Open Book Publishers logo Open Access logo
  • button
  • button
  • button
GO TO...
Contents
Copyright
book cover
BUY THE BOOK

9. The grapheme-phoneme correspondences of English, 1: Graphemes beginning with consonant letters

© 2015 Greg Brooks, CC BY http://dx.doi.org/10.11647/OBP.0053.09

A reminder: in chapters 9 and 10, the meanings of ‘initial’, ‘medial’ and ‘final’ referring to positions in words are different from their meanings in chapters 3-7, which deal with the phoneme-grapheme direction: here they refer to positions in written words, since these chapters deal with the grapheme-phoneme direction. So, for instance, here the ‘magic <e>’ in split digraphs is described as being in word-final position, and consonant letters enclosed within split digraphs are in medial position.

9.0 Unwritten consonant phonemes

This is the appropriate place to recall that some occurrences of medial and linking /w/ (and possibly two of initial /w/) and a great many occurrences of medial and linking /j/ are not represented in the spelling at all – see sections 3.8.7-8. Following linguistic convention, these instances can be described as spelt by zero, hence the numbering of this section. Necessarily, as far as these cases are concerned, the rest is silence.

9.1 General introduction to the grapheme-phoneme correspondences

In chapters 9 and 10 I present the grapheme-phoneme correspondences from British English spelling to RP using the inventory of 284 graphemes listed in chapter 8. This chapter covers graphemes beginning with consonant letters, chapter 10 those beginning with vowel letters; for this purpose <y> counts as a vowel letter. This arrangement is followed even with graphemes which begin with a letter of one category but always or mostly have correspondences with phonemes of the other category, e.g. <ed> in past verb forms, <le> in table, etc., and those which have both consonant and vowel pronunciations, especially <i, u, y>.

The distinction between the main system and the rest which was arrived at in chapters 3 and 5 is maintained here, in mirror-image. That is, the only graphemes which are treated as part of the main system are the 89 listed in Table 8.3, and the only grapheme-phoneme correspondences which are treated as part of the main system are the converses of the 138 phoneme-grapheme correspondences involving those graphemes in the ‘main system’ columns of Tables 8.1-2; this principle is maintained even for correspondences whose frequencies in this direction are low.

Other grapheme-phoneme correspondences which involve the 89 main-system graphemes are treated as exceptions to the main system (even where their correspondences in this direction have high frequencies). Because some correspondences which are frequent in the phoneme-grapheme direction are rare in the grapheme-phoneme direction, and vice versa (which indicates both a mismatch between the two directions and therefore a basic misdesign in the overall system), in chapters 9 and 10 I have abandoned the distinction between frequent and rare correspondences for the main-system graphemes. However, most minor correspondences are again treated as Oddities. (For exceptions to the last statement, see sections 9.5 and 10.2).

Across chapters 9 and 10 all 89 graphemes of the main system are covered. However, there are only 76 entries. This is mainly because of the 12 principal doubled consonant spellings which consist of two occurrences of the single letter which spells the same phoneme: <bb, dd, ff, gg, ll, mm, nn, pp, rr, ss, tt, zz> (the 13th is because <dg, dge> share an entry). Since these ‘geminates’ have hardly any pronunciations different from the basic one of the corresponding single letter, and their two letters hardly ever belong to separate graphemes (for the only exception I know of, see Notes to section 9.15), each is covered within a joint entry with the single letter.

Otherwise, in both chapters the graphemes of the main system are listed in alphabetical order, with cross-references to show where those consisting of more than one letter are not covered under their initial letter. Minor graphemes are listed under the appropriate main-system grapheme, e.g. <bd> under <b>, <ae> under <a>, <err> under <er>, etc.

[For compulsive counters: 2- and 3-phoneme graphemes are treated differently in these two chapters from chapters 3 and 5. There, each such grapheme was logged under each of the relevant phonemes. Here, each such grapheme is logged only once, under its initial letter. But the total number of correspondences remains the same.]

9.2 When is a digraph not a digraph?

(Parallel questions apply to trigraphs and four-letter graphemes – see for example the competing possibilities for word-final <che> (section 9.9), the discussions of <gh> in the entry for <g, gg> (section 9.15), <ough> in the entry for <ou> (section 10.33), the discussion of vowel letters ‘in hiatus’ (section 10.42), and the paragraph beginning ‘When is a split digraph not a split digraph?’ in Appendix A, section A.6).

Some sequences of more than one letter which form main-system graphemes never or hardly ever occur except as those graphemes – a clear example is <ck>. Others have exceptions only at morpheme boundaries within words, e.g. <t, h> in a few words like carthorse, meathook, <o, o> in cooperate, zoology. Other main-system graphemes again occur only in restricted positions, so that all other occurrences of the same sequence of letters contain more than one grapheme – see for example <ce> (section 9.8). I attempt to give clear guidance related to each main-system grapheme (and in section 9.44 state a generalisation about the six graphemes other than <sh> which are pronounced /ʃ/), but in the end effectively have to assume that a human reader (as distinct from a computerised text-to-speech system) can recognise both morpheme boundaries within compound words and multi-letter graphemes within stem words. Carney (1994: 286-7) states the same assumption.

I also assume that readers of this book will realise that other occurrences of the sequences of letters which constitute minor multi-letter graphemes follow the general rules; therefore I do not waste space saying (for example) ‘Occurrences of <p, s> other than word-initially consist of separate graphemes’. Conversely, where a correspondence for a single letter is said to be ‘regular’, this does not include cases where the letter forms part of another grapheme; for example, ‘<c> is pronounced /s/ before <e, i, y>’ does not include its di/trigraphic occurrences in <ce, ci, sci>.

Both assumptions work better for graphemes beginning with consonant letters than for those beginning with vowel letters – but that is true of generalisations for the two sets of graphemes as a whole.

9.3 Frequencies

The frequencies in these chapters are derived from Gontijo et al. (2003). They used a corpus of 17.9 million words (the CELEX database, Version 2.5, Baayen et al., 1995) in which both the British spelling and (a computerised version of) the RP pronunciation of every one of the 160,595 different words is represented. (The authors do, however, point out that 2,887 lines (1.8%) in the database contain multi-word expressions, of which the longest is European Economic Community; hence the number of lines with unique single words is actually 157,708.) Gontijo et al. based their graphemic analysis on that of Berndt et al. (1987), which was based on a corpus of only 17,000 words in US spelling, but adapted it for British spelling and expanded it to deal with rarer graphemes as their analysis proceeded. Ultimately, Gontijo et al.’s database contained a set of 195 graphemes and 461 grapheme-phoneme correspondences. While these numbers are rather smaller than my overall totals of 284 graphemes and 543 correspondences (see section 8.2), most of the ‘missing’ graphemes and correspondences are rare and would only be found by a total spelling nerd (= me).

As will be apparent, Gontijo et al. used a different corpus from Carney. Also, unlike Carney, they did not lemmatise their corpus (= remove suffixes and reduce words to their stem forms); nor did they ignore high-frequency words like of, there, where. However, they did relate the number of occurrences of a grapheme to the number of times each word appeared in the database – that is, they calculated text frequencies rather than lexical frequencies – see the discussion in section 3.3. Even so, their frequencies are not the mirror-image of Carney’s. Producing mirror-image frequencies would require using exactly the same database, the same set of conventions (especially whether to lemmatise or not), and the same set of graphemes for the analyses in both directions. Such an analysis has yet to be undertaken.

Having established their sets of graphemes and correspondences, Gontijo et al. calculated the number of occurrences of each grapheme and its frequency within the whole database, and the frequency of every grapheme-phoneme correspondence as a subset of all the correspondences for the relevant grapheme For example, they calculated that:

  • grapheme <a> accounted for 3,746,713 of the total of 67,590,620 grapheme occurrences
  • <a> therefore represented 5.55% of all the grapheme occurrences in the database
  • <a> pronounced /ə/ occurred 591,123 times, and that correspondence therefore represented 15.8% of the 3,746,713 correspondences for grapheme <a>.

To arrive at the percentages presented in chapters 9 and 10, I have modified Gontijo et al.’s results in various ways. To give just two examples:

1) The way they (and Mountford, 1998) analysed word-final <e> resulted in far too many split digraphs, trigraphs, etc.; e.g. they treated <a, e> in collapse as an example of <a.e>. In my opinion, <a, e> here are better analysed as <a> pronounced /æ/ and <e> as part of <se> pronounced /s/;

2) Their system recognised too few graphemes ending in <r>, e.g. <air> in dairy is split into <ai> pronounced /eə/ and <r> pronounced /r/, whereas my analysis posits that the <r> in such cases is not only a grapheme in its own right spelling /r/ but also part of <air> spelling /eə/ – see sections 5.6.3, 7.1 and 10.6, and section A.8 in Appendix A.

Rather than listing all the differences between my calculations and Gontijo et al.’s, let me just say that, where I could, I have re-allocated sets of words and correspondences in accordance with my analysis, and then re-calculated the frequencies of the correspondences within graphemes.

The outcomes are that:

  • I give no percentages for a large number of minor graphemes, those which have only one pronunciation and for which it would be otiose to keep saying ‘100%’. This applies to 154 of the 195 minor graphemes across these two chapters
  • for the 41 minor graphemes with more than one pronunciation I give percentages only in the few cases where Gontijo et al.’s data provide them, otherwise not
  • I give separate percentages for the correspondences of as many main-system graphemes as possible, including (again, where Gontijo et al.’s data provide them) for the minor correspondences of such graphemes, e.g. under <ch>; for the main exceptions to this see the first paragraph of section 10.1.

    9.4 The general picture: the regular pronunciations of English graphemes beginning with consonant letters

This chapter contains 38 main entries for graphemes beginning with consonant letters, in alphabetical order, even though Table 8.1 lists 58 graphemes spelling consonant phonemes in the main system. The reasons for the discrepancy are:

  • as mentioned above, the 12 geminate spellings have joint entries with the single letters, and <dg, dge> have a joint entry
  • all the correspondences for <ed, ew, i, u, ue, u.e, y>, consonantal, vocalic and 2-phoneme, are covered in chapter 10.

For the 51 main-system graphemes covered in this chapter, the general picture can be summed up as follows:

  • The 21 graphemes listed in Table 9.1 have only one pronunciation each (except for one tiny exception under <b>):

Table 9.1: 21 main-system consonant graphemes with only
one pronunciation each.

These graphemes

are always pronounced as

these phonemes

b, bb

/b/

ck

/k/

dd

/d/

dg, dge

/ʤ/

ff

/f/

k

/k/

mm

/m/

nn

/n/

p, pp

/p/

q

/k/

r, rr

/r/

sh

/ʃ/

ssi *

/ʃ/

tch

/ʧ/

tt

/t/

ve *

/v/

w

/w/

* For these graphemes, the statement that they have only one pronunciation each involves defining the circumstances in which they constitute separate graphemes carefully; the rest are pronounced as shown in all positions in the word where they occur – this qualification is needed to recognise that several do not occur initially and others do not occur finally; all 21 occur medially.

  • The 20 graphemes listed in Table 9.2 have only one frequent pronunciation each:

Table 9.2: 20 main-system consonant graphemes with only
one frequent pronunciation each.

These graphemes

are mostly pronounced as

these phonemes

ch

/ʧ/

ci *

/ʃ/

d

/d/

f (ignoring of)

/f/

gg

/g/

h

/h/

j

/ʤ/

l, ll

/l/

le *

/əl/

m

/m/

ng

/ŋ/

nn

/n/

ph

/f/

ss

/s/

ti

/ʃ/

v

/v/

wh

/w/

z, zz

/z/

* For these graphemes, the statement that they have only one frequent pronunciation each involves defining the circumstances in which they constitute separate graphemes carefully; the rest are pronounced as shown in all positions in the word where they occur – this qualification is needed to recognise that several do not occur initially and others do not occur finally; all 20 occur medially.

  • The nine graphemes listed in Table 9.3 have two main pronunciations each, and the circumstances in which the two pronunciations occur can be defined quite closely:

Table 9.3: Nine main-system consonant graphemes with two regular pronunciations each.

This grapheme

has these two main pronunciations

c

/k, s/

ce

/s, ʃ/

g

/g, ʤ/

n

/n, ŋ/

se

/z, s/

si

/ʒ, ʃ/

t

/t, ʧ/

th

/θ. ð/

x

/ks, z/

<s> is the only main-system grapheme beginning with a consonant letter which is a major problem: it is mainly pronounced /s/ but has lots of exceptions (mainly where it is pronounced /z/) for which no rules can be stated, especially in medial position.

This means that 41 of these 51 graphemes have only one, or only one frequent, pronunciation, and the other 10 have only two main pronunciations each; none have more than two main-system pronunciations.

For completeness, it should also be noted that many minor consonant graphemes also have highly predictable pronunciations, e.g. word-final <que>. In fact, of the 107 graphemes beginning with consonant letters that are outside the main system, only 12 <cc che cz gh gn mn nd phth sc sch te xh> have more than one pronunciation. In any attempt (not made here) to estimate the overall regularity of the system this would need to be taken into account. However, many minor graphemes are so rare that they would not affect the regularity calculation unless they occur in high-frequency words.

To complete the picture for graphemes beginning with consonant letters, Table 9.4 lists all 51 of them and shows their main-system and minor correspondences and numbers of Oddities. Table 9.4 is almost but not quite the mirror-image of Table 8.1 because:

  • graphemes which begin with consonant letters but vowel phonemes (e.g. <ho> in honest) are included here;
  • graphemes which begin with vowel letters but consonant phonemes (e.g. <ue> pronounced /juː/) are not included here but in Table 10.1.

Table 9.4: Main-system graphemes beginning with consonant letters, by main-system and minor correspondences and numbers of Oddities.

Main system

The rest

Grapheme

Basic phoneme

Other main-system correspondences

Exceptions to main system (minor correspondences)

Number of Oddities * which the grapheme ‘leads’

b

/b/

/p/

6

bb

/b/

c

/k/

/s/

/ʃ ʧ/

12

ce

/s/

/ʃ/

ch

/ʧ/

/k ʃ ʤ/

3

ci

/ʃ/

/ʧ ʒ/

ck

/k/

1

d

/d/

/ʤ/

7

dd

/d/

1

dg

/ʤ/

dge

/ʤ/

f

/f/

/v/

2

ff

/f/

1

g

/g/

/ʤ/

/k ʒ/

12

ge

/ʤ/

/ʒ/

gg

/g/

/ʤ/

h

/h/

/j/

5

j

/ʤ/

/j ʒ h/

1

k

/k/

4

l

/l/

/əl/

1

le

/əl/

/l/

ll

/l/

/j lj/

1

m

/m/

/əm/

5

mm

/m/

1

n

/n/

/ŋ/

/ən/

7

ng

/ŋ/

/n/ŋk/

3

nn

/n/

1

p

/p/

5

ph

/f/

/p v/

2

pp

/p/

2

q

/k/

2

r

/r/

3

rr

/r/

2

s

/s/

/z ʒ/

/ʃ/

12

se

/s/

/z/

3

sh

/ʃ/

si

/ʒ/

/ʃ/

/z/

ss

/s/

/ʃ z/

1

ssi

/ʃ/

t

/t/

/ʧ/

/ʃ s/

5

tch

/ʧ/

th

/ð/

/θ/

/t ʧ tθ/

2

ti

/ʃ/

/ʧ ʒ/

tt

/t/

1

v

/v/

/f/

1

ve

/v/

w

/w/

2

wh

/w/

/h/

x

/ks/

/z k gz kʃ gʒ eks/

4

z

/z/

/s ʒ ts/

2

zz

/z/

/ts/

Total

51

12

49

123

51

63

172

Grand total of correspondences: 235

* including 2- and 3-phoneme pronunciations and doubled spellings which are not part of the main system.

9.5 Order of description

In most of the 38 main entries in this chapter I list the items in this order:

1) The basic phoneme. In my opinion, each of these graphemes has a basic phoneme, the one that seems most natural as its pronunciation. Where the basic phoneme is the only pronunciation of the grapheme it is labelled ‘Only phoneme’. Where a geminate spelling always or mostly has the same pronunciation as the single letter they are shown together. However, there are five geminate spellings which are minor graphemes: <cc, jj, kk, vv, ww> - these are listed under Oddities below the single letter. <hh> occurs too, but only at the morpheme boundary in compound words, e.g. witchhunt, and <q, x> appear doubled only in brand names or foreign words. These three are therefore mentioned only to exclude them.

2) Any other phoneme which counts as a main-system pronunciation of the grapheme, as defined above. Where there are no such phonemes this subheading is omitted.

These two categories constitute the main system for grapheme-phoneme correspondences for graphemes beginning with consonant letters. Correspondences in the main system are shown in 9-point type, the rest in smaller 7.5-point type.

3) Any doubled-letter grapheme which is not part of the main system (this sub-heading is also omitted where it is not relevant).

4) Exceptions to the main system, including any 2- or 3-phoneme correspondences for the main grapheme(s). The reason for listing exceptions to the main system separately from the Oddities is that this is the clearest way of showing where the main rules break down.

5) The geminate spelling plus final <e>, if it occurs. Where it might but does not, I say so; elsewhere I omit this heading.

6) Oddities, minor graphemes which begin with the letter(s) of the main grapheme and occur only in restricted sets of words.

7) Any 2- or 3-phoneme graphemes which include, but do not have entirely the same spelling as, the main grapheme. Almost all the 2- and 3-phoneme graphemes are also Oddities, but a few belong to the main system and are included there.

Most entries end with Notes, and two (<s, se>) have Tables.

The only exceptions to this ordering are 15 of the graphemes which have only one pronunciation each: <b, bb, ck, dg, dge, k, p, pp, q, r, rr, sh, ssi, tch, ve>. Under each of these there is just one heading, ‘Only phoneme’, and it is automatically part of the main system without having to be so labelled; however, most of these entries have Notes. The other 6 graphemes which have only one pronunciation each (<dd, ff, mm, nn, tt, w>) have/are within more extended entries.

Where a grapheme cannot appear in all of initial, medial and final positions there is usually a note to this effect at the head of its entry, with this exception: because doubled consonant spellings never occur word-inirially (except <ll> in llama, llano), the headings where doubled spellings appear are not labelled to this effect.

9.6 <b, bb>

The main system

Only phoneme (almost)

/b/

100%

e.g. rabid, rabbit

The rest

pronounced

Exception to main system

<b>

/p/

only in presbyterian pronounced /prespɪˈtɪəriːjən/ (also pronounced /prezbɪˈtɪəriːjən/), where the <b> devoices to /p/ if the <s> is pronounced /s/

Word-final doubled letter + <e>

(does not occur)

Oddities

<bd>

/d/

only in bdellium /ˈdeliːjəm/

<bh>

/b/

only in abhor(red) /əˈbɔː(d)/, abhorrent /əˈbɒrənt/, bhaji, bhang(ra), bhindi, Bhutan and a few other rare words from the Indian sub-continent. <b, h> are usually separate graphemes at a morpheme boundary, as in clubhouse, subheading

<bp>

/p/

only in subpoena /səˈpiːnə/

<bt>

/t/

only in debt, doubt, subtle. /b/ surfaces in debit, indubitable, subtility – see section 7.2

<bu>

/b/

only in build, buoy, buy

<bv>

/v/

only in obvious pronounced /ˈɒviːjəs/

2-phoneme graphemes

(none)

Note

For <ba> in syllabary, and for <be> in deliberate, gooseberry /ˈgʊzbriː/), liberal, raspberry /ˈrɑːzbriː/), strawberry /ˈstrɔːbriː/), see section 6.10.

9.7 <c>

N.B. <ce, ch, ci, ck, tch> have separate entries.

The main system

Basic phoneme

/k/

67%

e.g. cat. Regular before <a, o, u> and consonant letters

Other phoneme

/s/

30%

e.g. city. Regular before <e, i, y>

The rest

Exceptions to main system

pronounced

3% in total

<c>

/k/

before <e, i, y> only in arced, arcing, Celt, Celtic (but the Glasgow football team is /ˈseltɪk/), sceptic, synced, syncing

(which means that the spelling synch for this verb is better), and words beginning encephal- pronounced /eŋkefəl-/ (also pronounced with /ensefəl-/). Also, in July 2006 the superlative adjective chicest /ˈʃiːkɪst/ appeared on a magazine cover – the comparative would presumably be chicer

<c>

/s/

other than before <e, i, y> only in apercu, facade (lacking their French cedillas)

<c>

/∫/

only in officiate, speciality, specie(s), superficiality and sometimes
ap/de-preciate
, associate. See Notes

<c>

/ʧ/

only in cellist, cello, cicerone (twice), concerto (second <c>)

Word-final doubled letter + <e>

(does not occur; in recce <cc, e> are separate graphemes)

Oddities

<cc>

/ks/

almost 100% before <e, i, y>, where (following the general rules for <c> above) the two letters are separate graphemes, e.g. accent, occiput, coccyx. This entry, with 2 graphemes corresponding separately to 2 phonemes, strictly speaking does not belong in this book based on correspondences to and from single graphemes, but it has to be included for clarity over the single-phoneme correspondences of <cc> in the next four paragraphs; <cc> pronounced /ks/ is not counted in the overall totals of correspondences

<cc>

/ʧ/

before <e, i> only in bocce, cappuccino. There are no occurrences of <cc> pronounced /ʧ/ before <y>

<cc>

/s/

before <i> only in flaccid, succinct pronounced /ˈflæsɪd, səˈsɪŋkt/ (also pronounced (regularly) /ˈflæksɪd, səkˈsɪŋkt/). There are no occurrences of <cc> pronounced /s/ before <e, y>

<cc>

/k/

before <e, i, y> only in baccy, biccy, recce /ˈrekiː/ (short for reconnoitre), soccer, speccy, streptococci

<cc>

/k/

100% before <a, o, u>, e.g. occasion, account, occur

<cch>

/k/

only in bacchanal, Bacchante, bacchic, ecchymosis, gnocchi, saccharide, saccharine, zucchini

<cq>

/k/

only in acquaint, acquiesce, acquire, acquisitive, acquit, with the <u> being pronounced /w/

<cqu>

/k/

(not /kw/) only in lacquer, picquet, racquet

<ct>

/t/

only in Connecticut, indict, victualler, victuals. /t/ surfaces in indiction – see section 7.2

<cu>

/k/

only in biscuit, circuit

<cz>

/ʧ/

only in czardas /ˈʧɑːdæʃ/, Czech /ʧek/

<cz>

/z/

only in czar(ina) /zɑː(ˈriːnə)/

2-phoneme graphemes

(none, but see <cc> pronounced /ks/ under Oddities)

Notes

Given the small numbers of words in which the major correspondences do not apply, those two correspondences stated context-sensitively mean that pronunciations of <c> as a single-letter grapheme are 97% predictable.

Medial <c> pronounced /∫/ is always followed by <i(e)>, but the <i(e)> is a separate grapheme pronounced /iː/. Some of the relevant words have alternative pronunciations with /s/, e.g. appreciate as /əˈpriː∫iːjeɪt/ or /əˈpriːsiːjeɪt/, associate as /əˈsəʊ∫iːjeɪt/ or /əˈsəʊsiːjeɪt/ (taking associate as a verb; the noun of the same spelling ends in /ət/), species as /ˈspiːʃiːz/ or /ˈspiːsiːz/. However, when verbs ending in <-ciate> are nominalised with the suffix /ən/ spelt <-ion>, which compulsorily changes the final /t/ of the verb to medial /∫/, in many RP-speakers’ accents a phonological constraint seems to operate against medial /∫/ occurring twice; for example appreciation, association are pronounced /əpriːsiːˈjeɪ∫ən, əˈsəʊsiːˈjeɪ∫ən/, not /əpriː∫iːˈjeɪ∫ən, əˈsəʊ∫iːˈjeɪ∫ən/.

For <ca> in adverbs ending <-ically>, which is always pronounced /ɪkliː/, apothecary and forecastle pronounced /ˈfəʊksəl/, and for <co> in chocolate, decorative, see section 6.10.

9.8 <ce>

Never initial.

The main system

For both categories and for estimated percentages see Notes.

Basic phoneme

/s/

except in a few suffixed forms (see section 6.4), only word-final, e.g. fence, once, voice. In final position there is only one exception

Other phoneme

/∫/

never initial; word-finally only in liquorice pronounced /ˈlɪkərɪ∫/ (also pronounced /ˈlɪkərɪs/); otherwise only medial: regular in the ending <-aceous> pronounced /ˈeɪʃəs/, e.g. cretaceous, curvaceous, herbaceous, sebaceous and about 100 other words, mostly scientific and all very rare, plus cetacean, crustacea(n), Echinacea, ocean, siliceous

The rest

pronounced

Exception to main system

word-final <ce>

/∫/, not /s/

only in liquorice pronounced /ˈlɪkərɪʃ/

Oddities

(none)

2-phoneme graphemes

(none)

Notes

Gontijo et al. (2003) do not recognise word-final <ce> as a separate grapheme, so give data only for its medial occurrences. However, it is clear that in both of the very restricted circumstances where it is a separate grapheme <ce> is virtually 100% regular.

In all unsuffixed words with medial <ce> as a digraph the stress falls on the vowel preceding /s/ spelt <ce>, and that vowel is spelt with a single letter which has its letter-name pronunciation (only exception: siliceous /sɪˈlɪʃəs/).

In many words, word-final <e> after <c> following a single vowel letter is also part of a split digraph with the vowel letter; see the entries for the six split digraphs in chapter 10, sections 10.4/17/24/28/38/40. However, in some words the vowel letter preceding <ce> is a separate grapheme with its ‘short’ pronunciation, e.g. practice; for these exceptions also see the sections just cited.

In all cases other than those defined above, <c, e> are separate graphemes; in particular, note oceanic /əʊsiːˈjænɪk/, panacea /pænəˈsiːjə/. Word-final <c, e> are separate graphemes only in fiance, glace (now increasingly spelt even in English text with French <é>).

9.9 <ch>

N.B. <tch> has a separate entry.

The main system

Basic phoneme

/ʧ/

87%

e.g. chew, detach

The rest

pronounced

Exceptions to main system

<ch>

/k/

10% regular (no exceptions) before a consonant letter, e.g. aurochs, chlamydia, chloride, chlorine, chrism, Christ(ian(ity)), Christmas, Christopher, chrome, chromosome, chronic and every other word beginning <chron->, chrysalis, chrysanthemum, drachma, lachrymose,

ochre, pinochle, pulchritude, sepulchre, strychnine, synchronise, technical, technique; also in many words of Greek origin, e.g. amphibrach, anarchy, anchor, archaic and every other word beginning <arch-> where the next letter is a vowel letter (exceptions: arch-enemy, archer, with /ʧ/), brachial, brachycephalic, bronchi(al/tis),
catechis-e/m
, chalcedony, chameleon, chaos, character, charisma, chasm, chemical, chemist, chiasma, chimera, chiropody (also pronounced with initial /ʃ/), choir, cholesterol, cholera, choral, chord, choreography, chorus, chyle, chyme, cochlea, diptych, distich, echo, epoch, eschatology, eucharist, eunuch, hierarch(y) and every other polysyllabic non-compound word ending <-arch(y)>, hypochondriac, ichor, lichen pronounced /ˈlaɪkən/ (also pronounced /ˈlɪʧən/), machination, malachite, mechani-c/sm,
melanchol-y/ic
, orchestra, orchid, pachyderm, parochial, pentateuch, psyche and all its derivatives, scheme, schizo and all its derivatives, scholar, school, stochastic, stomach, synecdoche, trachea, triptych, trochee. Words of non-Greek origin in this group are ache, baldachin, chianti, chiaroscuro, cromlech, Czech, masochist, Michael, mocha, oche, scherzo, schooner; also broch, loch, pibroch, Sassenach when pronounced with /k/ rather than Scots /x/. See Notes

<ch>

/∫/

2% phonemically and orthographically word-finally only in (Germanic) milch, mulch, Welch; otherwise only in about 50 words of mainly French origin, namely (initially) chagrin, chaise, chalet, chamois, champagne, chancre, chandelier, chaperone, charabanc, charade, charlatan, Charlotte,

chassis, chateau, chauffeu-r/se, chauvuinism, chef, chemise, chenille, cheroot, chevalier, chevron, Chicago,
chi-chi
(twice), chic(ane(ry)), chiffon, chignon, chivalr-ic/ous/y, chute; also sometimes in (Greek) chiropody (hence the punning shop name Shuropody); (medially) attache, brochure, cachet, cachou, cliche, crochet, duchesse, echelon, embouchure, Eustachian, machete, machicolation, machine, marchioness, nonchalant, parachute, pistachio, recherche (twice), ricochet, ruching, sachet, touche; (phonemically but not orthographically word-finally) fiche, gouache, moustache, niche pronounced /niː∫/ (also pronounced /nɪʧ/), pastiche, quiche, ruche.
Contrast word-final <che> pronounced /∫/ and word-final <ch, e> as separate graphemes, below

<ch>

/ʤ/

1% only in ostrich, sandwich, spinach pronounced /ˈɒstrɪʤ, ˈsæmwɪʤ, ˈspɪnɪʤ/

Oddities

<che>

/∫/

only in barouche and about 13 words of French origin, namely (medially) only rapprochement; (finally) avalanche, blanche, brioche, cache, cartouche, cloche, creche, douche, farouche, gauche, louche, panache. In all these words the final <e> is irrelevant to the pronunciation of the preceding vowel grapheme. Contrast the words where word-final <e> after <ch> is instead part of a split digraph (ache and fiche … ruche two paragraphs above) and word-final <ch, e> as separate graphemes, below

<che>

/ʧ/

only in niche pronounced /nɪʧ/ (also pronounced /niːʃ/)

<chs>

/∫/

only in fuchsia /ˈfjuːʃə/

2-phoneme graphemes

(none)

Notes

There are a few cases in which word-final <ch, e> constitute two graphemes rather than one: attache, cliche, recherche, touche with /ʃeɪ/ (sometimes spelt even in English text with French <é>), menarche, oche, psyche, synecdoche with /kiː/, but there appear to be no cases at all in which <c, h> are separate graphemes.

<ch> is also sometimes pronounced /x/ as in Scots broch, dreich, loch, Sassenach and German-style pronunciations of names like Schumacher, but I have not included this correspondence in my analysis because /x/ is not a phoneme of RP.

9.10 <ci>

Only medial.

The main system

Basic phoneme

/∫/

100%

regular when both preceded and followed by vowel letters, e.g. audacious, magician, specious. Extension: commercial, where the preceding <er> digraph nevertheless spells a (long) vowel phoneme. See also Notes

The rest

pronounced

Exceptions to main system

<ci>

/ʧ/

only in ancient /ˈeɪnʧənt/, ciabatta /ʧəˈætə/

<ci>

/ʒ/

only, exceptionally but increasingly, in coercion pronounced /kəʊˈwɜːʒən/ (usually pronounced /kəʊˈwɜːʃən/)

Oddities

(none)

2-phoneme graphemes

(none)

Notes

In most cases the stress falls on the vowel preceding /∫/ spelt <ci>, and that vowel is spelt with a single letter which has its letter-name pronunciation. Exceptions: if the preceding vowel letter is <i> it is pronounced /ɪ/, e.g. magician; also precious, special with /e/.

In all other cases, <c, i> are separate graphemes.

9.11 <ck>

Never initial.

The main system

Only phoneme

/k/

100%

e.g. black

The rest

pronounced

Exceptions to main system

(none)

Oddity

<ckgu>

/g/

only in blackguard
/ˈblægəd, ˈblægɑːd/

2-phoneme graphemes

(none)

Note

The only word in which <c, k> belong to separate morphemes and therefore graphemes seems to be acknowledge, and even there the phoneme is /k/. This counts as a curious ‘surfacing’ sound – see section 7.2.

9.12 <d, dd>

N.B. <dg, dge> have a separate entry. <ed>, as in past tense and participle verb forms, has a separate entry in chapter 10, section 10.15.

The main system

Basic phoneme

/d/

<d> 99%, <dd> 100%

e.g. bud, buddy

The rest

pronounced

Exceptions to main system

<d>

/ʤ/

1% of correspondences for <d>. Never word-final; regular initially and medially before <u> followed by another vowel letter or <r>, e.g. arduous, assiduous, (in)credulous, deciduous, dual/duel (cf. the homophone jewel), due (cf. the homophones dew, Jew), duet, duke, dune, dupe, duty, education, graduate pronounced either /ˈgræʤuːwət/ (noun) or /ˈgræʤuːweɪt/ (verb), durable, duration, duress, during, endure, fraudulen-ce/t, glandular, modul-e/ar,
nodul-e/ar
, pendulum, sedulous, procedure, verdure (cf. the homophone verger); also in gradual, individual, residual whether pronounced with /ʤuːwəl/ or /ʤəl/ (for the elision of the <u> see section 6.10). Also in a few words before <eu, ew>: deuce (cf. the homophone juice), various words beginning with (Greek) deuter-, dew (cf. the homophones due, Jew), grandeur. See Notes

Word-final doubled letter + <e>

(does not occur)

Oddities

<ddh>

/d/

only in Buddha and derivatives, saddhu

<de>

/d/

only in aide, blende, blonde, horde and in bade, forbade (past tenses of bid, forbid) pronounced /bæd, fəˈbæd/ (also pronounced /beɪd, fəˈbeɪd/)

<dh>

/d/

only in a few loanwords from the Indian subcontinent, e.g. dhobi, dhoti, dhow, Gandhi, jodhpurs, sandhi, Sindh

<di>

/ʤ/

only in cordial pronounced/ˈkɔːʤəl/ (also pronounced /ˈkɔːdiːjəl/), incendiary, intermediary, stipendiary, subsidiary pronounced with /ʤəriː/, soldier

<dj>

/ʤ/

only in about 10 words containing the (Latin) prefix <ad->: adjacent, adjective, adjoin, adjourn, adjudge, adjudicate, adjunct, adjure, adjust, adjutant, plus djinn

<dne>

/n/

only in Wednesday

<dt>

/t/

only in veldt

2-phoneme graphemes

(none)

Notes

For <da> in dromedary, lapidary, laudanum, legendary, secondary, <de> in broadening, considerable, gardener, launderette, widening and <di> in medicine see section 6.10.

All the words in which <d> is pronounced /ʤ/ were formerly pronounced with the sequence /dj/, and conservative RP-speakers may still pronounce them that way (or imagine they do). Pronunciations with /dj/ would require an analysis with the <d> pronounced /d/ and and the /j/-glide as part of the pronunciation of the <u> and following <r> or vowel letter. See <t>, section 9.33, for the largely parallel correspondence to voiceless /ʧ/, and <di> in the Oddities.

9.13 <dg, dge>

Only phoneme

/ʤ/

100%

e.g. badger, bridge, bridging, curmudgeon

Note

There seem to be no cases where <d, g(e)> are separate graphemes except at morpheme boundaries, e.g. headgear.

N.B. <ed> Though this grapheme has mainly consonant pronunciations, because it begins with a vowel letter it is covered in chapter 10, section 10.15.

9.14 <f, ff>

For percentages see Notes.

The main system

Basic phoneme

/f/

<f>

e.g. full. 100% provided of is treated as a special case

<ff>

100%. e.g. cliff, staff

Other phoneme
for
<f>

/v/

only in of and roofs pronounced /ruːvz/

The rest

pronounced

Exceptions to main system

(none)

Word-final doubled letter + <e>

<ffe>

/f/

only in gaffe, giraffe, pouffe; also in usual pronunciation of different, difference, sufferance (but not afferent, efferent) – see also section 6.10

Oddities

<fe>

/f/

only in carafe and some instances of elided vowels – see Notes

<ft>

/f/

only in often, soften

2-phoneme graphemes

(none)

Notes

Gontijo et al. (2003) found that 88% of all occurrences of <f> in their database were <f> pronounced /v/ in of, and only 12% were <f> pronounced /f/ in other words, but this is thoroughly misleading. Provided <f> in of is recognised as a special case (and roofs pronounced /ruːvz/ is rare), all other graphemes beginning <f> are pronounced /f/, = 100% predictable.

For <(f)fe> in cafetiere, conference, deference, difference, different; offering, preferable, preference, sufferance, <fi> in definitely, <for> in comfortable, <fu> in beautifully, dutifully see section 6.10.

9.15 <g, gg>

N.B. <dg(e), ge, ng> have separate entries. The entry for <ng> also covers all the cases where <n> before <g> is a separate grapheme.

The main system

Basic phoneme

/g/

<g> 71%,

<gg> 70%

e.g. game, braggart, egg. Regular except for <g> before <e, i, y>, but see the exceptions. Also see Notes

Other phoneme for <g>

/ʤ/

28% of corres-
pondences for <g>

Regular before <e, i, y>. See Notes

The rest

pronounced

Exceptions to main system

exceptions for <g> are 1% of its correspondences in total

<g>

/g/

before <e, i, y> in auger, beget, bogie, bogey, conger, eager, finger, fogey, gear, gecko, geese, gel (/gel/, conservative pronunciation of girl; contrast gel ‘viscous liquid’ pronounced /ʤel/), geld, get, geyser, hegemon-y/ic, laager, lager, monger and all its compounds, renege (for this word see also <e.e>, section 10.17, and Notes to next section), target (contrast parget, with regular /ʤ/), tiger, together; anthropophagi, begin, giddy, gill (‘lung of fish’; contrast gill ‘quarter of a pint’ pronounced /ʤɪl/ and see Notes), gillie (also spelt ghillie), gilt, gimbal(s) (also pronounced with /g/), gimlet, gimp, gird, girdle, girl, girn, girt, girth, give, gizzard, yogi and first <g> in gig, giggle, gingham, gynaecology

<g>

/ʤ/

not before <e, i, y> only in gaol, margarine (also pronounced with /g/), Reg, veg, and second <g> in mortgagor

<g>

/k/

only in length, lengthen, strength, strengthen pronounced /leŋkθ, ˈleŋkθən, streŋkθ, ˈstreŋkθən/ (also pronounced /lenθ, ˈlenθən, strenθ, ˈstrenθən/) - for the rationale of this analysis see Notes under /ŋ/, section 3.8.2 – and in angst /æŋkst/, disguise /dɪsˈkaɪz/, disgust pronounced /dɪsˈkʌst/, i.e. identically to discussed; disguise, disgust are also pronounced /dɪzˈgaɪz, dɪzˈgʌst/, i.e. with <s, g(u)> both voiced rather than voiceless

<g>

/ʒ/

initially, only in genre, gilet; medially, only in aubergine, conge, dirigiste, largesse, negligee, protege, regime, tagine and lingerie pronounced /ˈlænʒəriː/ (also pronounced /ˈlɒnʤəreɪ/)

<gg>

/ʤ/

30% of correspondences for <gg>, but occurs only in arpeggio, exaggerate, loggia, Reggie, suggest, veggie, vegging. See Notes

Word-final doubled letter + <e>

(does not occur)

Oddities

<gh>

/f/

75% of pronunciations for <gh>, but see Notes/. Medially, only in draught, laughter; otherwise only word-final and only in chough, cough, enough, laugh, rough, slough (‘shed skin’), sough, tough, trough

<gh>

/g/

25% of pronunciations for <gh>, but see Notes. Word-final only in ugh; otherwise only in afghan, aghast, burgher, ghastly, ghat, ghee, gherkin, ghetto, ghillie (also spelt gillie), ghost, ghoul, ogham, sorghum and a few more rare words

<gh>

/k/

only in hough /hɒk/

<gh>

/p/

only in misspelling of hiccup as *hiccough

<gi>

/ʤ/

only in allegiance, collegial, contagio-n/us, egregious, legion, litigious, plagiaris-e/m, prestigious, region, religio-n/us, vestigial

<gl>

/l/

only in a few Italian loan words, namely imbroglio, intaglio, seraglio, tagliatelle

<gm>

/m/

only in apophthegm, diaphragm, epiphragm, paradigm, phlegm, syntagm. /g/ surfaces in paradigmatic, phlegmatic, syntagma(tic) – see section 7.2

<gn>

/n/

only in (initially) gnarl, gnash, gnat, gnaw, gneiss, gnome, gnosis, Gnostic, gnu (only exception: gnocchi, with /nj/, though gnu could also be analysed that way, with <gn> pronounced /nj/ and <u> pronounced /uː/ rather than /juː/ - take your pick); (medially) cognisance (also pronounced with /gn/), physiognomy, recognise pronounced /ˈrekənaɪz/ (usually pronounced /ˈrekəgnaɪz/); (finally) align, arraign, assign, benign, campaign, coign, condign, consign, deign, design, ensign, feign, foreign, impugn and a few other very rare words in –pugn, malign, reign, resign, sign, sovereign, thegn;

also phonemically word-final in champagne, cologne where the final <e> is part of a split digraph with the letter before the <g>. /g/ surfaces in agnostic, diagnosis, prognosis, malignant, pugnacious, repugnant, assignation, designation, resignation, signal, signature - see section 7.2. For exceptions to <gn> pronounced /n/ see the 2-phoneme grapheme below

<gne>

/n/

only word-final and only in cockaigne, epergne, frankalmoigne /kəˈkeɪn, ɪˈpɜːn, ˈfræŋkælmɔɪn/. In soigne /swaːˈnjeɪ/ <gn, e> are separate graphemes

<gu>

/g/

only in (initially) guarantee, guard, guerrilla, guess, guest, guide, guild, guilder, guile, guillemot, guillotine, guilt, guinea, guise, guitar, guy and a few more rare words; (medially) baguette, dengue, disguise pronounced /dɪzˈgaɪz/ (also pronounced /dɪsˈkaɪz/), languor (the <u> surfaces as /w/ in languid, languish­ – see section 7.2) and suffixed forms of a few words in next category, e.g. cataloguing, demagoguery; (phonemically word-finally) plague, vague; fatigue, intrigue; brogue, drogue, rogue, vogue; fugue and a few more rare words; in this group the vowel letter before <g> and the final <e> form a split digraph - contrast ague /ˈeɪgjuː/ and dengue /ˈdeŋgeɪ/, and see <ngu, ngue> under <ng>. Also see Notes

<gue>

/g/

only word-final and only in analogue, catalogue, colleague, decalogue, demagogue, dialogue, eclogue, epilogue, ideologue, league, monologue, morgue, pedagogue, prologue, prorogue, synagogue, where the final <e> is irrelevant both to the ‘short’ pronunciation of <o> and to the ‘long’ pronunciations of <ea, or> preceding <gu>. In US spelling several of these words are spelt without the final <ue>

2-phoneme grapheme

<gn>

/nj/

only in chignon, cognac, gnocchi, lasagne, lorgnette, mignonette, monsignor, poignant, seigneur, soigne, vignette and possibly gnu

Notes

Given the small numbers of words in which the major correspondences for <g> do not apply, those two correspondences stated context-sensitively mean that pronunciations of <g> are 99% predictable. There are, however, a few homograph pairs with <g> pronounced /g/ in one and /ʤ/ in the other: gel /gel/ (posh pronunciation of girl) v. /ʤel/’hair lotion’; gill /ˈgɪl/ ‘lung of fish’ v. /ʤɪl/ ‘quarter of pint’, Gillingham /ˈgɪlɪŋəm/ in Dorset and Norfolk v. /ˈʤɪlɪŋəm/ in Kent.

For words containg <n, g> before <e, i> in which the pronunciation of the <g> as /g/ is irregular see section 9.24.

Despite /ʤ/ being 30% of correspondences for <gg> I have not recognised it as a major correspondence because it occurs in so few words, and its high frequency seems to be almost entirely due to the two common words exaggerate, suggest - and suggest, pronounced /səˈʤest/ in RP, has a different pronunciation in General American: /səgˈʤest/; here the <g>’s are separate graphemes representing separate phonemes – but this is no more ‘regular’ than the RP pronunciation because it is the only case where two consecutive <g>’s do not form a digraph – indeed, the only case of geminate consonant letters which would otherwise constitute a digraph not doing so.

The contexts in which <gh> is pronounced /g/ are easily defined – but so is the list of about a dozen words where this correspondence occurs. <gh> is also sometimes pronounced /x/ as in Irish lough and names like McCullough, Naughtie, but I have not included this correspondence in my analysis because /x/ is not a phoneme of RP.

<gh> is never a separate grapheme after <ai, ei> - see <aigh, eigh> under <ai, e>, sections 10.5, 10.12. However, no rule can be defined to distinguish the 10 or 11 words where <gh> is a separate grapheme pronounced /f/ after <au, ou> from those where <augh, ough> are four-letter graphemes, so these just have to be learnt. See also <augh> under <au>, section 10.9, and Notes to section 10.33 on <ough>.

<gu> mostly has 2-phoneme pronuncations, e.g. /gw/ in anguish, distinguish, extinguish, guacamole, guano, guava, iguana, language, languish, linguist, penguin, sanguine, segue, unguent; /gʌ/ in gulf, gust, etc.

For <ga> in vinegary, <go> in allegory, category, <gu> in figurative, see section 6.10.

9.16 <ge>

N.B. <dge> has a joint entry with <dg>.

The main system

For both categories and for absence of percentages see Notes.

Basic phoneme

/ʤ/

word-initially, only in geograph-er/y, geomet-er/ry,
Geordie
, George, Georgia(n), georgic; rare medially, but cf. burgeon, dungeon, gorgeous, hydrangea, pageant, sergeant, sturgeon, surgeon, vengeance where the following vowel letter or digraph is pronounced /ə/, plus pigeon with /ɪ/; also dangerous, vegetable – see section 6.10; also singeing, swingeing (as distinct from singing, swinging), whingeing; word-finally, regular in hundreds of words ending <-age> pronounced /ɪʤ/, e.g. garage pronounced /ˈgærɪʤ/, haemorrhage, image, language, mortgage, village (for other words in <-age> see previous section); also in, e.g., allege, blancmange, change, college, flange, hinge, lounge, orange, sacrilege, scavenge

Rare phoneme

/ʒ/

never initial; medially, only in bourgeois(ie), mangetout; word-finally, only in about 25 words of mainly French origin, namely beige, cortege, concierge, liege, melange, rouge and, with the <e> also forming part of the split digraphs <a.e, i.e, u.e> (for dual-functioning see section 7.1), in badinage, barrage, camouflage, collage, corsage, decalage, décolletage, dressage, entourage, espionage, fuselage, garage pronounced /ˈgærɑːʒ/, massage, mirage, montage, triage, sabotage; prestige; luge

The rest

(None).

Notes

Gontijo et al. (2003) do not recognise <ge> as a grapheme, so give no data for it. However, given that very few words have <ge> pronounced /ʒ/, the percentage for /ʤ/ would be high.

In many words, final <e> after <g> following <a> is part of a split digraph with the <a> - see section 10.4. There are also a very few examples ending <ege, ige, oge, uge> (sections 10.17/24/28/38) and none ending <yge> (section 10.40). On split digraphs see also section A.6, and for dual-functioning see section 7.1.

Except in the roughly 24 words listed under the basic phoneme, initial and medial <g, e> are always separate graphemes. Word-finally, the only such cases appear to be conge, protege with /ʒeɪ/ (sometimes spelt even within English text with French <é>), sylloge with /ʤiː/. In renege /rɪˈneɪg/ I analyse <e.e> as a split digraph pronounced /eɪ/ - see sections 10.17 and A.6 – and the <g> as a single-letter grapheme pronounced (uniquely in this position, and irregularly before <e>) /g/ (contrast allege, college, sacrilege with /ʤ/, cortege with /ʒ/).

N.B. For <gg> see under <g>.

9.17 <h>

Never occurs as a single-letter grapheme in word-final position.

N.B. <ch, ph, sh, tch, th, wh> have separate entries.

The main system

Basic phoneme

/h/

100%

e.g. cohort, have

The rest

pronounced

Doubled letter

(<hh> occurs only in compound words, e.g. bathhouse, where the two letters belong to separate morphemes and graphemes)

Exceptions to main system

<1%

<h>

/j/

only in a very few words between 2 vowels, namely annihilate, vehement,

vehicle, vehicular /əˈnaɪjɪleɪt, ˈviːjəmənt, ˈviːjɪkəl, viːˈjɪkjələ/

Oddities

<hea>

/ɪ/

only in forehead pronounced /ˈfɒrɪd/

<heir>

/eə/

only in heir and derivatives (but there is /r/-linking in heiress, inherit – see section 3.6; and in inherit /h/ also surfaces; see section 7.2)

<ho>

/ɒ/

only in bonhomie, honest, honour and derivatives

<hu>

/w/

only in chihuahua (twice)

2-phoneme grapheme

<hour>

/aʊwə/

only in hour

N.B. For <i> pronounced as the consonant phoneme /j/ see, nevertheless, the entry for <i> in chapter 10, section 10.22.

9.18 <j>

The main system

Basic phoneme

/ʤ/

100%

e.g. jet, majesty

The rest

pronounced

Doubled letter

<jj>

/ʤ/

only in hajj

Exceptions to main system

<1% in total

<j>

/j/

only in hallelujah /hælɪˈluːjə/, and majolica pronounced /maɪˈjɒlɪkə/ (also pronounced /məˈʤɒlɪkə/)

<j>

/ʒ/

only in jihad, raj and some rare French loanwords, e.g. bijou, goujon, jabot, jalousie, jupe

<j>

/h/

only in fajita, jojoba (twice), marijuana, mojito, Navajo’

Oddities

(none)

2-phoneme graphemes

(none)

9.19 <k>

N.B. <ck> has a separate entry.

The main system

Only phoneme

/k/

100%

e.g. kelp, kit, sky

The rest

pronounced

Doubled letter

<kk>

/k/

only in chukker, dekko, pukka and inflected forms of trek, e.g. trekkie

Exceptions to main system

(none)

Word-final doubled letter + <e>

(does not occur)

Oddities

<ke>

/k/

only in Berkeley, burke

<kh>

/k/

only in astrakhan, gurkha, gymkhana, khaki, khan, khazi, khedive, sheikh, Sikh. See Note

<kn>

/n/

only in knack(er(s)), knap, knave, knead, knee, knell, knew, knick(er(s)), knickerbocker, knick-knack, knife, knight, knit, knob, knobbly, knock, knoll, knot, know(ledge), knuckle and a few more very rare words. Contrast Knesset, with /kn/, and for acknowledge see section 7.2

2-phoneme graphemes

(none)

Note

<kh> also occurs in transcriptions of some Russian names, e.g. Khrushchev, Mikhail, where it is meant to represent the /x/ phoneme, like <ch> in Scots loch – but since (a) most English-speakers instead pronounce these names with /k/ (as in the words listed above under Oddities), and (b) the correspondence with /x/ occurs only in names, I have not included this correspondence in my analyses.

9.20 <l, ll>

N.B. <le> has a separate entry.

The main system

Basic phoneme

/l/

100%

e.g. lift, fill

The rest

pronounced

Exceptions to main system

<l>

as 2-phoneme
sequence /əl/

only in axolotl, dirndl, shtetl /ˈæksəlɒtəl, ˈdɜːndəl, ˈʃtetəl/

<ll>

/j/

only in French-/Spanish-like pronunciations of bouillabaisse, marseillaise, tortilla /buːjaːˈbes, mɑːseɪˈjez, tɔːˈtiːjɑː/

<ll>

as 2-phoneme
sequence /lj/

only in carillon /kəˈrɪljən/

Word-final doubled letter + <e>

<lle>

/l/

medially, only in decollet-age/ee; otherwise only final and only in the ending -ville, e.g. vaudeville, plus bagatelle, belle, braille, chanterelle, espadrille, fontanelle, gazelle, grille, pastille, nacelle, quadrille (but not reveille, tagliatelle where the <e> is pronounced /iː/). In chenille, tulle I analyse <ll> as pronounced /l/ and <i.e, u.e> as split digraphs pronounced /iː, uː/ - see sections 5.7.2, 5.7.6, A.6 – and medially in guillemot <lle> is pronounced /liː/

Oddity

<lh>

/l/

only in philharmonic, silhouette

2-phoneme graphemes

(see above)

Note

For <lle> in chancellery, jewellery see section 6.10.

9.21 <le>

Only final.

The main system

Basic pronunciation

/əl/

100%

only word-final after a consonant letter, e.g. table, visible

The rest

pronounced

Exceptions to main system

<le>

/l/

medially, only in Charles; otherwise only word-final and only in aisle, cagoule, clientele, gargoyle, gunwale, joule, isle, lisle, voile. See Notes

Oddities

(none)

2-phoneme graphemes

(The basic
pronunciation is a
2-phoneme sequence)

Notes

In many words where final <le> follows a vowel letter and the main rule above therefore does not apply, word-final <e> after <l> following a single vowel letter is part of a split digraph with the vowel letter; see the entries for the six split digraphs in chapter 10, sections 10.4/17/24/28/38/40.

Initial and medial <l, e> are always two separate graphemes. Word-finally, the only such cases (i.e. the <e> is neither part of a split digraph nor part of a digraph with <l>) appear to be souffle (sometimes spelt even within English text with French <é>) with /leɪ/, facsimile, hyperbole, ukulele with /liː/, biennale, finale, guacamole, tamale with either.

The reason for picking out aisle, cagoule, clientele, gargoyle, joule, isle, lisle, voile as having word-final <le> is that the preceding vowel grapheme would be pronounced the same if the <e> were not present. Some of the spellings would then look even odder, but cagoule does have the alternative spelling kagoul.

N.B. For <ll> see under <l>.

9.22 <m, mm>

The main system

Basic phoneme

/m/

100%

e.g. mum, sum, mummy, summit

The rest

pronounced

Exceptions to main system

<1% in total

<m>

as 2-phoneme
sequence /əm/

only word-finally, but regular in all the words ending in <-sm>, e.g. chasm, enthusiasm, orgasm, phantasm, pleonasm, sarcasm, spasm, several words ending in –plasm (e.g. ectoplasm), chrism, prism, schism and all the many other words ending in –ism, macrocosm, microcosm, abysm, aneurysm (also spelt aneurism), cataclysm, paroxysm, plus algorithm, rhythm and a few other very rare words; also film pronounced /ˈfɪləm/ in some Irish accents

Word-final doubled letter + <e>

<mme>

/m/

now only in oriflamme and (non-computer) programme since gram and its derivatives are no longer spelt *gramme, etc.; in consomme (sometimes spelt even within English text with French <é>), <mm, e> are separate graphemes

Oddities

<mb>

/m/

only word-final and only in dithyramb, lamb; climb, limb; aplomb, bomb, catacomb, comb, coomb, coxcomb, coulomb, hecatomb, rhomb, tomb, womb; crumb, dumb, numb, plumb, rhumb, succumb, thumb and a few more very rare words. /b/ surfaces in dithyrambic, bombard(ier), bombastic, rhomb-ic/us, crumble and supposedly, according to some

authorities, in thimble (from thumb) - see section 7.2. The word-form number has the two pronunciations /ˈnʌmbə/ (‘amount, numeral’) and /ˈnʌmə/ (‘having less feeling’, comparative form of the adjective numb)

<mbe>

/m/

only word-final and only in buncombe (‘nonsense’; also spelt bunkum), co(o)mbe (‘short valley’; also spelt coomb); contrast flambe /ˈflɒmbeɪ/ (sometimes spelt even within English text with French <é>), where <m, b, e> are all separate graphemes

<me>

/m/

never initial; mainly word-final and there only in become, come, some, welcome and the adjectival suffix /səm/ spelt <-some>, e.g. handsome (contrast hansom); medially only in camera, emerald, omelette, ramekin pronounced /ˈræmkɪn/ (also pronounced /ˈræmɪkɪn/) – see section 6.10 – and Thames

<mn>

/m/

100% of pronunciations of <mn> but see Notes. Only word-final and only in autumn, column, condemn, contemn, damn, hymn, limn, solemn. /n/ surfaces in autumnal, columnar, columnist, condemnation, contemner, damnable, damnation, hymnal, hymnody, solemnity - see section 7.2

<mn>

/n/

<1% of pronunciations of <mn> but see Notes. Only in mnemonic, mnemonist. /m/ surfaces in amnesia, amnesty - see section 7.2

2-phoneme grapheme

(see above)

Notes

Given the very different word positions of <mn> pronounced /m, n/ this grapheme is 100% predictable. Given that it never occurs medially it is also very easy to distinguish from instances of <m, n> as separate graphemes.

For <ma> in customary, <me> in camera, emerald, omelette, <mi> in admirable, family see section 6.10.

9.23 <n, nn>

N.B. <ng> has a separate entry, which also covers all the cases where <n> before <g> is a separate grapheme, including those mentioned here where the <n> is pronounced /ŋ/.

The main system

Basic phoneme

/n/

<n> 85%,
<nn> 100%

e.g. tin, tinny. For <n>, /n/ is regular except before <c> pronounced /k/ and before <ch, g, k, q, x>. See Notes

Other phoneme for <n>

/ŋ/

15%

regular before <c> pronounced /k/ and before <ch, g, k, q, x>, e.g. concur pronounced /kəŋˈkɜː/, uncle, zinc; anchor, synchronise; angle, England, fungus, language, langur, length pronounced /leŋkθ/, longevity, prolongation, single; ankle, sink, thanks; banquet, conquer; anxiety, anxious, larynx, lynx.
See Notes

The rest

pronounced

Exceptions to main system

<1%

<n>

as 2-phoneme
sequence /ən/

only in Haydn (I mention him in memory of Chris Upward of the Simplified Spelling Society) and most contractions of not with auxiliary verbs, i.e. isn’t, wasn’t, haven’t, hasn’t, hadn’t, doesn’t, didn’t, couldn’t, shouldn’t, wouldn’t, mayn’t, mightn’t, mustn’t, oughtn’t, usedn’t, some of which are rare to the point of disuse, plus durstn’t, which is dialectal/comic; in all of these except mayn’t the preceding phoneme

is a consonant. Other contractions of not with auxiliary verbs (ain’t, aren’t, can’t, daren’t, don’t, shan’t, weren’t, won’t), i.e. all those with a preceding vowel phoneme (except mayn’t) are monosyllabic (though some Scots say /ˈdeərənt/ with a preceding consonant and therefore two syllables, and also /r/-linking – see section 3.6). Curiously, innit, being a contraction of
isn’t it, reduces isn’t to a single syllable

Word-final doubled letter + <e>

<nne>

/n/

only word-final and only in cayenne, comedienne, cretonne, doyenne, tonne
and a few other rare words

Oddities

<nc>

/ŋ/

only in charabanc /ˈʃærəbæŋ/

<nd>

/m/

only in sandwich /ˈsæmwɪʤ/

<nd>

/n/

only in grandfather, Grandma (hence the frequent misspelling *Granma – cf. section 4.4.7 on Gran(d)dad), handsome
(cf. hansom (cab)), landscape

<nd>

/ŋ/

only in handcuffs, handkerchief /ˈhæŋkʌfs, ˈhæŋkəʧɪf/

<ne>

/n/

non-finally, only in vineyard (and even there it is stem-final within a compound word) and with an elided vowel (see section 6.10) in confectionery, generative, stationery, vulnerable; otherwise only word-final after a vowel letter and only in about 35 words, namely bowline, Catherine, clandestine pronounced /klænˈdestɪn/ (also pronounced /ˈklændəstaɪn/), cocaine, compline, crinoline, demesne, (pre)destine, determine, discipline, done, engine, ermine, examine, famine, feminine, genuine, gone, groyne, heroine, hurricane pronounced /ˈhʌrɪkən/ (also pronounced /ˈhʌrɪkeɪn/), illumine, intestine, jasmine, marline, masculine, medicine, migraine, moraine, peregrine, ptomaine, saccharine, sanguine, scone pronounced /skɒn/ (also pronounced

/skəʊn/), shone, urine, vaseline, wolverine.In all but one of these words the <e> is phonographically redundant, in that its removal would not affect the pronunciation - the preceding vowel letter (if single) does not have its ‘letter-name’ pronunciation, and where there are two vowel letters they either form a digraph (cocaine, groyne, migraine, moraine, ptomaine) or are pronounced separately (genuine). The exception is done, which needs to be kept visually distinct from don, as heroine and marline (‘rope’) are from heroin and marlin (‘fish’). The only words in which final <n, e> are separate graphemes are are aborigine, acne, anemone

<nt>

/n/

only in denouement, divertissement, rapprochement

<nw>

/n/

only in gunwale

2-phoneme grapheme

(see above)

Notes

Given the small numbers of words in which the major correspondences for <n> do not apply, those two correspondences stated context-sensitively mean that pronunciations of <n> are virtually 100% predictable. Actually, they occur even without being consciously noticed because of the phonological context.

Some words beginning encephal-, e.g. encephalitis, are pronounced either /ens-/, with the predominant pronunciation of <n> as /n/, or /eŋk-/, with the regular pronunciation of <n> as /ŋ/ before <c> pronounced /k/.

For <na> in concessionary, coronary, culinary, discretionary, extraordinary /ɪkˈstrɔːdənriː/, imaginary, legionary, mercenary, missionary, ordinary, precautionary, preliminary, probationary, pulmonary, reactionary, revolutionary, stationary, urinary, veterinary /ˈvetrɪnriː/, visionary, <ne> in confectionery, general, generative, millinery, stationery, <nou> in honourable see section 6.10.

9.24 <ng>

Never initial.

The main system

Basic phoneme

/ŋ/

100%

e.g. bang, sing, long, young, bung. Regular word-finally, with no exceptions (in RP). /g/ surfaces in long-er/est, strong-er/est, young-er/est, diphthongise, elongate, prolongation, and /ʤ/ in longevity – see section 7.2. Medially in stem words, only in clangour, hangar, but there are thousands of occurrences in suffixed forms, e.g. clangorous, clingy, hanger, ringer, singer, singing, stinger, swinging, wringer. See Notes

The rest

pronounced

Exceptions to main system

<1%

<ng>

/n/ or /ŋk/

only in length, lengthen, strength, strengthen. See under /n, k, ŋ/, sections 3.4.5, 3.6.1, 3.7.2

Oddities

<ngh>

/ŋ/

only in dinghy, gingham, Singhalese /ˈdɪŋiː, ˈgɪŋəm, sɪŋəˈliːz/ (contrast <ng, h> as separate graphemes in shanghai /ʃæŋˈhaɪ/)

<ngu>

/ŋ/

only in a very few suffixed forms of words in next category, e.g. haranguing, tonguing. See also end of section 6.4

<ngue>

/ŋ/

only in harangue, meringue, tongue /həˈræŋ, məˈræŋ, tʌŋ/ (contrast <n, gu, e> as separate graphemes in dengue /ˈdeŋgeɪ/)

2-phoneme graphemes

See <ng> possibly pronounced /ŋk/, four rows above, and Notes

Notes

Medially in stem and compound words, the letters <n, g> are always separate graphemes representing separate phonemes except in the words listed under exceptions to the main system and Oddities above.

Before <e, i, y> the regular pronunciation of <n, g> is /nʤ/ (e.g. Abinger, angel, congeal, danger, dungeon, engender, ginger, harbinger, messenger, tangent; engine, ingenious, laryngitis; dingy, stingy), i.e. <n, g> follow their main rules. Exceptions:

1) <n, g> pronounced /ŋg/ before <e, i> (there appear to be no such cases before <y>): anger, conger, finger, hunger, linger, long-er/est, malinger, mangel, monger, strong-er/est, young-er/est; diphthongise, fungi – here the <n> has its regular pronunciation before <g> - see previous section, but the pronunciation of the <g> as /g/ is the irregular one before <e, i>

2) <n, g> pronounced /nʒ/ before <e> (there appear to be no such cases before <i, y>): only in ingenue, lingerie pronounced /ˈlænʒəriː/ (also pronounced /ˈlɒnʤəreɪ/)

3) <n, g> pronounced /ŋʤ/ before <e> (there appear to be no such cases before <i, y>): only in longevity

4) <ng> pronounced /ŋ/ before <e, i, y>): none in stem words, but as noted above there are hundreds of suffixed examples.

Before <a, o, u> and consonant letters the regular pronunciation is /ŋg/ (e.g. angle, elongate, England, fungus, language, langur, prolongation, single), i.e. the <n> has its regular pronunciation before <g> - see previous section, and the pronunciation of the <g> is also regular. Exceptions:

1) <ng> pronounced /ŋ/ before <a, o> (there appear to be no exceptions before <u>): only in clangorous, clangour, hangar

2) <ng> pronounced /n/ or /ŋk/ before a consonant letter: see length, etc., in the Oddities.

Word-finally, <n, ge> are always separate graphemes representing separate phonemes, with <n> always pronounced /n/ and <ge> usually pronounced /ʤ/ - but this is a small set: arrange, change, grange, mange, range, strange; flange, orange, phalange; challenge, revenge, scavenge; cringe, fringe, hinge, singe, swinge, tinge, whinge; sponge; lounge, scrounge; lunge, plunge. To avoid confusion with singing, swinging, the verbs singe, swinge retain the <e> before <-ing>: singeing, swingeing, as does spongeing to avoid the mispronunciation that might arise from *sponging. Exceptions:

1) with final <n, ge> pronounced /nʒ/: only in melange

2) with final <n, g, e> as three separate graphemes: only in conge /ˈkɒnʒeɪ/ (sometimes spelt even within English text with French <é>).

N.B. For once, one with their initial but unwritten /w/ see the entry for <o> in chapter 10, section 10.27; and for all the graphemes beginning <oi> which have correspondences beginning with consonant phoneme /w/ (<oi, oir, oire, ois>) see the entry for <oi> in chapter 10, section 10.29.

9.25 <p, pp>

N.B. <ph> has a separate entry.

The main system

Only phoneme

/p/

100%

e.g. apt, apple

The rest

pronounced

Exceptions to main system

(none)

Word-final doubled letter + <e>

<ppe>

/p/

only in grippe, steppe

Oddities

<pb>

/b/

only in Campbell, cupboard, raspberry /ʼkæmbəl, ˈkʌbəd,ˈrɑːzbriː/

<pe>

/p/

only in canteloupe, troupe /ˈkæntəluːp, truːp/ (contrast canape, recipe /ˈkænəpeɪ, ˈresɪpiː/). See Notes

<pn>

/n/

only word-initial and only in words derived from Greek πνευ̑μα pneuma (‘breath’) or πνεύμων pneumon (‘lung’), e.g. pneumatic, pneumonia

<pph>

/f/

only in sapphic, sapphire, Sappho /ˈsæfɪk, ˈsæfaɪə, ˈsæfəʊ/

<ps>

/s/

only word-initial and only in some words of mainly Greek origin, e.g. psalm, psalter, psephology, pseud(o) and many compounds, psionic, psittacosis, psoriasis, psych(e/o) and many compounds, and a few more very rare words. /p/ surfaces in metempsychosis – see section 7.2

<pt>

/t/

only in Deptford, ptarmigan, pterodactyl (Greek, = ‘wing finger’), pterosaur (Greek, = ‘wing lizard’), Ptolem-y/aic, ptomaine, receipt and a few more very rare words. /p/ surfaces in archaeopteryx, helicopter, reception, receptive – see section 7.2

2-phoneme graphemes

(none)

Notes

In the vast majority of cases of word-final <p, e> the <e> is part of a split digraph (except canape (sometimes spelt even within English text with French <é>), recipe) and the <p> is a separate grapheme (including in canape, recipe).

For <pa> in comparable, separate /ˈseprət/ (adjective), separatist, <pe> in deepening, desperate, halfpenny, opening, operable, operative, prosperous, temperament, temperature, twopenny, <pi> in aspirin, <po> in corporal, corporate, policeman pronounced /ˈpliːsmən/, temporary see section 6.10.

9.26 <ph>

The main system

Basic phoneme

/f/

99%

e.g. philosophy and many other words mainly of Greek origin

The rest

pronounced

Exceptions to main system

<1% in total

<ph>

/p/

only in diphtheria, diphthong, naphtha, ophthalmic, shepherd. The first four also have pronunciations with /f/ – e.g. /ˈdɪfθɒŋ/ versus /ˈdɪpθɒŋ/

<ph>

/v/

only in nephew pronounced /ˈnevjuː/ (also pronounced /ˈnefjuː/), Stephen

Oddities

<phth>

/t/

only in phthisic, phthisis pronounced /ˈtaɪsɪk, ˈtaɪsɪs/

<phth>

/θ/

only in apophthegm /ˈæpəθem/, phthalate /ˈθæleɪt/

2-phoneme graphemes

(none)

Note

<p, h> are separate graphemes only at morpheme boundaries in compound words, e.g. cuphook, tophat. And <ph, th> are separate graphemes in some of the words listed just above.

N.B. For <pp> see under <p>.

9.27 <q>

The main system

Only phoneme

/k/

100%

e.g. quick

The rest

pronounced

Doubled letter

(does not occur)

Exceptions to main system

(none)

Oddities

For percentages see Note

<qu>

only /k/
(not /kw/)

occurs initially or medially (never finally) in about 46 words mainly of French origin, namely bouquet, conquer (/w/ surfaces in conquest – see section 7.2), coquette, croquet, croquette, etiquette, exchequer, liqueur, liquor, liquorice, maquis, mannequin, marquee, marquetry, masquerade, mosquito, parquet, piquant, quatrefoil, quay, quenelle, quiche, so(u)briquet, tourniquet, and, in more conservative

speakers’ accents, questionnaire, quoits; medially also in applique, communique, manque, risque where the final <-e> is a separate grapheme (sometimes written even within English text as French <é>), unlike the words in the next paragraph; also phonemically but not orthographically word-final in opaque; claque, plaque; antique, bezique, boutique, clique, critique, mystique, oblique, physique, pique, technique, unique; toque; peruque; and a few more rare words where the final <e> is part of a split digraph with a preceding vowel letter spelling variously /eɪ, ɑː, iː, əʊ, uː/

<que>

as a trigraph
pronounced only /k/
(not /kw/ plus vowel)

occurs word-initially only in queue and medially only in milquetoast (where it is nevertheless stem-final in a compound word); otherwise only word-finally and only in about 18 words mainly of French origin, namely:

(1) with a preceding consonant letter such that <que> could be replaced by <k> without changing the pronunciation: arabesque, barque, basque, brusque pronounced /brʌsk/ (also pronounced/bruːsk/), burlesque, casque, catafalque, grotesque, marque, masque, mosque, picturesque, romanesque, statuesque, torque. However, in this group barque, basque, casque, marque, masque, torque are kept visually distinct from bark, bask, cask, mark, mask, torc;

(2) with a preceding vowel letter with a short pronunciation such that <que> could be replaced by <ck> without changing the pronunciation: baroque, cheque (cf. US check), monocoque, plaque pronounced /plæk/ (also pronounced /plɑːk/)

2-phoneme graphemes

(none)

Note

Gontijo et al. (2003) do not recognise <que> as a separate grapheme. However, their calculations show that <qu, que> pronounced /k/ together constitute 9% of pronunciations of <qu> and that the other 91% of occurrences of <qu> are pronounced /kw/.

9.28 <r, rr>

Never word-final as separate graphemes.

The main system

Only phoneme

/r/

100%

e.g. very, berry

The rest

pronounced

Exceptions to main system

(none)

Word-final doubled letter + <e>

<rre>

/r/

occurs only in barre, bizarre, parterre, where it forms part of the four-letter graphemes <arre, erre> and is not pronounced /r/ (except that <rr> represents /r/ after /r/-linking in bizarrery – see section 3.6)

Oddities

<re>

/ə/

100% of pronunciations of word-final <re>. Only word-final, and in that position almost entirely regular, e.g. centre, mitre. The only exceptions appear to be genre, macabre /ˈʒɒnrə, məˈkɑːbrə/, where <r, e> are separate graphemes representing separate phonemes

<re>

/r/

only in forehead pronounced /ˈfɒrɪd/

<rh>

/r/

only in words of Greek origin, e.g. rhinoceros, rhododendron. There are some 2-phoneme exceptions at morpheme boundaries, e.g. poorhouse, warhorse

<rrh>

/r/

only medially and only in a few words of Greek origin, namely amenorrhoea, arrhythmia, cirrhosis, diarrhoea,

gonorrhoea, haemorrhage, haemorrhoid, lactorrhoea, pyorrhoea, pyrrhic. N.B. In catarrh, myrrh <rrh> is not a separate grapheme, but forms part of the four-letter graphemes <arrh, yrrh> and is not pronounced /r/ (but in catarrhal /r/-linking occurs – see section 3.6)

2-phoneme graphemes

(none)

Note

For full treatment of /r/-linking, implying when stem-final <r> is and is not pronounced, see section 3.6.

9.29 <s, ss>

N.B. <se, sh, si, ssi> have separate entries.

The main system

Basic phoneme

/s/

<s> 56%, <ss> 89%

e.g. cats, grass. For <s>, except within split digraphs, /s/ is regular in all positions, including when <s> is a grammatical suffix or a contracted form after voiceless non-sibilant consonants. Only exceptions in word-initial position: sorbet (sometimes), sugar, sure and German pronunciations of sauerkraut, spiel, stein, strafe, stumm. For medial and final positions see Notes and Table 9.5. For <ss> see the exceptions to the main system, and <ssi>, section 9.32

Other phonemes
for
<s>

/z/

43%

e.g. dogs. Never word-initial (except in sorbet pronounced /ˈzɔːbeɪ/ (also pronounced /ˈsɔːbeɪ/) and German pronunciation of sauerkraut). Regular within split digraphs, and when <s> is a grammatical suffix or a contracted form after stem-final vowels and

voiced non-sibilant consonants. For final position otherwise and medial position, see Notes and Table 9.5

/ʒ/

<1%

always preceded by a vowel letter and followed by <ua, ur>; only medial and only in casual, sensual, usual, visual; (dis/en/fore-)closure, com/ex-posure, embrasure, erasure, leisure, measure, pleasure, treasure(r), treasury, usur-y/er/ious. Despite its rarity in the grapheme-phoneme direction, this correspondence belongs in the main system because of its status as a main-system correspondence in the phoneme-grapheme direction – see section 3.8.4

The rest

pronounced

Exceptions to main system

See also Table 9.5

<s>

/∫/

<1% of pronunciations of <s>. Only in (initially) sugar, sure, and German pronunciations of spiel, stein, strafe, stumm; (medially) asphalt pronounced /ˈæ∫felt/ (also pronounced /ˈæsfælt/), censure, commensurate, ensure, insure, tonsure

<ss>

/∫/

7% of pronunciations of <ss>. Only in assure, fissure, issue, pressure, tissue

<ss>

/z/

5% of pronunciations of <ss>. Only in Aussie, brassiere, dessert, dissolve (but contrast dissolution, with /s/), hussar, Missouri, possess (first <ss>), scissors

Word-final doubled
letter +
<e>

<sse>

/s/

except in divertissement, only word-final, e.g. bouillabaisse, crevasse, duchesse, finesse, fosse, impasse, lacrosse, largesse, mousse, noblesse, palliasse, wrasse and a few more rare words (and contrast retrousse /rəˈtruːseɪ/, sometimes spelt even within English text with French <é>)

Oddities

<sc>

/s/

98% of pronunciations of <sc>, but see Notes. Regular before <e, i, y>, e.g. ascend, disciple, scythe. Irregularly, also in corpuscle, muscle; /k/ surfaces in corpuscular, muscular – see section 7.2. Exception: sceptic, with /sk/, which is also the regular pronunciation (following the general rules for <s, c>) before <a, o, u> (corpuscle, muscle appear to be the only occurrences of <sc> before a consonant letter). For other exceptions see next 2 paragraphs

<sc>

/∫/

1% of pronunciations of <sc>. Only in conscie, conscientious, crescendo, fascis-m/t

<sc>

/z/

<1% of pronunciations of <sc>. Only in crescent pronounced /ˈkrezənt/ (also pronounced /ˈkresənt/)

<sce>

/s/

only word-finally in verbs ending <-esce>, e.g. acquiesce, coalesce, convalesce, deliquesce, effervesce, evanesce and some other very rare words, plus reminisce. The final <e> surfaces as /ə/ in some suffixes, e.g. convalescent – see section 7.2

<sch>

/∫/

only in maraschino, meerschaum, schedule, schemozzle, schist, schistosomiasis, schlemiel, schlep, schlock, schmaltz, schmo(e), schmooze, schnapps, schnauzer, schnitzel, schnozzle, schuss, schwa, seneschal. Except in these words and schism (next paragraph) and in a few cases across a morpheme boundary (discharge, escheat, eschew, mischance, mischief, mischievous, with /sʧ/), <s, ch> is always pronounced /sk/, e.g. school. For absence of percentages here and in next paragraph see Notes

<sch>

/s/

only in schism pronounced /ˈsɪzəm/

<sci>

/∫/

only in conscience, conscious, fascia, luscious /ˈkɒnʃəns,ˈkɒnʃəs, ˈfeɪʃə, ˈlʌʃəs/

<sj>

/∫/

only in sjambok /ˈʃæmbɒk/

<st>

/s/

regular before final <-en, -le>, e.g. chasten, christen, hasten, fasten, glisten, listen, moisten (exception: tungsten); castle, forecastle (whether pronounced /ˈfəʊksəl/ or

/ˈfɔːkɑːsəl/), nestle, pestle, trestle, wrestle, bristle, Entwistle, epistle, gristle, thistle, whistle, apostle, jostle, throstle, bustle, hustle, rustle; otherwise only in chestnut, Christmas, durstn’t, dustbin, dustman, mistletoe, mustn’t, ostler, Postlethwaite, Thistlethwaite, Twistleton, waistcoat pronounced /ˈweɪskəʊt/ and sometimes ghastly. /t/ surfaces in apostolic, epistolary – see section 7.2

<sth>

/s/

only in asthma, isthmus if pronounced without /θ/

<sw>

/s/

only in answer, coxswain, sword /ˈɑːnsə, ˈkɒksən, sɔːd/ and boatswain pronounced /ˈbəʊsən/ (also pronounced /ˈbəʊtsweɪn/)

2-phoneme grapheme

<s>

/ɪz/

only, following an apostrophe, in regular singular and irregular plural possessive forms after a sibilant consonant (/s, z, ∫, ʒ, ʧ, ʤ/), e.g. Brooks’s (book), jazz’s (appeal), Bush’s (government), (the) mirage’s (appearance), (the) Church’s (mission), (the) village’s (centre), (the) geese’s (cackling)

Notes

Given that /s/ is the regular pronunciation of medial <s>, Table 9.5 lists categories where medial <s> is instead pronounced /z/, plus sub-exceptions with /s/ (and a very few sub-sub-exceptions with /z/).

And given that /s/ is the regular pronunciation of word-final <s> (including when it is a grammatical suffix or contracted form after a voiceless non-sibilant consonant), here is a list of categories where word-final <s> is instead pronounced:

  • /z/

    1) regularly after vowels and voiced non-sibilant consonants when <s> is a grammatical suffix (regular noun plural and third person singular present tense verb and, following an apostrophe, regular singular and irregular plural possessive) or contracted from is, has. This includes plurals in <-es> pronounced /iːz/ of words of Greek and Latin origin which have singulars in <-is> pronounced /ɪs/, e.g. axes, crises, diagnoses, testes

    2) in a few function words: always, as, his, sans, and cos where this is the abbreviation of because

    3) plus a few content words: lens, missus, and series, species (whether singular or plural), plus cos, the lettuce and the abbreviation of cosine, which vary in pronunciation between /kɒz/ and /kɒs/

  • /ɪz/ - see the 2-phoneme pronunciation above.

For <(s)sa> in adversary, emissary, necessary, <(s)so> in promissory, reasonable, seasoning, <ste> in christening, listener, listening see section 6.10.

The percentages of /ʃ, z/ as pronunciations of <ss> are due solely to the high frequencies of a few words with these correspondences.

The percentages for <sc> depend on recognising it as a digraph rather than as two single-letter graphemes. However, the fact that it is mainly a digraph before <e, i, y> and hardly ever a digraph elsewhere helps with this.

Gontijo et al. (2003) state that /s/ accounts for 96% of pronunciations of <sch> and /ʃ/ for only 4% - but since <sch> pronounced /s/ occurs only in schism their corpus must have been very strange in this respect.

Table 9.5: Medial <s> pronounced /z/, with sub-exceptions pronounced /s/ and sub-sub-exceptions pronounced /z/.

For other exceptions see above.

Categories where medial <s> is exceptionally pronounced /z/

Sub-exceptions where medial <s> is pronounced /s/ (with a few sub-sub-exceptions with /z/)

Almost always before <b> and always before <d, g, l, m>), but except before <m>, where there are hundreds of examples (e.g. chasm, prism, seismic, talisman), this is a small set: asbestos, busby, husband, lesbian, presbyter, presbyterian pronounced /prezbɪˈtɪəriːjən/, raspberry (taking <pb> to be a spelling of /b/); Tuesday, Wednesday, Thursday, wisdom; phosgene; gosling, grisly, Islam, measles, measly, muslim, muslin, Oslo (but the Norwegian pronunciation has /s/), quisling

only in presbyterian pronounced /prespɪˈtɪəriːjən/, where the <b> also devoices, unusually, to /p/

Mostly after <m>, e.g. crimson, flimsy, helmsman, whimsical, whimsy

hamster

Mostly after <w>, e.g. blowsy, drowsy, frowsy

frowsty

In the prefix <trans-> where the following phoneme is a vowel or a voiced consonant, e.g. transact, transgress, transit(ion), translate, transmit, transmute

transitive, transom

Mostly between vowel letters

Where the following letter is <e, i> followed by another vowel letter - see the main entries for <se, si>;

In compounds, e.g. aforesaid, antiseptic, beside, research;

Always in the endings <-osity, -sive, -some>;

Mostly in the ending <-sy> (sub-sub-exceptions with /z/: busy, cosy, daisy, poesy, posy, queasy, and derived forms such as cheesy, easy, lousy (despite the /s/ in louse – see Notes to next section), noisy, nosy, prosy, rosy);

In prefix <dis-> (sub-sub-exceptions with /z/: disaster, disease);

In prefix <mis->;

In a set of Greek words ending <-sis> in singular and <-ses> in plural: analysis, basis, crisis, diagnosis, emphasis, oasis, prognosis, thesis;

Plus asylum, basin, bison, chrysalis, comparison, crusade, desecrate, desolate, desultory, dysentery, episode, gasoline, garrison, isolate, isosceles and other words beginning <iso->, kerosene, mason, nuisance, palisade, parasite, parasol, philosophy, prosecute, sausage, unison and sometimes venison

In the ‘sugar’ words dextrose, glucose, lactose, sucrose the ending <-ose> can be pronounced /əʊs/ or /əʊz/ and this may also be true of many of the (mostly rare) adjectives ending in <-ose> - but morose, verbose (at least) have only /əʊs/

In a few other odd words: absolve, absorb, absorption, bowser, geyser, hawser, observe, palsy, pansy, tansy

9.30 <se>

Never initial.

The main system

For both categories see Notes and Table 9.6. For the absence of percentages see Notes.

Basic phoneme

/s/

only word-final. Regular after a consonant letter; otherwise unpredictable

Other phoneme

/z/

only word-final. Regular (no exceptions) after <ai, au, ui>, but this covers only 10 words; otherwise unpredictable

The rest

pronounced

Exceptions to main system

(none)

Oddities

(N.B. All medial, therefore not classified as exceptions to main system)

<se>

/∫/

only in gaseous pronounced /ˈgeɪ∫əs/ (also pronounced /ˈgæˈsiːjəs/)

<se>

/z/

only in gooseberry /ˈgʊzbriː/, housewife ‘sewing kit’pronounced /ˈhʌzɪf/

<se>

/ʒ/

only in nausea, nauseous pronounced /ˈnɔːʒə(s)/ (also pronounced /ˈnɔːziːjə(s)/)

2-phoneme graphemes

(none)

Notes

Gontijo et al. (2003) do not recognise <se> as a separate grapheme, hence the absence of percentages. I have based my choice of /s/ as the basic phoneme for <se> on its predominance in Table 9.6. This is admittedly a sort of lexical, rather than a text, frequency (see section 3.3).

Initial <s, e> and (except in the few Oddities listed) medial <s, e> always are/ belong to separate graphemes. Word-finally, the only words in which <s, e> are separate single-letter graphemes appear to be tsetse, usually pronounced /ˈtetsiː/ and the three French loanwords blase, expose (‘report of scandal’) and rose (‘pink wine’), with /eɪ/ (increasingly spelt even within English text with French <é>). In almost all other cases of final <s, e> the <e> is part of a split digraph and the <s> is a single-letter grapheme – see previous section. Part of my definition of a split digraph (see section A.6 in Appendix A) is that the leading letter is not preceded by another vowel letter. This makes it easy to define and identify almost all the words ending <se> where these letters do form a digraph, namely those where <-se> is preceded by two vowel letters or a consonant letter: see again Table 9.6, which also distinguishes the relevant words according to /s, z/ pronunciations.

In the last row of the table are listed the only eight words in which the vowel letter before the <s> is a single vowel letter preceded in turn by a consonant letter, so that that vowel letter and the final <e> look as though they ought to form a split digraph, but do not; these are the only exceptions to my definition of grapheme <se> just above besides the four words listed earlier in the previous paragraph.

Given that the pronunciation of house as a verb is /haʊz/, the pronunciation of houses /ˈhaʊzɪz/ as a singular verb is regular, but as a plural noun shows a very rare irregularity: if it were regular it would be /ˈhaʊsɪz/ (the noun stem /haʊs/ plus the plural ending /ɪz/ which is regular after sibilant consonants). The voicing of the stem-final consonant is shared only with some words ending in /f/ in the singular but /vz/ in the plural, e.g. leaf/leaves, or in /θ/ in the singular but /ðz/ in the plural (in RP), e.g. bath(s), plus lousy with /z/ from louse with /s/ (and contrast mous(e)y with /s/).

Table 9.6: /s, z/ as pronunciations of word-final <se>.

/s/

/z/

After <ai, au, ui>

(none)

all, but this is a small set: appraise, braise, chaise, praise; applause, cause, clause, pause; bruise, cruise

After <ea, ee, oi, oo, ou, u>

cease, crease, decease, decrease, grease, increase, lease, release; geese; porpoise, tortoise; goose, loose, moose, noose, vamoose; douse,

appease, ease, please, tease; cheese; noise, poise; choose;

arouse, blouse, carouse, espouse, rouse, plus house /haʊz/ as a verb and (suffixed) houses /ˈhaʊzɪz/

grouse, louse, mouse, Scouse, souse, spouse, plus house /haʊs/ as a singular noun (see Notes); use (noun)

as a plural noun and singular verb (see Notes); fuse, muse, use (verb)

After <r, w> (which here always form part of a vowel digraph)

all except those shown on right, including dowse (/daʊs/ ‘splash with water’, variant spelling of douse)

only in parse; hawse, tawse; browse, dowse (/daʊz/ ‘detect water’), drowse

After any other consonant letter

all except cleanse

only in cleanse

After consonant + vowel, so looking as though there is a split digraph

all, but this is a small set because final <e> after <s> is normally part of a split digraph (see Notes above Table and previous section): carcase, purchase; diocese /ˈdaɪəsɪs/; mortise, practise, premise, promise, treatise; purpose

(none)

For <se> in arsenal, arsenic see section 6.10.

9.31 <sh>

Only phoneme

/∫/

100%

e.g. ship, fish

Note

The only cases where, exceptionally, <s, h> do not form a digraph but belong to separate graphemes are at morpheme boundaries in compound words, e.g. mishandle, mishap, mishit. In dishonest, dishonour, however, there is no /h/ phoneme, so the letter <h> is (according to your analysis) either ‘silent’ or part of a grapheme <ho> pronounced /ɒ/. I prefer the latter analysis – see /ɒ/, section 5.4.4, and <h>, section 9.16.

9.32 <si>

Only medial.

The main system

Basic phoneme

/ʒ/

55%

regular when both preceded and followed by vowel letters, e.g. vision. In all such words the stress falls on the vowel preceding /ʒ/ spelt <si>, and that vowel is always spelt with a single letter and has its letter-name pronunciation, e.g. evasion, cohesion, erosion, collusion, except that <i> is always short /ɪ/, e.g. collision. See Notes

Other phoneme

/∫/

45%

regular between a preceding consonant letter (which is always one of <l, n, r>) and a following vowel letter, e.g. emulsion, repulsion; pension, tension; aversion, controversial, excursion, reversion, torsion, version. In all these cases the stress falls on the vowel preceding <l, n, r>. Also, where the preceding consonant letter is <l, n> the preceding vowel is spelt with a single letter which has its ‘short’ pronunciation; where the consonant letter is <r> it forms a digraph with the vowel letter and the pronunciation is either /ɜː/ where the digraph is <er, ur> or /ɔː/ where it is <or> (there are no words ending <-arsion, -irsion>). See Notes

The rest

pronounced

Exception to main system

medial <si>

/z/

only in business. See also section 6.10

Oddities

(none)

2-phoneme graphemes

(none)

Notes

<s, i> never form a digraph word-initially or –finally; medially they form a digraph only when followed by stem-final <-on>, plus business, controversial.

Given that the contexts in which the two pronunciations occur are almost entirely distinct, <si> is almost 100% predictable. The only exception is that version is now often pronounced /ˈvɜːʒən/ rather than /ˈvɜːʃən/.

N.B. For <ss> see under <s>.

9.33 <ssi>

Only medial.

Only phoneme

/∫/

100%

regular when both preceded and followed by vowel letters, e.g. accession, admission, discussion, fission, intercession, obsession, passion, percussion, permission, recession, remission. Exception: dossier, in either pronunciation (/ˈdɒsiːjə, ˈdɒsiːjeɪ/). In all these cases, including dossier, the stress falls on the vowel preceding/∫/ spelt <ssi>, and that vowel is spelt with a single letter which has its ‘short’ pronunciation

Note

In all other cases, <ss, i> are/belong to separate graphemes, e.g. in missile, passive.

9.34 <t, tt>

N.B. <tch, th, ti> have separate entries.

Basic phoneme

/t/

<t>94%,
<tt> 100%

e.g. rat, rattle

Other phoneme for <t>

/ʧ/

2% of
pronunciations
of <t>

regular before <u> followed by either another vowel letter or a single consonant letter and then a vowel letter, e.g. (in initial position) tuba, tube, tuber, Tuesday pronounced /ˈʧuːzdiː/, tuition pronounced /ʧuːˈwɪ∫ən/, tulip, tumour, tumult, tumultuous, tumulus, tuna, tune pronounced /ˈʧuːn/, tunic, tureen, tutor; (medially) impromptu; gargantuan, perpetuate; attitude, multitude, solitude; statue, virtue; habitue; intuition, pituitary; costume; fortunate, fortune, importune, opportune; capture, mature and dozens of other words in <-ture> and derivatives such as adventurous(ly), natural(ly); centurion, century, saturate; virtuoso; obtuse;
de/in/pro/re/sub-stitution
; also in several groups where the stress is always on the syllable preceding /ʧ/ spelt <t>: actual(ly), perpetual(ly), virtual(ly) and several other words in <-tual(ly)>; actuary, estuary, mortuary, obituary,

sanctuary, statuary, voluptuary; congratulate, fistula, petulan-t/ce, postulant, postulate, spatula, titular; contemptuous, fatuous, impetuous, tempestuous, tumultuous (again) and several other words in <-tuous>. Though rare in this direction, this correspondence qualifies as part of the main system because of the high frequency and predictability of /ʧ/ spelt <t> - see section 3.7.2

The rest

pronounced

Exceptions to main system

<t>

/∫/

5% of pronunciations of <t>. Mainly before <iat> with the <i> pronounced /iː/, e.g. differentiate, expatiate, ingratiate, initiate, negotiate, propitiate, satiate, substantiate, vitiate, plus minutiae, otiose pronounced /ˈəʊʃiːjəʊs, ˈəʊʃiːjəʊz/ (also pronounced /ˈəʊtiːjəʊs, ˈəʊtiːjəʊz/), partiality, ratio. Partial exceptions: novitiate can be pronounced with or without the /iː/: /nəˈvɪʃiːjət, nəˈvɪʃət/ and can therefore follow either this rule or the main rule for <ti>, see section 9.37; also, some of the words listed have alternative pronunciations with /s/, e.g. negotiate, substantiate as either /nɪˈgəʊʃiːjeɪt, səbˈstænʃiːjeɪt/ or /nɪˈgəʊsiːjeɪt, səbˈstænsiːjeɪt/. See also next paragraph

<t>

/s/

<1% of pronunciations of <t>. Only the penultimate <t> in about 10 words ending in <-tiation>, e.g. differentiation, initiation, negotiation,

propitiation, transubstantiation, and only for RP-speakers who avoid having two occurrences of medial /∫/ in such words (see Notes under /∫/, section 3.7.3), plus a few words where <t> is alternatively pronounced /∫/ - see previous paragraph. In French, on the other hand, /s/ is one of the most frequent correspondences for <t>

Word-final doubled letter + <e>

<tte>

/t/

only word-final, e.g. cigarette, gavotte. All such words have stress on the syllable ending in /t/ spelt <-tte> except etiquette, omelette, palette, which have stress on the first syllable. In latte <tt, e> are separate graphemes, as are <u.e, tt> in butte

Oddities

<te>

/t/

mainly word-final and in that position in at least 120 words, namely

- Bacchante, composite, compote, confidante, cote, debutante, definite, detente, dirigiste, enceinte, entente, entracte, exquisite, favourite, granite, hypocrite, infinite, minute (‘sixtieth of an hour’), opposite, perquisite, plebiscite, pointe, requisite, riposte, route, svelte

- about 30 nouns/adjectives in <-ate> pronounced /ət/ where the verbs with the same spelling are pronounced with /eɪt/, e.g. advocate, affiliate, aggregate, alternate (here with also a difference in stress and vowel pattern: noun/adjective pronounced /ɔːlˈtɜːnət/, verb pronounced /ˈɔːltəneɪt/), animate, appropriate, approximate, articulate, associate, certificate, coordinate, curate (here with also a difference in meaning and stress: noun (‘junior cleric’) pronounced /ˈkjʊərət/, verb (‘mount an exhibition’) pronounced /kjʊəˈreɪt/), degenerate, delegate, deliberate (here with also a difference in syllable structure: adjective /dɪˈlɪbrət/ with three syllables

and an elided vowel – see section 6.10; verb /dɪˈlɪbəreɪt/ with four syllables), designate, desolate, duplicate, elaborate, estimate, expatriate, graduate, initiate, intimate, legitimate, moderate, pontificate (here with unrelated (?) meanings: noun (‘pope’s reign’) pronounced /pɒnˈtɪfɪkət/, verb (‘speak pompously’) pronounced /pɒnˈtɪfɪkeɪt/), precipitate (but here only the adjective has /ət/; the noun as well as the verb has /eɪt/), predicate, separate (here too with a difference in syllable structure: adjective /ˈseprət/ with two syllables and an elided vowel – see section 6.10; verb /ˈsepəreɪt/ with three syllables), subordinate, syndicate, triplicate. In the verbs and the many other nouns and adjectives with this ending pronounced /eɪt/, <e> is part of the split digraph <a.e> pronounced /eɪ/ and the <t> on its own is pronounced /t/

- a further set of at least 60 nouns/adjectives in <-ate> pronounced /ət/ with no identically-spelt verb, e.g. accurate, adequate, agate, appellate, celibate, climate, collegiate, conglomerate, (in)considerate, consulate, consummate, delicate, desperate, (in)determinate, directorate, disconsolate, doctorate, electorate, episcopate, extortionate, fortunate, illegitimate, immaculate, immediate, inanimate, in(sub)ordinate, inspectorate, intricate, inviolate, (bacca)laureate, legate, (il)literate, novitiate, obdurate, palate, particulate, (com/dis)passionate, private, profligate, proletariate,
(dis)proportionate, protectorate, proximate, roseate, senate, surrogate, (in)temperate, triumvirate, ultimate,
(in)vertebrate (a few of these words have related verb forms with <-ate>

pronounced /eɪt/: animate, legitimate, mediate, subordinate, violate)

- possibly just one word where both noun and verb have <-ate> pronounced /ət/: pirate

-pronounced also occurs medially in a few words in rapid speech, e.g. interest, literacy, literal, literary, literature, sweetener, veterinary – see section 6.10.

In all cases where is pronounced the is phonographically redundant, but in a couple it makes the words visually distinct from words without the and with an unrelated meaning: point, rout.

Exceptions where word-final <t, e> are separate graphemes: coyote, dilettante, (piano)forte, karate, machete /məˈʃetiː/, and the French loanwords diamante, naivete, pate (‘paste’), saute (sometimes spelt even within English text with French <é>)

<te>

/ʧ/

only in righteous

<ts>

/z/

only in tsar

<tsch>

/ʧ/

only in kitsch, putsch

<tw>

/t/

only in two and derivatives, e.g. twopence, twopenny. /w/ surfaces in between, betwixt, twain, twelfth, twelve, twenty, twice, twilight, twilit, twin - see section 7.2

2-phoneme graphemes

(none)

Notes

For <ta> in budgetary, commentary, dietary, dignitary, fragmentary, hereditary, military, momentary, monetary, pituitary, planetary, proprietary, salutary, sanitary, secretary pronounced /ˈsekrətriː/, sedentary pronounced /ˈsedəntriː/ (also pronounced /sɪˈdentəriː/), solitary, tributary, unitary, voluntary, <tau> in restaurant, <(t)te> in cemetery, dysentery, entering, et cetera, interest, literacy, literal, literature, literary /ˈlɪtrəriː/, monastery, mystery /ˈmɪstriː/, presbytery, sweetener, veterinary /ˈvetrɪnriː/, utterance, <to> in amatory, auditory, conciliatory, conservatory, contributory, declamatory, defamatory, de/ex/re/sup-pository, desultory, dilatory, dormitory, explanatory, exploratory, factory /ˈfæktriː/, history /ˈhɪstriː/, inflammatory, inhibitory, interrogatory, inventory pronounced /ˈɪnvəntriː/ (also pronounced /ɪnˈventəriː/), laboratory, lavatory, mandatory, nugatory, obligatory, observatory, offertory, oratory, predatory, preparatory, promontory, purgatory, repertory, retaliatory, signatory, statutory, territory, transitory, victory /ˈvɪktriː/, <tu> in accentual, actual(ly), actuary, adventurous(ly), conceptual, contractual, effectual, estuary, eventual, factual, habitual, intellectual, mortuary, mutual, natural(ly), obituary, perpetual, punctual, ritual, sanctuary, statuary, spiritual, textual, virtual, voluptuary see section 6.10.

All the words in which <t> is pronounced /ʧ/ were formerly pronounced with the sequence /tj/, and conservative RP-speakers may still pronounce them that way (or imagine they do). Pronunciations with /tj/ would require an analysis with the <t> pronounced /t/ and and the /j/-glide as part of the pronunciation of the <u> and following <r> or vowel letter. See <d>, section 9.11, for the largely parallel correspondence to voiced /ʤ/, <di> in the Oddities there, and <ti>, section 9.37.

9.35 <tch>

Only phoneme

/ʧ/

100%

e.g. match

Note

There appear to be no cases where <t, ch> are separate graphemes.

9.36 <th>

The main system

Basic phoneme

/ð/

88%

in all (content and function) words ending in <-ther>, e.g. brother, either, except anther, ether, panther, and in all function words (except both, through and Scots outwith), i.e. although, than, that, the,

thee, their, them, then, thence, there, these, they, thine, this, thither, those, thou (archaic second person singular pronoun), though, thus, thy, with, without; also in a very few other stem content words, namely algorithm, bequeath, betroth (but troth has /θ/), booth, brethren, farthing, fathom, heathen (but (unrelated) heath has /θ/), mouth /maʊð/ (verb), oath /əʊð/ (verb), rhythm, smithereens, smooth, swarthy, withy and derivatives, e.g. betrothal, plus some other derived forms: earthen, loathsome, norther-n/ly, smithy, souther-n/ly, worthy, even though their stems earth, loath, north, smith, south, worth have /θ/. Also, in RP, in plurals of some nouns which have /θ/ in the singular, e.g. baths, oaths, paths, youths /bɑːðz, əʊðz, pɑːðz, juːðz/

Other phoneme

/θ/

12%

in three function words (both, through and Scots outwith) and in most content words, e.g. anther, ether, methane, method, mouth /maʊθ/ (noun), oath /əʊθ/ (noun), panther, pith, thigh, thin, thou (informal abbreviation meaning ‘thousandth of an inch/thousand pounds/dollars’), threw

The rest

pronounced

Exceptions to main system

<1% in total

<th>

/t/

only in Thai, thali, Thame, Thames, Therese, Thomas, thyme, Wrotham /ˈruːtəm/

<th>

/ʧ/

only in posthumous /ˈpɒsʧəməs/

<th>

as 2-phoneme
sequence /tθ/

only in eighth /eɪtθ/

Oddities

<the>

/θ/

only in Catherine with first <e> elided (see section 6.10), saithe (/seɪθ/, ‘fish of cod family’)

<the>

/ð/

only word-final and only in breathe, loathe, seethe, sheathe, soothe, staithe, teethe, wreathe. Only exceptions: absinthe /æbˈsænt/, (the river) Lethe /ˈliːθiː/ (in Greek mythology), nepenthe /neˈpenθiː/

2-phoneme grapheme

(see above)

Notes

The communicative load of the /θ, ð/ distinction is very low – there are remarkably few minimal pairs differing strictly and only in these phonemes; even scraping the dictionary for rare words I have managed to identify only 10 such pairs. The only ones which are also identical in spelling appear to be mouth, oath, thou (for the distinctions in use/meaning see above), and the only pairs which are not identical in spelling appear to be lo(a)th/loathe, sheath/sheathe, teeth/teethe, wreath/wreathe, where the words in each pair are related in meaning, plus ether/either pronounced /ˈiːðə/ (also pronounced /ˈaɪðə/), sooth/soothe, thigh/thy, where they are not. Other pairs differing visually only in the absence or presence of final <e> (bath/bathe, breath/breathe,
cloth/clothe
, lath/lathe, swath/swathe) have a further phonological difference in the pronunciation of the preceding vowel grapheme; similarly, seeth (/ʼsiːjɪθ/, archaic 3rd person singular of see) differs from seethe /siːð/ in having two syllables rather than one.

The only cases where <t, h> do not form a digraph are at morpheme boundaries in compound words, e.g. adulthood, bolthole, carthorse, coathook, goatherd, hothouse, meathook, pothole, warthog.

For <tho> in catholic (as well as <the> in Catherine), see section 6.10.

9.37 <ti>

Only medial. For all categories see Notes.

The main system

Basic phoneme

/∫/

94%

regular when followed by <a, e, o>, e.g. confidential, inertia, infectious, nation, quotient; cf. Ignatius

The rest

pronounced

Exceptions to main system

<ti>

/ʧ/

5% Regular when preceded by <s> and followed by <o>, but occurs only in combustion, con/di/indi/in/sug-gestion, exhaustion, question, rumbustious, plus Christian

<ti>

/ʒ/

<1% only in equation

Oddities

(none)

2-phoneme graphemes

(none)

Notes

Given the different contexts in which /∫, ʧ/ occur, these pronunciations are almost 100% predictable.

In all cases other than those defined above, <t, i> are separate graphemes, e.g. in consortium pronounced /kənˈsɔːtiːjəm/ (also but less often pronounced /kənˈsɔːʃəm/), till, native; also in a few words which are exceptions to the main rule above: cation /ˈkætaɪən/, consortia pronounced /kənˈsɔːtiːjə/ (less often but, by the main rule above, more regularly pronounced /kənˈsɔːʃə/), fortieth, otiose, pitiable; and in two words which are sub-exceptions to <ti> pronounced /ʧ/, namely bastion /ˈbæstiːjən/, Christianity /krɪstiːˈjænɪtiː/; also, the first <ti> is pronounced /siː/ in about 10 words ending in <-tiation>, e.g. differentiation, initiation, negotiation, propitiation, transubstantiation, but only by RP-speakers who avoid having two occurrences of medial /∫/ in a word of this sort. See also sections 3.7.3 and 9.35.

N.B. For <tt> see under <t>.

N.B. Though <u, u.e> have or are involved in various consonantal pronunciations see, nevertheless, the entries for <u, u.e> in chapter 10, sections 10.36, 10.38.

9.38 <v>

N.B. <ve> has a separate entry.

The main system

Basic phoneme

/v/

100%

e.g. very, oven

The rest

pronounced

Exception to main system

<v>

/f/

only in kvetch, svelte, svengali, veldt

Doubled letter

<vv>

/v/

only in bevvy, bovver, chavvy, chivvy, civvy, divvy, flivver, lavvy, luvv-y/ie, navvy, revving, savvy, skivvy, spivvery, spivvy

Word-final doubled letter + <e>

(does not occur)

Oddities

(none)

2-phoneme graphemes

(none)

Note

For <vou> in favourable, favourite see section 6.10.

9.39 <ve>

Only phoneme

/v/

never initial; for medial position see Notes; frequent word-finally

Notes

<ve> pronounced /v/ occurs medially in average, deliverable, evening (noun, ‘late part of day’, pronounced /ˈiːvnɪŋ/, as distinct from the verb of the same spelling, ‘levelling’, pronounced /ˈiːvənɪŋ/), every, several, sovereign (for these words see also section 6.10), and in a large number of regular plural nouns and singular verbs, e.g. haves (vs have-nots), gives, grieves, initiatives, dissolves, lives (verb), loves, improves, stoves, preserves, mauves, gyves; also in a small number of irregular plural nouns ending in <-ves> pronounced /vz/ where the singular forms have <-f> pronounced /f/, namely calves, dwarves (the form dwarfs also exists), elves, halves, hooves, leaves, loaves, scarves, (our/your/them-)selves, sheaves, shelves, thieves, turves (the form turfs also exists), wharves, wolves, plus a very few nouns where the <f> in the singular is within the split digraph <i.e>: knives, lives (/laɪvz/; the singular verb of the same spelling is pronounced /lɪvz/),
(ale/good/house/mid-)wives (but if housewife ‘sewing kit’ pronounced /ˈhʌzɪf/ has a plural it is presumably pronounced /ˈhʌzɪfs/).

In only 33 words, in my analysis (behave, conclave, forgave, gave, shave, Khedive, suave, wave; breve, eve; alive, archive, arrive, deprive, drive, five, hive, jive, live (adjective, /laɪv/), naive, ogive, recitative, revive, survive, swive, wive; alcove, cove, drove, mangrove, move, prove; gyve) is the <e> of final <ve> part not only of that digraph but also of a split digraph with a preceding single vowel letter. In practice this makes no difference – the word-final phoneme is /v/, so this aspect hardly needs analysing.

Gontijo et al. (2003) do not recognise <ve> as a separate grapheme. However, word-finally and medially before final <s>, <ve> always indicates /v/ regardless of whether it is so recognised, so is 100% predictable. Only the medial occurrences in average, deliverable, evening (‘late part of day’), every, several, sovereign are problematic.

In other medial occurrences and all initial occurrences <v, e> are separate graphemes, e.g. vest, oven. The only word in which final <v, e> are separate graphemes appears to be agave /əˈgɑːviː/.

9.40 <w>

N.B. (1) <wh> has a separate entry.
(2) <aw, ew, ow> have separate entries in chapter 10, sections 10.10/
21/34.

The main system

Basic
phoneme

/w/

100%

e.g. way

The rest

pronounced

Exceptions to main system

(none)

Doubled letter

<ww>

/w/

only in bowwow, glowworm, powwow, slowworm

Oddity

<wr>

/r/

only in awry (only non-initial example), wrap, wrasse, wreck, wren, wrench, wrest(le), wretch(ed), wriggle, wring, wrinkle, wrist, write, wrong, Wrotham /ˈruːtəm/, wrought, wry and a few more rare words. The only words in which <w, r> do not form a digraph appear to be cowrie, dowry

2-phoneme graphemes

(none)

9.41 <wh>

The main system

Basic
phoneme

/w/

80%

e.g. what, which. See Notes

The rest

pronounced

Exceptions to main system

<wh>

/h/

20% Only in who, whom, whose, whole, whoop(ing), whooper, whore

Oddities

(none)

2-phoneme graphemes

(none)

Notes

The high percentage for <wh> pronounced /h/ is due to the very high frequency of who, whose, whole, and recognition of the few words where this correspondence obtains should not be problematic.

Where <wh> is pronounced /w/ in RP, in many Scots accents it is pronounced /ʍ/, which is the voiceless counterpart of /w/ and sounds roughly like ‘hw’; but because /ʍ/ is not a phoneme of RP this correspondence is not included in my analyses.

The very few cases where <w, h> do not form a digraph are at morpheme boundaries in compound words, e.g. sawhorse.

9.42 <x>

The main system

Basic pronunciation

/ks/

82%

e.g. box, next, six

The rest

Doubled letter

(does not occur)

pronounced

Exceptions to main system

18% in total

<x>

/z/

regular in initial position, e.g. xylophone (except that some people pronounce the Greek letter name xi as /ksaɪ/); medial only in anxiety pronounced /æŋˈzaɪjɪtiː/ (also pronounced /æŋˈgzaɪjɪtiː/); rare word-finally. See Notes

<x>

/k/

only in coxswain /ˈkɒksən/ and before <c> pronounced /s/ in a small group of words of Latin origin, namely exceed, excel(lent), except, excerpt, excess, excise, excite

<x>

as 2-phoneme sequence /gz/

16% Only in some polysyllabic words of Latin origin, namely anxiety pronounced /æŋˈgzaɪjɪtiː/ (also pronounced /æŋˈzaɪjɪtiː/), auxiliary, exact, exaggerate, exalt, exam(ine), example, exasperate, executive, executor, exemplar, exemplify, exempt, exert, exigency, exiguous, exile pronounced /ˈegzaɪl/ rather than /ˈeksaɪl/, exist, exonerate, exorbitant, exordium, exotic, exuberant, exude, exult and a few more rare words; also in Alexandra, Alexander and becoming frequent in exit /ˈegzɪt/ (also pronounced /ˈeksɪt/). See Notes

<x>

as 2-phoneme sequence /k∫/

1% Only in 3 words of Latin origin: flexure, luxury, sexual /ˈflekʃə, ˈlʌkʃəriː, ˈsekʃ(uːw)əl/

<x>

as 2-phoneme sequence /gʒ/

only in luxuriance, luxuriant, luxuriate, luxurious /lʌgˈʒʊəriːj-əns/ənt/eɪt/əs/

<x>

as 3-phoneme sequence /eks/

only in X-ray, etc. One of only two 3-phoneme graphemes in the whole language

Oddities

(none)

2-phoneme sequences

(in addition to the basic pronunciation and three of the exceptions to the main system)

<xe>

/ks/

only in annexe, axe, deluxe

<xh>

/gz/

only in 7 polysyllabic words of Latin origin:exhaust(ion), exhibit, exhilarat-e/ion, exhort, exhume

<xh>

/ks/

only in 3 polysyllabic derivatives of words in the previous group: exhibition, exhortation, exhumation

<xi>

/k∫/

only in anxious, complexion, connexion (also spelt connection), crucifixion, fluxion, (ob)noxious

3-phoneme sequence

(see above)

Notes

In almost all words beginning <ex-> followed by a vowel letter, if the stress is on the initial <e>, the <x> is pronounced /ks/, but if the stress is on the next vowel the <x> is pronounced /gz/. The only exceptions are exile, which is usually pronounced /ˈegzaɪl/, i.e. with initial stress but irregular /gz/ (though a regularised spelling pronunciation /ˈeksaɪl/ is sometimes heard); exit, which (conversely) is usually pronounced /ˈeksɪt/, i.e. with initial stress and regular /ks/, but is increasingly heard as (irregular) /ˈegzɪt/, perhaps under the influence of exile; and cf. doxology, luxation, proximity with /ks/ despite the stress being on the following vowel. This tendency to pronounce <x> as /gz/ before the stressed vowel applies also to the given names Alexandra, Alexander, but their abbreviated forms Alexa, Alex have /ks/ because the stress falls earlier.

<x> pronounced /z/ occurs word-initially only in some words of Greek origin, namely xanthine, xanthoma, xanthophyll, xenon, xenophobia and several other words beginning xeno-, Xerox and several other words beginning xero-, xylem, xylene, xylophone and several other words beginning xylo-. Word-finally, the plurals of some French loanwords ending in <-eau> are sometimes spelt French-style with <x> as well as <s>, e.g. beau-s/x, bureau-s/x, flambeau-s/x, gateau-s/x, plateau-s/x, portmanteau-s/x, trousseau-s/x; indeed, my dictionary gives only the <x> form in bandeaux, chateaux, rondeaux, tableaux. In all these cases <x> is also pronounced /z/. In my opinion the <x> form is outmoded and unnecessary.

For <xo> in inexorable see section 6.10.

9.43 <z, zz>

The main system

Basic phoneme

/z/

<z> 97%, <zz> 97%

e.g. zoo, dazzle, jazz

The rest

pronounced

Exceptions to main system

3% of pronunciations of both graphemes in total

<z>

/s/

only in blitz(krieg), chintz, ersatz, glitz, howitzer, kibbutz, kibitz, klutz, lutz, pretzel, quartz, ritz, schmaltz, schnitzel, seltzer, spritz(er), Switzerland, waltz, wurlitzer

<z>

/ʒ/

only in azure pronounced /ˈæʒə, ˈeɪʒə/ (also pronounced /ˈæzj(ʊ)ə, ˈeɪzj(ʊ)ə/), seizure /ˈsiːʒə/

<z>

as 2-phoneme
sequence /ts/

only in Alzheimer’s, bilharzia, nazi (but Churchill said /ˈnɑːziː/), scherzo, schizo(-)

<zz>

as 2-phoneme
sequence /ts/

only in intermezzo, paparazzi, pizza, pizzicato

Word-final doubled letter + <e>

(does not occur)

Oddities

<ze>

/z/

only word-final. In other positions <z, e> are separate graphemes, e.g. in zest. The only word in which final <z, e> are separate graphemes is kamikaze

<zi>

/ʒ/

only in brazier, crozier, glazier pronounced /ˈbreɪʒə, ˈkrəʊʒə, ˈgleɪʒə/ (also pronounced /ˈbreɪziːjə, ˈkrəʊziːjə, ˈgleɪziːjə/)

2-phoneme sequences

(see above)

Note

The spelling <zh> is also used to represent /ʒ/, but because it occurs only in transcriptions of Russian names, e.g. Zhivago, Zhores, I have not added it to the inventory of graphemes.

9.44 Some useful generalisations about graphemes beginning with consonant letters

Almost all occurrences of geminate consonant letters are pronounced identically to the single letter. (Rule 28 in Clymer, 1963/1996 expresses this as ‘When two of the same consonants are side by side, only one is heard.’) To experienced users of English this may seem too obvious to state, but there are known instances (see, for example, Burton, 2007: 27) of adult literacy learners saying, when this was pointed out to them, ‘Why did no-one ever tell me that? I thought there must be two sounds because there are two letters, and could never work them out’. And I have witnessed an 11-year-old boy having to be taught this by his catch-up scheme tutor.

There are minor exceptions under <gg, ll, ss, zz> among main-system graphemes (and a few more under geminate consonants among the rest), but the only major set of exceptions is words with <cc> pronounced /ks/ - and even here most instances exhibit regular correspondences: the first <c> is pronounced regularly /k/ before a consonant letter, and the second <c> is pronounced regularly /s/ before <e, i, y>, e.g. accent, occiput, coccyx, so that here the real irregularities are the few words with <cc> pronounced /k/ before <e, i, y>: baccy, biccy, recce (short for reconnoitre), soccer, speccy, streptococci, and the two words with <cc> sometimes pronounced /s/ before <i>: flaccid, succinct – both have more regular pronunciations with /ks/, and there seem to be no such exceptions before <e, y>. This generalisation about geminate consonant letters is a very strong rule.

The five non-geminate doubled consonant graphemes (<ck, dg, dge, tch, ve>) and three of the four digraphs with <h> as the second letter (<ph, sh, th>) have virtually no irregular pronunciations, even though <th> has two major regular ones. <ch> is the exception, with several irregular pronunciations.

In addition, the lists in this chapter reveal two useful context-sensitive patterns:

  • The six main-system graphemes other than <sh> which are pronounced /∫/, namely <ce, ci, sci, si, ssi, ti>, are fairly easy to distinguish from occurrences of these sequences which are not pronounced /∫/: these graphemes occur with this pronuncation only in medial position, and then mainly between two vowel letters, though <si> always has a consonant letter between it and the preceding vowel letter, and <ci, sci> sometimes have. Also, five of these graphemes (in these contexts) have only one pronunciation. (The exception is <ti>, where a few words have /ʧ/ instead, one (equation) has /ʒ/, and in a few words with two occurrences of <ti> before a vowel letter (e.g. negotiation) there is alternation between /∫/ and /s/.) This pattern, unlike the next one, would be difficult to formulate as a rule, and learners need to pick it up;
  • The ‘soft’ pronunciations of <c, g> as /s, ʤ/ occur in similar contexts to each other (before <e, i, y>), and the ‘hard’ pronunciations /k, g/ correspondingly elsewhere.

The latter generalisation is simple enough to be taught as a rule, but teachers need to be alert to cases where learners may over-generalise it. It never applies to <ch, tch>, and learners will find various groups of (real or apparent) exceptions (some very rare):

1) exceptions to ‘<c> followed by <e, i, y> is pronounced /s/’:

  • (with /k/) arced, arcing, Celt, Celtic (but the Glasgow football team is /ˈseltɪk/), chicer, chicest, sceptic (in British spelling) and words beginning encephal- pronounced /eŋkefəl-/ (also pronounced with /ensefəl-/)
  • (with /ʃ/) cetacean, crustacea(n), Echinacea, liquorice, ocean, siliceous and words ending in <-aceous> pronounced /ˈeɪʃəs/, e.g. cretaceous, curvaceous, herbaceous, sebaceous and about 100 others, mostly scientific and all very rare; also officiate, speciality, specie(s), superficiality and sometimes ap/de-preciate, associate
  • (with /ʧ/) only cellist, cello, concerto
  • (with <cc> pronounced /k/) baccy, biccy, recce (short for reconnoitre), soccer, speccy, streptococci;

2) exceptions to ‘<c> is pronounced /k/ everywhere else (except before <h>)’:

  • (with /s/): apercu, facade;

3) exceptions to ‘<g> followed by <e, i, y> is pronounced /ʤ/’:

  • (a fair number with /g/ (see section 9.15), some of them high-frequency words): gear, get, give, tiger; giggle, girl, give
  • (with other but rare pronunciations): see section 9.15;

4) exceptions to ‘<g> is pronounced /g/ everywhere else’:

  • (with /ʤ/): gaol, margarine, Reg, veg, and second <g> in mortgagor
  • (with other but mostly rare pronunciations): see section 9.15
  • (with <gg> pronounced /ʤ/): arpeggio, exaggerate, loggia, Reggie, suggest, veggie, vegging.

For practical purposes with young learners, the rule about the ‘soft’ and ‘hard’ pronunciations of <c, g> can be considered 100% reliable, though they would probably need to be taught liquorice and ocean.

Inspection of the headings of sections 9.6-43 will show that about half give the percentage of the basic pronunciation as 100%, and several others are close to that. In quite a few other cases attention to the context will combine lower percentages into something over 90% or in the upper 80%’s. The only ones in the lower 80%’s are <wh, x>. Overall the predictability of the pronunciations of main-system graphemes beginning with consonant letters is probably over 90%. The two major exceptions are medial and final <s> and word-final <se>, both of which have the two main pronunciations /s, z/, and for which few useful generalisations can be given. Even so, the pronunciations of consonant graphemes are much more predictable than those of vowel graphemes, as is obvious from chapter 10.