Phoneme Frequency Tables

Phoneme Frequency Tables

We've identified two ways to rank the possible spelling of any given phoneme for our data set of over 26,000 words. In the first, we count the number of times a phoneme-grapheme correspondence occurs in the list of words. In the second, we give a weight to each word in the list depending on its frequency in connected text, then multiply each correspodence by that weight.

Why Adding a Frequency Weight Matters

Let's consider the sound /v/, which can be represented by the letter 'v' or the letter 'f'. In our data, there are about 1,700 words that use 'v' and only one word ('of') that uses 'f' (technically two words use this, thereof, being the other). Let's look at an excerpt from "Wizard of Oz" by L. Frank Baum to see why this matters.

When Dorothy stood in the doorway and looked around, she could see nothing but the great gray prairie on every side. Not a tree nor a house broke the broad sweep of flat country that reached to the edge of the sky in all directions. The sun had baked the plowed land into a gray mass, with little cracks running through it. Even the grass was not green, for the sun had burned the tops of the long blades until they were the same gray color to be seen everywhere.

As we can see above, the word 'of' occurs as often as all the other words that include the letter 'v'. It is one of the most common words in English!

So, if you're reading connected text and come across a spelling of the sound /v/, it's just as likely to be spelled with the letter 'f' as it is with the letter 'v'. This is a rather extreme example. Many of the other sounds have similar results regardless of how we rank them, but we can use this data to inform our teaching approaches.

Below you can see the breakdown of each correspondence for the sound /v/. We have a table for the direct frequencies and one for the weighted frequencies.

Grapheme Frequency (total) Frequency (%)
v 2,096 84.08
ve 393 15.76
f 2 0.08
vv 2 0.08
Grapheme Weighted Frequency (total) Weighted Frequency (%)
f 4,137,188.65 43.80
v 3,559,712.14 37.69
ve 1,747,984.76 18.51
vv 111.77 0.00

Frequency Tables by Phoneme

We've run rankings for our entire data set. Below you can find the frequency and weighted frequency for the spellings of each phoneme. You can click on each grapheme to see the words with that correspondence in our Word List Builder.

See our Methodology at the bottom for more details.

Phoneme Grapheme Frequency (total) Frequency (%) Weighted Frequency (total) Weighted Frequency (%)
a
4,897 99.49 15,987,659.81 98.48
eah
1 0.02 140,297.71 0.86
au
8 0.16 50,826.56 0.31
al
6 0.12 48,540.10 0.30
a.e
8 0.16 6,623.63 0.04
ai
2 0.04 237.08 0.00
Phoneme Grapheme Frequency (total) Frequency (%) Weighted Frequency (total) Weighted Frequency (%)
e
8,356 96.55 13,588,258.80 87.10
ea
219 2.53 689,226.80 4.42
ai
37 0.43 647,964.29 4.15
a
19 0.22 511,072.37 3.28
ie
17 0.20 116,997.53 0.75
ay
1 0.01 46,282.60 0.30
eo
6 0.07 1,610.44 0.01
Phoneme Grapheme Frequency (total) Frequency (%) Weighted Frequency (total) Weighted Frequency (%)
i
12,094 97.48 25,087,652.69 97.34
ee
1 0.01 337,818.60 1.31
y
217 1.75 111,553.97 0.43
i.e
63 0.51 86,417.44 0.34
u
6 0.05 54,100.23 0.21
o
1 0.01 37,722.70 0.15
e
7 0.06 36,908.82 0.14
ei
13 0.10 15,907.97 0.06
ie
5 0.04 6,259.95 0.02
Phoneme Grapheme Frequency (total) Frequency (%) Weighted Frequency (total) Weighted Frequency (%)
o
2,605 97.20 5,496,344.84 93.96
ough
16 0.60 224,265.22 3.83
o.e
6 0.22 47,584.47 0.81
oh
4 0.15 42,596.08 0.73
ow
10 0.37 20,472.38 0.35
oa
17 0.63 8,363.32 0.14
e
18 0.67 7,751.62 0.13
ou
4 0.15 2,148.66 0.04
Phoneme Grapheme Frequency (total) Frequency (%) Weighted Frequency (total) Weighted Frequency (%)
a
1,510 73.95 3,908,972.70 83.87
aw
167 8.18 222,133.37 4.77
au
273 13.37 180,893.55 3.88
al
30 1.47 171,684.07 3.68
ea
29 1.42 85,184.34 1.83
augh
17 0.83 67,099.80 1.44
ah
4 0.20 19,356.51 0.42
awe
6 0.29 3,787.37 0.08
i
6 0.29 1,441.06 0.03
Phoneme Grapheme Frequency (total) Frequency (%) Weighted Frequency (total) Weighted Frequency (%)
o
3,083 29.70 9,309,240.02 29.56
e
439 4.23 7,820,763.24 24.83
a
3,384 32.60 7,523,153.05 23.89
u
2,771 26.70 4,609,617.63 14.64
o.e
49 0.47 1,004,372.30 3.19
ou
357 3.44 634,155.24 2.01
i
146 1.41 213,752.02 0.68
au
9 0.09 126,002.13 0.40
a.e
91 0.88 116,295.36 0.37
u.e
6 0.06 69,400.66 0.22
oo
10 0.10 39,081.15 0.12
ae
6 0.06 12,931.57 0.04
ah
20 0.19 12,026.23 0.04
y
5 0.05 825.44 0.00
ea
1 0.01 376.50 0.00
oe
2 0.02 247.03 0.00
Phoneme Grapheme Frequency (total) Frequency (%) Weighted Frequency (total) Weighted Frequency (%)
a
2,490 50.64 2,584,783.86 25.18
a.e
1,025 20.85 2,287,218.71 22.29
ay
281 5.71 1,757,598.91 17.12
ai
650 13.22 1,022,324.57 9.96
e.e
19 0.39 749,307.13 7.30
ey
34 0.69 719,392.33 7.01
e
231 4.70 402,970.36 3.93
ei
41 0.83 350,891.65 3.42
ea
53 1.08 308,270.68 3.00
eigh
37 0.75 46,559.46 0.45
aigh
7 0.14 17,812.68 0.17
u
3 0.06 10,386.62 0.10
et
29 0.59 4,089.28 0.04
ee
10 0.20 890.73 0.01
ae
6 0.12 770.22 0.01
ah
1 0.02 147.61 0.00
Phoneme Grapheme Frequency (total) Frequency (%) Weighted Frequency (total) Weighted Frequency (%)
e
2,592 31.71 15,303,261.25 58.44
y
2,439 29.83 4,636,652.06 17.71
ea
771 9.43 2,135,472.83 8.16
ee
653 7.99 1,832,052.85 7.00
i
948 11.60 681,685.09 2.60
e.e
97 1.19 666,382.90 2.54
ie
471 5.76 442,009.58 1.69
ey
92 1.13 153,463.81 0.59
ei
54 0.66 153,312.19 0.59
eo
3 0.04 149,259.13 0.57
i.e
45 0.55 30,565.82 0.12
oe
4 0.05 725.25 0.00
ay
3 0.04 673.47 0.00
ae
3 0.04 257.72 0.00
Phoneme Grapheme Frequency (total) Frequency (%) Weighted Frequency (total) Weighted Frequency (%)
i
1,359 49.93 5,261,092.82 49.77
i.e
680 24.98 2,144,818.58 20.29
y
317 11.65 1,957,133.23 18.51
igh
193 7.09 867,226.42 8.20
ie
71 2.61 142,684.26 1.35
eye
14 0.51 109,888.85 1.04
ye
9 0.33 29,004.54 0.27
y.e
22 0.81 22,485.22 0.21
ia
17 0.62 14,674.39 0.14
eigh
4 0.15 8,285.49 0.08
ai
13 0.48 5,618.79 0.05
ay
6 0.22 4,541.69 0.04
ei
16 0.59 3,075.85 0.03
oy
1 0.04 295.96 0.00
Phoneme Grapheme Frequency (total) Frequency (%) Weighted Frequency (total) Weighted Frequency (%)
o
3,092 70.90 7,692,163.77 64.06
o.e
466 10.69 1,437,519.15 11.97
ow
271 6.21 1,165,854.84 9.71
ou
67 1.54 850,427.05 7.08
oa
282 6.47 201,313.93 1.68
oh
1 0.02 173,629.30 1.45
a
88 2.02 160,414.71 1.34
oo
29 0.66 127,899.30 1.07
oe
26 0.60 96,975.42 0.81
ough
10 0.23 94,067.25 0.78
eau
12 0.28 3,334.90 0.03
ew
6 0.14 2,325.85 0.02
au
6 0.14 1,141.87 0.01
eaux
4 0.09 240.93 0.00
aux
1 0.02 62.34 0.00
Phoneme Grapheme Frequency (total) Frequency (%) Weighted Frequency (total) Weighted Frequency (%)
o
100 6.54 5,854,557.42 53.07
ou
66 4.32 3,210,647.67 29.10
oo
427 27.93 601,478.54 5.45
u
540 35.32 481,407.04 4.36
ew
117 7.65 304,695.94 2.76
u.e
135 8.83 254,494.38 2.31
ue
57 3.73 129,632.19 1.18
ough
4 0.26 117,003.02 1.06
ui
47 3.07 52,116.68 0.47
eu
26 1.70 13,402.34 0.12
oe
10 0.65 11,956.17 0.11
Phoneme Grapheme Frequency (total) Frequency (%) Weighted Frequency (total) Weighted Frequency (%)
u
600 76.82 647,090.87 61.86
u.e
94 12.04 223,821.48 21.40
ew
26 3.33 71,173.33 6.80
eau
11 1.41 58,296.59 5.57
ue
28 3.59 41,147.90 3.93
eu
17 2.18 3,236.84 0.31
ou
1 0.13 702.00 0.07
uu
4 0.51 624.14 0.06
Phoneme Grapheme Frequency (total) Frequency (%) Weighted Frequency (total) Weighted Frequency (%)
oul
7 1.73 844,557.35 38.45
oo
185 45.79 838,291.91 38.16
u
200 49.50 434,429.69 19.78
o
10 2.48 79,088.16 3.60
ou
2 0.50 391.33 0.02
Phoneme Grapheme Frequency (total) Frequency (%) Weighted Frequency (total) Weighted Frequency (%)
ou
528 67.43 2,120,075.90 65.59
ow
249 31.80 1,109,544.75 34.32
ough
6 0.77 2,896.92 0.09
Phoneme Grapheme Frequency (total) Frequency (%) Weighted Frequency (total) Weighted Frequency (%)
oi
179 61.72 236,381.31 53.81
oy
111 38.28 202,916.95 46.19
Phoneme Grapheme Frequency (total) Frequency (%) Weighted Frequency (total) Weighted Frequency (%)
er
3,913 67.00 6,358,157.53 63.19
or
615 10.53 895,874.83 8.90
ur
434 7.43 627,090.31 6.23
ir
211 3.61 497,736.23 4.95
ere
2 0.03 442,426.19 4.40
ure
120 2.05 315,398.73 3.13
ar
190 3.25 239,483.17 2.38
ear
60 1.03 215,502.08 2.14
re
97 1.66 189,762.66 1.89
r
32 0.55 110,404.38 1.10
urr
59 1.01 61,143.69 0.61
our
51 0.87 37,350.46 0.37
orr
4 0.07 29,438.42 0.29
err
30 0.51 23,029.25 0.23
ro
5 0.09 11,432.20 0.11
irr
8 0.14 6,011.42 0.06
yr
9 0.15 2,058.31 0.02
Phoneme Grapheme Frequency (total) Frequency (%) Weighted Frequency (total) Weighted Frequency (%)
ure
42 37.17 68,401.33 47.81
ur
56 49.56 55,580.32 38.85
eur
15 13.27 19,083.87 13.34
Phoneme Grapheme Frequency (total) Frequency (%) Weighted Frequency (total) Weighted Frequency (%)
b
3,946 95.43 8,022,237.82 98.75
bu
19 0.46 56,625.68 0.70
bb
170 4.11 45,264.29 0.56
Phoneme Grapheme Frequency (total) Frequency (%) Weighted Frequency (total) Weighted Frequency (%)
p
5,818 92.89 7,664,090.79 93.50
pp
445 7.11 532,754.96 6.50
Phoneme Grapheme Frequency (total) Frequency (%) Weighted Frequency (total) Weighted Frequency (%)
g
1,968 86.28 4,143,570.86 94.29
gu
57 2.50 154,646.26 3.52
gg
215 9.43 63,254.22 1.44
gue
23 1.01 21,300.79 0.48
gh
18 0.79 11,534.29 0.26
Phoneme Grapheme Frequency (total) Frequency (%) Weighted Frequency (total) Weighted Frequency (%)
c
5,611 67.91 7,005,962.16 58.52
k
1,485 17.97 3,803,932.10 31.78
ck
768 9.29 789,497.70 6.59
ch
219 2.65 201,773.81 1.69
cc
107 1.29 143,891.04 1.20
qu
25 0.30 11,814.67 0.10
que
27 0.33 10,239.86 0.09
cu
4 0.05 2,419.37 0.02
q
3 0.04 683.97 0.01
cch
8 0.10 675.35 0.01
kh
6 0.07 536.07 0.00
Phoneme Grapheme Frequency (total) Frequency (%) Weighted Frequency (total) Weighted Frequency (%)
d
7,286 83.04 19,240,167.03 90.14
ed
1,317 15.01 1,906,237.21 8.93
dd
171 1.95 197,960.63 0.93
Phoneme Grapheme Frequency (total) Frequency (%) Weighted Frequency (total) Weighted Frequency (%)
t
12,474 90.68 32,737,159.44 94.84
tt
529 3.85 897,564.27 2.60
ed
684 4.97 784,333.10 2.27
bt
17 0.12 46,605.65 0.14
te
18 0.13 27,471.56 0.08
th
10 0.07 12,934.33 0.04
tte
20 0.15 9,558.84 0.03
pt
4 0.03 2,906.58 0.01
Phoneme Grapheme Frequency (total) Frequency (%) Weighted Frequency (total) Weighted Frequency (%)
f
2 0.08 4,137,188.65 43.80
v
2,096 84.08 3,559,712.14 37.69
ve
393 15.76 1,747,984.76 18.51
vv
2 0.08 111.77 0.00
Phoneme Grapheme Frequency (total) Frequency (%) Weighted Frequency (total) Weighted Frequency (%)
f
2,779 80.76 7,370,160.02 89.76
ff
308 8.95 524,702.94 6.39
ph
324 9.42 176,319.45 2.15
gh
30 0.87 139,369.26 1.70
Phoneme Grapheme Frequency (total) Frequency (%) Weighted Frequency (total) Weighted Frequency (%)
j
461 31.82 937,141.94 43.57
g
602 41.55 619,796.98 28.82
ge
170 11.73 385,594.98 17.93
dge
59 4.07 66,170.57 3.08
d
30 2.07 34,647.20 1.61
gi
25 1.73 34,560.44 1.61
dg
68 4.69 23,523.35 1.09
di
7 0.48 22,564.43 1.05
gg
11 0.76 21,099.28 0.98
dj
16 1.10 5,819.97 0.27
Phoneme Grapheme Frequency (total) Frequency (%) Weighted Frequency (total) Weighted Frequency (%)
ch
753 65.54 1,775,433.29 77.90
t
184 16.01 297,950.40 13.07
tch
196 17.06 151,789.44 6.66
ti
11 0.96 53,668.64 2.35
c
4 0.35 368.16 0.02
tl
1 0.09 2.50 0.00
Phoneme Grapheme Frequency (total) Frequency (%) Weighted Frequency (total) Weighted Frequency (%)
s
5,881 91.26 10,503,965.31 92.87
se
37 0.57 425,800.57 3.76
z
431 6.69 177,845.39 1.57
es
12 0.19 144,261.57 1.28
ss
14 0.22 28,488.65 0.25
zz
50 0.78 12,560.76 0.11
ze
14 0.22 12,209.46 0.11
x
5 0.08 4,696.22 0.04
Phoneme Grapheme Frequency (total) Frequency (%) Weighted Frequency (total) Weighted Frequency (%)
s
9,805 78.80 16,191,319.47 79.30
c
1,126 9.05 1,522,834.71 7.46
ss
876 7.04 1,124,719.62 5.51
ce
310 2.49 947,092.47 4.64
se
156 1.25 456,228.04 2.23
st
52 0.42 93,580.39 0.46
sc
96 0.77 74,443.11 0.36
ps
22 0.18 6,525.75 0.03
Phoneme Grapheme Frequency (total) Frequency (%) Weighted Frequency (total) Weighted Frequency (%)
si
97 61.39 94,240.96 50.17
s
31 19.62 85,888.88 45.72
ge
14 8.86 4,754.86 2.53
g
8 5.06 1,106.02 0.59
z
2 1.27 977.69 0.52
ti
2 1.27 415.73 0.22
ci
1 0.63 190.04 0.10
ss
2 1.27 180.92 0.10
j
1 0.63 99.30 0.05
Phoneme Grapheme Frequency (total) Frequency (%) Weighted Frequency (total) Weighted Frequency (%)
sh
971 39.73 1,841,411.21 54.10
ti
1,078 44.11 971,826.35 28.55
s
23 0.94 193,499.38 5.68
ci
123 5.03 150,736.44 4.43
ssi
73 2.99 124,772.29 3.67
ch
63 2.58 29,078.81 0.85
ss
15 0.61 26,856.62 0.79
sci
13 0.53 20,067.88 0.59
c
19 0.78 14,009.90 0.41
si
22 0.90 12,997.33 0.38
shi
8 0.33 11,314.38 0.33
t
18 0.74 4,019.40 0.12
che
5 0.20 2,163.09 0.06
sc
5 0.20 828.64 0.02
sch
7 0.29 202.72 0.01
x
1 0.04 2.50 0.00
Phoneme Grapheme Frequency (total) Frequency (%) Weighted Frequency (total) Weighted Frequency (%)
th
186 88.57 21,522,822.01 99.90
the
24 11.43 22,289.47 0.10
Phoneme Grapheme Frequency (total) Frequency (%) Weighted Frequency (total) Weighted Frequency (%)
th
660 100.00 3,461,508.65 100.00
Phoneme Grapheme Frequency (total) Frequency (%) Weighted Frequency (total) Weighted Frequency (%)
m
5,460 93.78 13,835,561.60 97.90
mm
291 5.00 221,918.03 1.57
mn
15 0.26 40,918.79 0.29
mb
53 0.91 32,941.85 0.23
gm
3 0.05 378.06 0.00
Phoneme Grapheme Frequency (total) Frequency (%) Weighted Frequency (total) Weighted Frequency (%)
m
79 100.00 20,107.42 100.00
Phoneme Grapheme Frequency (total) Frequency (%) Weighted Frequency (total) Weighted Frequency (%)
n
11,961 96.63 31,325,176.28 97.04
kn
67 0.54 631,396.93 1.96
nn
274 2.21 250,874.09 0.78
gn
75 0.61 72,742.36 0.23
pn
1 0.01 382.32 0.00
Phoneme Grapheme Frequency (total) Frequency (%) Weighted Frequency (total) Weighted Frequency (%)
ng
3,535 87.07 4,150,634.86 82.91
n
519 12.78 844,914.55 16.88
ngue
6 0.15 10,445.03 0.21
Phoneme Grapheme Frequency (total) Frequency (%) Weighted Frequency (total) Weighted Frequency (%)
l
7,637 88.15 11,836,993.01 75.95
ll
1,027 11.85 3,747,532.03 24.05
Phoneme Grapheme Frequency (total) Frequency (%) Weighted Frequency (total) Weighted Frequency (%)
le
858 45.74 1,132,345.30 54.77
al
621 33.10 672,332.87 32.52
el
169 9.01 109,820.41 5.31
l
91 4.85 62,556.45 3.03
il
59 3.14 54,961.51 2.66
all
44 2.35 26,850.20 1.30
ol
8 0.43 6,898.80 0.33
ul
9 0.48 769.80 0.04
yl
6 0.32 603.02 0.03
ell
3 0.16 286.88 0.01
ull
6 0.32 124.18 0.01
ill
2 0.11 1.00 0.00
Phoneme Grapheme Frequency (total) Frequency (%) Weighted Frequency (total) Weighted Frequency (%)
r
10,268 96.38 19,579,044.00 97.02
rr
267 2.51 430,091.29 2.13
wr
103 0.97 165,941.18 0.82
rh
16 0.15 4,538.41 0.02
Phoneme Grapheme Frequency (total) Frequency (%) Weighted Frequency (total) Weighted Frequency (%)
h
1,353 98.69 8,997,666.60 94.16
wh
13 0.95 557,017.71 5.83
j
5 0.36 1,207.69 0.01
Phoneme Grapheme Frequency (total) Frequency (%) Weighted Frequency (total) Weighted Frequency (%)
w
1,155 85.62 8,327,455.60 78.39
wh
135 10.01 2,256,577.93 21.24
u
49 3.63 36,617.36 0.34
o
10 0.74 2,766.30 0.03
Phoneme Grapheme Frequency (total) Frequency (%) Weighted Frequency (total) Weighted Frequency (%)
o.e
6 85.71 570,434.25 88.14
o
1 14.29 76,740.20 11.86
Phoneme Grapheme Frequency (total) Frequency (%) Weighted Frequency (total) Weighted Frequency (%)
y
140 54.47 4,509,208.59 95.81
i
112 43.58 196,300.39 4.17
j
3 1.17 733.30 0.02
ll
1 0.39 332.72 0.01
g
1 0.39 31.18 0.00
Phoneme Grapheme Frequency (total) Frequency (%) Weighted Frequency (total) Weighted Frequency (%)
qu
334 94.89 382,610.22 95.69
cqu
18 5.11 17,246.65 4.31
Phoneme Grapheme Frequency (total) Frequency (%) Weighted Frequency (total) Weighted Frequency (%)
x
646 99.54 721,828.91 99.77
xe
3 0.46 1,658.21 0.23
Phoneme Grapheme Frequency (total) Frequency (%) Weighted Frequency (total) Weighted Frequency (%)
xi
6 54.55 12,089.55 65.79
x
5 45.45 6,287.61 34.21
Phoneme Grapheme Frequency (total) Frequency (%) Weighted Frequency (total) Weighted Frequency (%)
tz
10 40.00 3,306.51 49.24
zz
8 32.00 1,702.95 25.36
z
4 16.00 1,556.14 23.18
ts
3 12.00 148.87 2.22

Methodology

In our data we break each word into its phoneme-grapheme correspondences. We use two rankings to score their frequency: direct and weighted.

For direct ranking:
Frequency (total) - We count the number of times a correspondence occurs in each word in our list. If it is used more than once in a word, then it is counted for each occurrence.

Frequency (%) - The percentage of time that a phoneme is represented by a grapheme.

For weighted ranking:
We use two different word frequency lists to determine a word's frequency. First, we use a Project Gutenberg word frequency list from 2006 to get a baseline in written text. Since Project Gutenberg (PG) is limited to public domain books it does not have recent words like 'aliens' and 'radiation'. We used Open Subtitles, open source movie subtitles, as our second list since it includes recent movies and hence, modern words.

We multiply the Open Subtitles frequency by 4 to bring it in line with values from the Project Gutenberg frequencies and then average the two frequencies. We wanted common words like 'it' and 'that' to be roughly equivalent for both data sets.

Weighted Frequency (total) - We look at each word, then multiply the number of times a particular correspondence occurs in the word by the word's frequency, then we add all these values together for that correspondence.

Here's an example. This is the calculation for 'f' = /v/.
PG frequency for 'of': 33,950,064.00
Subtitle frequency for 'of': 1,847,884.00
Average frequency: (33,950,064.00 + (1,847,884.00 x 4)) / 2 = 20,670,800.00
Number of times 'f' = /v/ in the word 'of': 1
Weighted Frequency for 'f' = /v/ in the word 'of': 20,670,800.00 x 1 = 20,670,800.00

We repeat this calculation for the far less common word 'thereof' to get the weighted frequency of 15,143.25

Finally, we sum the weighted frequencies and divide by 5, the lowest frequency for all correspondences, to make the numbers more manageble.
Weighted Frequency (total): (20,670,800.00 + 15,143.25) / 5 = 4,137,188.65

Weighted Frequency (%): The percentage of time that when you come across that phoneme in text that it will be represented by that particular grapheme.

* These calculations are based on the count of phoneme-grapheme correspondences across 26,755 words as of August 2023.

Other References

For previous work in this area see Paul Hanna's paper Phoneme-Grapheme Correspondences as Cues to Spelling Improvement