Dot, dot, dot, dash, dash, dash, dot, dot, dot: SOS! You have most certainly heard of this so-called ‘Morse Code’. Morse Code is a code in which letters, numbers and even some punctuation marks are represented in dots and dashes. Back in the days, this was a very useful way of communicating over longer distances. With a so-called telegraph, people would send small electric shocks, where a short shock would be a dot and a longer shock would be a dash. They used predetermined intervals to indicate for example a space in a sentence. The inventor of this code, Samuel Morse, tried to make the code in such a way that is could be transmitted as fast as possible, that is, he tried to make his code optimal. But how optimal was his code really?
Before I can explain to you how Samuel Morse tried to optimize his code, we need to know what the Morse Alphabet is. The Morse alphabet shows by how many dots and/or dashes a letter, number or punctuation mark is represented:
But why is, for example, the letter ‘A’ represented by ‘dot-dash’ and the letter ‘B’ by ‘dash-dot-dot-dot’? This all has to do with the frequency of these letters in the ‘average word’. As stated previously, a dot is a short electric shock, while a dash is a bit longer electric shock. What we know from this, is that it takes longer to send out a ‘B’ than it takes to send out an ‘A’. Obviously, the letter ‘A’ is way more common than the letter ‘B’, therefore it made sense to Samuel Morse to give the letter ‘A’ a shorter code. He tried to fully optimize his code by doing this for every letter and that is how he came up with the entire Morse alphabet.
In the time Samuel Morse invented the code, the technology was not developed far enough to get precise numbers on how often a letter occurs in the most commonly used words. These days we do have the techniques to perform such tests, which shows some interesting results. The following table shows the frequency of a letter in the official English language (UK):
Letter | Frequency | Letter | Frequency |
a | 0.081 | n | 0.067 |
b | 0.015 | o | 0.075 |
c | 0.028 | p | 0.019 |
d | 0.043 | q | 0.0009 |
e | 0.127 | r | 0.060 |
f | 0.022 | s | 0.063 |
g | 0.020 | t | 0.100 |
h | 0.061 | u | 0.028 |
i | 0.070 | v | 0.010 |
j | 0.0015 | w | 0.024 |
k | 0.008 | x | 0.0015 |
l | 0.040 | y | 0.019 |
m | 0.024 | z | 0.0007 |
How can we apply these frequencies to see how Morse did the ‘optimization job’? Define a dot as ‘1’ and define a dash as a ‘3’. Assume that the time it takes between two characters to be transmitted is 1 second. Using this, sending for example an ‘A’ (dot-dash) would take 1+1+3 = 5 seconds. The following table shows the number of seconds it takes for every letter to be sent sorted from the letter with the highest frequency to the letter with the lowest frequency:
Letter | Frequency | No. seconds |
e | 0.127 | 1 |
t | 0.091 | 3 |
a | 0.082 | 5 |
o | 0.075 | 11 |
i | 0.070 | 3 |
n | 0.067 | 5 |
s | 0.063 | 5 |
h | 0.061 | 7 |
r | 0.060 | 7 |
d | 0.043 | 7 |
l | 0.040 | 9 |
c | 0.028 | 11 |
u | 0.027 | 7 |
m | 0.024 | 7 |
w | 0.023 | 9 |
f | 0.022 | 9 |
g | 0.020 | 9 |
y | 0.019 | 13 |
p | 0.018 | 11 |
b | 0.015 | 9 |
v | 0.010 | 9 |
k | 0.008 | 9 |
j | 0.002 | 13 |
x | 0.0015 | 11 |
q | 0.001 | 13 |
z | 0.0007 | 11 |
From this table we can immediately conclude that Morse his optimization process was not really that optimal after all. For example, the letter ‘O’ takes 11 seconds to transmit and the letter ‘I’ takes 3 seconds to transmit, but the letter ‘O’ is more frequent than the ‘I’.
Researchers re-arranged the entire Morse Code to be optimal using this table. The result was that with the old Morse Alfabet, it took on average 6.1 seconds to send out a letter, while with theirs it only took 5.7 seconds. That is a difference of 0.4 seconds per letter, which can make a huge difference in the long run. And as we always say: time is money!
This article was written by Lars Beute