Techniques | College of Biological Sciences

Techniques

The ideas behind error-correcting codes are not really difficult, they just take a few pages to explain. The explanation is entirely verbal and symbolic, and does not require conventional mathematics. Suppose the following phrase is to be protected from errors that might occur when it is later mechanically read:

joy of a misty dawn

Careful application of redundancy is one of the keys to protection. For example, the

phrase could simply be replicated three times. When the copies were later mechanically

read, if any two of them matched, one of the two matching copies could be taken as correct. That would be a form of “majority rule.” If none of the three matched, there would be no majority and an error would be reported.

The problem with such a simple approach is not just that it wastes space, but that the method itself is prone to systematic error. A mechanical reader that misinterpreted one

copy could easily misinterpret the other copies in exactly the same way. For example, if the mechanical reader frequently misinterpreted the symbols j and i, then all three copies could match and be incorrect at the same time.

More useful is what Otto Schmitt (1913–1998) called “non-replicate redundancy.” More

than one copy is kept, but each in a different guise. In this case, full copies are not needed

at all; compressed copies that are functionally related to the full copy will suffice.

Error detection

The phrase to be protected can be compressed by repeatedly mapping pairs of adjacent

symbols into a single symbol. Suppose for illustration that the only symbols of interest are the lower-case letters of the English alphabet, plus a null symbol representing blank space. Arrange the symbols in alphabetic order, with the blank ﬁrst, so that each symbol has a successor. Thus a is the successor of blank, b is the successor of a, c of b, and forth. The last symbol of the alphabet, z, also has a successor; it merely wraps around to the beginning, making the successor of z be blank.

Now the line can be compressed to generate simple non-replicate redundancy in the following way. A blank space followed by any symbol is compressed to that symbol. An a followed by any symbol is compressed to the successor of that symbol. A b followed by any symbol is compressed to the second successor of that symbol (the successor of the successor). A c followed by any symbol is compressed to the third successor. And so forth. The entire set of compressions become the following 27 ×27 table with a regular structure:

(Below, the symbol represents a blank space.)

Now the process of compression begins by replacing the ﬁrst pair of adjacent symbols with the corresponding value from the table. The ﬁrst pair of symbols is jo. In row j and column o of the table is the symbol y. Therefore, the line

joy_of_a_misty_dawn

is rewritten as

yy_of_a_misty_dawn

Now the process repeats in the same way, replacing yy with the corresponding value from row y, column y of the table, which is w, giving

w_of_a_misty_dawn

As the table shows, any symbol followed by a blank space () is replaced by that symbol, so the line becomes

wof_a_misty_dawn

The process repeats until only a single symbol remains:

kf_a_misty_dawn

q_a_misty_dawn

qa_misty_dawn

r_misty_dawn

rmisty_dawn

disty_dawn

msty_dawn

ety_dawn

yy_dawn

w_dawn

wdawn

_awn

awn

The ﬁnal symbol in this sequence is a guard symbol representing the entire line, in this case k. Now the line can be written with that guard symbol in front, like this:

k joy of a misty dawn