Category Archives: Information

Information Reduction 1: Coding

Two types of coding

In a previous post, I described two fundamentally different types of coding. In the first, the intention is to carry all the information contained in the source over into the encoded version. In the second, on the other hand, we deliberately refrain from doing this. It is the second – the information-losing – type that is of particular interest to us.

When I highlighted this difference in my presentations twenty years ago and the phrase ‘information reduction’ appeared prominently in my slides, my project partners pointed out that this might not go down too well with the audience. After all, everyone wants to win; nobody wants to lose. How can I promote a product for which loss is a quality feature?

Well, sometimes we have to face the fact that the thing we have been trying to avoid at all costs is actually of great value. And that’s certainly the case for information-losing coding.

Medical coding

Our company specialised in the encoding of free-text medical diagnoses. Our program read the diagnoses that doctors write in free text in their patients’ medical records and automatically assigned them a code based upon a standard coding system (ICD-10) with about 15,000 codes (Switzerland, 2019). This sounds like a lot, but the number is small considering the billions of distinguishable diagnoses and diagnostic formulations that occur in the field of medicine (see article). Of course, the individual code cannot contain more information than the standard code is able to discern for the case in question. The full-text diagnoses usually contained more information than this and our task was to automatically extract the relevant parts from the free texts in order to assign the correct code. We were fairly successful in this attempt.

Coding is part of a longer chain

But coding is only one step in a bigger process. Firstly, the information-processing chain extends from codes to flat rates per case (Diagnosis Related Groups = DRGs). Secondly, the free texts to be coded in the medical record are themselves the result of a multi-stage chain of information processing and reduction that has already been performed. Overall, a hospital case involves a chain made up of the following stages from patient examination to flat rate per case:

  • Patient: amount of information contained in the patient.
  • Doctor: amount of information about the patient that the doctor recognises.
  • Medical record: amount of information documented by the doctor.
  • Diagnoses: amount of information contained in the texts regarding the diagnoses.
  • Codes: amount of information contained in the diagnosis codes.
  • Flat rate per case: amount of information contained in the flat rate per case.

The information is reduced at every step, usually quite dramatically. The question is, how does this process work? Can the reduction be automated. And is it a determinate process, or one in which multiple options exist?


This is a page about information reduction — see also overview.

Translation: Tony Häfliger and Vivien Blandford

Two Types of Coding 2

The two types of coding in set diagrams

I would like to return to the subject of my article Two types of coding 1 and clarify the difference between the two types of coding using set diagrams. I believe this distinction is so important for the field of semanticsand for information theory in general, that it should be generally understood.

Information-preserving coding

The information-preserving type of coding can be represented using the following diagram

Mengendiagramm 1:1-Kodierung

Fig 1: Information-preserving coding (1:1, all codes reachable)

The original form is shown on the left and the encoded form on the right. The red dot on the left could, for example, represent the letter A and the dot on the right the Morse code sequence dot dash. Since this is a 1:1 representation, you can always find your way back from each element on the right to the initial element on the left, i.e. from the dot dash of Morse code to the letter A.

Mengendiagramm 1:1-Kodierung, nicht alle Kodes erreicht

Fig. 2: Information-preserving coding (1:1, not all codes reachable)

Of course, a 1:1 coding system preserves information even if not all codes are used. Since the unused ones can never arise during coding, they play no role at all. For each element of the set depicted on the right that is used for a code, there is exactly one element in the initial form. The code is therefore reversible without loss of information, i.e. decodable, and the original form can be restored without loss for each resulting code.

Mengendarstellung: Informationserhaltende Kodierung (1:n)

Fig. 3: Information-preserving coding (1:n)

With a 1:n system of coding, too, the original form can be reconstructed without loss. An original element can be coded in different ways, but each code has only one original element. There is thus no danger of not getting back to the initial value. Again, it does not matter whether or not all possible codes (elements on the right) are used, since unused codes never need to be reached and therefore do not have to be retranslated.

For all the coding ratios shown so far (1:1 and 1:n), the original information can be fully reconstructed. It doesn’t matter whether we choose a ratio of 1:1 or 1:n, or whether all possible codes are used or some remain free. The only important thing is that each code can only be reached from a single original element. In the language of mathematics, information-preserving codes are injective relations.

Information-reducing coding
Mengendiagramm: Informationsreduzierende Kodierung

Fig. 4: Information-reducing coding (n:1)

In this type of coding, several elements from the initial set point to the same code, i.e. to the same element in the set of resulting codes. This means that the original form can no longer be reconstructed at a later time. The red dot in the figure on the right, for example, represents a code for which there are three different initial forms. The information about the difference between the three dots on the left is lost in the dot on the right and can never be reconstructed. Mathematicians call this a non-injective relation. Coding systems of this type lose information.

Although this type of coding is less ‘clean’, it is nevertheless the one that interests us most, as it typifies many processes in reality.

Two Types of Coding 1

A simple broken bone

In the world of healthcare, medical diagnoses are encoded to improve transparency. This is necessary because they can be formulated in such a wide variety of different ways. For example, a patient may suffer from the following:

– a broken arm
– a distal radius fracture
– a fractura radii loco classico
– a closed extension fracture of the distal radius
– a Raikar’s fracture, left
– a bone fracture of the left distal forearm
– an Fx of the dist. radius l.
– a Colles fracture

Even though they are constructed from different words and abbreviations, all the above expressions can be used to describe the same factual situation, some with more precision than others. And this list is by no means exhaustive. I have been studying such expressions for decades and can assure you without any exaggeration whatsoever that there are billions of different formulations for medical diagnoses, all of them absolutely correct.

Of course, this  huge array of free texts in all variations cannot be processed statistically. The diagnoses are therefore encoded, often using the ICD (International Classification of Diseases) system, which comprises between 15,000 and 80,000 different codes depending on variant. That’s a lot of codes, but much clearer than the billions of possible text formulations it replaces.

Incidentally, the methods used to automate the interpretation of texts so that it can be performed by a computer program are a fascinating subject.

Morse code 

Morse code is used for communication in situations where it’s only possible to send very simple signals. The sender encodes the letters of the alphabet in the form of dots and dashes, which are then transmitted to the recipient, who decodes them by converting them back into letters. An E, for example, becomes a dot and an A becomes a dot followed by a dash. The process of encoding/decoding is perfectly reversible, and the representation unambiguous.

Cryptography

In the field of cryptography, too, we need to be able to translate the code back into its original form. This approach differs from Morse code only in that the translation rule is usually a little more complicated and is known only to a select few. As with Morse code, however, the encrypted form needs to carry the same information as the original.

Information reduction

Morse code and cryptographic codes are both designed so that the receiver can ultimately recreate the original message. The information itself needs to remain unchanged, with only its outer form being altered.

The situation is quite different for ICD coding. Here, we are not dealing with words that are interchangeable on a one-for-one basis such as tibia and shinbone – the ICD is not, and was never intended to be, a reversible coding system. Instead, ICD codes are like drawers in which different diagnoses can be placed, and the process of classification involves deliberately discarding information which is then lost for ever. This is because there is simply too much detail in the diagnoses themselves. For example, a fracture can have the following independent characteristics:

– Name of the bone in question
– Site on the bone
– State of the skin barrier (open, closed)
– Joint involvement (intra-articular, extra-articular)
– Direction of the deformity (flexion, extension, etc.)
– Type of break line (spiral, etc.)
– Number and type of fracture fragments (monoblock, comminuted)
– Cause (trauma, tumour metastasis, fatigue)
– etc.

All these characteristics can be combined, which multiplies the number of possibilities. A statistical breakdown naturally cannot take all combination variants into account, so the diagnostic code covers only a few. In Germany and Switzerland, the ICD can cope with fewer than 20,000 categories for the entire field of medicine. The question of what information the ‘drawers’ can and cannot take into account, is an important topic both for players within the healthcare system and those of us who are interested in information theory and its practical application. Let’s turn now to the coding process.

Two types of coding

I believe that the distinction described above is an important one. On the one hand, we have coding systems that aim to preserve the information itself and change only its form, such as Morse code and cryptographic systems. On the other hand, we have systems such as those for encoding medical diagnosis. These aim to reduce the total amount of information because this is simply too large and needs to be cut down – usually dramatically – for the sake of clarity. Coding to reduce information behaves very differently from coding to preserve information.

This distinction is critical. Mathematical models and scientific theories that apply to information-preserving systems are not suitable for information-reducing ones. In terms of information theory, we are faced with a completely different situation.

What Information does a Bit Convey?


What does a bit mean?

The question may seem trivial – after all, everyone knows that a bit represents a choice between 0 and 1. Isn’t it?

So, what’s the problem?

The problem is that 0 and 1 are not the only answers. 0 and 1 are just a pair of possible instantiations for a bit, especially useful if you look at numbers, but there are many more possiblities.

Classic examples are:

– yes and no
– true and false
– positive and negative
– on and off

to name but a few. Other options include male/female, inside/outside, good/bad and any other binary pair you care to think of, as well as the associated inversions: for example, 1/0 as well as 0/1.

Bits select information. This information can be numbers, but information encompasses much more than numbers. Laymen think that bits are based on 0 and 1, but 0 and 1 are only a possible instatiation of something more general, namely information.

How does the bit know which two states it can represent?

You may think that the quality of information (like number count, truth value, positivity, direction etc) is an inherent property of the bit itself, with one bit representing 0/1 and another one representing for example true/false. This is not the case. Chip manufacturers do not assign individual properties to the bits within their chips in this way. From a technical point of view, all bits are identical, and it is precisely this simplicity and neutrality that makes binary, i.e. bit technologies so attractive.

Only when the computer runs a program do the previously neutral bits take on individual value pairs such as 0/1, true/false, etc. It is thus the program that assigns a meaning to the two states of the bit.

Of course, this has practical benefits because it means that the meaning assigned to a particular bit within a chip can be redefined time and time again, depending what program happens to be running. But it also means we have to face the fact that this meaning is not an inherent property of the bit itself, but a function of the program that calls it up, and thus of different bits entirely – those of the program.

But where do these bits take their meaning from? Of course, these bits also have value pairs assigned to them from the outside in exactly the same way. And so it goes on, with the meaning of value pairs always being assigned from the outside by other bits. In other words, this is an infinite regress in which every bit that explains another bit itself has to be explained by yet another.


Where does the chain end?

This search for the bits that explain other bits never ends; that is the nature of an infinite regress. But we do still have a chance of finding the end of the chain: the search is only hopeless as long as we remain within the confines of the computer. Human beings, however, can think beyond these limits. The program was written for a certain purpose and it is human beings – programmers and users – who decide what meaning they want the bits to have. This meaning, and thus the specific individual value pairs assigned to the bits, emerges offline at the end of the regress as an implicit understanding in the minds of human beings.

This has taken us beyond the world of bits, and I would argue that this is unavoidable: until we escape these bounds, we remain in a world that is precise but completely devoid of meaning. Individual bits only take on meaning when this is assigned from the outside, i.e. when we link them to information that is meaningful to us as human beings. This allows us to resolve the infinite regress.

Seen in isolation, the two states of the bit are completely neutral, which means we can assign them any meaning we like. Technically, that’s a stroke of genius. But we shouldn’t deceive ourselves that we can generate meaningful information from bits alone. There always needs to be an ‘outside’ that assigns meaning to the bits.


We thus have two types of information:

1. The isolated bit:

This indicates which of the two states of the bit is selected without describing the states themselves. It is the ‘technical’ bit of information theory.

2. The meaning assigned to the bit:

This tells us what the bit is all about, i.e. which two states it allows us to choose between. This is the qualitative information that can be expressed using the bit. Although we assign this information to the bit itself, it disappears as soon as we look at the bit in isolation.

These two types of information are fundamentally different.

Despite – or maybe precisely because of – this difference, they belong together. It is only by their combination that meaningful information is produced.

D. M.  MacKay: Selective and descriptive information content

Following the information science pioneer D. M. MacKay, we can call the two types of information the selective and the descriptive information content.