Category Archives: Logic

What is Entropy?

Definition of Entropy

The term entropy is often avoided because it contains a certain complexity. The phenomenon entropy, however, is constitutive for everything that is going on in our lives. A closer look is worth the effort.

Entropy is a measure of information and it is defined as:

Entropy is the information
– known at micro,
– but unknown at macro level.

The challenge of this definition is:

  1. to understand what is meant by the micro and macro states and
  2. ​to understand why entropy is a difference.

What is Meant by Micro and Macro Level?

The micro level contains the details (i.e. a lot of information), the macro level contains the overview (i.e. less, but more targeted information). The distance between the two levels can be very small (as with the bit, where the microlevel knows just two pieces of information: on or off) or huge, as with the temperature (macrolevel) of the coffee in a coffee cup, where the kinetic energies of the many molecules (microlevel) determine the temperature of the coffee. The number of molecules in the cup is really large (in the order of Avogadro’s number 1023) and the entropy of the coffee in the cup is correspondingly high.

Entropy is thus defined by the two states and their difference. However, states and difference are neither constant nor absolute, but a question of observation, therefore relative.

Let’s take a closer look at what this relativity means for the macro level.

What is the Relevant Macro Level?

In many fields like biology, psychology, sociology, etc. and in art, it is obious to me as a layman, that the notion of the two levels is applicable to these fields, too. They are, of course, more complex than a coffee cup, so that the simple thermodynamic relationship between micro and macro becomes more complex.

In particular, it is conceivable to have a mixture of several macro-states occurring simultaneously. For example, an individual (micro level), may belong to the macro groups of the Swiss, the computer scientists, the older men, the contemporaries of 2024, etc – all at the same time. Therefore, applying entropy reasoning to sociology is not as straightforward as the simple examples like Boltzmann’s coffee cup, Salm’s lost key, or a basic bit might suggest.

Entropy as a Difference

Micro and macro level of an object both have their own entropy. But what really matters ist the difference of the two entropies. The bigger the difference, the more is unknown on the macro level about the micro level.

The difference between micro and macro level says a lot about the way we perceive information. In simple words: when we learn something new, information is moved from micro to macro state.

The conventional definition of entropy states that it represents the information present in the micro but absent in the macro state. This definition of entropy via the two states means that the much more detailed microstate is not primarily visible to the macrostate. This is exactly what Niklas Luhmann meant when he spoke of intransparency1.

When an observer interprets the incoming signals (micro level) at his macro level, he attempts to gain order or transparency from an intransparent multiplicity. How he does this is an exciting story. Order – a clear and simple macro state – is the aim in many places: In my home, when I tidy up the kitchen or the office. In every biological body, when it tries to maintain constant form and chemical ratios. In society, when unrest and tensions are a threat, in the brain, when the countless signals from the sensory organs have to be integrated in order to recognise the environment in a meaningful interpretation, and so on. Interpretation is always a simplification, a reduction of information = entropy reduction.

Entropy and the Observer

An essential point is that the information reduction from micro to macro state is always carried out by an active interpreter and guided by his interest.

The human body, e.g., controls the activity of the thyroid hormones via several control stages, which guarantee that the resulting state (macro state) of the activity of body and mind remains within an adequate range even in case of external disturbances.

The game of building up a macro state (order) out of the many details of a micro state is to be found everywhere in biology, sociology and in our everyday live.

There is – in all these examples – an active control system that steers the reduction of entropy in terms of the bigger picture. This control in the interpretation of the microstate is a remarkable phenomenon. Always when transparency is wanted, an information rich micro state must be simplified to a macro state with less details.

Entropy can then be measured as the difference in information from the micro to the macro level. When the observer interprets signals from the micro level, he creates transparency from intransparency.

Entropy, Re-Entry and Oscillation

We can now have a look at the entropy relations in the re-entry phenomenon as described by Spencer-Brown2. Because the re-entry ‘re-enters’ the same distinction that it has just identified, there is hardly any information difference between before and after the re-entry and therefore hardly any difference between its micro and macro state. After all, it is the same distinction.

However, there is a before and an after, which may oscillate, whereby its value becomes imaginary (this is precisely described in chapter 11 of Spencer-Browns book ‘Laws of Form’)2. Re-entries are very common in thinking and in complex fields like biology or sociology when actions and their consequences meet their own causes. These loops or re-entries are exciting, both in thought processes and in societal analysis.

The re-entries lead to loops in the interpretation process and in many situations these loops can have puzzling logical effects (see paradoxes1 sand paradoxes2 ). In chapter 11 of ‘Laws of Form’2, Spencer-Brown describes the mathematical and logical effects around the re-entry in details. In particular, he develops how logical oscillations occur due to the re-entry.

Entropy comes into play whenever descriptions of the same object occur simultaneously at different levels of detail, i.e. whenever an actor (e.g. a brain or the kitchen cleaner) wants to create order by organising an information-rich and intransparent microstate in such a way that a much simpler and easier to read macrostate develops.

We could say that the observer actively creates a new macro state from the micro state. However, the micro-state remains and still has the same amount of entropy as before. Only the macro state has less. When I comb my hair, all the hairs are still there, even if they are arranged differently. A macro state is created, but the information can still be described at the detailed micro level of all the hairs, albeit slightly altered in the arrangement on the macro level.
Re-entry – on the other hand – is a powerful logical pattern. For me, both re-entry and entropy complement each other in the description of reality. Distinction and re-entry are very elementary. Entropy, on the other hand, always arises when several things come together and their arrangement is altered or differentely interpreted.

See also:
Five preconceptions about entropy
Category: Entropy

Translation: Juan Utzinger


1 Niklas Luhmann, Die Kontrolle von Intransparenz, hrsg. von Dirk Baecker, Berlin: Suhrkamp 2017, S. 96-120

2 Georg Spencer Brown , Laws of Form, London 1969, (Bohmmeier, Leipzig, 2011)

Georg Spencer-Browns Distinction and the Bit

continues paradoxes and logic (part 2)


History

Before we Georg Spencer-Brown’s (GSB’s) distinction as basic element for logic, physics, biology and philosophy, it is helpful to compare it with another, much better-known basic form, namely the bit. This allows us to better understand the nature of GSB’s distinction and the revolutionary nature of his innovation.

Bits and GSB forms can both be regarded as basic building blocks for information processing. Software structures are technically based on bits, but the forms of GSB (‘draw a distinction’) are just as simple, fundamental and astonishingly similar. Nevertheless, there are characteristic differences.

 Fig. 1: Form and bit show similarities and differences

Both the bit and the Spencer-Brown form were found in the early phase of computer science, so they are relatively new ideas. The bit was described by C. A. Shannon in 1948, the distinction by Georg Spencer-Brown (GSB) in his book ‘Laws of Form’ in 1969, only about 20 years later. 1969 fell in the heyday of the hippie movement and GSB was warmly welcomed Esalen, an intellectual hotspot and starting point of this movement.  This may have put him – on the other hand – in a bad light and hindered the established scientific community to look closer into his ideas. While the handy bit vivified California’s nascent high-tech information movement, Spencer-Brown’s mathematical and logical revolution was rather ignored by the scientific community. It’s time to overcome this disparity.

Similarities between Distinction and Bit

Both the form and the bit refer to information. Both are elementary abstractions and can therefore be seen as basic building blocks of information.

This similarity reveals itself in the fact that both denote a single action step – albeit a different one – and both assign a maximally reduced number of results to this action, exactly two.

Table 1: Both Bit and Distinction each contain
one action and two possible results (outcomes)

Exactly one Action, Exactly Two Potential Results

The action of the distinction is – as name says – the distinction, and the action of the bit is the selection. Both actions can be seen as information actions and are as such fundamental, i.e. not further reducible. The bit does not contain further bits, the distinction does not contain further distinctions. Of course, there are other bits in the vicinity of the bit and other distinctions in the vicinity of a distinction. However, both actions are to be seen as fundamental information actions. Their fundamentality is emphasised by the smallest possible number of results, namely two. The number of results cannot be smaller, because a distinction of 1 is not a distinction and a selection of 1 is not a selection. Both are only possible if there are two potential results.

Both distinction and bit are thus indivisible acts of information of radical, non-increasable simplicity.

Nevertheless, they are not the same and are not interchangeable. They complement each other.

While the bit has seen a technical boom since 1948, its prerequisite, the distinction, has remained unmentioned in the background. It is all the more worthwhile to bring it to the foreground today and shed new light on what links mathematics, logic, the natural sciences and the humanities.


Differences

Information Content and Shannons Bit

Both form and bit refer to information. In physics, the quantitative content of information is referred to as entropy.

At first glance, the information content when a bit is set or a distinction is made appears to be the same in both cases, namely the information that distinguishes between two states. This is clearly the case with a bit. As Shannon has shown, its information content is log2(2) = 1. Shannon called this dimensionless value 1 bit. The bit therefore contains – not surprisingly – the information of one bit, as defined by Shannon.

The Bit and its Entropy

The bit measures nothing other than entropy. The term entropy originally came from thermodynamics and was used to calculate the behaviour of heat machines. Entropy is in thermodynamics the partner term of energy, but it applies – like the term energy – to all fields of physics, not just to thermodynamics.

What is Entropy ?

Entropy is a measure for the information content. If I do not know something and then discover it, information flows. In a bit, there are – before I know which one is true – two states possible, the two states of the bit . When I find out which of the two states is true, I receive a small basic portion of information with the quantitative value of 1 bit.

One bit decides about two results. If more than two states are possible, the number of bits increases logarithmically with the number of possible states; so it takes three binary elections (bits) to find the correct choice out of 8 possibilities. The number of choices (bits) behaves logarithmically to the number of possible choices, as the example shows.

Dual choice = 1 Bit = log2(2).
Quadruple choice= 2 Bit = log2(4)
Octuple choice = 3 Bit = log2(8)

The information content of a single bit is always the information content of a single binary choice, i.e. log2(2) = 1.

The bit as a physical quantity is dimensionless, i.e. a pure number. This suits because the information about the choice is neutral, and not a length, a weight, an energy or a temperature. The bit serves well as the technical unit of quantitative information content. What is different with the other basic unit of information, the form of Spencer-Brown?


The Information Content of the Form

The information content of the bit is exactly 1 if the two outcomes of the selection have exactly the same probability. As soon as one of the two states is less probable, its choice reveals more information. When it is selected despite its lower prior probability, this makes more of a difference and reveals more information to us. The less probable its choice is, the greater the information will be, if it is selected. The classic bit is a special case in this regard: the probability of its two states is equal by definition and the information content of the choice is exactely 1.

This is entirely different with Spencer-Brown’s form of distinction. The decisive factor lies in the ‘unmarked space’. The distinction distinguishes something from the rest and marks it. The rest, i.e. everything else, remains unmarked. Spencer-Brown calls it the ‘unmarked space’.

We can and must now assume that the remainder, the unmarked, is much greater, and the probability of its occurrence is much higher than the probability that the marked will occur. The information content of the mark, i.e. of the drawing the distinction, is therefore usually greater than 1.

Of course, the distinction is about the marked and the marked is what interests us. That is why the information content of the distinction is calculated based on the marked and not the unmarked.

How large is the space of the unmarked? We would do well to assume that it is infinite. I can never know what I don’t know.

The difference in information content, measured as entropy, is the first difference we can see between bit and distinction. The information content of the bit, i.e. its entropy, is exactly 1. In the case of distiction, it depends on how large the unmarked space is, but it is always larger than the marked space and the entropy of the distinction is therefore always greater than 1.


Closeness and Openness

Fig. 1 above shows the most important difference between distinction and bit, namely their external boundaries. These are clearly defined in the case of the bit.

The meaning in the bit

The bit contains two states, one of which is activated, the other not. Apart from these two states, nothing can be seen in the bit and all other information is outside the bit. Not even the meanings of the two states are defined. They can mean 0 and 1, true and false, positive and negative or any other pair that is mutually exclusive. The bit itself does not contain these meanings, only the information as to which of the two predefined states was selected. The meaning of the two states is regulated outside the bit and assigned from outside. This neutrality of the bit is its strength. It can take on any meaning and can therefore be used anywhere where information is technically processed.

The meaning in the distinction

The situation is completely different with distinction. Here the meaning is marked. To do this, the inside of the distinction is distinguished from the outside. The outside, however, is open and there is nothing that does not belong to it. The ‘unmarked space’, in principle, is infinite. A boundary is defined, but it is the distinction itself. That is why the distinction cannot really separate itself from the outside, unlike the bit. In other words: The bit is closed, the distinction is not.


Differences between Distinction and Bit

There are two essential differences between distionction and bit.

Table 2: Differences between Distinction (Form) and Bit


Consequences

The two difference between distinction and bit have some interesting consequences.

Example NLP (Natural Language Processing)

The bit, due to its defined and simple entropy and its close borders, has the technological advantage of simple usability, which we exploit in the software industry. Distinctions, on the other hand, are more realistic due to their openness. For our specific task of interpreting medical texts, we therefore came across the need to introduce openness into the bit world of technical software through certain principles: The keywords here are

  1. Introduction of an acting subject that evaluates the input according to its own internal rules,
  2. Working with changing ontologies and classifications,
  3. Turning away from the classical, i.e. static and montonic logic and turning towards a non-monotonic logic,
  4. Integration of time as a logical element (not just as a variable).

Translation: Juan Utzinger

 

Paradoxes and Logic (Part 2)

continues Paradoxes and Logic (part 1)


“Draw a Distinction”

Spencer-Brown introduces the elementary building block of his formal logic with the words ‘Draw a Distinction’. Figure 1 shows this very simple formal element:

Fig 1: The form of Spencer-Brown

A Radical Abstraction

In fact, his logic consists exclusively of this building block. Spencer-Brown has thus achieved an extreme abstraction that is more abstract than anything mathematicians and logicians have found so far.

What is the meaning of this form? Spencer-Brown is aiming at an elementary process, namely the ‘drawing of a distinction’. This elementary process now divides the world into two parts, namely the part that lies within the distinction and the part outside.

Fig. 2: Visualisation of the distinction

Figure 2 shows what the formal element of Fig. 1 represents: a division of the world into what is separated (inside) and everything else (outside). The angle of Fig. 1 thus becomes mentally a circle that encloses everything that is distinguished from the rest: ‘draw a distinction’.

The angular shape in Fig. 1 therefore refers to the circle in Fig. 2, which encompasses everything that is recognised by the distinction in question.

Perfect Continence

But why does Spencer-Brown draw his elementary building block as an open angle and not as a closed circle, even though he is referring to the closedness by explicitly saying: ‘Distinction is perfect continence’, i.e. he assigns a perfect inclusion to the distinction. The fact that he nevertheless shows the continence as an open angle will become clear later, and will reveal itself to be one of Spencer-Brown’s ingenious decisions.  ↝  imaginary logic value, to be discussed later.

Marked and Unmarked

In addition, it is possible to name the inside and the outside as the marked (m = marked) and the unmarked (u = unmarked) space and use these designations later in larger and more complex combinations of distinctions.

Fig. 3: Marked (m) and unmarked (u) space

Distinctions combined

To use the building block in larger logic statements, it can now be put together in various ways.

Fig. 4: Three combined forms of differentiation

Figure 4 shows how distinctions can be combined in two ways. Either as an enumeration (serial) or as a stacking, by placing further distinctions on top of prior distinctions. Spencer-Brown works with these combinations and, being a genuine mathematician, derives his conclusions and proofs from a few axioms and canons. In this way, he builds up his own formal mathematical and logical system of rules. Its derivations and proofs need not be of urgent interest to us here, but they show how carefully and with what mathematical meticulousness Spencer-Brown develops his formalism.

​Re-Entry

The re-entry is now what leads us to the paradox. It is indeed the case that Spencer-Brown’s formalism makes it possible to draw the formalism of real paradoxes, such as the barber’s paradox, in a very simple way. The re-entry acts like a shining gemstone (sorry for the poetic expression), which takes on a wholly special function in logical networks, namely the linking of two logical levels, a basic level and its meta level.

The trick here is that the same distinction is made on both levels. That it involves the same distinction, but on two levels, and that this one distinction refers to itself, from one level to the other, from the meta-level to the basic level. This is the form of paradox.

​Exemple Barber Paradox

We can now notate the Barber paradox using Spencer-Brown’s form:

 

Fig. 5: Distinction of the men in the village who shave themselves (S) or do not shave themselves (N)

Fig. 6: Notation of Fig. 5 as perfect continence

Fig. 5 and Fig. 6 show the same operation, namely the distinction between the men in the village who shave themselves and those who do not.

So how does the barber fit in? Let’s assume he has just got up and is still unshaven. Then he belongs to the inside of the distinction, i.e. to the group of unshaven men N. No problem for him, he shaves quickly, has breakfast and then goes to work. Now he belongs to the men S who shave themselves, so he no longer has to shave. The problem only arises the next morning. Now he’s one of those men who shave themselves – so he doesn’t have to shave. Unshaven as he is now, however, he is a men he has to shave. But as soon as he shaves himself, he belongs to the group of self-shavers, so he doesn’t have to be shaven. In this manner, the barber switches from one group (S) to the other (N) and back. A typical oscillation occurs in the barber’s paradox – and in all other real paradoxes, which all oscillate.

How does the Paradox Arise?

Fig. 7: The barber (B) shaves all men who do not shave themselves (N)

Fig. 7, shows the distinction between the men N (red) and S (blue). This is the base level. Now the barber (B) enters. On a logical meta-level, it is stated that he shaves the men N, symbolised by the arrow in Fig. 7.

The paradox arises between the basic and meta level. Namely, when the question is asked whether the barber, who is also a man of the village, belongs to the set N or the set S. In other words:

→  Is  B an  N  or an  S ?  

The answer to this question oscillates. If B is an N, then he shaves himself (Fig. 7). This makes him an S, so he does not shave himself. As a result of this second cognition, he becomes an N and has to shave himself. Shaving or not shaving? This is the paradox and its oscillation.

How is it created? By linking the two levels. The barber is an element of the meta-level (macro level), but at the same time an element of the base level (micro level). Barber B is an acting subject on the meta-level, but an object on the basic level. The two levels are linked by a single distinction, but B is once the subject and sees the distinction from the outside, but at the same time he is also on the base level and there he is an object of this distinction and thus labelled as N or S. Which is true? This is the oscillation, caused by the re-entry.

The re-entry is the logical core of all true paradoxes. Spencer-Brown’s achievement lies in the fact that he presents this logical form in a radically simple way and abstracts it formally to its minimal essence.

The paradox is reduced to a single distinction that is read on two levels, firstly fundamentally (B is N or S) and then as a re-entry when considering whether B shaves himself.

The paradox is created by the re-entry in addition to a negation: he shaves the men who do not shave themselves. Re-entry and negation are mandatory in order to generate a true paradox. They can be found in all genuine paradoxes, in the barber paradox, the liar paradox, the Russell paradox, etc.

Georg Spencer-Brown’s achievement is that he has reduced the paradox to its essential formal core:

→ A (single) distinction with a re-entry and a negation.

His discoveries of distinction and re-entry have far-reaching consequences with regard to logic, and far beyond.


Let’s continue the investigation, see:  Form (Distinction) and Bit

Translateion: Juan Utzinger


 

Paradoxes and Logic (Part 1)


Logic in Practice and Theory

Computer programs consist of algorithms. Algorithms are instructions on how and in what order an input is to be processed. Algorithms are nothing more than applied logic and a programmer is a practising logician.

But logic is a broad field. In a very narrow sense, logic is a part of mathematics; in a broad sense, logic is everything that has to do with thinking. These two poles show a clear contrast: The logic of mathematics is closed and well-defined, whereas the logic of thought tends to elude precise observation: How do I come to a certain thought? How do I construct my thoughts when I think? And what do I think just in this moment, when I think about my thinking? While mathematical logic works with clear concepts and rules, which are explicit and objectively describable, the logic of thinking is more difficult to grasp. Are there any rules for correct thinking, just as there are rules in mathematical logic for drawing conclusions in the right way?

When I look at the differences between mathematical logic and the logic of thought, something definitely strikes me: Thinking about my thinking defies objectivity. This is not the case in mathematics. Mathematicians try to safeguard every tiny step of thought in a way that is clear and objective and comprehensible to everyone as soon as they understand the mathematical language, regardless of who they are: the subject of the mathematician remains outside.

This is completely different with thinking. When I try to describe a thought that I have in my head, it is my personal thought, a subjective event that primarily only shows itself in my own mind and can only be expressed to a limited extent by words or mathematical formulae.

But it is precisely this resistance that I find appealing. After all, I wish to think ‘correctly’, and it is tempting to figure out how correct thinking works in the first place.

I could now take regress to mathematical logic. But the brain doesn’t work that way. In what way then? I have been working on this for many decades, in practice, concretely in the attempt to teach the computer NLP (Natural Language Processing). The aim has been to find explicit, machine-comprehensible rules for understanding texts, an understanding that is a subjective process, and – being subjective – cannot be easily brought to outside objectivity.

My computer programmes were successful, but the really interesting thing is the insights I was able to gain about thinking, or more precisely, about the logic with which we think.

My work has given me insights into the semantic space in which we think, the concepts that reside in this space and the way in which concepts move. But the most important finding concerned time in logic. I would like to go into that closer and for this target we first look at paradoxes.

Real Paradoxes

Anyone who seriously engages with logic, whether professionally or out of personal interest, will sooner or later come across paradoxes. A classic paradox, for example, is the barber’s paradox:

The Barber Paradox

The barber of a village is defined by the fact that he shaves all the men who do not shave themselves. Does the barber shave himself? If he does, he is one of the men who shave themselves and whom he therefore does not shave. But if he does not shave himself, he is one of the men he shaves, so he also shaves himself. As a result, he is one of the men he does not have to shave. So he doesn’t shave – and so on. That’s the paradox: if he shaves, he doesn’t shave. If he doesn’t shave, he shaves.

The same pattern can be found in other paradoxes, such as the liar paradox and many others. You might think that these kinds of paradoxes are far-fetched and don’t really play a role. But paradoxes do play a role, at least in two places: in maths and in the thought process.

Russell’s Paradox and Kurt Gödel’s Incompleteness Theorems

Russel’s paradox has revealed the gap in set theory. Its ‘set of all sets that does not contain itself as an element’ follows the same pattern as the barber of the barber paradox and leads to the same kind of unsolvable paradox. Kurt Gödel’s two incompleteness theorems are somewhat more complex, but are ultimately based on the same pattern. Both Russel’s and Gödel’s paradoxes have far-reaching consequences in mathematics. Russel’s paradox has led to the fact that set theory can no longer be formed using sets alone, because this leads to untenable contradictions. Zermelo had therefore supplemented the sets with classes and thus gave up the perfectly closed nature of set theory.

Gödel’s incompleteness theorems, too, are ultimately based on the same pattern as the Barber paradox. Gödel had shown that every formal system (formal in the sense of the mathematicians) must contain statements that can neither be formally proven nor disproven. A hard strike for mathematics and its formal logic.

Spencer-Brown and the “Laws of Form”

Russel’s refutation of the simple set concept and Gödel’s proof of the incompleteness of formal logic suggest that we should think more closely about paradoxes. What exactly is the logical pattern behind Russel’s and Gödel’s problems? What makes set theory and formal logic incomplete?

The question kept me occupied for a long time. Surprisingly, it turned out that paradoxes are not just annoying evils, but that it is worth using them as meaningful elements in a new formal logic. This step was exemplarily demonstrated by the mathematician Georg Spencer-Brown in his 1969 book ‘Laws of Form’, including a maximally simple formalism for logic.


I would now like to take a closer look at the structure of paradoxes, as Spencer-Brown has pointed them out, and the consequences this has for logic, physics, biology and more.

continue: Paradoxes and Logic (part2)

Translation: Juan Utzinger


 

Five Preconceptions about Entropy

Which of these Preconceptions do you Share?

  1. Entropy is for nerds
  2. Entropy is incomprehensible
  3. Entropy is thermodynamics
  4. Entropy is noise
  5. Entropy is absolute


Details

1. Entropy is the Basis of our Daily Lives

Nerds like to be interested in complex topics and entropy fits in well, doesn’t it? It helps them to portray themselves as superior intellectuals. This is not your game and you might not see any practical reasons to occupy yourself with entropy. This attitude is very common and quite wrong. Entropy is not a nerdy topic, but has a fundamental impact on our lives, from elementary physics to practical everyday life.

Examples (according to W. Salm1)

  • A hot coffee cup cools down over time
  • Water evaporates in an open container
  • Pendulums that have been knocked remain stationary after a while
  • Iron rusts
  • Magnets become weaker after some years
  • Lessons learnt are forgotten
  • Combed hair becomes dishevelled
  • White shirts become stained
  • Rocks crumble
  • Radioactive elements decay

So there are plenty of reasons to look into the phenomenon of entropy, which can be found everywhere in everyday life. But most people tend to avoid the term. Why is that? This is mainly due to the second preconception.


2. Entropy is a Perfectly Understandable and Indispensable Fundamental Concept

It is true, that at first glance, entropy is rather confusing. However, entropy is only difficult to understand because of persistent preconceptions (see points 4 and 5, below). These ubiquitous preconceptions are the obstacles that make the concept of entropy seem incomprehensible. Overcoming these thresholds not only helps to understand many real and practical phenomena, but also sheds light on the foundations that hold our world together.


3. Entropy Plays a Role Everywhere in Nature

The term entropy stems from thermodynamics. But we should not be mislead by this. In reality, entropy is something that exists everywhere in physics, chemistry, biology and also in art and music. It is a general and abstract concept and it refers directly to the structure of things and the information they contain.

Historically, the term was introduced not 200 years ago in thermodynamics and was associated with the possibility of allowing heat (energy) to flow. It helped to understand the mode of operation of machines (combustion engines, refrigerators, heat pumps, etc.). The term is still taught in schools this way.

However, thermodynamics only shows a part of what entropy is. Its general nature was only described by C.E. Shannon2 in 1948. The general form of entropy, also known as Shannon or information entropy, is the proper, i.e. the fundamental form. Heat entropy is a special case.

Through its application to heat flows in thermodynamics, entropy as heat entropy was given a concrete physical dimension, namely J/K, i.e. energy per temperature. However, this is the special case of thermodynamics, which deals with energies (heat) and temperature. If entropy is understood in a very general and abstract way, it is dimensionless, a pure number.

As the discoverer of abstract and general information entropy, Shannon gave this number a name, the “bit”. For his work as an engineer at the Bell telephone company, Shannon used the dimensionless bit to calculate the flow of information in the telephone wires. His information entropy is dimensionless and applies not only in thermodynamics, but everywhere where information and flows play a role.


4. Entropy is the Difference between not Knowing and Knowing

Many of us learnt at school that entropy is a measure of noise and chaos. Additionally, the second law of physics tells us that entropy can only ever increase. Thus, disorder should but increase. However, identifying entropy with noise or even chaos is misleading.

There are good reasons for this misleading idea: If you throw a sugar cube into the coffee, its well-defined crystal structure dissolves, the molecules disperse disorderly in the liquid and the sugar shows a transition from ordered to disordered. This decay of order can be observed everywhere in nature. In physics, it is entropy that drives the decay of order according to the second law. And decay and chaos can hardly be equated with Shannon’s concept of information. Many scientists thought the same way and therefore equated information with negentropy (entropy with a negative sign). At first glance, this doesn’t seem to be a bad match. In this view, entropy is noise and the absence of noise, i.e. negentropy, would then be information. Actually logical, isn’t it?

Not quite, because information is contained both in the sugar cube as well as in the dissolved sugar molecules floating in the coffee. In some ways, there is even more information in the individually floating molecules because each has its own path. Their bustling movements contain information. For us coffee drinker, however, the bustling movements of the many molecules in the cup does not contain useful information and appears only chaotic. Can this chaos be information?

The problem is our conventional idea of information. Our idea is too static. I suggest that we see entropy as something that denotes a flow, namely the flow between not knowing and knowing. This dynamic is characteristic of learning, of absorbing new information.

Every second, an incredible amount of things happen in the cosmos that could be known. The information in the entire world can only increase. This is what the second law says, and what increases is entropy, not negentropy. Wouldn’t it be much more obvious to put information in parallel with entropy and not with negentropy? More entropy would then mean more information and not more chaos.

Where can the information be found? In the noise or in the absence of noise? In entropy or in negentropy?


Two Levels

Well, the dilemma can be solved. The crucial step is to accept that entropy is the tension between two states, the overview state and the detail state. The overview view does not need the details, but only sees the broad lines. C.F. Weizsäcker speaks of the macro level. The broad lines are the information that interests us. Details, on the other hand, appear to us as unimportant noise. But the details, i.e. the micro level, contain more information, usually a whole lot more, just take the movements of the water molecules in the coffee cup (micro level), whose chaotic bustle contains more information than the single indication of the temperature of the coffee (macro level). Both levels are connected and their information depends on each other in a complex way. Entropy is the difference between the two amounts of information. This is always greater at the detail level (micro level), because there is always more to know in the details than in the broad lines and therefore also more information.

But because the two levels refer to the same object, you as the observer can look at the details or the big picture. Both belong together. The gain in information about details describes the transition from the macro to the micro level, the gain in information about the overview describes the opposite direction.

So where does the real information lie? At the detailed level, where many details can be described, or at the overview level, where the information is summarised and simplified in a way that really interests us?

The answer is simple: information contains both the macro and the micro level. Entropy is the transition between the two levels and, depending on what interests us, we can make the transition in one direction or the other.


Example Coffee Cup

This is classically demonstrated in thermodynamics. The temperature of my coffee can be seen as a metric for the average kinetic energy of the individual liquid molecules in the coffee cup. The information contained in the many molecules is the micro state, the temperature is the macro state. Entropy is the knowledge that is missing in the macro state but is present in the micro state. But for me as a coffee drinker, only the knowledge of the macro state, the temperature of the coffee, is relevant. This is not present in the micro state insofar as it does not depend on the individual molecules, but rather statistically on the totality of all molecules. It is only in the macro state that knowledge about temperature becomes tangible.

For us, only the macro state shows relevant information. But there is additional information in the noise of the details. How exactly the molecules move is a lot of information, but these details don’t matter to me when I drink coffee, only their average speed determines the temperature of the coffee, which is what matters to me.

The information-rich and constantly changing microstate has a complex relationship with the simple macroinformation of temperature. The macro state also influences the micro state, because the molecules have to move within the statistical framework set by the temperature. Both pieces of information depend on each other and are objectively present in the object at the same time. What differs is the level or scope of observation. The difference in the amount of information in the two levels determines the entropy.

These conditions have been well known since Shannon2 and C.F. Weizsäcker. However, most schools still teach that entropy is a measure of noise. This is misleading. Entropy should always be understood as a delta, as a difference (distance) between the information in the overview (macro state) and the information in the details (micro state).


5. Entropy is a Relative Value

The fact that entropy is always a distance, a delta, i.e. mathematically a difference, also results in the fact that entropy is not an absolute value, but rather a relative value.

Example Coffee Cup
Let’s take the coffee cup as an example. How much entropy is in there? If we only look at the temperature, then the microstate corresponds to the average kinetic energy of the molecules. But the coffee cup contains even more information: How strong is the coffee? How strongly sweetened? How strong is the acidity? What flavours does it contain?

Example School Building
Salm1 gives the example of a lost door key that a teacher is looking for in a school building. If he knows which classroom the key is in, he has not yet found it. At this moment, the microstate only names the room. Where in the room is the key? Perhaps in a cupboard. In which one? At what height? In which drawer, in which box? The micro state varies depending on the depth of the request. It is a relative value. 

Because the information entropy is always a difference, the entropy, i.e. the span between overview and details, can always be extended to even more details.

Entropy is a relative value. If we specify it in absolute terms, we set – without explicitly declaring it – a lowest level (classroom, shelf or drawer). This is legal as long as we are aware that the seemingly absolute value only represents the distance to the assumed micro-level.

Statics and Dynamics 

Energy and entropy are two complementary quantities that permeate the entire description of nature (physics, chemistry, biology). The two fundamental laws of physics each contain one of the two general quantities E (energy) and S (entropy):

  1. Law:  E = 0    oder:   dE/dt = 0
  2. Law:  ∆S ≥ 0    oder:   dS/dt ≥ 0

Energy remains constant over time (in a closed system), while entropy can only increase. In other words: energy is a static value and shows what does not change, while entropy is essentially dynamic and shows flows, e.g. in heat machines, in Shannon’s current in telephone wires and whenever our thoughts flow and we learn and think.

Entropy and Time

Entropy is essentially linked to the phenomenon of time by the second law (∆S ≥ 0). While energy remains constant in a closed system (Noether’s theorem), entropy changes over time and increases in a closed system. Entropy therefore knows time, not only heat entropy in particular, but also the much more general information entropy.


Conclusion

  • Entropy is a key concept in physics and information theory.
  • The term entropy comes from thermodynamics, but the concept of entropy refers to information in general.
  • The thermodynamical entropy is the special case, information entropy is the general concept.
  • Everything that happens physically, chemically and in information processing, whether technical or biological, has to do with entropy. In particular, everything that has to do with information flows and structures. In other words, everything that really interests us.
  • Entropy is always relative and refers to the distance between the macro and micro levels.
  • The macro level contains less information than the micro level
  • The macro level contains the information of interest.
  • Neither is absolute: the micro level can always be described in more detail. The macro level is defined from the outside: What is of interest? The temperature of the coffee? The concentration of sugar molecules? The acidity? The caffeine content …
  • Only the definition of the two states makes it possible to specify the entropy in seemingly absolute terms. However, what counts for entropy is the relative value, i.e. the delta between the two states. This delta, the entropy, determines the flow.
  • The flow happens in time.

(Translation: Juan Utzinger)


1 Salm, W: Entropie und Information – naturwissenschaftliche Schlüsselbegriffe, Aulis Verlag Deubner, Köln, 1997

2 Shannon, C.E. und Weaver W: The Mathematical Theory of Information, Illinois Press, Chicago, 1949


See also: What is Entropy?

The theory of the three worlds (Penrose)

The theory of the three worlds

There are practical questions which concern our specific lives, and there are theoretical questions which seemingly don’t. However, there are also theoretical considerations which definitely concern our practical everyday lives. One of these is the three worlds theory, which deals with questions as to which worlds we specifically live in.

On what foundation is our everyday existence based? The theory of the three worlds points to the fact that we simultaneously live in three completely different worlds. Practically, this does not constitute a problem for us; theoretically, however, the question arises as to how three worlds which are so different from each other are able to meet in reality at all.

Roger Penrose has named the three worlds as follows:
A) the Platonic world,
B) the physical world,
C) the mental world.

This is Roger Penrose’s original graph:

 

 

 

 

 

 

 

 

Platonic world: The world of ideas. Mathematics, for example, is completely located in the Platonic world.

Physical world: The real, physical world with things that are in a specific place at a specific time.

 Mental world: My subjective perceptions without which I would not be able to recognise the other worlds, but also my thoughts and ideas as I experience them.

The circular relationship between the three worlds

The arrows between the spheres indicate the circular relationship that these worlds engage in together:

Platonic → physical: Behind physics, there is mathematics. Physics is inconceivable without higher mathematics. Evidently, the physical world complies with mathematical laws with a staggeringly accurate precision. Is the real world therefore determined by mathematics?

 Physical → mental: My brain is part of the physical world. According to common understanding, the neurons of the brain tissue determine my brain performance with their electric switches.

 Mental → Platonic: Great thinkers are capable of formulating the laws of mathematics in their thoughts (mental world); these laws “come into being” in their heads.

This, then, is the circular process: The Platonic world (mathematics) determines the physical one, which is the basis of human thought. In human thought, in turn, mathematics (and other ideas) are located. These mathematical laws … and here we come full circle.

The scope of the three worlds

What is also interesting are the opening funnels in Penrose’s sketch, which together with the arrows point from one world to the next. Penrose uses them to indicate the fact that the world that follows in the circular process merely requires part of the world from which it emerges during the generation process.

Platonic → physical: Only a small part of mathematical findings can be used in physics. Seen in this light, the physical laws only need (are?) an excerpt from mathematics.

Physical → mental: My brain is a very small part of the physical world.

Mental → Platonic: My brain deals with many things; mathematics and abstract ideas are only a part of it.

The Platonic world is then the origin of the physical world again. However, the proportions do not appear to work out properly. This resembles the famous impossible staircase:

figure: the impossible staircase

The impossible staircase

As an aside:
The impossible staircase was discovered by Roger Penrose’s father, Lionel Penrose, and is also called the Penrose steps – or the Escher-Penrose steps after the Dutch graphic artist who, inter alia, inspired Douglas Hofstadter to write his book Gödel, Escher, Bach. The endlessness with which the steps ascend can seemingly be graphically represented without any problems, but from a logical point of view it is eminently intricate (self-referential taboo).

For Penrose, there is a mystery in the three worlds. He writes that undoubtedly there are not three separate worlds in reality but only one, and at present we are not even able to divine the true nature of this world. This is therefore about three worlds in one – and thus about their differences and the form of their interlinkage.

Not an abstract theory

The three worlds are not an abstract theory but can be recognised in our own world of private experiences. They play an important part in music, for instance. The example of music also enables us to see how the three worlds interact. More about this on this website.


Translation: Tony Häfliger and Vivien Blandford

Artificial and natural intelligence: the difference

What is real intelligence? 

Paradoxically, the success of artificial intelligence helps us to identify essential conditions of real intelligence. If we accept that artificial intelligence has its limits and, in comparison with real intelligence, reveals clearly discernible flaws – which is precisely what we recognised and described in previous blog posts – then these descriptions do not only show what artificial intelligence lacks, but also where real intelligence is ahead of artificial intelligence. Thus we learn something crucial about natural intelligence.

What have we recognised? What are the essential differences? In my view, there are two properties which distinguish real intelligence from artificial intelligence. Real intelligence

– also works in open systems and

– is characterised by a conscious intention.

 

Chess and Go are closed systems

In the blog post on cards and chess, we examined the paradox that a game of cards appears to require less intelligence from us humans than chess, whereas it is precisely the other way round for artificial intelligence. In chess and Go, the computer beats us; at cards, however, we are definitely in with a chance.

Why is this the case? – The reason is the closed nature of chess, which means that nothing happens that is not provided for. All the rules are clearly defined. The number of fields and pieces, the starting positions and the way in which the pieces may move, who plays when and who has won at what time and for what reasons: all this is unequivocally set down. And all the rules are explicit; whatever is not defined does not play a part: what the king looks like, for instance. The only important thing is that there is a king and that, in order to win the game, his opponent has to checkmate him. In an emergency, a scrap of paper with a “K” on it is enough to symbolise the king.

Such closed systems can be described with mathematical clarity, and they are deterministic. Of course, intelligence is required to win them, but this intelligence may be completely mechanical – that is, artificial intelligence.

Pattern recognition: open or closed system?

This looks different in the case of pattern recognition where, for example, certain objects and their properties have to be identified on images. Here, the system is basically open, for it is not only possible that images with completely new properties can be introduced from the outside. In addition, the decisive properties themselves that have to be recognised can vary. The matter is thus not as simple, clearly defined and closed as in chess and Go. Is it a closed system, then?

No, it isn’t. Whereas in chess, the rules place a conclusive boundary around the options and objectives, such a safety fence must be actively placed around pattern recognition. The purpose of this is to organise the diversity of the patterns in a clear order. This can only be done by human beings. They assess the learning corpus, which includes as many pattern examples as possible, and allocate each example to the appropriate category. This assessed learning corpus then assumes the role of the rules of chess and determines how new input will be interpreted. In other words: the assessed learning corpus contains the relevant knowledge, i.e. the rules according to which previously unknown input is interpreted. It corresponds to the rules of chess.

The AI system for pattern recognition is thus open as long as the learning corpus has not been integrated; with the assessed corpus, however, such a system becomes closed. In the same way that the chess program is set clear limits by the rules, expert assessment provides the clear-cut corset which ultimately defines the outcome in a deterministic way. As soon as the assessment has been made, a second and purely mechanical intelligence is capable of optimising the behaviour within the defined limits – and ultimately to a degree of perfection which I as a human being will never be able to achieve.

Who, though, specifies the content of the learning corpus which turns the pattern recognition program into a technically closed system? It is always human experts who assess the pattern inputs und who thus direct the future interpretation done by the AI system. In this way pattern recognition can be turned into a closed task like a game of chess or go which can be solved by a mechanical algorithm.

In both cases – in the initially closed game program (chess and Go) as well as in the subsequently closed pattern recognition program – the algorithm finds a closed situation, and this is the prerequisite for an artificial, i.e. mechanical intelligence to be able to work.

Conclusion 1:
AI algorithms can only work in closed spaces.

In the case of pattern recognition, the human-made learning corpus provides this closed space.

Conclusion 2:
Real intelligence also works in open situations.

Is there any intelligence without intention?

Why is artificial intelligence unable to work in an open space without assessments introduced from outside? Because it is only the assessments introduced from outside that make the results of intelligence possible. And assessments cannot be provided purely mechanically by the AI but are always linked to the assessors’ views and intentions.

Besides the differentiation between open and closed systems, our analysis of AI systems shows us still more about real intelligence, for artificial and natural intelligence also differ from each other with regard to the extent to which individual intentions play a part in their decision-making.

In chess programs, the objective is clear: to checkmate the opponent’s king. The objective which determines the assessment of the moves, namely the intention to win, does not have to be laboriously recognised by the program itself but is intrinsically given.

With pattern recognition, too, the role of the assessment intention is crucial, for what kind of patterns should be distinguished in the first place? Foreign tanks versus our own tanks? Wheeled tanks versus tracked tanks? Operational ones versus damaged ones? All these distinctions make sense, but the AI must be set, and adjusted to, a specific objective, a specific intention. Once the corpus has been assessed in a certain direction, it is impossible to suddenly derive a different property from it.

As in the chess program, the artificial intelligence is not capable of finding the objective on its own: in the chess program, the objective (checkmate) is self-evident; in pattern recognition, the assessors involved must agree on the objective (foreign/own tanks, wheeled/tracked tanks) in advance. In both cases, the objective and the intention come from the outside.

Conversely, natural intelligence has to determine itself what is important and what is unimportant, and what objectives it pursues. In my view, an active intention is an indispensable property of natural intelligence and cannot be created artificially.

Conclusion 3:
In contrast to artificial intelligence, natural intelligence is characterised by the fact that it is able to judge, and deliberately orient, its own intentions.


This is a blog post about artificial intelligence. You can find further posts through the overview page about AI.


Translation: Tony Häfliger and Vivien Blandford

Now where in artificial intelligence is the intelligence located?


In a nutshell: the intelligence is always located outside.


a) Rule-based systems

The rules and algorithms of these systems are created by human beings, and no one will ascribe real intelligence to a pocket calculator. The same also applies to all other rule-based systems, however refined they may be. The rules are devised by human beings.

b) Conventional corpus-based systems (neural networks)

These systems always use an assessed corpus, i.e. a collection of data which have already been evaluated  (details). This assessment decides according to what criteria each individual corpus entry is classified, and this classification then constitutes the real knowledge in the corpus.

However, the classification cannot be derived from the data of the corpus itself but is always introduced from the outside. And it is not only the allocation of a data entry to a class that can only be done from the outside; rather, the classes themselves are not determined by the data of the corpus, either, but are provided from the outside – ultimately by human beings.

The intelligence of these systems is always located in the assessment of the data pool, i.e. the allocation of the data objects to predefined classes, and this is done from the outside, by human beings. The neural network which is thus created does not know how the human brain has found the evaluations required for it.

c) Search engines

Search engines constitute a special type of corpus-based system and are based on the fact that many people use a certain search engine and decide with their clicks which internet links can be allocated to the search string. Ultimately, search engines only average the traces which the many users leave with their context knowledge and their intentions. Without the human brains of the users who have used the search engines so far, the search engines would not know where to point new queries.

d) Game programs (chess, Go, etc.) / deep learning

This is where things become interesting, for in contrast to the other corpus-based systems, such programs do not require any human beings who assess the corpus, which consists of the moves of games previously played from the outside. Does this mean, then, that such systems have an intelligence of their own?

Like the pattern recognition programs (b) and the search engines (c), the Go program has a corpus which in this case contains all the moves of the test games played before. The difference from the classic AI systems consists in the fact that the assessment of the corpus (i.e. the moves of the games) is already defined by the success in the actual game. Thus no human being is required who has to make a distinction between foreign tanks and our own tanks in order to provide the template for the neural network. The game’s success can be directly recognised by the machine, i.e. the algorithm itself; human beings are not required.

With classic AI systems, this is not the case, and a human being who assesses the individual corpus items is indispensable. Added to this, the assessment criterion is not given unequivocally, as it is with Go. Tank images can be categorised in completely different ways (wheeled/tracked tanks, damaged/undamaged tanks, tanks in towns/open country, in black and white/coloured pictures, etc.). This opens the interpretation options for the assessment at random. For all these reasons, an automatic categorisation is impossible with classic AI systems, which therefore always require an assessment of the learning corpus by human experts.

In the case of chess and Go, it is precisely this that is not required. Chess and Go are artificially designed and completely closed systems and thus indeed completely determined in advance. The board, the rules and the objective of the game – and thus also the assessment of the individual moves – are given automatically. Therefore no additional intelligence is required; instead, an automatism can play test games with itself within a predefined, closed setting and in this way attain the predefined objective better and better until it is better than any human being.

In the case of tasks which have to be solved not in an artificial game setting but in reality, however, the permitted moves and objectives are not completely defined, and there is leeway for strategy. An automatic system like deep learning cannot be applied in open, i.e. real situations.

It goes without saying that in practice, a considerable intelligence is required to program victory in Go and other games, and we may well admire the intelligence of the engineers at Google, etc., for that, yet once again it is their human intelligence which enables them to develop the programs, and not an intelligence which the programs designed by them are able to develop themselves.

Conclusion

AI systems can be very impressive and very useful, but they never have an intelligence of their own.

Static and dynamic IF-THEN, Part 2

(This blog post continues the introduction to the dynamic IF-THEN.)

Several IF-THENs next to each other

Let’s have a look at the following situation:

IF A, THEN B
IF A, THEN C

If a conclusion B and, at the same time, a conclusion C can be drawn from a premise A, then which conclusion is drawn first?

Static and dynamic logic

In terms of classical logic, this does not matter since A, B and C always exist simultaneously in a static system and do not change their truthfulness. Therefore it does not matter whether one or the other conclusion is drawn first.

This is completely different in dynamic logic – i.e. in a real situation. If I opt for B, it may be that I “lose sight” of option C. After all, the statement B is usually related to further other statements, and these options may continue to occupy my processor, which means that the processor does not have any time at all for statement C.

Dealing with contradictions

There is an additional factor: further conclusions drawn from statements B and C lead to further statements D, E, F, etc.  In a static system, all the statements resulting from further valid conclusions must be compatible with each other. This absolute certainty does not exist in the case of real statements. Therefore it cannot be ruled out that, say, statements D and E contradict each other. And in this situation, it does matter whether we reach D or E first.

A dynamic system must be able to deal with this situation. It must be able to actively hold the contradictory statements D and E and “weigh them up against each other”, i.e. analyse their relevance and plausibility while taking their individual contexts into consideration – if you like, this is the normal way of thinking.

In this process, it matters whether I “weigh up” B or C first. Depending on the option I choose, I will end up in a totally different “field” of statements. It is certain that occasionally, statements from the two fields contradict each other. For static logic, this would be tantamount to the collapse of the system. For dynamic logic, however, this is perfectly normal – indeed, a contradiction is the reason for having a closer look at the system of statements from this position. It is the tension that drives the system and keeps the thought process active – until the contradictions are resolved.

Describing this dynamism of thinking is the objective of logodynamics.

Truth – a search process

The fact that the truthfulness of the statements is not determined from the start may be regarded as a weakness of the dynamic system. Then again, this is exactly our own human situation: we do NOT know from the start what is true and what isn’t, and we first have to develop our system of statements. Static logic is unable to tell us how this development works – it is precisely for this that we need a dynamic logic.

Thinking and time

In real thinking, time plays a part. It matters which conclusion is drawn first. Admittedly, this makes things in dynamic logic slightly more difficult. However, if we want to track down the processes in a real situation, we have to accept that real processes always take place in time. We cannot remove time from thinking – nor can we remove it from our logic.

Yet static logic does so. This is why it is only capable of describing one result of thinking, the final point of a process in time. What happens during thinking is the subject of logodynamics.

Determinism – a cherished habit

If I can conclude both B and C from A and if, depending on which conclusion I draw first, the thought process evolves in a different direction, then I will have to face another disagreeable fact: namely that I am unable to derive from the initial situation (i.e. the set of statements A that I accept as being truthful) what direction I will pursue. In other words: my thought process is not determined – at any rate not by the set of what I have already recognised.

On the one hand, this is regrettable, for I can never be quite sure whether I draw the right conclusions since I simply have too many options. On the other hand, this also provides me with freedom. At the moment when I must decide to follow the path through B or C first – and that without already seeing through the system as a whole, i.e. in my real situation – at this moment I also gain the freedom to make the decision myself.

Freedom – there is no certainty

Logodynamics thus explores the thought process for systems which still have to find truth. These systems, too, are not in a position to examine an unlimited number of conclusions at the same time. This is the real situation. This means that these systems have a certain arbitrariness in that they are capable of making decisions at their own discretion. The thought structures that emerge in the process, the advantages and disadvantages they have and what may be taken for granted, is explored by logodynamics.

It is clear that derivability cannot be taken for granted. This is regrettable, and we would prefer to be on the safe side. Yet it is only this uncertainty that enables us to think freely.

The dynamic IF-THEN is necessary

In terms of practical thinking, the point is that static logic is not equal to the process of finding truth. Static logic merely describes the found consistent system. The preceding discussion of whether the system resolves the contradictions and how it does so, will only become apparent through a logodynamic description.

In other words: static logic is incomplete. To examine the real thought process, the somewhat trickier dynamic logic is indispensable. It deprives us of certainty but provides us with a more realistic tool.

This is a blog post about dynamic logic. The preceding post made a distinction between dynamic and static logic IF-THEN.

Artificial Intelligence (Overview )

Is AI dangerous or useful?

This question is currently the subject of extensive debate. The aim here is not to repeat well-known opinions, but to shed light on the basics of the technology that you are almost certainly unaware of. Or do you know where AI gets its intelligence from?

For a quarter of a century, I have been working with ‘intelligent’ IT systems and I am astonished that we ascribe real intelligence to artificial intelligence at all. That’s exactly what it doesn’t have. Its intelligence always comes from humans, who not only provide the data, but also have to evaluate its meaning before the AI can use it. Only then, AI can surprise us with its impressiv performance and countless useful applications in a wide variety of areas. How does it achieve this?

In 2019, I started a blog series on this topic, which you can see an overview of below. In 2021, I then summarised the articles in a book entitled “Wie die künstliche Intelligenz zur Intelligenz kommt” (in German). See below a list of blogposts which form the basis of the book.

While the book is in German, the blogseries is available both in German and English.


Latest Posts about AI

English Posts:

German Posts:


Earlier Posts (basis of the KI-book)

Rule-based or corpus-based?

These are the two fundamentally different methods of computer intelligence. They can either be based on rules or a collection of data (corpus). In the introductory post, I present the two with the help of two characteristic anecdotes:


With regard to success, the corpus-based systems have obviously outstripped the rule-based ones:


The rule-based systems had a more difficult time of it. What are their challenges? How can they overcome their weaknesses? And where is their intelligence situated inside them?


How are corpus-based systems set up? How is their corpus compiled and assessed? What are neural networks all about? And what are the natural limits of corpus-based systems?


Next, we’ll have a look at search engines, which are also corpus-based systems. How do they arrive at their proposals? Where are their limits and dangers? Why, for instance, is it inevitable that bubbles are formed?


Is a program capable of learning without human beings providing it with useful pieces of advice? It appears to work with deep learning. To understand this, we first compare a simple card game with chess: what requires more intelligence? Surprisingly, it becomes clear that for a computer, chess is the simpler game.

With the help of the general conditions of the board games Go and chess, we recognise under what conditions deep learning works.


In the following blog post, I’ll provide an overview of the AI types known to me. I’ll draw a brief outline of their individual structures and of the differences in the way they work.

So where is the intelligence?


The considerations reveal what distinguishes natural intelligence from artificial intelligence:


AI only shows its capabilities when the task is clear and simple. As soon as the question becomes complex, they fail. Or they fib by arranging beautiful sentences found in their treasure trove of data in such a way that it sounds intelligent (ChatGPT, LaMDA). They do not work with logic, but with statistics, i.e. with probability. But is what appears to be true always true?

The weaknesses necessarily follow from the design principle of AI. Further articles deal with this:

Games and Intelligence (2): Deep Learning

Go and chess

The Asian game of Go shares many similarities with chess while being simpler and more sophisticated at the same time.

The same as in chess:
– Board game → clearly defined playing field
– Two players (more would immediately increase complexity)
– Unequivocally defined possibilities of playing the stones (clear rules)
– The players place stones alternately (clear timeline).
– No hidden information (as, for instance, in cards)
Clear objective (the player who has surrounded the larger territory wins)

Simpler in Go:
– Only one type of piece: the stone (unlike in chess: king, queen, etc.)

More complex/requires more effort:
– Go has a slightly larger playing field.
– The higher number of fields and stones require more computation.
– Despite its very simple rules, Go is a highly sophisticated game.

Summary

Compared with their common features, the differences between Go and chess are minimal. In particular, Go satisfies the strongly limiting preconditions a) to d), which enable an algorithm to tackle the job:

a) a clearly defined playing field,
b) clearly defined rules,
c) a clearly defined course of play,
d) a clear objective.(Cf. also preceding blog post)

Go and deep learning

Google has beaten the best human Go players. This victory was achieved by means of a type of AI which is called deep learning. Many people think that this proves that a computer – i.e. a machine – can be genuinely intelligent. Let us therefore have a closer look at how Google managed to do this.

Rule- or corpus-based, or a new, third system?

The strategies of the known AI programs are either rule-based or corpus-based. In previous posts, we asked ourselves where the intelligence in these two strategies comes from, and we realised that the intelligence in rule-based AI is injected into the system by the human experts who establish the rules. Corpus-based AI also requires human beings, since all the inputs into the corpus must be assessed (e.g. friendly/hostile tanks), and these assessments can always be traced back to people even if this is not immediately obvious.

However, what does this look like in the case of deep learning? Obviously, it does not require any human beings any longer in order to provide specific assessments – in Go, with regard to the individual moves’ chances of winning; rather, it is sufficient for the program to play against itself and find out on its own which moves have proved most successful. In this, deep learning does NOT depend on human intelligence and – in chess and Go – even turns out to be superior to human intelligence.

Deep learning is corpus-based

Google’s engineers undoubtedly did a fantastic job. Whereas in conventional corpus-based applications, the data for the corpus have to be compiled laboriously, this is quite simple in the case of the Go program: the engineers simply have the computer play against itself, and every game is an input into the corpus. No one has to take the trouble to trawl the internet or any other source for data; instead, the computer is able to generate a corpus of any size very simply and quickly. Although like the programs for pattern recognition, deep learning for Go continues to depend on a corpus, this corpus can be compiled in a much simpler way – and automatically at that.

Yet it gets even better for deep learning. Not only is the compilation of the corpus much simpler, but the assessment of the single moves in the corpus is also very easy: Finding out the best move from among all the moves that are possible at any given time no longer requires any human experts. How does this work? How is deep learning capable of drawing intelligent conclusions without any human intelligence at all? This may be astonishing, but if we look at it in more detail, it becomes clear why this is indeed the case.

The assessment of corpus inputs

The difference is the assessment of the corpus inputs. To illustrate this, let’s have another look at the tank example. Its corpus consists of tank images, and a human expert has to assess each picture according to whether it shows one of our own tanks or a foreign tank. As explained, this requires human experts. In our second example, the search engine, it is also human beings, namely the users, who assess whether the link to a website suggested in the corpus fits the input search string. Both types of AI cannot do without human intelligence.

With deep learning, however, this is really different. The assessment of the corpus, i.e. the individual moves that make up the many different Go test games, does not require any additional intelligence. The assessment automatically results from the games themselves, since the only criterion is whether the game has been won or lost. This, however, is known to the corpus itself since it has registered the entire course of every game right to the end. Therefore the way in which every game has proceeded, automatically contains its own assessment – assessments by human beings are no longer required.

The natural limits of deep learning

The above, however, also reveals the conditions in which deep learning is possible at all: for the course of the game and the assessment to be clear-cut, there must not be any surprises. Ambiguous situations and uncontrollable outside influences are not allowed. For everything to be flawlessly calculable, the following is indispensable:

1. A closed system

This is given by the properties a) to c) (cf. preceding post), which games like chess and Go possess, namely

a) a clearly defined playing field,
b) clearly defined rules,
c) a clearly defined course of play.

A closed system is necessary for deep learning to work. Such a system can only be an artificially constructed system, for there are no closed systems in nature. It is no accident that chess and Go are particularly suitable for AI since games always have this aspect of being consciously designed. Games which integrate chance as part of the system, such as cards in the preceding post, are not absolutely closed systems any longer and therefore less suitable for artificial intelligence.

2. A clearly defined objective

A clearly defined objective – point d) in the preceding post – is also necessary for the assessment of the corpus to take place without any human interference, because the objective of the process under investigation and the assessment of the corpus inputs are closely connected. We must understand that the target of the corpus assessment is not given by the corpus data. Data and assessment are two different things. We have already discussed this in the example of the tanks, where we saw that a corpus input, i.e. the pixels of a tank photograph, did not automatically contain its own assessment (hostile/friendly). The assessment is a piece of information which is not intrinsic to the individual data (pixels) of an image; rather, it has to be fed into the corpus from the outside (by an interpreting intelligence). Therefore the same corpus input can also be assessed in very different ways: if the corpus is told whether an individual image is one of our own tanks or a foreign tank, it still does not know whether it is a tracked tank or a wheeled tank. With all such images, assessments can go in very different directions – unlike with chess and Go, where a move in a game (which is known to the corpus) is solely assessed according to the criterion of whether it is conducive to winning the game.

Thus chess and Go pursue a simple, clearly defined objective. In contrast to these two games, however, tank pictures allow for a wide variety of assessment objectives. This is typical of real situations. Real situations are always open, and in such situations, various and differing assessements can make sense and are absolutely appropriate. For the purpose of assessment, an instance (intelligence) outside the data has to establish the connection between the data and the assessment objective. This function is always linked to an instance with a certain intention.

Machine intelligence, however, lacks this intention and therefore depends on being provided with it by an objective from the outside. If the objective is as self-evident as it is in chess and Go, this is not a problem, and the assessment of the corpus can indeed be conducted by the machine itself without any human intelligence. In such unequivocal situations, machine deep learning is genuinely capable of working – indeed, even of beating human intelligence.

However, this only applies if the rules and the objective of a game are clearly defined. In all other cases, it is not an algorithm that is required but “real” intelligence, i.e. intelligence with a deliberate intention.

Conclusion

  1. Deep learning (DL) works.
  2. DL uses a corpus-based system.
  3. DL is capable of beating human intelligence in certain applications.
  4. However, DL only works in a closed system.
  5. DL only works if the objective is clear and unequivocal.

Ad 4) Closed systems are not real but are either obvious constructs (like games) or idealisations of real circumstances (= models). Such idealisations are invariably simplification with reduced information content. They are therefore incapable of mapping reality completely.

Ad 5) The objective, i.e. the “intention”, corresponds to a subjective momentum. This subjective momentum distinguishes natural from machine intelligence. The machine must be provided with it in advance.

This is a blog post about artificial intelligence.


Translation: Tony Häfliger and Vivien Blandford

Overview of the AI systems

All the systems we have examined so far, including deep learning, can in essence be traced back to two methods: the rule-based method and the corpus-based method. This also applies to the systems we have not discussed to date, namely simple automata and hybrid systems, which combine the two above approaches. If we integrate these variants, we will arrive at the following overview:

A: Rule-based systems

Rule-based systems are based on calculation rules. These rules are invariably IF-THEN commands, i.e. instructions which assign a certain result to a certain input. These systems are always deterministic, i.e. a certain input always leads to the same result. Also, they are always explicit, i.e. they involve no processes that cannot be made visible, and the system is always completely transparent – at least in principle. However, rule-based systems can become fairly complex.

A1: Simple automaton (pocket calculator type)

Fig. 1: Simple automaton

Rules are also called algorithms (“Algo”) in Fig. 1. Input and outputs (results) need not be figures. The simple automaton distinguishes itself from other systems in that it does not require any special knowledge base, but works with a few calculation rules. Nevertheless, simple automata can be used to make highly complex calculations, too.

Perhaps you would not describe a pocket calculator as an AI system, but the differences between a pocket calculator and the more highly developed systems right up to deep learning are merely gradual in nature – i.e. precisely of the kind that is being described on this page. Complex calculations soon strike us as intelligent, particularly if we are unable to reproduce them that easily with our own brains. This is already the case with simple arithmetic operations such as divisions or root extraction, where we quickly reach our limits. Conversely, we regard face recognition as comparatively simple because we are usually able to recognise faces quite well without a computer. Incidentally, nine men’s morris is also part of the A1 category: playing it requires a certain amount of intelligence, but it is complete in itself and easily controllable with an AI program of the A1 type.

A2: Knowledge-based system

Fig. 2: Compiling a knowledge base (IE=Inference Engine)

These systems distinguish themselves from simple automata in that part of their rules have been outsourced to a knowledge base. Fig. 2 indicates that this knowledge base has been compiled by a human being, and Fig. 3 shows how it is applied. The intelligence is located in the rules; it originates from human beings – in the application, however, the knowledge base is capable of working on its own.

Fig. 3: Application of a knowledge-based system

The inference machine (“IE” in Figs. 2 and 3) corresponds to the algorithms of the simple automaton in Fig. 1. In principle, algorithms, the inference engine and the rules of the knowledge bases are always rules, i.e. explicit IF-THEN commands. However, these can be interwoven and nested in a variety of different ways. They can refer to figures or concepts. Everything is made by human experts.

The rules in the knowledge base are subordinate to the rules of the inference engine. The latter control the flow of the interpretation, i.e. they decide what rules of the knowledge base are to be applied and how they are to be implemented. The rules of the inference engine are the actual program that is read and executed by the computer. The rules of the knowledge base, however, are not directly executed by the computer, but indirectly through the instructions provided by the inference engine. This is nesting – which is typical of commands, i.e. software in computers; after all, the rules of the inference engine are not implemented directly but read by deeper rules right down to the machine language at the core (in the kernel) of a computer. In principle, however, the rules of the knowledge base are calculation rules just like the rules of the inference machine, but in a “higher” programming language. It is an advantage if the human domain experts, i.e. the human specialists, find this programming language particularly easy and safe to read and use.

With regard to the logic system used in inference machines, we distinguish between rule-based systems

– with a static logic (ontologies type / semantic web type),
– with a dynamic logic (concept molecules type).

For this, cf. the blog post on the three innovations of rule-based AI.

B: Corpus-based systems

Corpus-based systems are compiled in three steps (Fig. 4). In the first step, as large as possible a corpus is collected. The collection does not contain any rules, only data. Rules would be instructions; however, the data of the corpus are not instructions: they are pure data collections, texts, images, game processes, etc.

Fig. 4: Compiling a corpus-based system

These data must now be assessed. As a rule, this is done by a human being. In the third step, a so-called neural network is trained on the basis of the assessed corpus. In contrast to the data corpus, the neural network is again a collection of rules like the knowledge base of the rule-based systems A. Unlike those, however, the neural network is not constructed by a human being but built and trained by the assessed corpus. Unlike the knowledge base, the neural network is not explicit, i.e. it is not readily accessible.

Fig. 5: Application of a corpus-based system

In their applications, both neural networks and the rule-based systems are fully capable of working without human beings. Even the corpus is no longer necessary. All the knowledge is located in the algorithms of the neural network. In addition, neural networks are also quite capable of interpreting poorly structured contents such as a mess of pixels (i.e. images), where rule-based systems (B type) very quickly reach their limits. In contrast to these, however, corpus-based systems are less successful with complex outputs, i.e. the number of possible output results must not be too large since if it is, the accuracy rate will suffer. What are best suited here are binary outputs of the “our tank – foreign tank” type (cf. preceding post) or of “male author – female author” in the assessment of Twitter texts. For such tasks, corpus-based systems are vastly superior to rule-based ones. This superiority quickly declines, however, when it comes to finely differentiated outputs.

Three subtypes of corpus-based AI

The three subtypes differ from each other with regard to who or what assesses the corpus.

Fig. 6: The three types of corpus-based system and how they assess their corpus

B1: Pattern recognition type

I described this type (top in Fig. 6) in the tank example. The corpus is assessed by a human expert.

B2: Search engine type

Cf. middle diagram in Fig. 6: in this type, the corpus is assessed by the customers. I described such a system in the search engine post.

B3: Deep learning type

In contrast to the above types, this one (bottom in Fig. 6) does not require a human being to train or assess the neural network. The assessment results solely from the way in which the games proceed. The fact that deep learning is only possible in very restricted conditions is explained in the post on games and intelligence.

C: Hybrid systems

Of course the above-mentioned methods (A1-A2, B1-B3) can also be combined in practice.

Thus a face identification system, for instance, may work in such a way that in the images provided by a surveillance camera, a corpus-based system B1 is capable of recognising faces as such, and in the faces the crucial shapes of eyes, mouth, etc. Subsequently, a rule-based system A2 uses the points marked by B1 to calculate the proportions of eyes, nose, mouth, etc., which characterise an individual face. Such a combination of corpus- and rule-based systems allows for individual faces to be recognised in images. The first step would not be possible for an A2 system, the second step would be far too complicated and inaccurate for a B1 system. A hybrid system makes it possible.


In the following blog post, I will answer the question as to where the intelligence is located in all these systems. But you have probably long found the answer yourself.

This is a blog post about artificial intelligence.


Translation: Tony Häfliger and Vivien Blandford

Games and intelligence (1)

Chess or jass: what requires more intelligence?

(Jass is a very popular Swiss card game of the same family as whist and bridge, though more homespun than the latter.)

Generally, it is assumed that chess requires more intelligence, for obviously less intelligent players definitely stand a chance of winning at cards while they don’t in chess. If we consider, however, what a computer program must be able to do in order to win, the picture soon looks different: chess is clearly simpler for a machine.

This may surprise you, but it is worth looking at the features the two games have in common, as well as their differences – and of course, both have a great deal to do with our topic of artificial intelligence.

Common features

a) Clearly defined playing field

The chessboard has 64 black and white fields; only the pieces that are situated on these fields play a part. At cards, the bridge table could be regarded as a playing field, as could the so-called square “jass carpet” that is placed on a restaurant table; it is the material playing field in the same way that the material chessboard is for chess. If we are interested in successful playing behaviour, however, the colour of the jass carpet or the make of the chess board are immaterial; what counts is solely the abstract, i.e. “IT-type” of playing field: where can our chess pieces and playing cards move in a more mathematical way? And in this respect, the situation is completely clear at cards, too: the cards are in a clearly defined place at any given time, either in a player’s hand ready to be played, or in front of a player as a trick already won, or on the table as a face-up card to be seen by everyone. Both chess and cards can therefore be said to have a clearly defined playing field.

b) Clear rules

Here, too, there is hardly any difference between the two games. Although there are all sorts of variants of whist and bridge, and although jass rules differ from village to village and even from restaurant to restaurant (which may occasionally lead to heated discussions), as soon as a set of rules has been agreed upon, the situation is clear. As in chess, it is clear what goes and what doesn’t, and the players’ possible activities are clearly defined.

c) Clear course of play

Here again, the games do not differ from each other. At any point in time, there is precisely one player who is permitted to act, and his or her options are clearly defined.

d) Clear objective

Chess is about beating the opponent’s king; card games are about scoring points or tricks, depending on the variant. Games do not last an eternity. A card game is over when all the cards have been played; in chess, the draw and stalemate rules prevent a game from going on indefinitely. There is always one clear winner, there are always clear losers, and if need be there is a definitive tie.

Differences

e) Clear starting situation?

In chess, the starting situation is identical in every game; all pieces start at their appointed place. At cards, however, the pack of cards is shuffled before every game. Whereas in chess, we always start from precisely the same situation, we have to envisage a new one before every card game. Chance thus plays an important role in cards; in chess, it has been deliberately excluded. This is bound to have consequences. Since I have to factor in chance at cards, I cannot rely on certainties like in chess, but have to rely on probabilities.

f) Hidden information?

A lack of knowledge remains a challenge for card players throughout the game. Whereas in chess, everything is openly recognisable for each player on the board, card games literally thrive on players NOT knowing where the cards are. Therefore they must guess – i.e. rely on probabilities – and run certain risks. There is no guessing in chess; the situation is always clear, open and evident. Of course, this makes it substantially easier to describe the situation in chess; at cards, however, this lack of knowledge makes a description of the situation difficult.

g) Probabilities and emotions (psychology)

If I do not know everything, I have to rely on probabilities. Experience shows that this is something that we human beings are comprehensively very bad at. We let ourselves be guided by emotions much more strongly than we care to admit. Fears and hopes determine our expectations, and we often grossly misjudge probabilities. An AI program naturally has an edge over us in this respect since it does not have to cope with emotions and is much better at computing probabilities. Yet the machine wants to beat its opponent and will therefore have to assess its opponent’s reactions correctly. The AI program would therefore do well to take its opponent’s flawed handling of probabilities into its considerations, but this is not very easy in terms of algorithms. How does it recognise an optimist? Human players try to read their opponents while trying to mislead them about their own emotions at the same time. This is part of the game. It is no use to the program if it makes computations without any emotions while being incapable of recognising and assessing its opponent’s emotions.

h) Communication 

Chess is played by one player against the other. Card games usually involve four players playing each other in pairs. This aspect, i.e. that two individuals have to coordinate their actions, makes the game interesting, and it would be fatal for a card game program to neglect this aspect. But how should we program this? What has to be taken into account here, too, is point f) above, namely the fact that I cannot see my partner’s cards; I neither know my partner’s cards nor my opponents’. Of course my partner and I are interested in coordinating our game, and part of this is that we communicate our options (hidden cards) and our strategies (intentions for driving the game forward) to each other. If, for instance, I hold the ace of hearts, I would like my partner to lead hearts to enable me to win the trick. However, I am not allowed to tell him that openly – yet an experienced card player would not find this a problem. First of all, the run of the game often reveals who holds the ace of hearts. Of course it is not easy to discover this because both the cards that have already been played and possible tactics and strategies have to be taken into consideration. The number of options, the computation of the probabilities and the psychology of the players all come into play here, which can result in very exciting conflict situations – which ultimately also makes the game attractive. In chess, however, with its constantly very explicit situation, circumstances are a great deal simpler in this respect.

But this is not all:

i) The legal grey area

Is it really true that my partner and I are unable to exchange communication about our cards and strategies? Officially, of course, this is prohibited – but can this ban really be implemented in practice?

Of course it can’t. Whereas in chess, it is practically solely the explicit moves that play a part, there is a great deal of additional information at cards which a practised player must be able to read. How am I smiling when I’m playing a card? If I hold the ace of hearts, which can win the next trick, I obviously want my partner to help me and lead hearts. One possibility of achieving this in a jass game is to play a minor heart and place it on the table with distinctive emphasis. A practised partner will easily read this as a signal for him to lead hearts next time rather than diamonds to enable me to win the trick with my ace. No one will really be able to ban anyone from leading a card in a certain way, provided that this is done with sufficient discretion. Partners who are well attuned to each other do not only know the completely legal signals which they automatically emit through the selection of the cards they play, but also some signals from the grey area with which they coordinate their game.

These signals constitute information which an ambitious AI will have to be able to identify and process. The volume of information which it has to process for this purpose is not only much larger than the volume of information in chess, it is not limited by any manner of means either. My AI plays two human opponents, and those two also communicate with each other. The AI should be able to recognise their communication in order not to be hopelessly beaten. The signals agreed upon by the opponents may of course vary and be of any degree of sophistication. How can my AI discover what arrangements the two made prior to the game?

Conclusion

Card games are much more difficult to program than chess

If we want to develop a program for a card game, we will have to take into consideration aspects e) to i), which hardly play any part in chess. In terms of algorithms, however, aspects e) to i) constitute a difficult challenge owing to the imponderabilities.

In comparison with card games, chess is substantially less difficult for a computer because

– there is always the same starting situation,
– there is no hidden information,
– no probabilities need to be taken into account,
– human emotions play a small part,
– there is no legal grey area because no exchange of information between partners is possible.

For an AI program, chess is therefore the simpler game. It is completely defined, i.e. the volume of information that is in the game is very small, clearly disclosed and clearly limited. This is not the case with card games.


This is a blog post about artificial intelligence. In the second part about games and intelligence, I will deal with Go and deep learning .


Translation: Tony Häfliger and Vivien Blandford

How real is the probable?

AI can only see whatever is in the corpus

Corpus-based systems are on the road to success. They are “disruptive”, i.e. they change our society substantially within a very short period of time – reason enough for us to recall how these systems really work.

In previous blog posts I explained that these systems consist of two parts, namely a data corpus and a neural network. Of course, the network is unable to recognise anything that is not already in the corpus. The blindness of the corpus automatically continues in the neural network, and the AI is ultimately only able to produce what is already present in the data of the corpus. The same applies to incorrect input in the corpus: this will reappear in the results of the AI and, in particular, lessen their accuracy.

When we bring to mind the mode of action of AI, this fact is banal, since the learning corpus is the basis for this kind of artificial intelligence. Only that which is in the corpus can appear in the results, and errors and lack of precision in the corpus automatically diminish the validity of the results.

What is less banal is another aspect, which is also essentially tied up with the artificial intelligence of neural networks. It is the role played by probability. Neural networks work through probabilities. What precisely does this mean, and what effects does it have in practice?

Neural networks make assessments according to probability

Starting point

Let’s look again at our search engine from the preceding post. A customer of our search engine enters a search string. Other customers before him have already entered the same search string. We therefore suggest those websites to the customer which have been selected by the earlier customers. Of course we want to place those at the top of the customer’s list which are of most interest to him (cf. preceding post). To be able to do so, we assess all the customers according to their previous queries. How we do this in detail is naturally our trade secret; after all, we want to gain an edge over our competitors. No matter how we do this, however – and no matter how our competitors do it – we end up weighting previous users’ suggestions. On the basis of this weighting process, we select the proposals which we present to our enquirer and the order in which we display them. Here, probabilities are the crucial factor.

Example

Let us assume that enquirer A asks our search engine a question, and the two customers B and C have already asked the same question as A and left their choice, i.e. the addresses of the websites selected by them, in our well-stocked corpus. Which selection should we now prefer to present to A, that of B or that of C?

Now we have a look at the assessments of the three customers: to what extent do B’s and C’s profiles correspond with A’s profile? Let’s assume that we arrive at the following correspondences:

Customer B:  80%
Customer C: 30%

Naturally we assume that B corresponds better with A than C and that A is therefore served better by B’s answers.

But is this truly the case?

The question is justified, for after all, there is no complete correspondence with either of the two other users. It may be the case that it is precisely the 30% with which A and C correspond which concerns A’s current query. In that case, it would be unfortunate to give B’s answer priority, particularly if the 80% correspondence with B concerns completely different fields which have nothing to do with the current query. Admittedly, this deviation from probability is improbable in a specific case, but it is not impossible – and this is the actual crux of probabilities.

Now in this case, we reasonably opted for B, and we may be certain that probability is on our side. In terms of our business success, we may confidently rely on probability. Why?

This is connected with the law of large numbers. In an individual case as described above, C’s answer may indeed by the better one. In most cases, however, B’s answers will be more to our customer’s liking, and we are well advised to provide him with that answer. This is the law of large numbers. Essentially, it is the basis of the phenomenon of probability:

In an individual case, something improbable may happen; in many cases, however, we may rely on it that usually what is probable is what will happen.

Conclusion for our search engine
  1. If we are interested in being right in most cases, we stick to probability.
  2. At the same time, we accept that we may miss the target in rare cases.

Conclusion for corpus-based AI in general

What applies to our search engine generally applies to any corpus-based AI since all these systems work on the basis of probability. Thus the conclusion for corpus-based AI is as follows:

  1. If we are interested in being right in most cases, we stick to probability.
  2. At the same time, we accept that we may miss the target in rare cases.

 We must acknowledge that corpus-based AI has an inherent weak point, a kind of Achilles’ heel of an otherwise highly potent technology. We should therefore continue to watch this heel carefully:

  1. Incidence:
    When is the error most likely to occur, when can it be neglected? This is connected with the size and quality of the corpus, but also with the situation in which the AI is used.
  2. Consequence:
    What are the consequences if rare cases are neglected?
    Can the permanent averaging and observing of solely the most probable solutions be called intelligent?
  3. Interdependencies:
    With regard to the fundamental interdependencies, the connection with the concept of entropy is of interest: the second law of thermodynamics states that in an isolated system, what happens is always what is more probable, and thermodynamics measures this probability with the variable S, which it defines as entropy.
    What is probable is what happens, both in thermodynamics and in our search engine – but how does a natural intelligence choose?

The next blog post will be about games and intelligence, specifically about the difference between chess and a Swiss card games.

This is a post about artificial intelligence.


Translation: Tony Häfliger and Vivien Blandford

Rule-based AI: Where is the intelligence situated

Two AI variants: rule-based and corpus-based

The two AI variants mentioned in previous blog posts are still topical today, and they have registered some remarkable successes. The two differ from each other not least in where precisely their intelligence is situated. Let’s first have a look at the rule-based system.

Structure of a rule-based system

In the Semfinder company, we used a rule-based system. I drew the following sketch of it in 1999:

Semantic interpretation system

Green: data
Yellow: software
Light blue: knowledge ware
Dark blue: knowledge engineer

The sketch consists of two rectangles, which represent different locations. The rectangle bottom left shows what happens in the hospital; the rectangle top right additionally shows what goes on in knowledge engineering.

In the hospital, our coding program reads the doctors’ free texts, interprets them and converts them into concept molecules, and allocates the relevant codes to them with the help of a knowledge base. The knowledge base contains the rules with which the texts are interpreted. In our company, these rules were drawn up by people (human experts). The rules are comparable to the algorithms of a software program, apart from the fact that they are written in a “higher” programming language to ensure that non-IT specialists, i.e. the domain experts, who in our case are doctors, are able to establish them easily and maintain them safely. For this purpose, they use the knowledge base editor, which enables them to view the rules, to test them, to modify them or to establish completely new ones.

Where, then, is the intelligence situated?

It is situated in the knowledge base – but it is not actually a genuine intelligence. The knowledge base is incapable of thinking on its own; it only carries out what a human being has instilled into it. I have therefore never described our system as intelligent. At the very least, intelligence means that new things can be learnt, but the knowledge base learns nothing. If a new word crops up or if a new coding aspect is integrated, then this is not done by the knowledge base but by the knowledge engineer, i.e. a human being. All the rest (hardware, software, knowledge base) only carry out what they have been prescribed to do by human beings. The intelligence in our system was always and exclusively a matter of human beings – i.e. a natural rather than an artificial intelligence.

Is this different in the corpus-based method? In the following post, we will therefore have a closer look at a corpus-based system.

 

This is a post about artificial intelligence.


Translation: Tony Häfliger and Vivien Blandford