How real is the probable?

AI can only see whatever is in the corpus

Corpus-based systems are on the road to success. They are “disruptive”, i.e. they change our society substantially within a very short period of time – reason enough for us to recall how these systems really work.

In previous blog posts I explained that these systems consist of two parts, namely a data corpus and a neural network. Of course, the network is unable to recognise anything that is not already in the corpus. The blindness of the corpus automatically continues in the neural network, and the AI is ultimately only able to produce what is already present in the data of the corpus. The same applies to incorrect input in the corpus: this will reappear in the results of the AI and, in particular, lessen their accuracy.

When we bring to mind the mode of action of AI, this fact is banal, since the learning corpus is the basis for this kind of artificial intelligence. Only that which is in the corpus can appear in the results, and errors and lack of precision in the corpus automatically diminish the validity of the results.

What is less banal is another aspect, which is also essentially tied up with the artificial intelligence of neural networks. It is the role played by probability. Neural networks work through probabilities. What precisely does this mean, and what effects does it have in practice?

Neural networks make assessments according to probability

Starting point

Let’s look again at our search engine from the preceding post. A customer of our search engine enters a search string. Other customers before him have already entered the same search string. We therefore suggest those websites to the customer which have been selected by the earlier customers. Of course we want to place those at the top of the customer’s list which are of most interest to him (cf. preceding post). To be able to do so, we assess all the customers according to their previous queries. How we do this in detail is naturally our trade secret; after all, we want to gain an edge over our competitors. No matter how we do this, however – and no matter how our competitors do it – we end up weighting previous users’ suggestions. On the basis of this weighting process, we select the proposals which we present to our enquirer and the order in which we display them. Here, probabilities are the crucial factor.

Example

Let us assume that enquirer A asks our search engine a question, and the two customers B and C have already asked the same question as A and left their choice, i.e. the addresses of the websites selected by them, in our well-stocked corpus. Which selection should we now prefer to present to A, that of B or that of C?

Now we have a look at the assessments of the three customers: to what extent do B’s and C’s profiles correspond with A’s profile? Let’s assume that we arrive at the following correspondences:

Customer B:  80%
Customer C: 30%

Naturally we assume that B corresponds better with A than C and that A is therefore served better by B’s answers.

But is this truly the case?

The question is justified, for after all, there is no complete correspondence with either of the two other users. It may be the case that it is precisely the 30% with which A and C correspond which concerns A’s current query. In that case, it would be unfortunate to give B’s answer priority, particularly if the 80% correspondence with B concerns completely different fields which have nothing to do with the current query. Admittedly, this deviation from probability is improbable in a specific case, but it is not impossible – and this is the actual crux of probabilities.

Now in this case, we reasonably opted for B, and we may be certain that probability is on our side. In terms of our business success, we may confidently rely on probability. Why?

This is connected with the law of large numbers. In an individual case as described above, C’s answer may indeed by the better one. In most cases, however, B’s answers will be more to our customer’s liking, and we are well advised to provide him with that answer. This is the law of large numbers. Essentially, it is the basis of the phenomenon of probability:

In an individual case, something improbable may happen; in many cases, however, we may rely on it that usually what is probable is what will happen.

Conclusion for our search engine
  1. If we are interested in being right in most cases, we stick to probability.
  2. At the same time, we accept that we may miss the target in rare cases.

Conclusion for corpus-based AI in general

What applies to our search engine generally applies to any corpus-based AI since all these systems work on the basis of probability. Thus the conclusion for corpus-based AI is as follows:

  1. If we are interested in being right in most cases, we stick to probability.
  2. At the same time, we accept that we may miss the target in rare cases.

 We must acknowledge that corpus-based AI has an inherent weak point, a kind of Achilles’ heel of an otherwise highly potent technology. We should therefore continue to watch this heel carefully:

  1. Incidence:
    When is the error most likely to occur, when can it be neglected? This is connected with the size and quality of the corpus, but also with the situation in which the AI is used.
  2. Consequence:
    What are the consequences if rare cases are neglected?
    Can the permanent averaging and observing of solely the most probable solutions be called intelligent?
  3. Interdependencies:
    With regard to the fundamental interdependencies, the connection with the concept of entropy is of interest: the second law of thermodynamics states that in an isolated system, what happens is always what is more probable, and thermodynamics measures this probability with the variable S, which it defines as entropy.
    What is probable is what happens, both in thermodynamics and in our search engine – but how does a natural intelligence choose?

The next blog post will be about games and intelligence, specifically about the difference between chess and a Swiss card games.

This is a post about artificial intelligence.


Translation: Tony Häfliger and Vivien Blandford

Rule-based AI: Where is the intelligence situated

Two AI variants: rule-based and corpus-based

The two AI variants mentioned in previous blog posts are still topical today, and they have registered some remarkable successes. The two differ from each other not least in where precisely their intelligence is situated. Let’s first have a look at the rule-based system.

Structure of a rule-based system

In the Semfinder company, we used a rule-based system. I drew the following sketch of it in 1999:

Semantic interpretation system

Green: data
Yellow: software
Light blue: knowledge ware
Dark blue: knowledge engineer

The sketch consists of two rectangles, which represent different locations. The rectangle bottom left shows what happens in the hospital; the rectangle top right additionally shows what goes on in knowledge engineering.

In the hospital, our coding program reads the doctors’ free texts, interprets them and converts them into concept molecules, and allocates the relevant codes to them with the help of a knowledge base. The knowledge base contains the rules with which the texts are interpreted. In our company, these rules were drawn up by people (human experts). The rules are comparable to the algorithms of a software program, apart from the fact that they are written in a “higher” programming language to ensure that non-IT specialists, i.e. the domain experts, who in our case are doctors, are able to establish them easily and maintain them safely. For this purpose, they use the knowledge base editor, which enables them to view the rules, to test them, to modify them or to establish completely new ones.

Where, then, is the intelligence situated?

It is situated in the knowledge base – but it is not actually a genuine intelligence. The knowledge base is incapable of thinking on its own; it only carries out what a human being has instilled into it. I have therefore never described our system as intelligent. At the very least, intelligence means that new things can be learnt, but the knowledge base learns nothing. If a new word crops up or if a new coding aspect is integrated, then this is not done by the knowledge base but by the knowledge engineer, i.e. a human being. All the rest (hardware, software, knowledge base) only carry out what they have been prescribed to do by human beings. The intelligence in our system was always and exclusively a matter of human beings – i.e. a natural rather than an artificial intelligence.

Is this different in the corpus-based method? In the following post, we will therefore have a closer look at a corpus-based system.

 

This is a post about artificial intelligence.


Translation: Tony Häfliger and Vivien Blandford

Information Reduction 8: Different Macro States

Two states at the same time

In my last article I showed how a system can be described at two levels: that of the micro and that of the macro state. At the micro level, all the information is present in full detail; at the macro level there is less information but what there is, is more stable. We have already discussed the example of the glass of water, where  the micro state describes the movement of the individual water molecules, whereas the macro state encompasses the temperature of the liquid. In this paper I would like to discuss how different the relationship between micro and macro states can be.

Does the macro state depend on the micro state?

In terms of its information content, the macro state is always smaller than the micro. But does it have an existence of its own at all, or is it simply a consequence of the micro state? To what extent is the macro state really determined by the micro state? In my opinion, there are major differences between the different situations in this respect. This becomes clear when we consider the question of how to predict the future of the systems.

Glass of water

If we know the kinetic energy of the many individual molecules that make up a glass of water, we also know its temperature – the macro state can be deduced from knowledge about the micro state. In this case, we also know how it will develop: if the system remains closed, the temperature will remain constant. The macro state remains the same, even though there is a lot of information speeding around in the micro state. The temperature only changes when external influences – and in particular energy flows – come to bear. So, why does the temperature remain the same? It all comes down to the law of conservation of energy. The total amount of energy in the closed system remains constant, which means that however the variables in the micro state change, the macro state remains the same.

But why does the law of conservation of energy apply? This is closely linked to the Hamilton principle or principle of least action. This is one of the most fundamental rules in nature and by no means confined to thermodynamics.

The closed thermodynamic system is an ideal system that hardly ever occurs in such a pure form in nature; in reality, it is always an approximation. Let us now compare this abstract system with some systems that really do exist in the natural world.

Water waves and Bénard cells

This type of system can be observed as a wave on the surface of a body of water. In my opinion, Bénard cells, as described in the work of Prigogine, fall into the same category. In both cases, the macroscopic structures come into being as open systems. Both cells and waves can only arise due to external influences, with Bénard cells forming due to temperature gradient and gravity, and water waves forming due to wind and gravity. The structures arise due to the effects of these external forces, which interact to produce macroscopic structures that, interestingly enough, remain in place for long periods. Their persistence is astonishing. Why does the wave maintain its shape, when the particles of matter it is made up of are constantly changing?

The macroscopic structures formed in such open systems are much more complex than those of an isolated thermal system. Opposing external forces (such as wind and gravity) give rise to completely new forms – waves and cells. The external forces are necessary for the form to emerge and persist, but the resulting macroscopic form itself is new and is not inherent to the external forces, which are very simple in terms of information content.

Just like in the thermal system, we have two levels here: the macro level of the simple outer form (cell or wave) and a micro level of the many molecules that make up the body of this form. And, once again, the macro level – i.e. the form – is much simpler in terms of information content than the micro level, which consists of a huge number of molecules. The wave retains its shape over a long period of time, while the underlying molecules move about frantically. The wave continues to roll, capturing new molecules along the way, which now make up the wave. At given any moment the form, i.e. the coming together of the macro state from the individual molecules, appears completely determined. The information that makes up the form, however, is much easier to grasp at the macro level. The movements of the many individual molecules that make up the wave are there, but do not seem necessary to describe the form of the wave. It looks as though the new macro state is best explained by the old one.

In contrast to more highly developed organisms, the structure of both water waves and Bénard cells disappears as soon as the forces from outside diminish. Our own existence, like that of any other organic life, depends on structures that are much slower to disappear. That is to say: the macro state needs strengthened in relation to the micro state.

The thermostat

The macro state can be bolstered by giving it a controller. Imagine a heating system with a temperature sensor. When the temperature drops, the heating comes on; when it gets too high, the heating goes off. This keeps the temperature, i.e. the macro state, constant. But, of course, this heating system is anything but closed from a thermodynamic point of view. And temperature sensors and control systems to support the macro state and keep it constant are a human invention, not a naturally occurring phenomenon like water waves. Does such a thing exist in the natural world?

Autopoiesis and autopersistence

Of course, such control systems are also found in nature. During my medical studies I was impressed by the number and complexity of control circuits in the human organism. Control is always based upon information. The study of medicine made it evident to me that information is an essential part of the world.

The automatic formation of the wave or Bénard cell is a phenomenon known as autopoiesis. Waves and cells are not stable, but biological organisms are – or, at any rate, they are much more stable than waves. This is because biological organisms incorporate their own control systems. It’s as if a wave were to become aware of its own utter dependency on the wind and respond by actively seeking out its source of sustenance (the wind) or by creating a structure within itself to preserve its energy for the lean times when the wind is not blowing.

This is exactly what the human body – and in fact every biological body – does. It is a macro state that can maintain itself by controlling its micro state and deploying control processes in response to its environment.

Biological systems

This type of system differs from insulated thermal systems by its ability to create shapes, and from simple, randomly created natural shapes such as a water wave by its ability to actively assist the shape’s survival. This is because biological systems can respond to their environment to ensure their own survival. Biological systems differ from the simpler autopoietic systems in their ability to maintain a constant shape for longer thanks to complex internal controls and purposeful activity in response to their environment.

If a system is to maintain a constant form, it needs some kind of memory to preserve the pattern. And, if it is to respond purposefully to its environment, it helps if it has some kind of idea about this outside world. Both this memory of its own pattern and the simplified idea about the outside world need to be represented as information within the biological system information, otherwise it will not be able to maintain its form over time. The biological system thus has some kind of information-based interior. Because of the properties described above, biological systems are always interpreting systems.


This is an article from the series Information reduction.


Translation: Tony Häfliger and Vivien Blandford

What the corpus knows – and what it doesn’t

Compiling the corpus

In a previous post we saw how the corpus – the basis for the neural network of AI – is compiled. The neural network is capable of interpreting the corpus in a refined manner, but of course the neural network cannot extract anything from the corpus that is not in it in the first place.

Neural Network and Corpus
Fig. 1: The neural network extracts knowledge from the corpus

How is a corpus compiled? A domain expert assigns images of a certain class to a certain type, for instance “foreign tanks” vs “our tanks”. In Fig. 2, these categorisations carried out by the experts are the red arrows, which evaluate the tank images in this example.

Expert and Corpus
Fig. 2: Making the categorisations in the corpus

Of course, the human expert’s assignations of the individual images according to the target categories must be correct – but that is not enough. There are fundamental limits to the evaluability of the corpus by a neural network, no matter how refined this may be.

Chance reigns if a corpus is too small

If I have only coloured images of our own tanks and only black and white ones of the foreign tanks (cf. introductory post about AI), the system can easily be led astray and identify all the coloured images as our own tanks and all the black and white ones as tanks of a foreign army. Although this defect can be remedied with a sufficiently large corpus, the example illustrates how important it is that a corpus consists of correct elements. If chance (coloured/black and white) is apt to play a crucial role in a corpus, the system will easily draw the wrong conclusions. Chance plays a greater role the smaller the corpus, but also the higher the number of possible “outcomes” (searched-for variables).

Besides these relative obstacles, there are also fundamental limits to the evaluability of an AI corpus. This is what we are going to look at next.

Tracked or wheeled tank?

Whatever is not in the corpus cannot be extracted from it. Needless to say, I cannot classify aircraft with a tank corpus.

Corpus and Network
Fig. 3: The evaluation is decisive – corpus with friendly and hostile tanks and a network programmed accordingly.

What, though, if our tank system is intended to find out whether a tank is tracked or wheeled? In principle, the corpus may well contain images of both types of tanks. How can the tank AI from our example recognise them?

The simple answer is: it can’t. In the corpus, the system has many images of tanks and knows whether each one is hostile or friendly. But is it wheeled or not? This information is not part of the corpus (yet) and can therefore not be extracted by the AI. Human beings may be capable of evaluating each individual image accordingly, as they did with the “friendly/hostile” properties, but then this would be an intelligence external to the AI that would make the distinction. The neural network is incapable of doing this on its own since it does not know anything about tracks and wheels. It has only learnt to distinguish our tanks from foreign ones. To establish a new category, the relevant information must first be fed into the corpus (new red arrows in Fig. 2), and then the neural network must be trained to answer new questions.

Such training need not necessarily be done on the tank corpus. The system would also be able to make use of the corpus of completely different vehicles whether these move on wheels or tracks. Although the distinction can automatically be transferred to the tank corpus, the external wheels/track system must first be trained – and again with categorisations made by human beings.

On its own, without predefined examples, the AI system will not be able to make this distinction.

Conclusions

  1. Only such conclusions can be drawn from a corpus as are part of that corpus.
  2. Categorisations (the red arrows in Fig. 2) invariably come from the outside, i.e. from human beings.

In our tank example, we have examined a typical image recognition AI. However, do the conclusions drawn from it (cf. above) also apply to other corpus-based systems? And isn’t there something like “deep learning”, i.e. the possibility that an AI system learns on its own?

Let us therefore have a look at a completely different type of corpus-based AI in the next blog post.

This is a post about artificial intelligence.


Translation: Tony Häfliger and Vivien Blandford

Where is intelligence situated in corpus-based AI?

In a preceding post we saw that in rule-based AI, intelligence is situated in the rules. These rules are drawn up by people, and the system is as intelligent as the people who have formulated them. Where, then, is intelligence situated in corpus-based AI?

The answer is somewhat more complicated than in the case of rule-based systems. Let us therefore have a closer look at the structure of such a corpus-based system. It is established in three steps:

  1. compiling as large a data collection as possible (corpus),
  2. assessing this data collection,
  3. training the neural network.

The network can be applied as soon as it has been established:

  1. applying the neural network.

Let’s have a closer look at the four steps.

Step 1: Compiling the data collection

In our tank example, the corpus (data collection) consists of photographs of tanks. Images are typical of corpus-based intelligence, but the collection may of course also contain other kinds of information such as customers’ queries submitted to search engines or GPS data from mobile phones. The typical feature is that the data of each singular entry are made up of so many individual elements (e.g. pixels) that their evaluation by rules consciously drawn up by people becomes too labour-intensive. In such cases, rule-based systems are not worthwhile.

A collection of data alone, however, is not enough. The data now have to be assessed.

Step 2: Assessing the corpus

corpus and neural network
Fig. 1: Corpus-based system

Fig. 1 displays the familiar picture of our tank example. On the left-hand side, you can see the corpus. In this figure, it has already been assessed; the assessment is symbolised by the black and green flags on the left of each tank image.

In simplified terms, the assessed corpus can be imagined as a two-columned table. The left-hand column contains the information about the images, the right-hand column contains the assessment, and the arrow between them is the categorisation, which thus becomes an essential part of the corpus in that it states to which category (o or f) each individual image belongs, i.e. how it has been assessed.

Table of assessments
Tab. 1: Corpus with assessment (o=own, f=foreign)

Typically, the volumes of information in the two columns differ greatly in size. Whereas the assessment in the right-hand column of our tank example consists of precisely one bit, the image in the left-hand column contains all the pixels of the photograph; each and every pixel’s position, colour, etc. have been stored – i.e. a rather large data volume. This difference in the size ratio is typical of corpus-based systems – and if you have philosophical interests, I would like to point out its nexus with the issue of information reduction and entropy (cf. posts on information reduction). At the moment, however, the focus is on intelligence in corpus-based AI systems, and we note that in the corpus, every image is allocated its correct target category.

We do not know how this categorisation happens, for it is carried out by human beings with the neurons in their own heads. These human beings are unlikely to be conscious of the precise behaviour of their neurons, and thus could not identify the rules by which this process is governed. They do know, however, what the images represent, and they indicate this in the corpus by assigning them to the relevant categories. This categorisation is introduced into the corpus from the outside by human beings; it is one hundred per cent man-made. At the same time, this assessment is an absolute condition and the basis for the establishment of the neural network. Later, too, when the completely trained neural network no longer requires the corpus with the categorisations that were brought in from the outside, it will still have been necessary for the network to be set up to be able to work at all.

Where, then, does this intelligence come from in the assignment to the categories o) and f)? It is ultimately human beings who carry out this categorisation (and may also fail to do it correctly); it is their intelligence. Once the categorisation in the corpus has been noted, this is not active intelligence any longer, but fixed knowledge.

Expert and corpus
Fig. 2: Assessing the corpus

The assessment of the corpus is a crucial stage, for which intelligence is undoubtedly required. The compiled data collection has to be assessed, and the domain expert who carries out this assessment has to guarantee that it is correct. In Fig. 2, the domain expert’s intelligence is represented by the yellow sphere. The corpus receives the knowledge thus generated through the categorisations, which in turn are represented as red arrows in Fig. 2.

Knowledge is something different from intelligence. In a certain sense, it is passive. In this sense, the pieces of information contained in the corpus are objects of knowledge, i.e. categorisations which have been formulated and need not be processed any longer. Conversely, intelligence is an active principle which is capable of making valuations on its own as is done by the human expert. The elements in the corpus, however, are data or – in the case of the above-mentioned results of the experts’ intelligence – permanently formulated knowledge.

To distinguish this knowledge from intelligence, I did not colour it yellow in Fig 2, but green.

Thus we usefully distinguish between three things:

data (the data collection in the corpus),
knowledge (the completed assessment of these data),
intelligence (the ability to carry out this assessment).

Step 3: Learning stage

training the neural network
Fig.3: Learning stage

At the learning stage, the neural network is established on the basis of the learning corpus. The success of this process again requires a considerable degree of intelligence; this time, it comes from AI experts, who enable the learning stage to work and who control it. A crucial role is played by algorithms here: they are responsible for the correct evaluation of the knowledge in the corpus and for the neural network taking precisely that shape which will ensure that all the categorisations contained in the corpus can be reproduced by the network itself.

The extraction of knowledge and the algorithms used in the process are symbolised by the brown arrow between corpus and network. The algorithms may appear to display a certain degree of intelligence even though they do not do anything that has not been predefined by the IT experts and the knowledge in the corpus. The emerging neural network itself does not have any intelligence of its own but is the result of this process and thus of the experts’ intelligence. It contains a substantial amount of knowledge, however, and is therefore coloured green in Fig. 3, like the knowledge in Fig. 2. In contrast to the corpus, however, the categorisations (red arrows) are significantly more complex, in precisely the way in which neural networks work in a more complex manner than a simple two-columned table does (Tab. 1).

There is something else that distinguishes the knowledge in the network from the corpus: the corpus contains knowledge about individual cases whereas the network is abstract. It can therefore also be applied to cases that have been unknown to date.

Step 4: Application

Application of a neural network
Fig. 4: Application of a neural network

In Fig. 4, a previously unknown image is assessed by the neural network and categorised according to the knowledge stored in the network. This does not require a corpus any longer, nor does it require an expert; the “trained” but now fixed wiring in the neural network is enough. At this moment, the network is no longer capable of learning anything new. However, it is able to attain perfectly impressive achievements with a completely new input. This performance has been enabled by the preceding work, i.e. the establishment of the corpus, the (hopefully) correct assessments and the algorithms of the learning stage. Behind the learning corpus, there is the domain experts’ human intelligence; behind the algorithms of the learning stage, there is the IT experts’ human intelligence.

Conclusion

What appears to be artificial intelligence to us is the result of the perfectly human, i.e. natural intelligence of domain experts and IT experts.

In the next post, we will have an even closer look at what kind of knowledge a corpus really contains, and at what AI can get out of the corpus and at what it can’t.

This is a post about artificial intelligence.


Translation: Tony Häfliger and Vivien Blandford

The three innovations of rule-based AI

Have the neural networks outpaced the rule-based systems?

It cannot be ignored: corpus-based AI has overtaken rule-based AI by far. Neural networks are making the running wherever we look. Is the competition dozing? Or are rule-based systems simply incapable of yielding equivalent results to those of neural networks?

My answer is that both methods are predisposed for performing very different functions as a matter of principle. A look at their respective modes of action makes clear what the two methods can usefully be employed for. Depending on the problem to be tackled, one or the other has an advantage.

Yet the impression remains: the rule-based variant seems to be on the losing side. Why is that?

In what dead end has rule-based AI got stuck?

In my view, rule-based AI is lagging behind because it is unwilling to cast off its inherited liabilities – although doing so would be so easy. It is a matter of

  1. acknowledging semantics as an autonomous field of knowledge,
  2. using complex concept architectures,
  3. integrating an open and flexible logic (NMR).

We have been doing this successfully for more than 20 years. What do the three points mean in detail?

Point 1: acknowledging semantics as an autonomous field of knowledge

Usually, semantics is considered to be part of linguistics. In principle, there would not be any objection to this, but linguistics harbours a trap for semantics which is hardly ever noticed: linguistics deals with words and sentences. The error consists in perceiving meaning, i.e. semantics, through the filter of language, and assuming that its elements have to be arranged in the same way as language does with words. Yet language is subject to one crucial limitation: it is linear, i.e. sequential – one letter follows another, one word comes after another. It is impossible to place words in parallel next to each other. When we are thinking, however, we are able to do so. And when we investigate the semantics of something, we have to do so in the way we think and not in the way we speak.

Thus we have to find such formalisms for the concepts as occur in thought. The limitation imposed by the linear sequence of the elements and the resulting necessity to reproduce compounds and complex relational structures with grammatical tricks in a makeshift way, and differently in every language – this structural limitation does not apply to thinking, and this results in structures on the side of semantics that are completely different from those on the side of language.

Word ≠ concept

What certainly fails to work is a simple “semantic” annotation of words. A word can have many and very different meanings. One meaning (= a concept) can be expressed with different words. If we want to analyse a text, we must not look at the individual words but always at the general context. Let’s take the word “head”. We may speak of the head of a letter or the head of a company. We cannot integrate the context into our concept by associating the concept of <head< with other concepts. Thus there is a <body part<head< and a <function<head<. The concept on the left (<body part<) then states the type of the concept on the right (<head<). We are thus engaged in typification. We look for the semantic type of a concept and place it in front of the subconcept.

Consistantly composite data elements

The use of typified concepts is nothing new. However, we go further and create extensive structured graphs, which then constitute the basis for our work. This is completely different from working with words. The concept molecules that we use are such graphs possess a very special structure to ensure that they can be read easily and quickly by both people and machines. This composite representation has many advantages, among them the fact that combinatorial explosion is countered very simply and that the number of atomic concepts and rules can thus be drastically cut. Thanks to typification and the use of attributes, similar concepts can be refined at will, which means that by using molecules we are able to speak with a high degree of precision. In addition, the precision and transparency of the representation have very much to do with the fact that the special structure of the graphs (molecules) has been directly derived from the multifocal concept architecture (cf. Point 2).

Point 2: using complex concept architectures

Concepts are linked by means of relations in the graphs (molecules). The above-mentioned typification is such a relation: when the <head< is perceived as a <body part<, then it is of the <body part< type, and there is a very specific relation between <head< and <body part<, namely a so-called hierarchical oris-a’ relation – the latter because in the case of hierarchical relations, we can always say ‘is a”, i.e. in our case: the <head< is a <body part<.

Typification is one of the two fundamental relations in semantics. We allocate a number of concepts to a superordinate concept, i.e. their type. Of course this type is again a concept and can therefore be typified again in turn. This results in hierarchical chains of ‘is-a’ relations with increasing specification, such as <object<furniture<table<kitchen table<. When we combine all the chains of concepts subordinate to a type, the result is a tree. This tree is the simplest of the four types of architecture used for an arrangement of concepts.

This tree structure is our starting point. However, we must acknowledge that a mere tree architecture has crucial disadvantages which preclude the establishment of semantics which are really precise. Those who are interested in the improved and more complex types of architecture and their advantages and disadvantages, will find a short description of the four types of architecture on the website of meditext.ch.

In the case of the concept molecules, we have geared the entire formalism, i.e. the intrinsic structure of the rules and molecules themselves, to the complex architectures. This has many advantages, for the concept molecules now have precisely the same structure as the axes of the multifocal concept architecture. The complex folds of the multifocal architecture can be conceived of as a terrain, with the dimensions or semantic degrees of freedom as complexly interlaced axes. The concept molecules now follow these axes with their own intrinsic structure. This is what makes computing with molecules so easy. It would not work like this with simple hierarchical trees or multidimensional systems. Nor would it work without consistently composite data elements whose intrinsic structure follows the ramifications of the complex architecture almost as a matter of course.

Point 3: integrating an open and flexible logic (NMR)

For theoretically biased scientists, this point is likely to be the toughest, for classic logic appears indispensable to most of them, and many bright minds are proud of their proficiency in it. Classic logic is indeed indispensable – but it has to be used in the right place. My experience shows me that we need another logic in NLP (Natural Language Processing), namely one that is not monotonic. Such non-monotonic reasoning  (NMR) enables us to attain the same result with far fewer rules in the knowledge basis. At the same time, maintenance is made easier. Also, it is possible for the system to be constantly developed further because it remains logically open. A logically open system may disquiet a mathematician, but experience shows that an NMR system works substantially better for the rule-based comprehension of the meaning of freely formulated text than a monotonic one.

Conclusion

Today, the rule-based systems appear to be lagging behind the corpus-based ones. This impression is deceptive, however, and derives from the fact that most rule-based systems have not yet succeeded in jumping ahead of themselves and becoming more modern. This is why they are either

  • only applicable for ckear tasks in a small and well defined domain , or
  • very rigid and therefore hardly employable, or
  • they require an unrealistic use of resources and become unmaintainable.

If, however, we use consistently composite data elements and a higher degree of concept architectures, and if we deliberately refrain from monotonic conclusions, a rule-based system will enable us to get further than a corpus-based one – for the appropriate tasks.

Rule-based and corpus-based systems differ a great deal from each other, and depending on the task in hand, one or the other has the edge. I will deal with this in a later post.

The next post will deal with the current distribution of the two AI methods.

This is a post about artificial intelligence.


Translation: Tony Häfliger and Vivien Blandford

Specification of the challenges for rule-based AI

Rule-based AI is lagging behind

The distinction between rule-based AI and corpus-based AI makes sense in several respects since the two systems work in completely different ways. This does not only mean that their challenges are completely different, it also means that as a consequence, their development trajectories are not parallel in terms of time.

In my view, the only reason for this is that rule-based AI has reached a dead end from which it will only be able to extricate itself once it has correctly identified its challenges. This is why these challenges will be described in more detail below.

Overview of the challenges

In the preceding post, I listed four challenges for rule-based AI. Basically, the first two cannot be remedied: it takes experts to draw up the rules, and these must be experts both in abstract logic and in the specialist field concerned. There is not much that can be changed about this. The second challenge will also remain: finding such experts will remain a problem.

The situation is better for challenges three and four, namely the large number of rules required, and their complexity. Although it is precisely these two that represent seemingly unalterable obstacles of considerable size, the necessary insights may well take the edge off them. However, both challenges must be tackled consistently, and this means that we will have to jettison some cherished old habits and patterns of thought. Let’s have a closer look at this.

The rules require a space and a calculus

 Rule-based AI consists of two things:

  • rules which describe a domain (specialist field) in a certain format, and
  • an algorithm which determines which rules are executed at what time.

In order to build the rules, we require a space which specifies the elements which the rules may consist of and thus the very nature of the statements that can be made within the system. Such a space does not exist of its own accord but has to be deliberately created. Secondly, we require a calculus, i.e. an algorithm which determines how the rules thus established are applied. Of course, both the space and the calculus can be created in completely different ways, and these differences “make the difference”, i.e. they enable a crucial improvement of rule-based AI, albeit at the price of jettisoning some cherished old habits.

Three innovations

In the 1990s, we therefore invested in both the fundamental configuration of the concept space and the calculus. We established our rule-based system on the basis of the following three innovations:

  • data elements: we consistently use composite data elements (concept molecules);
  • space: we arrange concepts in a multidimensional-multifocal architecture;
  • calculus: we rely on non-monotonic reasoning (NMR).

These three elements interact and enable us to capture a greater number of situations more accurately with fewer data elements and rules. The multifocal architecture enables us to create better models, i.e. models which are more appropriate to their situations and contain more details. Since the number of elements and rules decreases at the same time, we succeed in going beyond the boundaries which previously constrained rule-based systems with regard to extent, precision and maintainability.

In the next post, we will investigate how the three above-mentioned innovations work.

This is a post about artificial intelligence.


Translation: Tony Häfliger and Vivien Blandford

The challenges for rule-based AI

Rule-based in comparison with corpus-based

Corpus-based AI (the “Tanks” type; cf. introductory AI post) successfully overcame its weaknesses (cf. preceding post). This was the result of a combination of “brute force” (improved hardware) and an ideal window of opportunity, i.e. when during the super-hot phase of internet expansion, companies such as Google, Amazon, Facebook and many others were able to collect large volumes of data and feed their data corpora with them – and a sufficiently big data corpus is the linchpin of corpus-based AI.

Brute force was not enough for rule-based AI, however, nor was there any point in collecting lots of data, since data also have to be organised for rule construction – and largely manually at that, i.e. by human expert specialists.

Challenge 1: different mentalities

Not everyone is equally fascinated by the process of building algorithms. Building algorithms requires a particular faculty of abstraction combined with a very meticulous vein – with regard to abstractions, at any rate. No matter how small an error in the rule construction may be, it will inevitably have an impact. Mathematicians possess the consistently meticulous mentality that is called for here, but natural scientists and engineers are also favourably characterised by it. Of course, accountants must also be meticulous, but AI rule construction additionally requires creativity.

Salespersons, artists and doctors, however, work in a different field. Abstractions are often incidental; the importance lies in what is tangible and specific. Empathy for other people can also be very important, or someone has to be able to act with speed and precision, as is the case with surgeons. These characteristics are all very valuable, but they are less relevant to algorithm construction.

This is a problem for rule-based AI because rule construction requires the skills of one camp and the knowledge of the other: it requires the mentality that makes a good algorithm designer combined with the way of thinking and the knowledge of the specialist field to which the rules refer. Such combinations of specialist knowledge with a talent for abstraction are rare. In the hospitals in which I worked, both cultures were quite clearly visible in their separateness: on the one hand the doctors, who at best accepted computers for invoicing or certain expensive technical devices but had a low opinion of information technology in general, and on the other hand the computer scientists, who did not have a clue about what the doctors did and talked about. The two camps simply avoided each other most of the time. Needless to say, it was not surprising that the expert systems designed for medical purposes only worked for very small specialist fields – if they had progressed beyond the experimentation stage at all.

Challenge 2: where can I find the experts?

Experts who are creative and equally at home in both mentality camps are obviously hard to find. This is aggravated by the fact that there are no training facilities for such experts. Equally realistic are the following questions: where are the instructors who are conversant with the current challenges? Which diplomas are valid for what? And how can an investor in this new field evaluate whether the experts employed are fit for purpose and the project is moving in the right direction?

Challenge 3: the sheer volume of detailed rules required

The fact that a large volume of detailed knowledge is required to be able to draw meaningful conclusions in a real situation was already a challenge for corpus-based AI. After all, it was only with really large corpora, i.e. thanks to the internet and a boost in computer performance, that it succeeded in gathering the huge volume of detailed knowledge which is one of the fundamental prerequisites for every realistic expert system.

For rule-based AI, however, it is particularly difficult to provide the large volume of knowledge since this provision of knowledge requires people who manually package this large volume of knowledge into computer-processable rules. This is very time-consuming work, which additionally requires hard-to-find human specialist experts who are able to meet the above-mentioned challenges 1 and 2.

In this situation, the question arises as to how larger-scale rule systems which actually work can be built at all. Could there be any possibilities for simplifying the construction of such rule systems?

Challenge 4: complexity

Anyone who has ever tried to really underpin a specialist field with rules discovers that they quickly encounter complex questions to which they find no solutions in the literature. In my field of Natural Language Processing (NLP), this is obvious. The complexity cannot be overlooked here, which is why it is imperative to deal with it. In other words: the principle of hope is not adequate to the task; rather, the complexity must be made the subject of debate and be studied intensively.

What complexity means and how it can be countered will be the subject matter of a further post. Of course, complexity must not result in an excessive increase in rules (cf. challenge 3). The question which therefore arises for rule-based AI is: how can we build a rule system which takes into consideration the wealth of details and complexity while still remaining simple and manageable?

The good news is: there are definitely answers to this question.

In a following post, the challenges will be specified.

This is a post about artificial intelligence.


Translation: Tony Häfliger and Vivien Blandford

Corpus-based AI overcomes its weaknesses

Two AI variants: rule-based and corpus-based

In the preceding post, I mentioned the two fundamental approaches to attempting to imbue computers with intelligence, namely the rule-based approach and the corpus-based approach. In a rule-based system, the intelligence is situated in a rule pool that is deliberately designed by people. In the corpus-based method, the knowledge is contained in the corpus, i.e. in a data collection which is analysed by a sophisticated program.

The performance of both methods has been massively boosted since the 1990s. The most impressive boost has been achieved with the corpus-based method, which is now regarded as the artificial intelligence proper and is making headlines across the board today. What, then, are the crucial improvements of the two methods? To begin with, we’ll have a look at how corpus-based AI works.

How does corpus-based AI work?

Corpus-based AI (c-AI) consists of two parts:

  1. the corpus,
  2. algorithms (neural network).
corpus and neural network
Fig. 1: Structure of a corpus-based AI system

The corpus, which is also called learning corpus, is a collection of data. This can consist of photographs of tanks or faces, but also of collections of search queries, for instance of Google. What is important is that the corpus already contains the data in a weighted form. In the tank example, it has been written into the corpus whether the tanks are friendly or hostile. The collection of faces contains information about the owners of those faces. In the case of the search queries, Google records the links that a searcher clicks, i.e. which suggestion offered by Google is successful. Thus the learning corpus contains knowledge which the corpus-based AI is going to use.

Now the c-AI has to learn. The aim is for the AI to be able to categorise a new tank image, a new face or a new query correctly. For this purpose, the c-AI makes use of the knowledge in the corpus, i.e. the pictures of the tank collection, where it is noted for each image whether the tank is ours or foreign – as represented in Fig. 1.

Now the second component of the c-AI comes into play: the algorithm. Essentially, this is a neural network. It consists of several layers of “neurons” which pick up the input signals, process them and then transmit their own signals to the next higher level. Fig. 1 shows how the first (yellow) neuron layer picks up the signals (pixels) from the image and, after processing them, forwards its own signals to the next (orange) layer until finally, the network arrives at the result of “our tank” or “foreign tank”.

When the neural network is now shown a new image that has not been assessed yet, the process is precisely the same as with the other picture. If the network has been trained well, the program should be able to categorise on its own, i.e. the neural network should be able to discern whether the tank is ours or someone else’s

Query about an unclassified tank
Fig. 2: Search query to the neural network about an unclassified tank

The significance of the data corpus for corpus-based AI

A corpus-based AI finds its detailed knowledge in the corpus that has been specially compiled for it and evaluates the connections which it discovers there. The corpus therefore contains the knowledge which the c-AI evaluates. In our example, the knowledge consists in the connection of the photograph, i.e. a set of wildly arranged pixels, with a simple binary piece of information (our tank/foreign tank). This knowledge is already part of the corpus before the algorithms conduct an evaluation. The algorithms of the c-AI thus do not detect anything that is not already in the corpus. However, the c-AI is now also able to apply the knowledge found in the corpus to new, unassessed cases.

The challenges for corpus-based AI

The challenges for c-AI are unequivocal:

  1. Corpus size: the more images there are in the corpus, the higher the certainty of the categorisation. A corpus that is too small will result in faulty results. The size of the corpus is crucial for the precision and reliability of the results.
  2. Hardware: the processing power required by a c-AI is very high and becomes higher the more precise the method is intended to be. Hardware performance is the decisive factor for the practical applicability of the method.

This quickly clarifies how c-AI has been able to improve its performance so impressively in the last two decades:

  1. The data volumes which Google and other organisations are capable of collecting in the internet have increased drastically. In this respect, Google profits from quite an important amplification effect: the more queries Google receives, the better the corpus and thus its hit rate. The better the hit rate, the more queries Google will receive.
  2. The hardware that is required to evaluate the data is becoming less expensive and more performant. Today, internet companies and other organisations operate huge server farms, without which the processor-intensive evaluations would not be possible in the first place.

Besides the corpus and the hardware, the sophistication of the algorithms naturally also plays a part. However, the algorithms were not bad even decades ago. In comparison with the other two factors – hardware and corpus – the progress made in the field of algorithms only plays a modest part in the impressive success of c-AI.

The success of corpus-based AI

The challenges for c-AI were tackled by the big corporations and organisations extremely successfully.

The above description of the operating mode of c-AI, however, should also reveal the weaknesses immanent in the system, which are accorded less media attention. I will discuss them in more detail in a later post.

Next we will have a look at the challenges for rule-based AI.

This is a post about artificial intelligence.


Translation: Tony Häfliger and Vivien Blandford

 

AI: Vodka and tanks

AI in the last century

AI is a big buzzword today but was already of interest to me in my field of natural language processing in the 1980s and 1990s. At that time, there were two methods which were occasionally labelled AI, but they could not have been more different from each other. The exciting thing is that these two different methods still exist today and continue to be essentially different from each other.

AI-1: vodka

The first method, i.e. the one already used by the very first computer pioneers, was purely algorithmic, i.e. rule-based. Aristotle’s syllogisms are a paradigm of this type of rule-based system:

Premise 1: All human beings are mortal.
Premise 2: Socrates is a human being.
Conclusion: Socrates is mortal.

The expert posits premises 1 and 2, the system then draws the conclusion autonomously. Such systems can be underpinned mathematically. Set theory and first-order logic are often regarded as a safe mathematical basis. Theoretically, such systems were thus waterproof. In practice, however, things looked somewhat different. Problems were caused by the fact that even the smallest details had to be included in the rule system; if they were not, the whole system would “crash”, i.e. draw completely absurd conclusions. The correction of these details increased disproportionately to the extent of the knowledge that was covered. At best, the systems worked for small special fields for which clear-cut rules could be found; when it came to wider fields, however, the rule bases were too large and were no longer maintainable. A further serious problem was the fuzziness which is peculiar to many expressions and which is difficult to grasp with such hard-coded systems.

Thus this type of AI came in for increasing criticism. The following translation attempt may serve as an example of why this was the case. An NLP program translated sentences from English into Russian and then back again. The input of the biblical passage “The spirit is willing but the flesh is weak.” resulted in the retranslation “The vodka is good but the meat is rotten.”

This story may or may not have happened precisely like this, but it demonstrates the difficulties encountered in attempts to capture language with rule-based systems. This example demonstrates the difficulties encountered in attempts to capture language with rule-based systems. The initial euphoria associated with the “electronic brain” and “machine intelligence” since the 1950s fizzled out, the expression “artificial intelligence” became obsolete and was replaced by the term “expert system”, which sounded less pretentious.

Later, in about 2000, the stalwarts of rule-based AI were buoyed up again, however. Tim Berners-Lee, the pioneer of the WWW, launched the Semantic Web initiative with the purpose of improving the usability of the internet. The experts of rule-based AI, who had been educated at the world’s best universities, were ready and willing to establish knowledge bases for him, which they now called ontologies. With all due respect to Berners-Lee and his efforts to introduce semantics to the net, it must be said that after almost 20 years, the Semantic Web initiative has not substantially changed the internet. In my view, there are good reasons for this: the methods of classic mathematical logic are too rigid to map the complex processes of thinking – more about this in other posts, particularly on static and dynamic logic. At any rate, both the classic rule-based expert systems of the 20th century and the Semantic Web initiative have fallen short of the high expectations.

AI-2: tanks

However, there were alternatives which tried to correct the weaknesses of rigid propositional logic as early as the 1990s. For this purpose, the mathematical toolkit was extended.

Such an attempt was fuzzy logic. A statement or a conclusion was now no longer unequivocally true or false; rather, its veracity could be weighted. Besides set theory and predicate logic, probability calculus was now also included in the mathematical toolkit of the expert systems. Yet some problems remained: again, there had to be precise and elaborate descriptions of the rules that were applicable. Thus fuzzy logic was also part of rule-based AI, even though is was equipped with probabilities. Today, such programs work perfectly well in small, well-demarcated technical niches, beyond which they are insignificant.

At that time, another alternative was constituted by the neural networks. The were considered to be interesting; however, their practical applications tended to attract some derision. To illustrate this, the following anecdote was bandied about:

The US Army – which has been an essential driver of computer technology all along – is supposed to have set up a neural network for the identification of US and foreign tanks. A neural network operates in such a way that the final conclusions are found through several layers of conclusions by the system itself. People need not input any rules any longer; they are generated by the system itself.

How is the system able to do this? It requires a learning corpus for this purpose. In the case of tank recognition, this consisted of a series of American and Russian tanks. Thus it was known for every photograph whether it was American or Russian, and the system was trained until it was capable of generating the required categorisation itself. The experts only exerted an indirect influence on the program in that they established the learning corpus; the program compiled the conclusions in the neural network autonomously – without the experts knowing precisely what rules the system used to draw which conclusions from which details. Only the result had to be correct, of course. Now, once the system had completely integrated the learning corpus, it could be tested by being shown a new input, for instance a new tank photo, and it was expected to categorise the new image correctly on the basis of the rules it had found in the learning corpus. As mentioned before, this categorisation was conducted by the system on its own, without the experts exerting any further influence and without them knowing how conclusions were drawn in a specific case.

It was said that this worked perfectly with regard to tank recognition. No matter how many photos were shown to the program, the categorisation was always spot on. The experts could hardly believe that they had really created a program with a 100% identification rate. How could this be? Ultimately, they discovered the reason: the photos of the American tanks were in colour, those of the Russian tanks were in black and white. Thus the program only had to recognise the colour; the contours of the tanks were irrelevant.

Rule-based vs corpus-based

The two anecdotes show what problems were lying in wait for rule-based and corpus-based AI at the time.

  • In the case of rule-based AI (vodka), they were
    – the rigidity of mathematical logic,
    – the fuzziness of our words,
    – the necessity to establish very large knowledge bases,
    – the necessity to use specialist experts for the knowledge bases.
  • In the case of corpus-based AI (tanks), they were
    – the lack of transparency of the paths along which conclusions were drawn,
    – the necessity to establish a very large and correct learning corpus.

I hope that I have been able to describe the characters and modes of operation of the two AI types with the two above (which admittedly are somewhat unfair) examples, including the weaknesses with characterise each type.

Needless to say, the challenges persist. In the following posts I will show how the two AI types have reacted against this and where the intelligence now really resides in the two systems. To begin with, we’ll have a look at corpus-based AI.


This is a blog post about artificial intelligence.

Translation: Tony Häfliger and Vivien Blandford

Combinatorial explosion

Objects and relations

Let us first take a set of objects and consider how many connections (relations) there are between them, leaving aside the nature of the relationships and focussing solely upon their number. This is quite a simple task, because there is always exactly one relation between any two objects. Even if the two objects are entirely unrelated, this fact has a meaning and is thus useful information. We can count the number of possible connections between the objects and compare the number of objects with the number of possible relations.

7 Objekte und ihre Relationen

Fig 1: Seven objects and their relations

Figure 1 shows seven objects (blue) and their relations (red). Every object is connected to every other object. Thus, in our example, each of the 7 objects is connected to 7-1 = 6 other objects, giving a total of 7 * 6 / 2 = 21 relations. The general mathematical formula for this is NR = (NO2 – NO) / 2, where NR is the number of relations and NO is the number of objects.
As we can see from the formula, the number of relations increases in proportion to the square of the number of objects. Or, to put it non-mathematically:

There are always a great many more relations than there are objects!

Below is a small table showing the number of relations for a given number of objects:

NO  NR
———————-
1    0
2    1
3    3
4    6
5    10
6    15
7    21
8    28
9    36
10     45
100   4,950
1000    499,500

Table 1: Objects and relations

While the numbers in the first column are small, the quadratic increase is not particularly noticeable. However, as these numbers rise it quickly becomes more marked. Before we turn our attention to the practical implications of this, let us first take a look at the number of possible combinations.

Objects and combinations

The term ‘combination’ refers to the ways in which a number of objects can be combined with each other. Whereas a relation always relates to precisely two objects, combinations can include any number of objects from 1 to all (= NO).

Tab 2: Objekte und Kombinationen

Table 2: Objects and combinations

Table 2 shows objects and the number of combinations between them for 1 to 4 objects, with the number of objects in the first column and the number of combinations in the second. The objects are identified by letters (a, b, c ,d) and the possible combinations are shown in the column on the far right. When there is only one object (a) there is just one combination consisting solely of this element; when there are 2 the number of combinations rises to 3 and when there are 4 it is 15. The number of combinations per object therefore increases even faster than the number of relations (as described above). The formula for this is: NC = 2No – 1.

As we saw earlier, the relationship between objects and their relations is quadratic. The relationship between objects and combinations, on the other hand, is based upon exponential growth, meaning that it rises even more quickly. When there are 10 objects, the number of combinations is 1023; when there are 100, this figure rises to an incredible 1,267,650,600,600,228,229,429,401,496,703,205,375 or 1.26 * 1036 !

 The number of combinations thus increases extremely rapidly.

This exponential increase forms the basis for the combinatorial explosion.

Combinatorial explosion

Let’s suppose we have a number of different objects with different properties, for example:

4 shapes: round, square, triangular, star-shaped.
8 colours: red, orange, yellow, green, blue, brown, white, black.
7 materials: wood, PVC, aluminium, cardboard, paper, glass, stone.
3 sizes: small, medium-sized, large.

We can now combine these four classes and their 22 properties in any way we want. For example, an object may be triangular, green, medium-sized and made of PVC. Based upon these 22 properties, how many different types of objects can we distinguish between?

We can select one property independently from each of the four classes (shape, colour, material, size), giving a total of 4x8x7x3 = 672 possible combinations. This means that, if there are 22 properties, we can describe 672 different objects. For every additional class, the number of possibilities is multiplied.

It doesn’t take many additional classes before the number of possible combinations explodes.


This is the combinatorial explosion. And it plays a critical role in any information processing – especially when the information relates to the real world, where the number of classes has no natural limit.

Information Reduction 7: Micro and Macro State

Examples of information reduction

In previous texts we looked at examples of information reduction in the following areas:

  • Coding / classification
  • Sensory perception
  • DRG (Flat rate per case)
  • Opinion formation
  • Thermodynamics

What do they have in common?

Micro and macro state

What all these examples have in common is that, in terms of information, there are two states: a micro state with a great many details and a macro state with much less information. One very clear example that many of us will remember from our school days is the relationship between the two levels in thermodynamics.

The two states exist simultaneously, and have less to do with the object itself than with the perspective of the observer. Does he need to know everything, down to the last detail? Or is he more interested in the essence, i.e. the simplified information of the macro state?

Micro and macro state in information theory

The interplay of micro and macro states was first recognised in thermodynamics. In my opinion, however, this is a general phenomenon, which is closely linked to the process of information reduction. It is particularly helpful to differentiate between the two states when investigating information processing in complex situations.
Wherever the amount of information is reduced, a distinction can be drawn between a micro and a macro state. The micro state is the one that contains more information, the macro state less. Both describe the same object, but from different perspectives.

The more detailed micro state is considered to be ‘more real’

We tend to think we are seeing something more clearly if we can discern more details. So we regard the detailed micro state as the actual reality and the macro state as either an interpretation or a consequence of this.

… but the low-information macro state is more interesting

Remarkably, however, the low-information state is of more interest to us than the micro state. In the micro state, there are simply too many details. These are either irrelevant to us (thermodynamics, sensory perception) or they obstruct our clear view of the goal represented by the macro state (coding, classification, opinion-forming, flat rate per case).

Strange antagonism

There is thus a strange antagonism between the two states, with one seeming more real and the other more relevant, as if these two qualities were somehow at odds with one another. The more detailed the information, the less the many single data points have to do with the overall perspective, which thus increasingly disappears from sight. On the other hand: the more intensively the view strives for relevance, the more it detaches itself from the details of reality. This paradoxical relationship between micro and macro state is characteristic of all information reduction relationships and highlights both the importance of, and the challenges associated with, such processes.

Are there differences between the various processes of information reduction?

Absolutely. The only thing they have in common is that it is possible to display the data at a detailed micro level or at a macro level containing little information, with the latter usually being more relevant.

Such processes always involve a reduction in information, but the way in which it is reduced differs. At this point it would be illuminating to examine the differences – which play a decisive role in many issues – more closely. Read more in next post.


This is a page about information reduction — see also overview.

Translation: Tony Häfliger and Vivien Blandford

Information Reduction 6: The Waterglass, Revisited

Is that physics?

In my article Information reduction 5: The classic glass of water, I drew upon the example of a glass of water to illustrate the principle of information reduction. In this example, the complex and detailed information about the kinetic energy of water molecules (micro level) is reduced to simple information about the temperature of the water.

Of course, a physicist might criticise this example – and quite rightly so, because the glass of water is actually much more complicated than this. Boltzmann’s calculations only apply to the ideal gas, i.e. one whose molecules do not interact except when they collide and exchange their individual movement information.

An ideal gas

The ideal gas is an idealisation you won’t find anywhere in the real world. Other forces exist between individual molecules than the purely mechanical ones, and the situation in our glass of water is no different. Because water is a liquid not a gas and because much stronger bonds exist between molecules in liquids than between gas molecules, these additional bonds complicate the picture.

Water

Moreover, water is a special case. The water molecule (H2O) is a strong dipole, which means it has a strong electrical charge difference between its two poles, the negatively charged pole with the oxygen atom (O) and the positively charged pole with the two hydrogen atoms (H2). As a result of this strong polarity, multiple water molecules join together. If such agglomerations were to be maintained, the water would be a solid (such as ice) rather than a liquid. But since they are only temporary, the water remains a liquid, but a special one that behaves in a very particular way. See, for example, the current research of Gerald Pollack.

Physics and information science

A glass of water probably isn’t the example a physicist would have chosen, but I’m not going to change it. It’s as good an example as any to explain the ratio of information at the micro and macro levels. Boltzmann’s calculations are only approximately correct, but his thesis holds: the temperature of an object is the macro-level information that summarises the many data points about the chaotic movements of the individual molecules at the micro level.

The glass of water may be a bad example to a physicist.  For our consideration about micro and macro states, however, it makes no difference whether we are considering an ideal gas or a glass of water: there is always a huge information gap between the macro state and the micro state, and that is the salient point. In a glass of water, the micro state contains billions of times more information than the macro state. And, interestingly, although the micro state is richer in information, it is the macro state that is of greater interest to us.

The transition

How does the transition from micro to macro state take place in different cases? Clearly, this transition is slightly different in the glass of water than in the ideal gas due to the special properties of the H2O molecule. And the transition from the micro to the macro state is completely different in our other examples of classification, concept formation and framing that are not drawn from the physical world. We will now go into these peculiarities. See the posts to come.


This is a page about information reduction — see also overview.

Translation: Tony Häfliger and Vivien Blandford

Logodynamics

What is logic for?

Is logic about thinking? I used to think so, believing that logic was something like the ‘doctrine of thinking’, or even the ‘doctrine of correct thinking’. A closer look, however, reveals that what we call logic, and the field of study that goes by this name, is about proving rather than thinking. Classical logic is in fact the science of the proof.

But there’s a lot more to thinking than proving. If you want to proof something, first you have to find the proofs. Then you have to assess these proofs in context – a context that can change. And what do you do about contradictions? I believe it is the job of logic to investigate the question of how we think in a more general sense. It should be more than just a science of proof. But how do we arrive at such an extended version of logic?

The decisive step for me was the realisation that there are two types of logic: one static and one dynamic. Only when we dare to leave the safe garden of static logic can we begin to examine real thinking.

Classical logic = logostatics

Classical logic shaped Western intellectual life for more than two millennia – from the syllogisms of Aristotle to the scholasticism of the Middle Ages including the teachings of Thomas Aquinas, to the first order logic (FOL) of mathematicians, which represents the widely accepted state of the art today. These systems of logic are truly static. Every statement within them has a generally valid, absolute truth value; the statement is either true or false – and that must not change. In other words: the logical building is static. Mathematicians call such logic monotonic.

Logodynamics

Although contradictions cannot be tolerated in a classical system of logic, in a dynamic one they make up crucial elements in the network of statements. It’s the same in our own minds, where contradictions are nothing more than starting points for our thinking. Finally, contradictions, e.g. observations that are incompatible with one another, force us to take a closer look. If statements are contradictory, it makes us want to reflect on where the truth lies. Contradictions, forbidden in classical logic, are actually the starting point for thinking in dynamic logic. Just as in physics, where an electric voltage supplies the energy that allows current to flow, in logic a contradiction provides the tension that drives us to carry on thinking.

But continuing to think also means always being open to completely new statements. This is another way that logodynamics differs from classical logic. The classical system first defines its ‘world’, i.e. all the elements that may be used subsequently, or indeed at all. The system must be closed. Classical logic requires a clear demarcation (definition) of the world of a system of statements (both true and false) before any conclusions can be drawn in this closed world of statements. By contrast, our thinking is by no means closed. We can always include new objects, test new differentiations for known objects, find new reasons and re-evaluate existing ones. In other words: we can learn. Therefore, a system of logic that approximates the way people think must always be open.

In a classical system of logic, time does not exist. Everything that is true is always true. The situation is very different in a logodynamic system. What is considered true today may be recognised as an error tomorrow. Without this possibility there is no learning. The logodynamic system recognises time as a necessary and internal element.  This fundamentally changes the logical mechanism, the ‘basic switch’ of logic, namely the IF-THEN. The IF-THEN of dynamic logic always has a time element to it – the IF always comes before the THEN. A static system could, at most, recognise time as an object for consideration, along the lines of one of its variables, but not as something that plays a role in its own functioning.

Thus, a logodynamic system has the following three properties that differentiate it from a logostatic one:

  1. Non-monotony: contradictions in the system are allowed.
  2. Openness: new elements can appear in the system at any time.
  3. System-internal time: time passes between IF and THEN.

(Translation: R. Waddington)

IF-THEN: Static or Dynamic

IF-THEN and Time

It’s a commonly held belief that there’s nothing complicated about the idea of IF-THEN from the field of logic. However, I believe this overlooks the fact that there are actually two variants of IF-THEN that differ depending on whether the IF-THEN in question possesses an internal time element.

Dynamic (real) IF-THEN

For many of us, it’s self-evident that the IF-THEN is dynamic and has a significant time element. Before we can get to our conclusion – the THEN – we closely examine the IF – the condition that permits the conclusion. In other words, the condition is considered FIRST, and only THEN is the conclusion reached.

This is the case not only in human thinking, but also in computer programs. Computers allow lengthy and complex conditions (IFs) to be checked. These must be read from the computer’s memory by its processor. It may be necessary to perform even smaller calculations contained in the IF statements and then compare the results of the calculations with the set IF conditions. These queries naturally take time. Even though the computer may be very fast and the time needed to check the IF minimal, it is still measurable. Only AFTER checking can the conclusion formulated in the computer language – the THEN – be executed.

In human thinking, as in the execution of a computer program, the IF and the THEN are clearly separated in time. This should come as no surprise, because both the sequence of the computer program and human thinking are real processes that take place in the real, physical world, and all real-world processes take time.

Static (ideal) IF-THEN

It may, however, surprise you to learn that in classic mathematical logic the IF-THEN takes no time at all. The IF and the THEN exist simultaneously. If the IF is true, the THEN is automatically and immediately also true. Actually, even speaking of a before and an after is incorrect, since statements in classical mathematical logic always take place outside of time. If a statement is true, it is always true, and if it is false, it is always false (= monotony, see previous posts).

The mathematical IF-THEN is often explained using Venn diagrams (set diagrams). In these visualisations, the IF may, for example, be represented by a set that is a subset of the THEN set. For mathematicians, IF-THEN is a relation that can be derived entirely from set theory. It’s a question of the (unchangeable) states of true or false rather than of processes, such as thinking in a human brain or the execution of a computer program.

Thus, we can distinguish between
  • Static IF-THEN:
    In ideal situations, i.e. in mathematics and in classical mathematical logic.
  • Dynamic IF-THEN:
    In real situations, i.e. in real computer programs and in the human brain.
Dynamic logic uses the dynamic IF-THEN          

If we are looking for a logic that corresponds to human thinking, we must not limit ourselves to the ideal, i.e. static, IF-THEN. The dynamic IF-THEN is a better match for the normal thought process. This dynamic logic that I am arguing for takes account of time and needs the natural – i.e. the real and dynamic – IF-THEN.

If time is a factor and the world may be a slightly different place after the first conclusion has been drawn, it matters which conclusion is drawn first. Unless you allow two processes to run simultaneously, you cannot draw both conclusions at the same time. And even if you do, the two parallel processes can influence each other, complicating the matter still further. For this reason along with many others, dynamic logic is much more complex than the static variant. This increases our need for a clear formalism to help us deal with this complexity.

Static and dynamic IF-THEN side by side

The two types of IF-THEN are not mutually exclusive; they complement each other and can coexist. The classic, static IF-THEN describes logical states that are self-contained, whereas the dynamic variant describes logical processes that lead from one logical state to another.

This interaction between statics and dynamics is comparable with the situation in physics, where we find statics and dynamics in mechanics, and electrostatics and electrodynamics in the study of electricity. In these fields, too, the static part describes the states (without time) and the dynamic part the change of states (with time).