Tag Archives: Artificial Intelligence

What Can I Know?


This website is powered by the question of how thinking works.


Information and Interpretation

How is data assigned a meaning? What does information consist of? The answer seems clear, as the bit is generally regarded as its building block.

Entropy is the quantity by which information appears in physics – thanks to C. E. Shannon, the inventor of the bit. Bits measure entropy and are regarded as the measure of information. But what is entropy and what does it really have to do with information?


Artificial Intelligence (AI)

Today there is a lot of talk about AI. I have been creating such systems for forty years – but without labelling them with this publicity term.

  • The big difference: corpus-based and rule-based AI
  • How real is the probable?
  • Which requires more intelligence: jassen (a popular Swiss card game) or chess?
  • What distinguishes biological intelligence from machine intelligence?

What today is called AI are always neural networks. What is behind this technology? Neural networks are extremely successful – but are they intelligent?

-> Can machines be intelligent? 


Logic

Mathematical logic, to many, appears to be the ultimate in rationality and logic. I share the respect for the extraordinary achievements of the giants on whose shoulders we stand. However, we can also think beyond this:

  • Are statements always either true or false?
  • Can classical logic with its monotonicity really be used in practice?
  • How can time be incorporated into logic?
  • Can we approach logical contradictions in a logically correct way?

Aristotle’s classical syllogisms still influence our view of the world today. This is because they gave rise to the ‘first order logic’ of mathematics, which is generally regarded as THE classical logic. Is there a formal way out of this restrictive and static logic, which has a lot to do with our static view of the world?

-> Logic: From statics to dynamics


Semantics and NLP (Natural Language Processing)

Our natural language is simply ingenious and helps us to communicate abstract ideas. Without language, humanity’s success on our planet would not have been possible.

  • No wonder, then, that the science that seeks to explain this key to human success is considered particularly worthwhile. In the past, researchers believed that by analysing language and its grammar they could formally grasp the thoughts conveyed by it, which is still taught in some linguistics departments today. In practice, however, the technology ‘Large Language Model’ (LLM) of Google’s has shattered this claim.

As a third option, I argue in favour of a genuinely semantic approach that avoids the gaps in both the grammar and the LLMs. We will deal with the following:

  • Word and meaning
  • Semantic architectures
  • Concept molecules

-> Semantics and Natural Language Processing (NLP)


Scales: Music and Maths

A completely different topic, which also has to do with information and the order in nature, is the theory of harmony. Rock and hits are based on a simple theory of harmony, jazz and classical music on complex ones. But why do these information systems work? Not only can these questions be answered today, the answers also provide clues to the interplay between the forces of nature.

  • Why do all scales span an octave?
  • The overtone series is not a scale!
  • Standing waves and resonance
  • Prime numbers and scales

-> How musical scales evolved


The author

My name is Hans Rudolf Straub. Information about my person can be found here.


Books

On the topics of computational linguistics, philosophy of information, NLP and concept molecules:

The Interpretive System, H.R. Straub, ZIM-Verlag, 2020 (English version)
More about the book

Das interpretierende System, H.R. Straub, ZIM-Verlag, 2001 (German version)
More about the book

On the subject of artificial intelligence:

Wie die künstliche Intelligenz zur Intelligenz kommt, H.R. Straub, ZIM-Verlag, 2021 (Only available in German)
More about the book
Ordering the book from the publisher

You can order a newsletter here.


Thank You

Many people have helped me to develop these topics. Wolfram Fischer introduced me to the secrets of Unix, C++ and SQL and gave me the opportunity to build my first semantic interpretation programme. Norbert Frei and his team of computer scientists actively helped to realise the concept molecules. Without Hugo Mosimann and Maurus Duelli, Semfinder would neither have been founded nor would it have been successful. The same applies to Christine Kolodzig and Matthias Kirste, who promoted and supported Semfinder in Germany. Csaba Perger and Annette Ulrich were Semfinder’s first employees, full of commitment and clever ideas and – as knowledge engineers – provided the core for the emerging knowledge base.

Wolfram Fischer actively helped me with the programming of this website. Most of the translations into English were done by Vivien Blandford and Tony Häfliger, as well as Juan Utzinger.

Thank you sincerely!

Artificial and natural intelligence: the difference

What is real intelligence? 

Paradoxically, the success of artificial intelligence helps us to identify essential conditions of real intelligence. If we accept that artificial intelligence has its limits and, in comparison with real intelligence, reveals clearly discernible flaws – which is precisely what we recognised and described in previous blog posts – then these descriptions do not only show what artificial intelligence lacks, but also where real intelligence is ahead of artificial intelligence. Thus we learn something crucial about natural intelligence.

What have we recognised? What are the essential differences? In my view, there are two properties which distinguish real intelligence from artificial intelligence. Real intelligence

– also works in open systems and

– is characterised by a conscious intention.

 

Chess and Go are closed systems

In the blog post on cards and chess, we examined the paradox that a game of cards appears to require less intelligence from us humans than chess, whereas it is precisely the other way round for artificial intelligence. In chess and Go, the computer beats us; at cards, however, we are definitely in with a chance.

Why is this the case? – The reason is the closed nature of chess, which means that nothing happens that is not provided for. All the rules are clearly defined. The number of fields and pieces, the starting positions and the way in which the pieces may move, who plays when and who has won at what time and for what reasons: all this is unequivocally set down. And all the rules are explicit; whatever is not defined does not play a part: what the king looks like, for instance. The only important thing is that there is a king and that, in order to win the game, his opponent has to checkmate him. In an emergency, a scrap of paper with a “K” on it is enough to symbolise the king.

Such closed systems can be described with mathematical clarity, and they are deterministic. Of course, intelligence is required to win them, but this intelligence may be completely mechanical – that is, artificial intelligence.

Pattern recognition: open or closed system?

This looks different in the case of pattern recognition where, for example, certain objects and their properties have to be identified on images. Here, the system is basically open, for it is not only possible that images with completely new properties can be introduced from the outside. In addition, the decisive properties themselves that have to be recognised can vary. The matter is thus not as simple, clearly defined and closed as in chess and Go. Is it a closed system, then?

No, it isn’t. Whereas in chess, the rules place a conclusive boundary around the options and objectives, such a safety fence must be actively placed around pattern recognition. The purpose of this is to organise the diversity of the patterns in a clear order. This can only be done by human beings. They assess the learning corpus, which includes as many pattern examples as possible, and allocate each example to the appropriate category. This assessed learning corpus then assumes the role of the rules of chess and determines how new input will be interpreted. In other words: the assessed learning corpus contains the relevant knowledge, i.e. the rules according to which previously unknown input is interpreted. It corresponds to the rules of chess.

The AI system for pattern recognition is thus open as long as the learning corpus has not been integrated; with the assessed corpus, however, such a system becomes closed. In the same way that the chess program is set clear limits by the rules, expert assessment provides the clear-cut corset which ultimately defines the outcome in a deterministic way. As soon as the assessment has been made, a second and purely mechanical intelligence is capable of optimising the behaviour within the defined limits – and ultimately to a degree of perfection which I as a human being will never be able to achieve.

Who, though, specifies the content of the learning corpus which turns the pattern recognition program into a technically closed system? It is always human experts who assess the pattern inputs und who thus direct the future interpretation done by the AI system. In this way pattern recognition can be turned into a closed task like a game of chess or go which can be solved by a mechanical algorithm.

In both cases – in the initially closed game program (chess and Go) as well as in the subsequently closed pattern recognition program – the algorithm finds a closed situation, and this is the prerequisite for an artificial, i.e. mechanical intelligence to be able to work.

Conclusion 1:
AI algorithms can only work in closed spaces.

In the case of pattern recognition, the human-made learning corpus provides this closed space.

Conclusion 2:
Real intelligence also works in open situations.

Is there any intelligence without intention?

Why is artificial intelligence unable to work in an open space without assessments introduced from outside? Because it is only the assessments introduced from outside that make the results of intelligence possible. And assessments cannot be provided purely mechanically by the AI but are always linked to the assessors’ views and intentions.

Besides the differentiation between open and closed systems, our analysis of AI systems shows us still more about real intelligence, for artificial and natural intelligence also differ from each other with regard to the extent to which individual intentions play a part in their decision-making.

In chess programs, the objective is clear: to checkmate the opponent’s king. The objective which determines the assessment of the moves, namely the intention to win, does not have to be laboriously recognised by the program itself but is intrinsically given.

With pattern recognition, too, the role of the assessment intention is crucial, for what kind of patterns should be distinguished in the first place? Foreign tanks versus our own tanks? Wheeled tanks versus tracked tanks? Operational ones versus damaged ones? All these distinctions make sense, but the AI must be set, and adjusted to, a specific objective, a specific intention. Once the corpus has been assessed in a certain direction, it is impossible to suddenly derive a different property from it.

As in the chess program, the artificial intelligence is not capable of finding the objective on its own: in the chess program, the objective (checkmate) is self-evident; in pattern recognition, the assessors involved must agree on the objective (foreign/own tanks, wheeled/tracked tanks) in advance. In both cases, the objective and the intention come from the outside.

Conversely, natural intelligence has to determine itself what is important and what is unimportant, and what objectives it pursues. In my view, an active intention is an indispensable property of natural intelligence and cannot be created artificially.

Conclusion 3:
In contrast to artificial intelligence, natural intelligence is characterised by the fact that it is able to judge, and deliberately orient, its own intentions.


This is a blog post about artificial intelligence. You can find further posts through the overview page about AI.


Translation: Tony Häfliger and Vivien Blandford

Now where in artificial intelligence is the intelligence located?


In a nutshell: the intelligence is always located outside.


a) Rule-based systems

The rules and algorithms of these systems are created by human beings, and no one will ascribe real intelligence to a pocket calculator. The same also applies to all other rule-based systems, however refined they may be. The rules are devised by human beings.

b) Conventional corpus-based systems (neural networks)

These systems always use an assessed corpus, i.e. a collection of data which have already been evaluated  (details). This assessment decides according to what criteria each individual corpus entry is classified, and this classification then constitutes the real knowledge in the corpus.

However, the classification cannot be derived from the data of the corpus itself but is always introduced from the outside. And it is not only the allocation of a data entry to a class that can only be done from the outside; rather, the classes themselves are not determined by the data of the corpus, either, but are provided from the outside – ultimately by human beings.

The intelligence of these systems is always located in the assessment of the data pool, i.e. the allocation of the data objects to predefined classes, and this is done from the outside, by human beings. The neural network which is thus created does not know how the human brain has found the evaluations required for it.

c) Search engines

Search engines constitute a special type of corpus-based system and are based on the fact that many people use a certain search engine and decide with their clicks which internet links can be allocated to the search string. Ultimately, search engines only average the traces which the many users leave with their context knowledge and their intentions. Without the human brains of the users who have used the search engines so far, the search engines would not know where to point new queries.

d) Game programs (chess, Go, etc.) / deep learning

This is where things become interesting, for in contrast to the other corpus-based systems, such programs do not require any human beings who assess the corpus, which consists of the moves of games previously played from the outside. Does this mean, then, that such systems have an intelligence of their own?

Like the pattern recognition programs (b) and the search engines (c), the Go program has a corpus which in this case contains all the moves of the test games played before. The difference from the classic AI systems consists in the fact that the assessment of the corpus (i.e. the moves of the games) is already defined by the success in the actual game. Thus no human being is required who has to make a distinction between foreign tanks and our own tanks in order to provide the template for the neural network. The game’s success can be directly recognised by the machine, i.e. the algorithm itself; human beings are not required.

With classic AI systems, this is not the case, and a human being who assesses the individual corpus items is indispensable. Added to this, the assessment criterion is not given unequivocally, as it is with Go. Tank images can be categorised in completely different ways (wheeled/tracked tanks, damaged/undamaged tanks, tanks in towns/open country, in black and white/coloured pictures, etc.). This opens the interpretation options for the assessment at random. For all these reasons, an automatic categorisation is impossible with classic AI systems, which therefore always require an assessment of the learning corpus by human experts.

In the case of chess and Go, it is precisely this that is not required. Chess and Go are artificially designed and completely closed systems and thus indeed completely determined in advance. The board, the rules and the objective of the game – and thus also the assessment of the individual moves – are given automatically. Therefore no additional intelligence is required; instead, an automatism can play test games with itself within a predefined, closed setting and in this way attain the predefined objective better and better until it is better than any human being.

In the case of tasks which have to be solved not in an artificial game setting but in reality, however, the permitted moves and objectives are not completely defined, and there is leeway for strategy. An automatic system like deep learning cannot be applied in open, i.e. real situations.

It goes without saying that in practice, a considerable intelligence is required to program victory in Go and other games, and we may well admire the intelligence of the engineers at Google, etc., for that, yet once again it is their human intelligence which enables them to develop the programs, and not an intelligence which the programs designed by them are able to develop themselves.

Conclusion

AI systems can be very impressive and very useful, but they never have an intelligence of their own.

Artificial Intelligence (Overview )

Is AI dangerous or useful?

This question is currently the subject of extensive debate. The aim here is not to repeat well-known opinions, but to shed light on the basics of the technology that you are almost certainly unaware of. Or do you know where AI gets its intelligence from?

For a quarter of a century, I have been working with ‘intelligent’ IT systems and I am astonished that we ascribe real intelligence to artificial intelligence at all. That’s exactly what it doesn’t have. Its intelligence always comes from humans, who not only provide the data, but also have to evaluate its meaning before the AI can use it. Only then, AI can surprise us with its impressiv performance and countless useful applications in a wide variety of areas. How does it achieve this?

In 2019, I started a blog series on this topic, which you can see an overview of below. In 2021, I then summarised the articles in a book entitled “Wie die künstliche Intelligenz zur Intelligenz kommt” (in German). See below a list of blogposts which form the basis of the book.

While the book is in German, the blogseries is available both in German and English.


Latest Posts about AI

English Posts:

German Posts:


Earlier Posts (basis of the KI-book)

Rule-based or corpus-based?

These are the two fundamentally different methods of computer intelligence. They can either be based on rules or a collection of data (corpus). In the introductory post, I present the two with the help of two characteristic anecdotes:


With regard to success, the corpus-based systems have obviously outstripped the rule-based ones:


The rule-based systems had a more difficult time of it. What are their challenges? How can they overcome their weaknesses? And where is their intelligence situated inside them?


How are corpus-based systems set up? How is their corpus compiled and assessed? What are neural networks all about? And what are the natural limits of corpus-based systems?


Next, we’ll have a look at search engines, which are also corpus-based systems. How do they arrive at their proposals? Where are their limits and dangers? Why, for instance, is it inevitable that bubbles are formed?


Is a program capable of learning without human beings providing it with useful pieces of advice? It appears to work with deep learning. To understand this, we first compare a simple card game with chess: what requires more intelligence? Surprisingly, it becomes clear that for a computer, chess is the simpler game.

With the help of the general conditions of the board games Go and chess, we recognise under what conditions deep learning works.


In the following blog post, I’ll provide an overview of the AI types known to me. I’ll draw a brief outline of their individual structures and of the differences in the way they work.

So where is the intelligence?


The considerations reveal what distinguishes natural intelligence from artificial intelligence:


AI only shows its capabilities when the task is clear and simple. As soon as the question becomes complex, they fail. Or they fib by arranging beautiful sentences found in their treasure trove of data in such a way that it sounds intelligent (ChatGPT, LaMDA). They do not work with logic, but with statistics, i.e. with probability. But is what appears to be true always true?

The weaknesses necessarily follow from the design principle of AI. Further articles deal with this:

Games and Intelligence (2): Deep Learning

Go and chess

The Asian game of Go shares many similarities with chess while being simpler and more sophisticated at the same time.

The same as in chess:
– Board game → clearly defined playing field
– Two players (more would immediately increase complexity)
– Unequivocally defined possibilities of playing the stones (clear rules)
– The players place stones alternately (clear timeline).
– No hidden information (as, for instance, in cards)
Clear objective (the player who has surrounded the larger territory wins)

Simpler in Go:
– Only one type of piece: the stone (unlike in chess: king, queen, etc.)

More complex/requires more effort:
– Go has a slightly larger playing field.
– The higher number of fields and stones require more computation.
– Despite its very simple rules, Go is a highly sophisticated game.

Summary

Compared with their common features, the differences between Go and chess are minimal. In particular, Go satisfies the strongly limiting preconditions a) to d), which enable an algorithm to tackle the job:

a) a clearly defined playing field,
b) clearly defined rules,
c) a clearly defined course of play,
d) a clear objective.(Cf. also preceding blog post)

Go and deep learning

Google has beaten the best human Go players. This victory was achieved by means of a type of AI which is called deep learning. Many people think that this proves that a computer – i.e. a machine – can be genuinely intelligent. Let us therefore have a closer look at how Google managed to do this.

Rule- or corpus-based, or a new, third system?

The strategies of the known AI programs are either rule-based or corpus-based. In previous posts, we asked ourselves where the intelligence in these two strategies comes from, and we realised that the intelligence in rule-based AI is injected into the system by the human experts who establish the rules. Corpus-based AI also requires human beings, since all the inputs into the corpus must be assessed (e.g. friendly/hostile tanks), and these assessments can always be traced back to people even if this is not immediately obvious.

However, what does this look like in the case of deep learning? Obviously, it does not require any human beings any longer in order to provide specific assessments – in Go, with regard to the individual moves’ chances of winning; rather, it is sufficient for the program to play against itself and find out on its own which moves have proved most successful. In this, deep learning does NOT depend on human intelligence and – in chess and Go – even turns out to be superior to human intelligence.

Deep learning is corpus-based

Google’s engineers undoubtedly did a fantastic job. Whereas in conventional corpus-based applications, the data for the corpus have to be compiled laboriously, this is quite simple in the case of the Go program: the engineers simply have the computer play against itself, and every game is an input into the corpus. No one has to take the trouble to trawl the internet or any other source for data; instead, the computer is able to generate a corpus of any size very simply and quickly. Although like the programs for pattern recognition, deep learning for Go continues to depend on a corpus, this corpus can be compiled in a much simpler way – and automatically at that.

Yet it gets even better for deep learning. Not only is the compilation of the corpus much simpler, but the assessment of the single moves in the corpus is also very easy: Finding out the best move from among all the moves that are possible at any given time no longer requires any human experts. How does this work? How is deep learning capable of drawing intelligent conclusions without any human intelligence at all? This may be astonishing, but if we look at it in more detail, it becomes clear why this is indeed the case.

The assessment of corpus inputs

The difference is the assessment of the corpus inputs. To illustrate this, let’s have another look at the tank example. Its corpus consists of tank images, and a human expert has to assess each picture according to whether it shows one of our own tanks or a foreign tank. As explained, this requires human experts. In our second example, the search engine, it is also human beings, namely the users, who assess whether the link to a website suggested in the corpus fits the input search string. Both types of AI cannot do without human intelligence.

With deep learning, however, this is really different. The assessment of the corpus, i.e. the individual moves that make up the many different Go test games, does not require any additional intelligence. The assessment automatically results from the games themselves, since the only criterion is whether the game has been won or lost. This, however, is known to the corpus itself since it has registered the entire course of every game right to the end. Therefore the way in which every game has proceeded, automatically contains its own assessment – assessments by human beings are no longer required.

The natural limits of deep learning

The above, however, also reveals the conditions in which deep learning is possible at all: for the course of the game and the assessment to be clear-cut, there must not be any surprises. Ambiguous situations and uncontrollable outside influences are not allowed. For everything to be flawlessly calculable, the following is indispensable:

1. A closed system

This is given by the properties a) to c) (cf. preceding post), which games like chess and Go possess, namely

a) a clearly defined playing field,
b) clearly defined rules,
c) a clearly defined course of play.

A closed system is necessary for deep learning to work. Such a system can only be an artificially constructed system, for there are no closed systems in nature. It is no accident that chess and Go are particularly suitable for AI since games always have this aspect of being consciously designed. Games which integrate chance as part of the system, such as cards in the preceding post, are not absolutely closed systems any longer and therefore less suitable for artificial intelligence.

2. A clearly defined objective

A clearly defined objective – point d) in the preceding post – is also necessary for the assessment of the corpus to take place without any human interference, because the objective of the process under investigation and the assessment of the corpus inputs are closely connected. We must understand that the target of the corpus assessment is not given by the corpus data. Data and assessment are two different things. We have already discussed this in the example of the tanks, where we saw that a corpus input, i.e. the pixels of a tank photograph, did not automatically contain its own assessment (hostile/friendly). The assessment is a piece of information which is not intrinsic to the individual data (pixels) of an image; rather, it has to be fed into the corpus from the outside (by an interpreting intelligence). Therefore the same corpus input can also be assessed in very different ways: if the corpus is told whether an individual image is one of our own tanks or a foreign tank, it still does not know whether it is a tracked tank or a wheeled tank. With all such images, assessments can go in very different directions – unlike with chess and Go, where a move in a game (which is known to the corpus) is solely assessed according to the criterion of whether it is conducive to winning the game.

Thus chess and Go pursue a simple, clearly defined objective. In contrast to these two games, however, tank pictures allow for a wide variety of assessment objectives. This is typical of real situations. Real situations are always open, and in such situations, various and differing assessements can make sense and are absolutely appropriate. For the purpose of assessment, an instance (intelligence) outside the data has to establish the connection between the data and the assessment objective. This function is always linked to an instance with a certain intention.

Machine intelligence, however, lacks this intention and therefore depends on being provided with it by an objective from the outside. If the objective is as self-evident as it is in chess and Go, this is not a problem, and the assessment of the corpus can indeed be conducted by the machine itself without any human intelligence. In such unequivocal situations, machine deep learning is genuinely capable of working – indeed, even of beating human intelligence.

However, this only applies if the rules and the objective of a game are clearly defined. In all other cases, it is not an algorithm that is required but “real” intelligence, i.e. intelligence with a deliberate intention.

Conclusion

  1. Deep learning (DL) works.
  2. DL uses a corpus-based system.
  3. DL is capable of beating human intelligence in certain applications.
  4. However, DL only works in a closed system.
  5. DL only works if the objective is clear and unequivocal.

Ad 4) Closed systems are not real but are either obvious constructs (like games) or idealisations of real circumstances (= models). Such idealisations are invariably simplification with reduced information content. They are therefore incapable of mapping reality completely.

Ad 5) The objective, i.e. the “intention”, corresponds to a subjective momentum. This subjective momentum distinguishes natural from machine intelligence. The machine must be provided with it in advance.

This is a blog post about artificial intelligence.


Translation: Tony Häfliger and Vivien Blandford

Overview of the AI systems

All the systems we have examined so far, including deep learning, can in essence be traced back to two methods: the rule-based method and the corpus-based method. This also applies to the systems we have not discussed to date, namely simple automata and hybrid systems, which combine the two above approaches. If we integrate these variants, we will arrive at the following overview:

A: Rule-based systems

Rule-based systems are based on calculation rules. These rules are invariably IF-THEN commands, i.e. instructions which assign a certain result to a certain input. These systems are always deterministic, i.e. a certain input always leads to the same result. Also, they are always explicit, i.e. they involve no processes that cannot be made visible, and the system is always completely transparent – at least in principle. However, rule-based systems can become fairly complex.

A1: Simple automaton (pocket calculator type)

Fig. 1: Simple automaton

Rules are also called algorithms (“Algo”) in Fig. 1. Input and outputs (results) need not be figures. The simple automaton distinguishes itself from other systems in that it does not require any special knowledge base, but works with a few calculation rules. Nevertheless, simple automata can be used to make highly complex calculations, too.

Perhaps you would not describe a pocket calculator as an AI system, but the differences between a pocket calculator and the more highly developed systems right up to deep learning are merely gradual in nature – i.e. precisely of the kind that is being described on this page. Complex calculations soon strike us as intelligent, particularly if we are unable to reproduce them that easily with our own brains. This is already the case with simple arithmetic operations such as divisions or root extraction, where we quickly reach our limits. Conversely, we regard face recognition as comparatively simple because we are usually able to recognise faces quite well without a computer. Incidentally, nine men’s morris is also part of the A1 category: playing it requires a certain amount of intelligence, but it is complete in itself and easily controllable with an AI program of the A1 type.

A2: Knowledge-based system

Fig. 2: Compiling a knowledge base (IE=Inference Engine)

These systems distinguish themselves from simple automata in that part of their rules have been outsourced to a knowledge base. Fig. 2 indicates that this knowledge base has been compiled by a human being, and Fig. 3 shows how it is applied. The intelligence is located in the rules; it originates from human beings – in the application, however, the knowledge base is capable of working on its own.

Fig. 3: Application of a knowledge-based system

The inference machine (“IE” in Figs. 2 and 3) corresponds to the algorithms of the simple automaton in Fig. 1. In principle, algorithms, the inference engine and the rules of the knowledge bases are always rules, i.e. explicit IF-THEN commands. However, these can be interwoven and nested in a variety of different ways. They can refer to figures or concepts. Everything is made by human experts.

The rules in the knowledge base are subordinate to the rules of the inference engine. The latter control the flow of the interpretation, i.e. they decide what rules of the knowledge base are to be applied and how they are to be implemented. The rules of the inference engine are the actual program that is read and executed by the computer. The rules of the knowledge base, however, are not directly executed by the computer, but indirectly through the instructions provided by the inference engine. This is nesting – which is typical of commands, i.e. software in computers; after all, the rules of the inference engine are not implemented directly but read by deeper rules right down to the machine language at the core (in the kernel) of a computer. In principle, however, the rules of the knowledge base are calculation rules just like the rules of the inference machine, but in a “higher” programming language. It is an advantage if the human domain experts, i.e. the human specialists, find this programming language particularly easy and safe to read and use.

With regard to the logic system used in inference machines, we distinguish between rule-based systems

– with a static logic (ontologies type / semantic web type),
– with a dynamic logic (concept molecules type).

For this, cf. the blog post on the three innovations of rule-based AI.

B: Corpus-based systems

Corpus-based systems are compiled in three steps (Fig. 4). In the first step, as large as possible a corpus is collected. The collection does not contain any rules, only data. Rules would be instructions; however, the data of the corpus are not instructions: they are pure data collections, texts, images, game processes, etc.

Fig. 4: Compiling a corpus-based system

These data must now be assessed. As a rule, this is done by a human being. In the third step, a so-called neural network is trained on the basis of the assessed corpus. In contrast to the data corpus, the neural network is again a collection of rules like the knowledge base of the rule-based systems A. Unlike those, however, the neural network is not constructed by a human being but built and trained by the assessed corpus. Unlike the knowledge base, the neural network is not explicit, i.e. it is not readily accessible.

Fig. 5: Application of a corpus-based system

In their applications, both neural networks and the rule-based systems are fully capable of working without human beings. Even the corpus is no longer necessary. All the knowledge is located in the algorithms of the neural network. In addition, neural networks are also quite capable of interpreting poorly structured contents such as a mess of pixels (i.e. images), where rule-based systems (B type) very quickly reach their limits. In contrast to these, however, corpus-based systems are less successful with complex outputs, i.e. the number of possible output results must not be too large since if it is, the accuracy rate will suffer. What are best suited here are binary outputs of the “our tank – foreign tank” type (cf. preceding post) or of “male author – female author” in the assessment of Twitter texts. For such tasks, corpus-based systems are vastly superior to rule-based ones. This superiority quickly declines, however, when it comes to finely differentiated outputs.

Three subtypes of corpus-based AI

The three subtypes differ from each other with regard to who or what assesses the corpus.

Fig. 6: The three types of corpus-based system and how they assess their corpus

B1: Pattern recognition type

I described this type (top in Fig. 6) in the tank example. The corpus is assessed by a human expert.

B2: Search engine type

Cf. middle diagram in Fig. 6: in this type, the corpus is assessed by the customers. I described such a system in the search engine post.

B3: Deep learning type

In contrast to the above types, this one (bottom in Fig. 6) does not require a human being to train or assess the neural network. The assessment results solely from the way in which the games proceed. The fact that deep learning is only possible in very restricted conditions is explained in the post on games and intelligence.

C: Hybrid systems

Of course the above-mentioned methods (A1-A2, B1-B3) can also be combined in practice.

Thus a face identification system, for instance, may work in such a way that in the images provided by a surveillance camera, a corpus-based system B1 is capable of recognising faces as such, and in the faces the crucial shapes of eyes, mouth, etc. Subsequently, a rule-based system A2 uses the points marked by B1 to calculate the proportions of eyes, nose, mouth, etc., which characterise an individual face. Such a combination of corpus- and rule-based systems allows for individual faces to be recognised in images. The first step would not be possible for an A2 system, the second step would be far too complicated and inaccurate for a B1 system. A hybrid system makes it possible.


In the following blog post, I will answer the question as to where the intelligence is located in all these systems. But you have probably long found the answer yourself.

This is a blog post about artificial intelligence.


Translation: Tony Häfliger and Vivien Blandford

Games and intelligence (1)

Chess or jass: what requires more intelligence?

(Jass is a very popular Swiss card game of the same family as whist and bridge, though more homespun than the latter.)

Generally, it is assumed that chess requires more intelligence, for obviously less intelligent players definitely stand a chance of winning at cards while they don’t in chess. If we consider, however, what a computer program must be able to do in order to win, the picture soon looks different: chess is clearly simpler for a machine.

This may surprise you, but it is worth looking at the features the two games have in common, as well as their differences – and of course, both have a great deal to do with our topic of artificial intelligence.

Common features

a) Clearly defined playing field

The chessboard has 64 black and white fields; only the pieces that are situated on these fields play a part. At cards, the bridge table could be regarded as a playing field, as could the so-called square “jass carpet” that is placed on a restaurant table; it is the material playing field in the same way that the material chessboard is for chess. If we are interested in successful playing behaviour, however, the colour of the jass carpet or the make of the chess board are immaterial; what counts is solely the abstract, i.e. “IT-type” of playing field: where can our chess pieces and playing cards move in a more mathematical way? And in this respect, the situation is completely clear at cards, too: the cards are in a clearly defined place at any given time, either in a player’s hand ready to be played, or in front of a player as a trick already won, or on the table as a face-up card to be seen by everyone. Both chess and cards can therefore be said to have a clearly defined playing field.

b) Clear rules

Here, too, there is hardly any difference between the two games. Although there are all sorts of variants of whist and bridge, and although jass rules differ from village to village and even from restaurant to restaurant (which may occasionally lead to heated discussions), as soon as a set of rules has been agreed upon, the situation is clear. As in chess, it is clear what goes and what doesn’t, and the players’ possible activities are clearly defined.

c) Clear course of play

Here again, the games do not differ from each other. At any point in time, there is precisely one player who is permitted to act, and his or her options are clearly defined.

d) Clear objective

Chess is about beating the opponent’s king; card games are about scoring points or tricks, depending on the variant. Games do not last an eternity. A card game is over when all the cards have been played; in chess, the draw and stalemate rules prevent a game from going on indefinitely. There is always one clear winner, there are always clear losers, and if need be there is a definitive tie.

Differences

e) Clear starting situation?

In chess, the starting situation is identical in every game; all pieces start at their appointed place. At cards, however, the pack of cards is shuffled before every game. Whereas in chess, we always start from precisely the same situation, we have to envisage a new one before every card game. Chance thus plays an important role in cards; in chess, it has been deliberately excluded. This is bound to have consequences. Since I have to factor in chance at cards, I cannot rely on certainties like in chess, but have to rely on probabilities.

f) Hidden information?

A lack of knowledge remains a challenge for card players throughout the game. Whereas in chess, everything is openly recognisable for each player on the board, card games literally thrive on players NOT knowing where the cards are. Therefore they must guess – i.e. rely on probabilities – and run certain risks. There is no guessing in chess; the situation is always clear, open and evident. Of course, this makes it substantially easier to describe the situation in chess; at cards, however, this lack of knowledge makes a description of the situation difficult.

g) Probabilities and emotions (psychology)

If I do not know everything, I have to rely on probabilities. Experience shows that this is something that we human beings are comprehensively very bad at. We let ourselves be guided by emotions much more strongly than we care to admit. Fears and hopes determine our expectations, and we often grossly misjudge probabilities. An AI program naturally has an edge over us in this respect since it does not have to cope with emotions and is much better at computing probabilities. Yet the machine wants to beat its opponent and will therefore have to assess its opponent’s reactions correctly. The AI program would therefore do well to take its opponent’s flawed handling of probabilities into its considerations, but this is not very easy in terms of algorithms. How does it recognise an optimist? Human players try to read their opponents while trying to mislead them about their own emotions at the same time. This is part of the game. It is no use to the program if it makes computations without any emotions while being incapable of recognising and assessing its opponent’s emotions.

h) Communication 

Chess is played by one player against the other. Card games usually involve four players playing each other in pairs. This aspect, i.e. that two individuals have to coordinate their actions, makes the game interesting, and it would be fatal for a card game program to neglect this aspect. But how should we program this? What has to be taken into account here, too, is point f) above, namely the fact that I cannot see my partner’s cards; I neither know my partner’s cards nor my opponents’. Of course my partner and I are interested in coordinating our game, and part of this is that we communicate our options (hidden cards) and our strategies (intentions for driving the game forward) to each other. If, for instance, I hold the ace of hearts, I would like my partner to lead hearts to enable me to win the trick. However, I am not allowed to tell him that openly – yet an experienced card player would not find this a problem. First of all, the run of the game often reveals who holds the ace of hearts. Of course it is not easy to discover this because both the cards that have already been played and possible tactics and strategies have to be taken into consideration. The number of options, the computation of the probabilities and the psychology of the players all come into play here, which can result in very exciting conflict situations – which ultimately also makes the game attractive. In chess, however, with its constantly very explicit situation, circumstances are a great deal simpler in this respect.

But this is not all:

i) The legal grey area

Is it really true that my partner and I are unable to exchange communication about our cards and strategies? Officially, of course, this is prohibited – but can this ban really be implemented in practice?

Of course it can’t. Whereas in chess, it is practically solely the explicit moves that play a part, there is a great deal of additional information at cards which a practised player must be able to read. How am I smiling when I’m playing a card? If I hold the ace of hearts, which can win the next trick, I obviously want my partner to help me and lead hearts. One possibility of achieving this in a jass game is to play a minor heart and place it on the table with distinctive emphasis. A practised partner will easily read this as a signal for him to lead hearts next time rather than diamonds to enable me to win the trick with my ace. No one will really be able to ban anyone from leading a card in a certain way, provided that this is done with sufficient discretion. Partners who are well attuned to each other do not only know the completely legal signals which they automatically emit through the selection of the cards they play, but also some signals from the grey area with which they coordinate their game.

These signals constitute information which an ambitious AI will have to be able to identify and process. The volume of information which it has to process for this purpose is not only much larger than the volume of information in chess, it is not limited by any manner of means either. My AI plays two human opponents, and those two also communicate with each other. The AI should be able to recognise their communication in order not to be hopelessly beaten. The signals agreed upon by the opponents may of course vary and be of any degree of sophistication. How can my AI discover what arrangements the two made prior to the game?

Conclusion

Card games are much more difficult to program than chess

If we want to develop a program for a card game, we will have to take into consideration aspects e) to i), which hardly play any part in chess. In terms of algorithms, however, aspects e) to i) constitute a difficult challenge owing to the imponderabilities.

In comparison with card games, chess is substantially less difficult for a computer because

– there is always the same starting situation,
– there is no hidden information,
– no probabilities need to be taken into account,
– human emotions play a small part,
– there is no legal grey area because no exchange of information between partners is possible.

For an AI program, chess is therefore the simpler game. It is completely defined, i.e. the volume of information that is in the game is very small, clearly disclosed and clearly limited. This is not the case with card games.


This is a blog post about artificial intelligence. In the second part about games and intelligence, I will deal with Go and deep learning .


Translation: Tony Häfliger and Vivien Blandford

How real is the probable?

AI can only see whatever is in the corpus

Corpus-based systems are on the road to success. They are “disruptive”, i.e. they change our society substantially within a very short period of time – reason enough for us to recall how these systems really work.

In previous blog posts I explained that these systems consist of two parts, namely a data corpus and a neural network. Of course, the network is unable to recognise anything that is not already in the corpus. The blindness of the corpus automatically continues in the neural network, and the AI is ultimately only able to produce what is already present in the data of the corpus. The same applies to incorrect input in the corpus: this will reappear in the results of the AI and, in particular, lessen their accuracy.

When we bring to mind the mode of action of AI, this fact is banal, since the learning corpus is the basis for this kind of artificial intelligence. Only that which is in the corpus can appear in the results, and errors and lack of precision in the corpus automatically diminish the validity of the results.

What is less banal is another aspect, which is also essentially tied up with the artificial intelligence of neural networks. It is the role played by probability. Neural networks work through probabilities. What precisely does this mean, and what effects does it have in practice?

Neural networks make assessments according to probability

Starting point

Let’s look again at our search engine from the preceding post. A customer of our search engine enters a search string. Other customers before him have already entered the same search string. We therefore suggest those websites to the customer which have been selected by the earlier customers. Of course we want to place those at the top of the customer’s list which are of most interest to him (cf. preceding post). To be able to do so, we assess all the customers according to their previous queries. How we do this in detail is naturally our trade secret; after all, we want to gain an edge over our competitors. No matter how we do this, however – and no matter how our competitors do it – we end up weighting previous users’ suggestions. On the basis of this weighting process, we select the proposals which we present to our enquirer and the order in which we display them. Here, probabilities are the crucial factor.

Example

Let us assume that enquirer A asks our search engine a question, and the two customers B and C have already asked the same question as A and left their choice, i.e. the addresses of the websites selected by them, in our well-stocked corpus. Which selection should we now prefer to present to A, that of B or that of C?

Now we have a look at the assessments of the three customers: to what extent do B’s and C’s profiles correspond with A’s profile? Let’s assume that we arrive at the following correspondences:

Customer B:  80%
Customer C: 30%

Naturally we assume that B corresponds better with A than C and that A is therefore served better by B’s answers.

But is this truly the case?

The question is justified, for after all, there is no complete correspondence with either of the two other users. It may be the case that it is precisely the 30% with which A and C correspond which concerns A’s current query. In that case, it would be unfortunate to give B’s answer priority, particularly if the 80% correspondence with B concerns completely different fields which have nothing to do with the current query. Admittedly, this deviation from probability is improbable in a specific case, but it is not impossible – and this is the actual crux of probabilities.

Now in this case, we reasonably opted for B, and we may be certain that probability is on our side. In terms of our business success, we may confidently rely on probability. Why?

This is connected with the law of large numbers. In an individual case as described above, C’s answer may indeed by the better one. In most cases, however, B’s answers will be more to our customer’s liking, and we are well advised to provide him with that answer. This is the law of large numbers. Essentially, it is the basis of the phenomenon of probability:

In an individual case, something improbable may happen; in many cases, however, we may rely on it that usually what is probable is what will happen.

Conclusion for our search engine
  1. If we are interested in being right in most cases, we stick to probability.
  2. At the same time, we accept that we may miss the target in rare cases.

Conclusion for corpus-based AI in general

What applies to our search engine generally applies to any corpus-based AI since all these systems work on the basis of probability. Thus the conclusion for corpus-based AI is as follows:

  1. If we are interested in being right in most cases, we stick to probability.
  2. At the same time, we accept that we may miss the target in rare cases.

 We must acknowledge that corpus-based AI has an inherent weak point, a kind of Achilles’ heel of an otherwise highly potent technology. We should therefore continue to watch this heel carefully:

  1. Incidence:
    When is the error most likely to occur, when can it be neglected? This is connected with the size and quality of the corpus, but also with the situation in which the AI is used.
  2. Consequence:
    What are the consequences if rare cases are neglected?
    Can the permanent averaging and observing of solely the most probable solutions be called intelligent?
  3. Interdependencies:
    With regard to the fundamental interdependencies, the connection with the concept of entropy is of interest: the second law of thermodynamics states that in an isolated system, what happens is always what is more probable, and thermodynamics measures this probability with the variable S, which it defines as entropy.
    What is probable is what happens, both in thermodynamics and in our search engine – but how does a natural intelligence choose?

The next blog post will be about games and intelligence, specifically about the difference between chess and a Swiss card games.

This is a post about artificial intelligence.


Translation: Tony Häfliger and Vivien Blandford

Rule-based AI: Where is the intelligence situated

Two AI variants: rule-based and corpus-based

The two AI variants mentioned in previous blog posts are still topical today, and they have registered some remarkable successes. The two differ from each other not least in where precisely their intelligence is situated. Let’s first have a look at the rule-based system.

Structure of a rule-based system

In the Semfinder company, we used a rule-based system. I drew the following sketch of it in 1999:

Semantic interpretation system

Green: data
Yellow: software
Light blue: knowledge ware
Dark blue: knowledge engineer

The sketch consists of two rectangles, which represent different locations. The rectangle bottom left shows what happens in the hospital; the rectangle top right additionally shows what goes on in knowledge engineering.

In the hospital, our coding program reads the doctors’ free texts, interprets them and converts them into concept molecules, and allocates the relevant codes to them with the help of a knowledge base. The knowledge base contains the rules with which the texts are interpreted. In our company, these rules were drawn up by people (human experts). The rules are comparable to the algorithms of a software program, apart from the fact that they are written in a “higher” programming language to ensure that non-IT specialists, i.e. the domain experts, who in our case are doctors, are able to establish them easily and maintain them safely. For this purpose, they use the knowledge base editor, which enables them to view the rules, to test them, to modify them or to establish completely new ones.

Where, then, is the intelligence situated?

It is situated in the knowledge base – but it is not actually a genuine intelligence. The knowledge base is incapable of thinking on its own; it only carries out what a human being has instilled into it. I have therefore never described our system as intelligent. At the very least, intelligence means that new things can be learnt, but the knowledge base learns nothing. If a new word crops up or if a new coding aspect is integrated, then this is not done by the knowledge base but by the knowledge engineer, i.e. a human being. All the rest (hardware, software, knowledge base) only carry out what they have been prescribed to do by human beings. The intelligence in our system was always and exclusively a matter of human beings – i.e. a natural rather than an artificial intelligence.

Is this different in the corpus-based method? In the following post, we will therefore have a closer look at a corpus-based system.

 

This is a post about artificial intelligence.


Translation: Tony Häfliger and Vivien Blandford

Intelligence in the search engine

How does intelligence get into a search engine?

Let’s assume that you are building a search engine. In the process, you do not want to avail yourself of the services of expensive and not always faultless domain experts, but solely build the search engine with sufficient data servers (the hardware for the corpus) and an ingenious software. In principle, you will use a neural network with a corpus. How do you inject intelligence into your system?

Trick 1: Let the customers train the corpus

As in the tank AI of previous blog posts, a search engine depends on categorisations, this time provided by customers’ allocation of input texts (search string) to a list of web addresses which might be interesting for their searches. To find the relevant addresses, your system is again based on a learning corpus, which this time consists of the list of your previous customers’ search inputs. The web addresses which the previous customers have clicked from among those offered to them are qualified as positive hits in the corpus. When it comes to new queries – also from other customers – you simply indicate the addresses which have received most clicks to date. They can’t be all that bad, after all, and the system gets more refined with every query and the following click. And it still applies that the bigger the corpus, the more precise the system.

Again, the categorisations originate outside the system as they are provided by people who have assessed the selection offered to them by the search engine by placing their clicks according to their preferences. They did so

  • with their human intelligence and
  • in line with their individual interests.

The second point is particularly interesting. We might have a closer look at this later.

Trick 2: Assess the customers at the same time

Not every categorisation by every customer is equally relevant. As a search engine operator, you can optimise two directions:

  • Assess the assessors:
    You know all your customers’ inputs, so you can easily find out how reliable these customers’ categorisations, i.e. the web addresses they clicked in connection with their search strings, are. Not all the customers are equally proficient in this respect. The more other customers click the same web address for the same search string, the safer the categorisation will also be for future queries. You can now use this information in order to weight your customers: the customer who has so far had the most reliable categorisations, i.e. the one who most often chose what the others also chose, is given most weight. A customer who was followed by fewer others will be regarded as less reliable. This weighting process will increase the probability that the future search results will rate those websites higher which are of interest to most customers.
  • Assess the searchers:
    Not every search engine user has the same interests. You are able to take this into consideration since you know all their previous inputs. You can make use of these inputs to generate a profile of this customer. This will naturally enable you to select the search results for him or her accordingly. Assessors with a profile similar to the searcher’s will weight the potential addresses similarly, too, and you will be able to personalise the search results even more in the customer’s interest.

For you as a search machine operator, it is in any case worth generating a profile of all your customers for an improvement in the quality of search suggestions alone.

Consequences

  1. Search engines become more precise the more they are used.
    This applies to all the corpus-based systems, i.e. to all technologies with neural networks: the larger their corpus, the higher their precision. They can be capable of amazing feats.
  2. A remarkable feedback effect can be observed in this connection: the bigger the corpus, the better the quality of the search engine, which is why it is used more often, which in turn enlarges its corpus and thus boosts its attractiveness in comparison with competitors. This effect inevitably results in such monopolies as are typical of all applications of corpus-based software.
  3. All the categorisations were primarily made by human beings. The basis of intelligence – the categorising inputs in the corpus – is still provided by human beings. In the case of search engines, these are all the individual users who in this way input their knowledge into the corpus. Which means that the intelligence in AI is not all that artificial after all.
  4. The tendency towards bubble formation is inherent in corpus-based systems: if search engines generate profiles of their customers, they can offer them better search results. In a self-referential loop, this inevitably leads to bubble formation: users with similar views are brought increasingly closer together by the search engines since in this way, these users are provided with the search results which correspond most closely to their individual interests and views. They will come across deviating views less and less often.

The next post will be about a further important aspect of corpus-based systems, namely the role of probability.

This is a post about artificial intelligence.


Translation: Tony Häfliger and Vivien Blandford

What the corpus knows – and what it doesn’t

Compiling the corpus

In a previous post we saw how the corpus – the basis for the neural network of AI – is compiled. The neural network is capable of interpreting the corpus in a refined manner, but of course the neural network cannot extract anything from the corpus that is not in it in the first place.

Neural Network and Corpus
Fig. 1: The neural network extracts knowledge from the corpus

How is a corpus compiled? A domain expert assigns images of a certain class to a certain type, for instance “foreign tanks” vs “our tanks”. In Fig. 2, these categorisations carried out by the experts are the red arrows, which evaluate the tank images in this example.

Expert and Corpus
Fig. 2: Making the categorisations in the corpus

Of course, the human expert’s assignations of the individual images according to the target categories must be correct – but that is not enough. There are fundamental limits to the evaluability of the corpus by a neural network, no matter how refined this may be.

Chance reigns if a corpus is too small

If I have only coloured images of our own tanks and only black and white ones of the foreign tanks (cf. introductory post about AI), the system can easily be led astray and identify all the coloured images as our own tanks and all the black and white ones as tanks of a foreign army. Although this defect can be remedied with a sufficiently large corpus, the example illustrates how important it is that a corpus consists of correct elements. If chance (coloured/black and white) is apt to play a crucial role in a corpus, the system will easily draw the wrong conclusions. Chance plays a greater role the smaller the corpus, but also the higher the number of possible “outcomes” (searched-for variables).

Besides these relative obstacles, there are also fundamental limits to the evaluability of an AI corpus. This is what we are going to look at next.

Tracked or wheeled tank?

Whatever is not in the corpus cannot be extracted from it. Needless to say, I cannot classify aircraft with a tank corpus.

Corpus and Network
Fig. 3: The evaluation is decisive – corpus with friendly and hostile tanks and a network programmed accordingly.

What, though, if our tank system is intended to find out whether a tank is tracked or wheeled? In principle, the corpus may well contain images of both types of tanks. How can the tank AI from our example recognise them?

The simple answer is: it can’t. In the corpus, the system has many images of tanks and knows whether each one is hostile or friendly. But is it wheeled or not? This information is not part of the corpus (yet) and can therefore not be extracted by the AI. Human beings may be capable of evaluating each individual image accordingly, as they did with the “friendly/hostile” properties, but then this would be an intelligence external to the AI that would make the distinction. The neural network is incapable of doing this on its own since it does not know anything about tracks and wheels. It has only learnt to distinguish our tanks from foreign ones. To establish a new category, the relevant information must first be fed into the corpus (new red arrows in Fig. 2), and then the neural network must be trained to answer new questions.

Such training need not necessarily be done on the tank corpus. The system would also be able to make use of the corpus of completely different vehicles whether these move on wheels or tracks. Although the distinction can automatically be transferred to the tank corpus, the external wheels/track system must first be trained – and again with categorisations made by human beings.

On its own, without predefined examples, the AI system will not be able to make this distinction.

Conclusions

  1. Only such conclusions can be drawn from a corpus as are part of that corpus.
  2. Categorisations (the red arrows in Fig. 2) invariably come from the outside, i.e. from human beings.

In our tank example, we have examined a typical image recognition AI. However, do the conclusions drawn from it (cf. above) also apply to other corpus-based systems? And isn’t there something like “deep learning”, i.e. the possibility that an AI system learns on its own?

Let us therefore have a look at a completely different type of corpus-based AI in the next blog post.

This is a post about artificial intelligence.


Translation: Tony Häfliger and Vivien Blandford

Where is intelligence situated in corpus-based AI?

In a preceding post we saw that in rule-based AI, intelligence is situated in the rules. These rules are drawn up by people, and the system is as intelligent as the people who have formulated them. Where, then, is intelligence situated in corpus-based AI?

The answer is somewhat more complicated than in the case of rule-based systems. Let us therefore have a closer look at the structure of such a corpus-based system. It is established in three steps:

  1. compiling as large a data collection as possible (corpus),
  2. assessing this data collection,
  3. training the neural network.

The network can be applied as soon as it has been established:

  1. applying the neural network.

Let’s have a closer look at the four steps.

Step 1: Compiling the data collection

In our tank example, the corpus (data collection) consists of photographs of tanks. Images are typical of corpus-based intelligence, but the collection may of course also contain other kinds of information such as customers’ queries submitted to search engines or GPS data from mobile phones. The typical feature is that the data of each singular entry are made up of so many individual elements (e.g. pixels) that their evaluation by rules consciously drawn up by people becomes too labour-intensive. In such cases, rule-based systems are not worthwhile.

A collection of data alone, however, is not enough. The data now have to be assessed.

Step 2: Assessing the corpus

corpus and neural network
Fig. 1: Corpus-based system

Fig. 1 displays the familiar picture of our tank example. On the left-hand side, you can see the corpus. In this figure, it has already been assessed; the assessment is symbolised by the black and green flags on the left of each tank image.

In simplified terms, the assessed corpus can be imagined as a two-columned table. The left-hand column contains the information about the images, the right-hand column contains the assessment, and the arrow between them is the categorisation, which thus becomes an essential part of the corpus in that it states to which category (o or f) each individual image belongs, i.e. how it has been assessed.

Table of assessments
Tab. 1: Corpus with assessment (o=own, f=foreign)

Typically, the volumes of information in the two columns differ greatly in size. Whereas the assessment in the right-hand column of our tank example consists of precisely one bit, the image in the left-hand column contains all the pixels of the photograph; each and every pixel’s position, colour, etc. have been stored – i.e. a rather large data volume. This difference in the size ratio is typical of corpus-based systems – and if you have philosophical interests, I would like to point out its nexus with the issue of information reduction and entropy (cf. posts on information reduction). At the moment, however, the focus is on intelligence in corpus-based AI systems, and we note that in the corpus, every image is allocated its correct target category.

We do not know how this categorisation happens, for it is carried out by human beings with the neurons in their own heads. These human beings are unlikely to be conscious of the precise behaviour of their neurons, and thus could not identify the rules by which this process is governed. They do know, however, what the images represent, and they indicate this in the corpus by assigning them to the relevant categories. This categorisation is introduced into the corpus from the outside by human beings; it is one hundred per cent man-made. At the same time, this assessment is an absolute condition and the basis for the establishment of the neural network. Later, too, when the completely trained neural network no longer requires the corpus with the categorisations that were brought in from the outside, it will still have been necessary for the network to be set up to be able to work at all.

Where, then, does this intelligence come from in the assignment to the categories o) and f)? It is ultimately human beings who carry out this categorisation (and may also fail to do it correctly); it is their intelligence. Once the categorisation in the corpus has been noted, this is not active intelligence any longer, but fixed knowledge.

Expert and corpus
Fig. 2: Assessing the corpus

The assessment of the corpus is a crucial stage, for which intelligence is undoubtedly required. The compiled data collection has to be assessed, and the domain expert who carries out this assessment has to guarantee that it is correct. In Fig. 2, the domain expert’s intelligence is represented by the yellow sphere. The corpus receives the knowledge thus generated through the categorisations, which in turn are represented as red arrows in Fig. 2.

Knowledge is something different from intelligence. In a certain sense, it is passive. In this sense, the pieces of information contained in the corpus are objects of knowledge, i.e. categorisations which have been formulated and need not be processed any longer. Conversely, intelligence is an active principle which is capable of making valuations on its own as is done by the human expert. The elements in the corpus, however, are data or – in the case of the above-mentioned results of the experts’ intelligence – permanently formulated knowledge.

To distinguish this knowledge from intelligence, I did not colour it yellow in Fig 2, but green.

Thus we usefully distinguish between three things:

data (the data collection in the corpus),
knowledge (the completed assessment of these data),
intelligence (the ability to carry out this assessment).

Step 3: Learning stage

training the neural network
Fig.3: Learning stage

At the learning stage, the neural network is established on the basis of the learning corpus. The success of this process again requires a considerable degree of intelligence; this time, it comes from AI experts, who enable the learning stage to work and who control it. A crucial role is played by algorithms here: they are responsible for the correct evaluation of the knowledge in the corpus and for the neural network taking precisely that shape which will ensure that all the categorisations contained in the corpus can be reproduced by the network itself.

The extraction of knowledge and the algorithms used in the process are symbolised by the brown arrow between corpus and network. The algorithms may appear to display a certain degree of intelligence even though they do not do anything that has not been predefined by the IT experts and the knowledge in the corpus. The emerging neural network itself does not have any intelligence of its own but is the result of this process and thus of the experts’ intelligence. It contains a substantial amount of knowledge, however, and is therefore coloured green in Fig. 3, like the knowledge in Fig. 2. In contrast to the corpus, however, the categorisations (red arrows) are significantly more complex, in precisely the way in which neural networks work in a more complex manner than a simple two-columned table does (Tab. 1).

There is something else that distinguishes the knowledge in the network from the corpus: the corpus contains knowledge about individual cases whereas the network is abstract. It can therefore also be applied to cases that have been unknown to date.

Step 4: Application

Application of a neural network
Fig. 4: Application of a neural network

In Fig. 4, a previously unknown image is assessed by the neural network and categorised according to the knowledge stored in the network. This does not require a corpus any longer, nor does it require an expert; the “trained” but now fixed wiring in the neural network is enough. At this moment, the network is no longer capable of learning anything new. However, it is able to attain perfectly impressive achievements with a completely new input. This performance has been enabled by the preceding work, i.e. the establishment of the corpus, the (hopefully) correct assessments and the algorithms of the learning stage. Behind the learning corpus, there is the domain experts’ human intelligence; behind the algorithms of the learning stage, there is the IT experts’ human intelligence.

Conclusion

What appears to be artificial intelligence to us is the result of the perfectly human, i.e. natural intelligence of domain experts and IT experts.

In the next post, we will have an even closer look at what kind of knowledge a corpus really contains, and at what AI can get out of the corpus and at what it can’t.

This is a post about artificial intelligence.


Translation: Tony Häfliger and Vivien Blandford

The three innovations of rule-based AI

Have the neural networks outpaced the rule-based systems?

It cannot be ignored: corpus-based AI has overtaken rule-based AI by far. Neural networks are making the running wherever we look. Is the competition dozing? Or are rule-based systems simply incapable of yielding equivalent results to those of neural networks?

My answer is that both methods are predisposed for performing very different functions as a matter of principle. A look at their respective modes of action makes clear what the two methods can usefully be employed for. Depending on the problem to be tackled, one or the other has an advantage.

Yet the impression remains: the rule-based variant seems to be on the losing side. Why is that?

In what dead end has rule-based AI got stuck?

In my view, rule-based AI is lagging behind because it is unwilling to cast off its inherited liabilities – although doing so would be so easy. It is a matter of

  1. acknowledging semantics as an autonomous field of knowledge,
  2. using complex concept architectures,
  3. integrating an open and flexible logic (NMR).

We have been doing this successfully for more than 20 years. What do the three points mean in detail?

Point 1: acknowledging semantics as an autonomous field of knowledge

Usually, semantics is considered to be part of linguistics. In principle, there would not be any objection to this, but linguistics harbours a trap for semantics which is hardly ever noticed: linguistics deals with words and sentences. The error consists in perceiving meaning, i.e. semantics, through the filter of language, and assuming that its elements have to be arranged in the same way as language does with words. Yet language is subject to one crucial limitation: it is linear, i.e. sequential – one letter follows another, one word comes after another. It is impossible to place words in parallel next to each other. When we are thinking, however, we are able to do so. And when we investigate the semantics of something, we have to do so in the way we think and not in the way we speak.

Thus we have to find such formalisms for the concepts as occur in thought. The limitation imposed by the linear sequence of the elements and the resulting necessity to reproduce compounds and complex relational structures with grammatical tricks in a makeshift way, and differently in every language – this structural limitation does not apply to thinking, and this results in structures on the side of semantics that are completely different from those on the side of language.

Word ≠ concept

What certainly fails to work is a simple “semantic” annotation of words. A word can have many and very different meanings. One meaning (= a concept) can be expressed with different words. If we want to analyse a text, we must not look at the individual words but always at the general context. Let’s take the word “head”. We may speak of the head of a letter or the head of a company. We cannot integrate the context into our concept by associating the concept of <head< with other concepts. Thus there is a <body part<head< and a <function<head<. The concept on the left (<body part<) then states the type of the concept on the right (<head<). We are thus engaged in typification. We look for the semantic type of a concept and place it in front of the subconcept.

Consistantly composite data elements

The use of typified concepts is nothing new. However, we go further and create extensive structured graphs, which then constitute the basis for our work. This is completely different from working with words. The concept molecules that we use are such graphs possess a very special structure to ensure that they can be read easily and quickly by both people and machines. This composite representation has many advantages, among them the fact that combinatorial explosion is countered very simply and that the number of atomic concepts and rules can thus be drastically cut. Thanks to typification and the use of attributes, similar concepts can be refined at will, which means that by using molecules we are able to speak with a high degree of precision. In addition, the precision and transparency of the representation have very much to do with the fact that the special structure of the graphs (molecules) has been directly derived from the multifocal concept architecture (cf. Point 2).

Point 2: using complex concept architectures

Concepts are linked by means of relations in the graphs (molecules). The above-mentioned typification is such a relation: when the <head< is perceived as a <body part<, then it is of the <body part< type, and there is a very specific relation between <head< and <body part<, namely a so-called hierarchical oris-a’ relation – the latter because in the case of hierarchical relations, we can always say ‘is a”, i.e. in our case: the <head< is a <body part<.

Typification is one of the two fundamental relations in semantics. We allocate a number of concepts to a superordinate concept, i.e. their type. Of course this type is again a concept and can therefore be typified again in turn. This results in hierarchical chains of ‘is-a’ relations with increasing specification, such as <object<furniture<table<kitchen table<. When we combine all the chains of concepts subordinate to a type, the result is a tree. This tree is the simplest of the four types of architecture used for an arrangement of concepts.

This tree structure is our starting point. However, we must acknowledge that a mere tree architecture has crucial disadvantages which preclude the establishment of semantics which are really precise. Those who are interested in the improved and more complex types of architecture and their advantages and disadvantages, will find a short description of the four types of architecture on the website of meditext.ch.

In the case of the concept molecules, we have geared the entire formalism, i.e. the intrinsic structure of the rules and molecules themselves, to the complex architectures. This has many advantages, for the concept molecules now have precisely the same structure as the axes of the multifocal concept architecture. The complex folds of the multifocal architecture can be conceived of as a terrain, with the dimensions or semantic degrees of freedom as complexly interlaced axes. The concept molecules now follow these axes with their own intrinsic structure. This is what makes computing with molecules so easy. It would not work like this with simple hierarchical trees or multidimensional systems. Nor would it work without consistently composite data elements whose intrinsic structure follows the ramifications of the complex architecture almost as a matter of course.

Point 3: integrating an open and flexible logic (NMR)

For theoretically biased scientists, this point is likely to be the toughest, for classic logic appears indispensable to most of them, and many bright minds are proud of their proficiency in it. Classic logic is indeed indispensable – but it has to be used in the right place. My experience shows me that we need another logic in NLP (Natural Language Processing), namely one that is not monotonic. Such non-monotonic reasoning  (NMR) enables us to attain the same result with far fewer rules in the knowledge basis. At the same time, maintenance is made easier. Also, it is possible for the system to be constantly developed further because it remains logically open. A logically open system may disquiet a mathematician, but experience shows that an NMR system works substantially better for the rule-based comprehension of the meaning of freely formulated text than a monotonic one.

Conclusion

Today, the rule-based systems appear to be lagging behind the corpus-based ones. This impression is deceptive, however, and derives from the fact that most rule-based systems have not yet succeeded in jumping ahead of themselves and becoming more modern. This is why they are either

  • only applicable for ckear tasks in a small and well defined domain , or
  • very rigid and therefore hardly employable, or
  • they require an unrealistic use of resources and become unmaintainable.

If, however, we use consistently composite data elements and a higher degree of concept architectures, and if we deliberately refrain from monotonic conclusions, a rule-based system will enable us to get further than a corpus-based one – for the appropriate tasks.

Rule-based and corpus-based systems differ a great deal from each other, and depending on the task in hand, one or the other has the edge. I will deal with this in a later post.

The next post will deal with the current distribution of the two AI methods.

This is a post about artificial intelligence.


Translation: Tony Häfliger and Vivien Blandford

Specification of the challenges for rule-based AI

Rule-based AI is lagging behind

The distinction between rule-based AI and corpus-based AI makes sense in several respects since the two systems work in completely different ways. This does not only mean that their challenges are completely different, it also means that as a consequence, their development trajectories are not parallel in terms of time.

In my view, the only reason for this is that rule-based AI has reached a dead end from which it will only be able to extricate itself once it has correctly identified its challenges. This is why these challenges will be described in more detail below.

Overview of the challenges

In the preceding post, I listed four challenges for rule-based AI. Basically, the first two cannot be remedied: it takes experts to draw up the rules, and these must be experts both in abstract logic and in the specialist field concerned. There is not much that can be changed about this. The second challenge will also remain: finding such experts will remain a problem.

The situation is better for challenges three and four, namely the large number of rules required, and their complexity. Although it is precisely these two that represent seemingly unalterable obstacles of considerable size, the necessary insights may well take the edge off them. However, both challenges must be tackled consistently, and this means that we will have to jettison some cherished old habits and patterns of thought. Let’s have a closer look at this.

The rules require a space and a calculus

 Rule-based AI consists of two things:

  • rules which describe a domain (specialist field) in a certain format, and
  • an algorithm which determines which rules are executed at what time.

In order to build the rules, we require a space which specifies the elements which the rules may consist of and thus the very nature of the statements that can be made within the system. Such a space does not exist of its own accord but has to be deliberately created. Secondly, we require a calculus, i.e. an algorithm which determines how the rules thus established are applied. Of course, both the space and the calculus can be created in completely different ways, and these differences “make the difference”, i.e. they enable a crucial improvement of rule-based AI, albeit at the price of jettisoning some cherished old habits.

Three innovations

In the 1990s, we therefore invested in both the fundamental configuration of the concept space and the calculus. We established our rule-based system on the basis of the following three innovations:

  • data elements: we consistently use composite data elements (concept molecules);
  • space: we arrange concepts in a multidimensional-multifocal architecture;
  • calculus: we rely on non-monotonic reasoning (NMR).

These three elements interact and enable us to capture a greater number of situations more accurately with fewer data elements and rules. The multifocal architecture enables us to create better models, i.e. models which are more appropriate to their situations and contain more details. Since the number of elements and rules decreases at the same time, we succeed in going beyond the boundaries which previously constrained rule-based systems with regard to extent, precision and maintainability.

In the next post, we will investigate how the three above-mentioned innovations work.

This is a post about artificial intelligence.


Translation: Tony Häfliger and Vivien Blandford

AI: Vodka and tanks

AI in the last century

AI is a big buzzword today but was already of interest to me in my field of natural language processing in the 1980s and 1990s. At that time, there were two methods which were occasionally labelled AI, but they could not have been more different from each other. The exciting thing is that these two different methods still exist today and continue to be essentially different from each other.

AI-1: vodka

The first method, i.e. the one already used by the very first computer pioneers, was purely algorithmic, i.e. rule-based. Aristotle’s syllogisms are a paradigm of this type of rule-based system:

Premise 1: All human beings are mortal.
Premise 2: Socrates is a human being.
Conclusion: Socrates is mortal.

The expert posits premises 1 and 2, the system then draws the conclusion autonomously. Such systems can be underpinned mathematically. Set theory and first-order logic are often regarded as a safe mathematical basis. Theoretically, such systems were thus waterproof. In practice, however, things looked somewhat different. Problems were caused by the fact that even the smallest details had to be included in the rule system; if they were not, the whole system would “crash”, i.e. draw completely absurd conclusions. The correction of these details increased disproportionately to the extent of the knowledge that was covered. At best, the systems worked for small special fields for which clear-cut rules could be found; when it came to wider fields, however, the rule bases were too large and were no longer maintainable. A further serious problem was the fuzziness which is peculiar to many expressions and which is difficult to grasp with such hard-coded systems.

Thus this type of AI came in for increasing criticism. The following translation attempt may serve as an example of why this was the case. An NLP program translated sentences from English into Russian and then back again. The input of the biblical passage “The spirit is willing but the flesh is weak.” resulted in the retranslation “The vodka is good but the meat is rotten.”

This story may or may not have happened precisely like this, but it demonstrates the difficulties encountered in attempts to capture language with rule-based systems. This example demonstrates the difficulties encountered in attempts to capture language with rule-based systems. The initial euphoria associated with the “electronic brain” and “machine intelligence” since the 1950s fizzled out, the expression “artificial intelligence” became obsolete and was replaced by the term “expert system”, which sounded less pretentious.

Later, in about 2000, the stalwarts of rule-based AI were buoyed up again, however. Tim Berners-Lee, the pioneer of the WWW, launched the Semantic Web initiative with the purpose of improving the usability of the internet. The experts of rule-based AI, who had been educated at the world’s best universities, were ready and willing to establish knowledge bases for him, which they now called ontologies. With all due respect to Berners-Lee and his efforts to introduce semantics to the net, it must be said that after almost 20 years, the Semantic Web initiative has not substantially changed the internet. In my view, there are good reasons for this: the methods of classic mathematical logic are too rigid to map the complex processes of thinking – more about this in other posts, particularly on static and dynamic logic. At any rate, both the classic rule-based expert systems of the 20th century and the Semantic Web initiative have fallen short of the high expectations.

AI-2: tanks

However, there were alternatives which tried to correct the weaknesses of rigid propositional logic as early as the 1990s. For this purpose, the mathematical toolkit was extended.

Such an attempt was fuzzy logic. A statement or a conclusion was now no longer unequivocally true or false; rather, its veracity could be weighted. Besides set theory and predicate logic, probability calculus was now also included in the mathematical toolkit of the expert systems. Yet some problems remained: again, there had to be precise and elaborate descriptions of the rules that were applicable. Thus fuzzy logic was also part of rule-based AI, even though is was equipped with probabilities. Today, such programs work perfectly well in small, well-demarcated technical niches, beyond which they are insignificant.

At that time, another alternative was constituted by the neural networks. The were considered to be interesting; however, their practical applications tended to attract some derision. To illustrate this, the following anecdote was bandied about:

The US Army – which has been an essential driver of computer technology all along – is supposed to have set up a neural network for the identification of US and foreign tanks. A neural network operates in such a way that the final conclusions are found through several layers of conclusions by the system itself. People need not input any rules any longer; they are generated by the system itself.

How is the system able to do this? It requires a learning corpus for this purpose. In the case of tank recognition, this consisted of a series of American and Russian tanks. Thus it was known for every photograph whether it was American or Russian, and the system was trained until it was capable of generating the required categorisation itself. The experts only exerted an indirect influence on the program in that they established the learning corpus; the program compiled the conclusions in the neural network autonomously – without the experts knowing precisely what rules the system used to draw which conclusions from which details. Only the result had to be correct, of course. Now, once the system had completely integrated the learning corpus, it could be tested by being shown a new input, for instance a new tank photo, and it was expected to categorise the new image correctly on the basis of the rules it had found in the learning corpus. As mentioned before, this categorisation was conducted by the system on its own, without the experts exerting any further influence and without them knowing how conclusions were drawn in a specific case.

It was said that this worked perfectly with regard to tank recognition. No matter how many photos were shown to the program, the categorisation was always spot on. The experts could hardly believe that they had really created a program with a 100% identification rate. How could this be? Ultimately, they discovered the reason: the photos of the American tanks were in colour, those of the Russian tanks were in black and white. Thus the program only had to recognise the colour; the contours of the tanks were irrelevant.

Rule-based vs corpus-based

The two anecdotes show what problems were lying in wait for rule-based and corpus-based AI at the time.

  • In the case of rule-based AI (vodka), they were
    – the rigidity of mathematical logic,
    – the fuzziness of our words,
    – the necessity to establish very large knowledge bases,
    – the necessity to use specialist experts for the knowledge bases.
  • In the case of corpus-based AI (tanks), they were
    – the lack of transparency of the paths along which conclusions were drawn,
    – the necessity to establish a very large and correct learning corpus.

I hope that I have been able to describe the characters and modes of operation of the two AI types with the two above (which admittedly are somewhat unfair) examples, including the weaknesses with characterise each type.

Needless to say, the challenges persist. In the following posts I will show how the two AI types have reacted against this and where the intelligence now really resides in the two systems. To begin with, we’ll have a look at corpus-based AI.


This is a blog post about artificial intelligence.

Translation: Tony Häfliger and Vivien Blandford