Tag Archives: NLP

Paradoxes and Logic (Part 1)


Logic in Practice and Theory

Computer programs consist of algorithms. Algorithms are instructions on how and in what order an input is to be processed. Algorithms are nothing more than applied logic and a programmer is a practising logician.

But logic is a broad field. In a very narrow sense, logic is a part of mathematics; in a broad sense, logic is everything that has to do with thinking. These two poles show a clear contrast: The logic of mathematics is closed and well-defined, whereas the logic of thought tends to elude precise observation: How do I come to a certain thought? How do I construct my thoughts when I think? And what do I think just in this moment, when I think about my thinking? While mathematical logic works with clear concepts and rules, which are explicit and objectively describable, the logic of thinking is more difficult to grasp. Are there any rules for correct thinking, just as there are rules in mathematical logic for drawing conclusions in the right way?

When I look at the differences between mathematical logic and the logic of thought, something definitely strikes me: Thinking about my thinking defies objectivity. This is not the case in mathematics. Mathematicians try to safeguard every tiny step of thought in a way that is clear and objective and comprehensible to everyone as soon as they understand the mathematical language, regardless of who they are: the subject of the mathematician remains outside.

This is completely different with thinking. When I try to describe a thought that I have in my head, it is my personal thought, a subjective event that primarily only shows itself in my own mind and can only be expressed to a limited extent by words or mathematical formulae.

But it is precisely this resistance that I find appealing. After all, I wish to think ‘correctly’, and it is tempting to figure out how correct thinking works in the first place.

I could now take regress to mathematical logic. But the brain doesn’t work that way. In what way then? I have been working on this for many decades, in practice, concretely in the attempt to teach the computer NLP (Natural Language Processing). The aim has been to find explicit, machine-comprehensible rules for understanding texts, an understanding that is a subjective process, and – being subjective – cannot be easily brought to outside objectivity.

My computer programmes were successful, but the really interesting thing is the insights I was able to gain about thinking, or more precisely, about the logic with which we think.

My work has given me insights into the semantic space in which we think, the concepts that reside in this space and the way in which concepts move. But the most important finding concerned time in logic. I would like to go into that closer and for this target we first look at paradoxes.

Real Paradoxes

Anyone who seriously engages with logic, whether professionally or out of personal interest, will sooner or later come across paradoxes. A classic paradox, for example, is the barber’s paradox:

The Barber Paradox

The barber of a village is defined by the fact that he shaves all the men who do not shave themselves. Does the barber shave himself? If he does, he is one of the men who shave themselves and whom he therefore does not shave. But if he does not shave himself, he is one of the men he shaves, so he also shaves himself. As a result, he is one of the men he does not have to shave. So he doesn’t shave – and so on. That’s the paradox: if he shaves, he doesn’t shave. If he doesn’t shave, he shaves.

The same pattern can be found in other paradoxes, such as the liar paradox and many others. You might think that these kinds of paradoxes are far-fetched and don’t really play a role. But paradoxes do play a role, at least in two places: in maths and in the thought process.

Russell’s Paradox and Kurt Gödel’s Incompleteness Theorems

Russel’s paradox has revealed the gap in set theory. Its ‘set of all sets that does not contain itself as an element’ follows the same pattern as the barber of the barber paradox and leads to the same kind of unsolvable paradox. Kurt Gödel’s two incompleteness theorems are somewhat more complex, but are ultimately based on the same pattern. Both Russel’s and Gödel’s paradoxes have far-reaching consequences in mathematics. Russel’s paradox has led to the fact that set theory can no longer be formed using sets alone, because this leads to untenable contradictions. Zermelo had therefore supplemented the sets with classes and thus gave up the perfectly closed nature of set theory.

Gödel’s incompleteness theorems, too, are ultimately based on the same pattern as the Barber paradox. Gödel had shown that every formal system (formal in the sense of the mathematicians) must contain statements that can neither be formally proven nor disproven. A hard strike for mathematics and its formal logic.

Spencer-Brown and the “Laws of Form”

Russel’s refutation of the simple set concept and Gödel’s proof of the incompleteness of formal logic suggest that we should think more closely about paradoxes. What exactly is the logical pattern behind Russel’s and Gödel’s problems? What makes set theory and formal logic incomplete?

The question kept me occupied for a long time. Surprisingly, it turned out that paradoxes are not just annoying evils, but that it is worth using them as meaningful elements in a new formal logic. This step was exemplarily demonstrated by the mathematician Georg Spencer-Brown in his 1969 book ‘Laws of Form’, including a maximally simple formalism for logic.


I would now like to take a closer look at the structure of paradoxes, as Spencer-Brown has pointed them out, and the consequences this has for logic, physics, biology and more.

continue: Paradoxes and Logic (part2)

Translation: Juan Utzinger


 

AI: Vodka and tanks

AI in the last century

AI is a big buzzword today but was already of interest to me in my field of natural language processing in the 1980s and 1990s. At that time, there were two methods which were occasionally labelled AI, but they could not have been more different from each other. The exciting thing is that these two different methods still exist today and continue to be essentially different from each other.

AI-1: vodka

The first method, i.e. the one already used by the very first computer pioneers, was purely algorithmic, i.e. rule-based. Aristotle’s syllogisms are a paradigm of this type of rule-based system:

Premise 1: All human beings are mortal.
Premise 2: Socrates is a human being.
Conclusion: Socrates is mortal.

The expert posits premises 1 and 2, the system then draws the conclusion autonomously. Such systems can be underpinned mathematically. Set theory and first-order logic are often regarded as a safe mathematical basis. Theoretically, such systems were thus waterproof. In practice, however, things looked somewhat different. Problems were caused by the fact that even the smallest details had to be included in the rule system; if they were not, the whole system would “crash”, i.e. draw completely absurd conclusions. The correction of these details increased disproportionately to the extent of the knowledge that was covered. At best, the systems worked for small special fields for which clear-cut rules could be found; when it came to wider fields, however, the rule bases were too large and were no longer maintainable. A further serious problem was the fuzziness which is peculiar to many expressions and which is difficult to grasp with such hard-coded systems.

Thus this type of AI came in for increasing criticism. The following translation attempt may serve as an example of why this was the case. An NLP program translated sentences from English into Russian and then back again. The input of the biblical passage “The spirit is willing but the flesh is weak.” resulted in the retranslation “The vodka is good but the meat is rotten.”

This story may or may not have happened precisely like this, but it demonstrates the difficulties encountered in attempts to capture language with rule-based systems. This example demonstrates the difficulties encountered in attempts to capture language with rule-based systems. The initial euphoria associated with the “electronic brain” and “machine intelligence” since the 1950s fizzled out, the expression “artificial intelligence” became obsolete and was replaced by the term “expert system”, which sounded less pretentious.

Later, in about 2000, the stalwarts of rule-based AI were buoyed up again, however. Tim Berners-Lee, the pioneer of the WWW, launched the Semantic Web initiative with the purpose of improving the usability of the internet. The experts of rule-based AI, who had been educated at the world’s best universities, were ready and willing to establish knowledge bases for him, which they now called ontologies. With all due respect to Berners-Lee and his efforts to introduce semantics to the net, it must be said that after almost 20 years, the Semantic Web initiative has not substantially changed the internet. In my view, there are good reasons for this: the methods of classic mathematical logic are too rigid to map the complex processes of thinking – more about this in other posts, particularly on static and dynamic logic. At any rate, both the classic rule-based expert systems of the 20th century and the Semantic Web initiative have fallen short of the high expectations.

AI-2: tanks

However, there were alternatives which tried to correct the weaknesses of rigid propositional logic as early as the 1990s. For this purpose, the mathematical toolkit was extended.

Such an attempt was fuzzy logic. A statement or a conclusion was now no longer unequivocally true or false; rather, its veracity could be weighted. Besides set theory and predicate logic, probability calculus was now also included in the mathematical toolkit of the expert systems. Yet some problems remained: again, there had to be precise and elaborate descriptions of the rules that were applicable. Thus fuzzy logic was also part of rule-based AI, even though is was equipped with probabilities. Today, such programs work perfectly well in small, well-demarcated technical niches, beyond which they are insignificant.

At that time, another alternative was constituted by the neural networks. The were considered to be interesting; however, their practical applications tended to attract some derision. To illustrate this, the following anecdote was bandied about:

The US Army – which has been an essential driver of computer technology all along – is supposed to have set up a neural network for the identification of US and foreign tanks. A neural network operates in such a way that the final conclusions are found through several layers of conclusions by the system itself. People need not input any rules any longer; they are generated by the system itself.

How is the system able to do this? It requires a learning corpus for this purpose. In the case of tank recognition, this consisted of a series of American and Russian tanks. Thus it was known for every photograph whether it was American or Russian, and the system was trained until it was capable of generating the required categorisation itself. The experts only exerted an indirect influence on the program in that they established the learning corpus; the program compiled the conclusions in the neural network autonomously – without the experts knowing precisely what rules the system used to draw which conclusions from which details. Only the result had to be correct, of course. Now, once the system had completely integrated the learning corpus, it could be tested by being shown a new input, for instance a new tank photo, and it was expected to categorise the new image correctly on the basis of the rules it had found in the learning corpus. As mentioned before, this categorisation was conducted by the system on its own, without the experts exerting any further influence and without them knowing how conclusions were drawn in a specific case.

It was said that this worked perfectly with regard to tank recognition. No matter how many photos were shown to the program, the categorisation was always spot on. The experts could hardly believe that they had really created a program with a 100% identification rate. How could this be? Ultimately, they discovered the reason: the photos of the American tanks were in colour, those of the Russian tanks were in black and white. Thus the program only had to recognise the colour; the contours of the tanks were irrelevant.

Rule-based vs corpus-based

The two anecdotes show what problems were lying in wait for rule-based and corpus-based AI at the time.

  • In the case of rule-based AI (vodka), they were
    – the rigidity of mathematical logic,
    – the fuzziness of our words,
    – the necessity to establish very large knowledge bases,
    – the necessity to use specialist experts for the knowledge bases.
  • In the case of corpus-based AI (tanks), they were
    – the lack of transparency of the paths along which conclusions were drawn,
    – the necessity to establish a very large and correct learning corpus.

I hope that I have been able to describe the characters and modes of operation of the two AI types with the two above (which admittedly are somewhat unfair) examples, including the weaknesses with characterise each type.

Needless to say, the challenges persist. In the following posts I will show how the two AI types have reacted against this and where the intelligence now really resides in the two systems. To begin with, we’ll have a look at corpus-based AI.


This is a blog post about artificial intelligence.

Translation: Tony Häfliger and Vivien Blandford