AI in the 20th Century — Part 1

Original article was published by Sreshta Putchala on Artificial Intelligence on Medium


We made great strides in AI in the 1950s. Alan Turing proposed the Imitation Game, and Claude Shannon published a paper on chess-playing programs. The first one being written in 1951 at the University of Manchester by Christopher Strachey. The program that became well known was by Arthur Samuel, which he demonstrated at the Dartmouth Conference in 1956.

Let’s play games!

The basic components of a program to play checkers, or similar board games, are the search algorithm that peers into the future, examining all variations. The evaluation function that judges how good a board position is. Samuel’s program the first program that incorporated learning; it would improve with every game it played. This was achieved by tuning the parameters of the evaluation function based on the outcome of the game. This is important because the program chooses between alternative moves, eventually based on the evaluation function’s values. The better the evaluation, the better would be the choice of the move. The program also remembered the values of board positions seen earlier. Samuel is one of the people who invented Alpha-Beta pruning, which can drastically cut down on the search effort. The program became good enough to beat its creator. One can imagine the kinds of fears that might have been raised amongst people after Frankenstein and RUR.

At that time, computers were small (in memory size; physically, they were huge) and slow. Computing was still an esoteric field. The earliest proponents of AI with logic, language, and games. Chess and checkers were already a fascination and considered hallmarks of intelligence. Alex Bernstein had spoken about chess at Dartmouth and developed the first chess-playing program in 1957. There was optimism amongst many that machines would soon be the best chess players around, but skepticism. Herbert Simon (1916–2001) and Alan Newell (1927–1992) predicted that a computer would be a world champion in ten years. In 1968, David Levy, an international master, wagered a bet that no machine would beat him in the next ten years, and won it by beating the strongest player at that time, from Northwestern University named Chess4. However, Levy later lost to Deep Thought’’, originating from the Carnegie Mellon University, and in 1997, its successor Deep Blue from IBM beat the then reigning world champion, Gary Kasparov.

Game playing programs worked with a numerical evaluation function and processing natural language beckoned as a new avenue of exploration. Noam Chomsky had been developing his theory of generative linguistics and moving on to transformational grammar. Armed with a grammatical model of language, and aided by a bilingual lexicon, machine translation became an area of research, much to the public’s interest. However, the earliest programs were unable to contend with the richness of language, mainly when dealing with idioms. It was reported in the press that on translating “the spirit is willing, but the flesh is weak” into Russian and back into English, the output read “the vodka is fine, but the meat has got spoiled” or something to that effect.

How to Understand Language?

Interestingly, a natural language parser developed in 1963 by Susumu Kuno at Harvard revealed the degree of ambiguity in the English Language that often escapes human listeners, who latch on to one parse tree and the corresponding meaning. Given the input “Time flies like an arrow’, the parser produced at least four interpretations other than the one we usually assume. In 1966, the US government appointed the Automatic Language Processing Advisory Committee (ALPAC) had a negative report, leading to a decline in machine translation research funding. Like other ideas, it revived only with increased computing power availability in the latter half of the century.

Logic at the very core!

The task that computers were successful is logical reasoning. In the Dartmouth Conference in 1956, Herbert Simon and Alan Newell demonstrated a working theorem prover called LT (Logic Theorist), along with J C Shaw. The Logic Theorist proved many of the theorems in Principia Mathematica, even finding shorter and elegant proofs for some of them. An attempt to publish a new proof in the Journal of Symbolic Logic, however, failed because a paper co-authored by a program was not acceptable! Another system that showed promise was a geometry theorem prover built by Gelernter in 1959. However, these ‘theorem provers’ were based on search and faced with its nemesis — an exponentially growing search space. Like many Al problems, `geometry theorem proving’ too met a revival many years later. In 1961, James Slage wrote the first symbolic integration program, SAINT, which formed many symbolic mathematics tools.

Perhaps Newell and Simon’s most significant contribution was their program called GPS (General Problem Solver) that addressed general-purpose problem solving, based on human thought processes. Their strategy called Means End Analysis (MEA) embodied a goal-directed search strategy in which the problem solver repeatedly looks for methods (means) to achieve the most significant partial goal (ends), till all goals are solved. Their work found a home in the Carnegie Mellon University (CMU). It was first implemented in a production system language, OPS5, to build expert systems. Subsequently, John Laird and Paul Rosenbloom built a general symbolic problem-solving architecture known as SOAR, a popular tool.

The emergence of Neural networks:

Neural networks, the emergent systems approach to problem-solving, believe that the sum of interconnected simple processing elements is more than its part appeared in the early years. Unlike classical Al systems that are implemented in a top-down manner, neural networks are built by connecting the “neurons” according to a certain architecture and then learning the weights of these connections through training. In these weights, all the knowledge gets captured. It is not straightforward to interpret the weights meaningfully. The first of these was Perceptron’s system in 1957 by Frank Rosenblatt (1928–1971) at Cornell University. The Perceptron was essentially a single layer feed-forward neural network and could learn to classify certain patterns. However, Minsky and Papert (1969) gave a mathematical proof that the Perceptron could handle only a more precise class of patterns, which proved to be a dampener for neural networks. It was only in the mid-eighties, with the revival of the Backpropagation algorithm for training multilayer neural networks by Rumelhart and Co. that research in neural networks came again to the fore, and with bigger and faster machines available, was quite a rage amongst researchers in the nineties.

Here cometh the winter!

A hiatus called “Al winter” had set in with the drying of funds in the late eighties, mainly due to the immense hype that Al projects had generated only to disappoint. Meanwhile, personal computers were growing by leaps and bounds, rendering Lisp machines, and even the language Lisp itself, to be a forte of a dedicated but small community.

(to be continued……….)