Conceptual and Computational Intractability: Assessing the Relevance of Kolmogorov Complexity to Human Psychology

 

Pauli Salo

Cognitive Science

Department of Computer Science

University of Jyväskylä

 

 


 

 

 

Abstract

This article assesses the relevance of complexity, both computational complexity and Kolmogorov complexity, to human psychology. It is argued that the complexity problems, especially those associated with computational models, result from a residuum of false empiricist presuppositions holding that the complex behavioural repertoire of organisms is an emergent product of simple principles, whether in terms of gene regulation, environment or general learning algorithms, and argues that there is much empirical evidence that the kind of complexity that characterizes human psychology is irreducible and hence computationally/conceptually intractable. Kolmogorov complexity is argued to be especially relevant to the matter. The source and nature of such complexity is discussed.

 

Keywords

Kolmogorov complexity, computational complexity, computational models, empiricism, learning, frame problem


 

 

 

 

1 introduction

When a scientific concept is brought from its original context and applied to a new one, it often transforms into a metaphor. Such might be, at least in part, the case with the “computer metaphor” of the human mind (Boden, 1981, §1), originally invented for the purposes of attaining rigor in mathematics and mathematical logic through syntactic manipulation of concrete symbols. True enough, the inventors themselves were quite occupied with an analysis of human thinking (Turing, 1936, 1950), and many of the most profound and innovative results concerning the human mind have been obtained by using these techniques (Chomsky, 1957, Marr, 1982, Rumelhart & McClelland, 1986), so as to warrant Steven Pinker to claim that the computational theory of the mind:

has solved millennia-old problems in philosophy, kicked off the computer revolution, posed the significant questions of neuroscience, and provided psychology with a magnificently fruitful research agenda (Pinker, 1997, p. 77).

After Alan Turing’s seminal paper (1936) on the topic of “computational intelligence,” the first thoughts in this direction begin to emerge at the early forties and fifties, after the war (e.g. Hull, 1937; 1943, McCulloch & Pitts, 1943, Rochester, Holland, Haibt, & Duda, 1956, Turing, 1948/1968, Turing, 1950, Shannon, 1950b), but the idea really began to blossom in the sixties. Two prominent ideas shape the psychological landscape at that time.

The first were the computational neural models and cybernetic models, forerunners of modern computational neuroscience and especially what is known as ‘connectionism’ (Ashby, 1940, Baernstein & Hull, 1931, Bradner, 1937, Bennett & Ward, 1933, Crozier, 1929, Ellson, 1935, Hebb, 1949, Kreuger & Hull, 1931, Lotka, 1925, McCulloch & Pitts, 1943, Minsky, 1954, Neumann, 1958, Rashevsky, 1938, Ross, 1938, Stephens, 1929, Tolman, 1939, Troland, 1928, Rosenblatt, 1958; 1962, Rosenblueth, Norbert, & Bigelow, 1943, Selfridge, 1959, Walton, 1930; Dewey, 1896 is an important precursor), build in much the spirit of the ‘cybernetic’ models of the mind (see Shannon, 1948, Wiener, 1948, Wisdom, 1951). Behaviorists had been exploring the same associative techniques with their more restricted methodology (Hull, 1943).

The second idea is the conception of the mind as a ‘symbol manipulation device,’ a notion brought into psychology directly from the ideas of Turing, Hilbert and Leibniz, among the others. The idea was to view the mind as a mechanism for storing and manipulating symbols and symbol structures (Chomsky, 1957, Craik, 1952, Feigenbaum & Feldman, 1963, Galanter & Gerstenhaber, 1956, Galanter & Smith, 1958, Gelernter & Rochester, 1958, Gelernter, 1959, Miller, Galanter, & Pribram, 1960, Newell, Shaw, & Simon, 1958a, Newell, Shaw, & Simon, 1958b, Newell & Simon, 1961, 1963, Turing, 1936). It was again the digital computer that remobilized the idea of ‘symbol manipulation:’ the early computers were used to compute numbers, while computers are not restricted to the manipulation of numbers; they can compute with formal symbols that could represent, or ‘stand for,’ almost anything of human concern. In short, computers were able to solve problems of almost any imaginable kind rather than just missile trajectories and cryptoarithmetic problems.

The first area to implement the idea was chess and other simple games (Bernstein & Roberts, 1958, Kister, Stein, Ulam, Walden, & Wells, 1957, Newell, 1955, Newell et al., 1958b, Samuel, 1959, Shannon, 1950a), but it is not a long step from here to say that such machines are capable of thinking. This idea came from Turing, and culminated in Newell and Simon’s idea of general problem solver (Newell & Simon, 1963).

Yet, contrary to what Pinker claims in the passage cited above, it is far from clear today how far this concept get us in revealing the principles that cover what we may think pretheoretically as ‘human psychology’ (Chomsky, 2000, Dreyfus, 1972, Fodor, 1983, 2000). One negative, and what I believe to be a most important aspect of this issue concerns the phenomenon of computational complexity  or the “scaling problem” in its various forms. I will discuss the relevance of complexity to the notion of computational psychology and psychology in general, addressing computational complexity as well as Kolmogorov-complexity, the emphasis being on the latter, having evoked some recent discussion (Bennett, 1988, Chaitin, 1979, Chater, 1996, 1999, Sambrook & Whiten, 1997). The importance of complexity here is that if it seems as if the computational models of the human mind systematically fail because of certain kind of complexity, then it is possible that there is something in our mind, not just in the choice of the wrong models, which causes this to happen.

Here it must be mentioned that even if the most naïve enthusiasm towards this theory lies already in the past, and even if many psychologists are not involved in mathematical modelling, psychological theories are full of concepts borrowed from the realm of computation, such as “processors,” “subsystems,” “bus,” “information channels,” “mechanical homunculi,” “central executives,” “constituent structures” and “symbol manipulation,” “heuristics,” and “recursive search” to mention only few. Moreover, computers have made possible the execution of concrete computer simulations of psychological processes, offering thus a concrete platform for testing theories with remarkable computational complexity (one of the very first simulations was a simulation of Donald Hebb’s cell-assembly theory, Hebb, 1949, by Rochester et al., 1956).

1.1   The problem with complexity

Chess is a particularly easy task-environment to begin with due to the fact that it is well-defined and, as is well known, a complex, nontrivial task. But exactly in what sense it is claimed to be ‘complex?’ According to deGroot, already Selz (1922) tried, well before the appearance of concrete computational models, to “explain the course of the thought process as a strictly determined succession of operation carried out an obviously active subject” (this citation from de Groot, 1965, p. 51) such that it would yield a “a complete (literally: ‘gapless’) description of the causal connections that govern the total course of intellectual and/or motor processes” (p. 13-14). When this idea – certainly one aspect of the computer metaphor of the mind that is important – was turned into concrete causal models of chess players thinking in the 50’s, it became apparent that the tasks, both the simulation of chess player’s thinking and chess playing itself, were complex in a sense that enormous amount of computational resources, time in particular, was needed to beat a human player (Shannon, 1950b, Newell et al., 1958b, Stockmeyer & Chandra, 1979; see also Hartmanis & Stearns, 1965, Stearns, 1988). This is problematic, not only in the sense that it often prohibits one from constructing actual simulations, but also from the psychological point of view that human players are, if compared to computers, notoriously effective in their thinking.

From a mathematical point of view, chess turned out to be complex because a given chess board position itself does not contain any trivial regularity from which the next good move could be computed. Rather, as Luc Longpré pointed out, “the non-randomness [regularity] is so intricately buried that any space or time bounded process would fail to detect it” (Longpré, 1992, p. 69). For a computer to detect a good move, it must evaluate millions of positions, while human experts seem to do only approximately 50, maintaining reasonable level of play (Newell & Simon, 1976).

Moving from chess into a more real-world thinking, it, too, appeared to involve complexities not anticipated by many. The well-known ‘frame problem’ is a good example (McCarthy & Hayes, 1969, Pylyshyn, 1986). The frame problem originates in the following situation. A computer operates based only on the syntactic properties of the symbols it manipulates, while those symbols are assumed to represent the world, these semantic relations emerging somehow from outside of the system itself. Suppose, to keep this distinction in mind, that the robot wants it to be the case that P(a), where P(a) represents some fact. The robot knows that some facts are true or false, and that there is a set of inferential rules and potential actions which allow it to carry out actions and, in doing so, change the facts of the world in order to bring it to P(a). For instance, in order to get into a conversation with its friend by phone, the robot first needs to look up the phone number from the phone book, dial the number, and so on. Thus, by having the premises, the inferential rules and actions available, the robot is required to deduce a correct solution to the problem of having P(a), and, after carrying out the deductions, initiate the plan. The problem is that a priori the robot cannot know which of the facts, inferential rules and actions are relevant for obtaining P(a). In order to determine the relevancy of some particular fact the computer must engage in a phenomenally complex task of carrying out the deductions for each of the facts – of whose result is often just the annoying fact that the particular fact was not relevant. In order to solve this problem it was suggested that the relevant facts were “framed” beforehand so that only those facts were considered, yet this strategy fails since, as remarked by Fodor, nearly any fact could be relevant in any situation (Fodor, 2000, pp. 37-38). Hence the name “frame problem.” If you restrict the relevancy beforehand, you get notoriously unreliable robots, and if not, then they are notoriously slow; human are, in contrast, both reliable and effective.

These two problems and their numerous elucidations and variations, the problem with chess and common-sense inference, are sufficiently similar to suggest a similar treatment – there’s a pattern in these failures. In both cases, when the inferential processes are analysed in terms of syntactic computation, a vast computational explosion results. Nevertheless, in both cases humans are notoriously effective and reliable. In fact, it almost seems as if humans could not be effective, since the fact that these task environments are computationally complex by the standards of mathematical rigor means that one could not possibly compute them effectively.

One should not be mislead by the fact that both our examples involve what can be called ‘symbolic computations,’ that is, computational processes defined over recursively defined formal languages. The so-called ‘neural networks,’ currently popular, are not an exception in this respect. Indeed, when the first wave of connectionist models beginning with Rosenblatt’s Perceptron (Rosenblatt, 1958) were shown to be defective by Minsky and Papert (Minsky & Papert, 1969), they were precisely defective in the sense that they were computationally too complex and, when not complex, too simple in their behavioural repertoire. Though the situation got somewhat better with the new models in the 80’s (Hopfield, 1982, Rumelhart & McClelland, 1986; see papers in Anderson & Rosenfeld, 1988), the problem which computational complexity persists and has now become even worse (see Judd, 1990, 1996 Kolen & Goel, 1991, Minsky & Papert, 1990, Tesauro, 1987, Tesauro & Janssens, 1988, Redding, Kowalczyk, & Downs, 1993). Learning in a network is either vastly complex or, if it isn’t, applicable only to very simple stimulus-response –pairings (classifications), as was put by Stephen Judd: “It is widely acknowledged that as networks get larger and larger and deeper, their learning time grows prohibitively” so that “whenever things get scaled up, the news regarding training time is so bad that we can hardly measure it” (Judd, 1996, p. 123). Furthermore, when the classical theory still fares rather well on some selected areas, such as modelling the competence of the speaker/hearer of a natural language(s), the connectionism, as it is committed to what are called ‘Markov-models,’ is already hopeless (Bever, Fodor, & Garrett, 1968, Chomsky & Miller, 1958, Fodor & Pylyshyn, 1988, Miller & Chomsky, 1963).[1] I will return to connectionism and the complexity of learning later.

Returning to the problem of computational complexity in the problem solving, whether in the domain of chess or routine common-sense inference, what the human mind does there seems nearly miraculous due to the fact that the tasks are complex, implying, by mathematical certainty, that there ought not to exists effective algorithms, whether performed by our brains or computers. Not surprisingly, there is a loophole here that seems to have some consequences worth a closer look.

When some problem is said to be “computationally complex” this usually means that in order to design an algorithm that can solve the problem in every case, the amount of either time or space (or both) the algorithm requires in the worst case increases quite fast as a function of the size of the problem instance. That is, if you are about to find the shortest path among some set of cities such that every city is visited once and only once, the time it takes to find this property is exponentially related to number of cities due to the increase in the combinatorial possibilities that must be searched exhaustively. However, if there are only a finite number of instances of the given problem, i.e., city maps, then, from a mathematical point of view, the computation does not require any resources: it is possible to pair each of those maps directly to the shortest path by means of what is called a ‘table-lookup method’ and write this table to the memory of the computer. The same is true of chess, where, as is well-known, there are only a finite number of board positions to consider, though the number of those positions is, of course, very large. Now when it is proved that chess is computationally complex (e.g., Stockmeyer & Chandra, 1979), this is understood in a sense that the size of the board is assumed to be of arbitrary size, not to be restricted to the standard 8 x 8 board. We say that our standard chess is thus a “finite instance” of the more general problem of chess. Much of the same could be true of common-sense reasoning and learning, where the “finite instance” might be determined by the actual ‘causal and logical laws that happen to be true of the actual world,’ making us effective only on this instance, maybe due to some evolutionary tinkering.

Thus, some authors have dismissed computational complexity as irrelevant on these grounds (Anderson, 1990, pp. 38-40), but too quickly, I believe. Finite instances can be simple or complex as well, though in a somewhat different sense. Consider the following table-lookup pairing á1, 1ñ, á2, 0ñ, á3, 1ñ, á4, 0ñ, á5, 1ñ, . . . án, mñ, n and m being natural numbers, where each odd number is paired with 1, each even number is paired with 0. Instead of asking how much time is required to solve this finite instance (namely, none), we may ask how much programming code is minimally required to implement it. In our case, assuming that the list is sufficiently long, it suffices to implement the (short) algorithm that determines whether the number is even or odd plus the numbers n and m. This algorithm being ‘short’ reflects the fact that this sequence would make a trivial game; similarly, it reflects that fact that since chess or common-sense inference is so nontrivial, maybe, even if they both represent a finite problem instance, they are complex in the sense that the above pairing is not.

In order to understand this notion of complexity we turn to the formalism known as Kolmogorov complexity or instance complexity. I will explain the basics of this formalism in the next section, then return to its psychological implications and various other claims that have been put forward concerning the significance of Kolmogorov complexity to the psychology, plus other topics that have been raised in connection with complexity on a more general level.

2 Kolmogorov complexity in machines and men

The intuition is that a finite instance is simple when the algorithm that implements it, either by recognizing its members or by generating it as a list, is ‘short.’ Since obviously the algorithm need not be longer than the list itself, we may say that simple instances are those that can be compressed by means of the algorithms that, say, print them out. We may formulate this requirement by fixing some computational language or a ‘specification method’ as a recursive function f from programs (plus inputs) into their outputs (products) and, presupposing f, say that the size |p| in bits of the minimal program p which prints some x is the instance complexity of x.

Complexity(x) = min{|p| : f(p) = x}

If x can be generated by a small computer program, relative to the size of x itself, then x is regular and simple. Otherwise it is complex. An obvious defect with this working definition is that it depends on the choice of f. A given string x may be more simple in terms of LISP than PASCAL, and we do not want the measure to be relative to some particular specification method. In fact, for each x it is possible to find some specification method which has Complexity(x) = 0. However, obviously if we are allowing the specification method to vary freely, we need to include it into the complexity measure as well. If x is short when coded with LISP, then the complexity of x must include also the information about which programming language, namely LISP, it is coded with. What is, then, the complexity of the LISP interpreter? Written in PASCAL?

A Universal Turing Machine (UTM) can be used to settle the matter. UTM is a universal specification method which can simulate any other specification method (computer and its input).  Suppose that the input of the UTM contains the code for the particular specification method f (e.g., LISP vs. PASCAL), its program p, and simulates p on f to print the given string x. Then we seek the most minimal combination of f and p:

ComplexityUTM(x) = {|pf|: UTM(p, f) = x}.

To be accurate, we also need information to tell p and f apart from each other, but we may omit this technicality here. It can now be proven that, when compared to any other specification method f the complexity computed by this method can differ from it only by a constant, depending only on the choice of f, namely, by the size of the code required to determine f. That is, the difference in complexity between the universal method and LISP, for instance, can be only a constant, hence in assessing the complexity of objects, strings or natural numbers, we can use the universal method. Hence we may take ComplexityUTM(x) to measure the complexity of any string, knowing that no other method would improve the result infinitely often. This is the invariance theorem of Kolmogorov complexity, established independently during the 1960’s by R. Solomonoff (1964), A. N. Kolmogorov (1965), and G. Chaitin (1969). Let us say that the Kolmogorov complexity of x, as defined above, is denoted by “C(x)”.

This measure has the following properties, relevant to the present discussion. First, it determines how much of a given string can be compressed in terms of algorithms and, as a consequence, it determines how much computable invariances there are. For this reason, Kolmogorov complexity can be used to define “randomness:” a string can be said to be “random” if it is its own shortest description. The decimals of p are on this criteria non-random or better, pseudo-random, since they can be produced by a small program; yet by many other tests it is random. I will be using “random” and “pseudo-random” in this sense, reserving the term “random” only strings that are Kolmogorov-complex. This is important, for I am going to argue that the human mind is complex in this quite strict, “non-reductive” sense.

Secondly, Kolmogorov complexity can be used to capture the property of instance complexity, as discussed above, by seeking the smallest algorithm that can define the given instant, say in chess (Orponen, Ko, Schöning, & Watanabe, 1994). Some instances are more complex than others with respect of this measure, even if they are finite and, from the perspective of computational complexity, not complex at all. Indeed, Kolmogorov complexity can be applied specifically to the measurement of finite objects.

For a more comprehensive survey of the properties of Kolmogorov complexity, see Li & Vitányi, 1997. Returning to the case of chess and related task-environments, we noted that, when they are considered as problems of infinite instances, they were computationally complex. This means that their regularity is “buried deep” enough to force the algorithm to compute; or, in other words, the surface properties of the board positions do not carry simple information about the next move. What happens, then, when one does not compute as (millions of variations) one might speculate is the case with effective human cognition? One can say, from the perspective of Kolmogorov complexity, that these problems, or their relevant finite instances, become more and more irregular or even random.[2] If computations are required in order to “see” the invariances, and if there are no computations, there are no invariances to be seen. That is, a human player cannot “directly see” any logic in the board . Then this means that in order to play effective and feasible chess, or in order to solve decision problems of real-life, one cannot but use brute information – algorithms with high information content. In other words, the processes – or better, the underlying psychological mechanisms - must be complex in terms of Kolmogorov complexity, hence quite literally complex: there is no short algorithmic description – a computational theory – of the process.

The reverse is also true. Suppose one wants to simulate problem solving in a task-environment which requires long algorithms, but, instead, uses only very short ones. Then we expect such models to fail, either by being notoriously unreliable or by being notoriously slow – both properties which seem to characterise currently all kinds of computational models. It is then no accident that the popular computational models used really are simple in terms of their Kolmogorov complexity.

For instance, a typical neural network is constituted by a simple code that describes a single formal neuron, a code which copies the neuron to everywhere and pseudo-random connections between them. The result is a large network, which is only apparently complex but, on a more close inspection, very simple; it is simple in that it is defined by a short programming code. These models do not scale-up, unsurprisingly. The same is true of more symbolic models that use recursion, an ingenious process that uses the same often simple code plus a block of structured memory, over and over again. If the tasks, to repeat, are complex, then these models – or all models that are too simple – could not succeed, both as simulations or explanations of the underlying mental processes.

Importantly, this also rules out various once-popular computational heuristic strategies. A heuristic strategy is one which trades reliability to effectiveness, but it is presupposed that the programming code itself – the principles that are used to implement the heuristic search – are descriptively simple. If it is true that the task-environment is complex, then it is also true that there is no simple heuristic solution available such that it would match the reliability and effectiveness of the human mind.

There is a trend in the literature of chess psychology to treat the human player as using a large number of chess-specific images or patterns, patterns that turn out to be some kind of copies of the actual boards. But as soon as one tries to use such information in the case of a computer, it turns out that the strategy is hopeless. It is hopeless because to be able to play reasonable chess one does not need only images of the boards,  rather, since the game is complex, information concerning the abstract search space – which results, immediately, in a desperate search through that space. However, although humans do not search, they, too, require knowledge of the search tree, not about single board positions, and for much the same reason. So although a skilled chess player must find many board configurations “familiar,” it cannot be the case that his or her skill consists in “memorizing” them. There are no useful patterns to memorize what comes to the board positions. What the skilled players must see and understand are the consequences of one’s positions – a fact that is revealed by analysing their protocols (DeGroot, 1965). It is just that the search tree cannot be directly “seen” or perceived; rather, it must be ‘grasped.’

It is not only thought which seems subject to impenetrable complexity; perception or “seeing” is another domain where these phenomena have been recognized for a long time, at least ever since the behaviorist program collapsed. Behaviorist thinking was saturated with the idea that the responses of organisms could be predicted based on invariances found from the stimuli. Thus, on a certain abstract but arguably relevant level of analysis, all computationalists are still behaviorists: it is their simple program code that describes a putative connection between a simple stimulus regularity or invariance and a response. Skinner would surely have called this as an instance of a S-R –law (Skinner, 1953, pp. 34-35). Yet it turned out that there were no such invariances, but only “randomness,” as Zenon Pylyshym wrote: “If we attempt to describe human behaviour in terms of physical properties of the environment, we soon come to the conclusion that, except for tripping, falling, sinking, bouncing of walls, and certain autonomic reflexes, human behaviour is essentially random” (Pylyshyn, 1984, p. 12, emphasis mine). According to Fodor, physical invariants are absent in the impoverished sensory organs so that ”perception cannot, in general, be thought of as the categorization of physical invariants, however abstractly such invariance may be described” (Fodor, 1975, p. 48, footnote 15, emphasis mine). This means that, if true, there are no relevant invariants, and when there are no invariants, we are left with irregularity or noise, thus high Kolmogorov complexity. Much of the same is true of linguistic elements, such as ‘phonemes’ (Chomsky & Halle, 1968) and ‘words,’ as was explained by Howard Lasnik:

But what is a word? What is an utterance? These notions are already quite abstract [...] The big leap is what everyone takes for granted. It’s widely assumed that the big step is going from sentences to transformation, but this in fact isn’t a significant step. The big step is going from “noise” to “word.” (Lasnik, 2000, p. 3.)

In criticizing Skinner’s theory of language, Chomsky (1959) observed that “the syntactic organization of an utterance is not something directly represented in any simple way in the physical structure of the utterance itself” (p. 55). For instance, a syntactic unit such as ‘being a noun in English’ “has no sensory/acoustic correspondent; there’s nothing that nouns qua nouns sound like, or look like on an oscilloscope” (Fodor, 1984, p. 245). We may again conclude that, due to the randomness or “noise” in the physical stimulus that could putatively be viewed as the causal antecedent of some cognitive state or action, as thought explicitly by the behaviorists and implicitly by some modern computationalists, “the program [of behaviorism] seems increasingly hopeless as empirical research reveals how complex the mental structures of organisms, and the interactions of such structures, really are” (Fodor, 1975, p. 52). One might now add to this, if what I have argued is true, that “complexity” could be taken in a fairly literal sense.[3]

Concluding, I have argued that the empirical evidence, both from the lack of credibility on the behaviorist psychology and the “scandalous”[4] failures of computational models of all kinds, suggest that we are attempting to understand a process – human perception and cognition – with an apparatus that is too simple where the processes might well be complex. As once put by Tinbergen (1951) long ago, “we may now draw the conclusion that the causation of behavior is immensely more complex than was assumed in the generalizations of the past,” since a “number of internal and external factors act upon complex central nervous structures” (p. 64). According to my view, we should interpret “more complex” to mean “literally complex:” a true model of such processes should not just look complex, it must be complex. With the help of Kolmogorov complexity we can attempt a better understanding of the nature of the complexity so as to see why some theories are complex only in a highly misleading way, hiding their simplicity in the complex-looking outfit. This seems to be the case with both neural network models as well as more classical, recursion based symbolic models, and many others. The right kind of complexity is not the kind that can somehow “emerges” from simple principles, it is there as complexity tout court.

3 In seek of simplicity?

Some authors have explicitly denied the present thesis, namely, that the human mind is, in an important sense, confronted with complexity, noise, or irregularity, by proposing that one of the fundamental cognitive principles would be to search for simplicity or invariances. Behaviorists thought so, and they, in turn, followed the empiricists; it is then not a miracle that those who seem to have accepted part of what I have claimed in the previous section are rationalists or, in the lack of better word, nativists.

Nick Chater has recently argued that the world around us is “immensely complex” yet, oddly enough, “highly patterned,” whereas the ability to “find such patterns in the world is [...] of central importance throughout cognition” (Chater, 1999, p. 273). These patterns are found by following a fundamental principle of choosing the pattern that provides the simplest explanation of the available data. Chater then defined “simplicity” as ‘simple in terms of Kolmogorov complexity,’ arriving at a thesis virtually opposite to the one presented here. According to this proposal, there are simple invariances which, when they are found, explain a lot of our cognition and perception, being raised to the status of “fundamental principle” having “very broad scope” so that it could be “used as a starting point for the detailed rational analyses of a wide range of cognitive processes” (p. 264), whereas what I think is, for the most part, just the opposite, namely, that there is no invariances or simplicity to be found in explaining cognition, but it seems to be, on the contrary, saturated with irreducible complexity.

The matter is of course empirical, and can be argued only on empirical basis. For instance, concerning what is currently understood about language learning, it is in fact the standard argument against empiricism and behaviorism to show that, were a child to choose the simplest, or even some relatively simple, hypothesis, he or she would fail in a way that is against the empirical facts (see Crain & Lillo-Martin, 1999). The hypotheses that the child is required to formulate concerning the properties of language(s) in learning one are vastly too abstract to allow any kind of inductive method of seeking invariances to be applied; suffice it to consider a mentally retarded child who, in the course of language maturation, masters the rules already in few years (Bellugi et al., 2001) whereas for linguists, trying to learn these facts from experience with the help of already matured knowledge of language plus sophisticated cognitive tools and the recorded wisdom of the earlier generations of linguists, only partly grasping of their full complexity takes a lifetime. It was the American structuralists some fifty years ago who applied these ideas to the analysis of language, with results that were not encouraging.[5] It seems that the majority of linguists taking language learning at all seriously, proposing concrete models rather than speculations, accept it that “innate factors permit the organism to transcend experience, reaching a high level of complexity that does not reflect the limited and degenerate environment” (Chomsky, 1980, p. 34). But if this is true, it makes almost no sense to propose that the learners seek to formulate the simplest possible hypothesis, given the impoverished sensory information available.

Consider the problem of ‘perceptual organization,’ keeping these facts in mind. That task has traditionally been constructed as a question of how the perceptual system is able to derive “a complex and structured description of the perceptual world from patterns of activity at the sensory receptors” (Chater, 1996, p. 566) to which Chater suggested that we “view perceptual organization as a means of encoding the sensory stimulus” so that the perceptual system chooses the simplest possible encoding, hence captures the “regularities” in the stimulus, in terms of Kolmogorov-complexity (p. 568). But as it is with efficient chess or everyday thinking, it is with perception of linguistic categories: Chater’s strategy could, and presumably would fail, there being nothing but more or less “noise” with respect to the linguistic categories we actually appear to hear in the stream of sounds. What we hear is very likely to be true due to the fact that we succeed in conveying the correct message most of the time, but the perceptual mechanisms cannot be squeezing these facts out of the sensory invariances. The invariances, the mentalists claim, are inside of the mind/brain of the speaker.

True, simplicity is an important cognitive principle in the sense that we seem to prefer simple explanations instead of complex ones, and that it is this ideal which, presumably, a large part of our science (and its success) depends on. One might claim that it is indeed raised to the status of a “fundamental principle” which could, when applied correctly, explain much of why we are so rational as we are. But finding the simplest explanation is a computationally intractable process (determining the Kolmogorov complexity of some string is altogether uncomputable), meaning that it is unlikely that, even if the process itself targets for simplicity, it is itself simple. On the contrary, it might be extremely complex, involving a large amount of background knowledge of the world such that, when the outcome of the process would be intuit by us as providing a simple or elegant account of the phenomenon we are investigating, hence agreeing with the world, the process itself draws from its rich databases. In other words, as put by Chater, “simplicity should be viewed as a goal of cognitive processing” (p. 294, emphasis mine), not as a property of the process itself.[6]

4 Learning, empiricism and rationalism

A somewhat common response to these difficulties would be to reply that the missing information must be attained by learning or inferring it from the proximal stimuli. According to this solution, a computational model that learns – transfers complexity from the environment to the system – provides a solution to the problem of complexity; according to the empiricist ideology, it is the solution to much of the entire epistemology and psychology.

Yet the situation is actually even worse in the case of learning where, as a consequence of these observations, one has been forced to rely upon a theory of “abduction” and “triggering” rather than induction or “invariance seeking.” If there are no invariances, there’s no learning: one cannot learn anything by observing noise. Assuming a simple learner with the task of finding invariances from the experience one is at once engaged into an endless formulation and verification of possible hypotheses, without any guarantee of converging into a solution. This method is structurally similar to the search process in chess: the computer is required to search through an astronomical set of combinations, trying to find some solution satisfying some formal criterion; in the case of learning, the learner is trying to formulate the best hypothesis that fits the data when there is often an astronomical number of them to consider.  One should not loose sight here of the fact that learning is a subtype of inference, whereas it is inference (reliable and effective fixation of new beliefs) that causes the computational systems to halt. Thus, ‘general learning algorithms’ are computationally complex, so that, according to Judd (1996), “the scaleup problem will not be solved without a deeper understanding of the issues involved in learning, and this entails the development of narrower definitions for the kinds of learning we want to achieve” (p. 140). Similarly, according to Minsky and Papert (1990), “certain problems” are not “solvable within the framework of any single, homogeneous machine” (p. 248). This is exactly the correct hypothesis, I believe, for the term “homogeneous” is just one subtype of what I have claimed to be Kolmogorov simple, viz. “same structure everywhere.” According to Minsky, what is required instead is a large collection of specialized agencies, a “society of mind” (Minsky, 1986; see also Fodor, 1983). But whenever we so specialize the system, or restrict learning to allow only “narrower definitions for the kinds of learning we want to achieve,” what is required is brute information. Domain specificy is not an emergent property, it must be intrinsic qua being complex property of the system itself.

The world surely do not appear to us to be filled with noise, but, instead, order and regularities. Is this a paradox? Chater argues that “[w]ithout the ability to find such patterns an agent might as well be in a random world: It would be able to predict, explain, and understand nothing; and it would have no basis on which to choose its actions” (Chater, 1999, p.273). Thus, in order for us to predict the consequences of our and other’s actions effectively and reliable seems to imply that there must be regularities and patterns in the “causal texture”[7] of the world.

It is true that there must be regularities in the world; the problem is that these regularities, if these speculations are correct, are not in the senses. If what the organism comes to believe is true, and if the organism is able to infer, most of the time at least, true conclusions from its premises, then that suffices to establish that there are patterns around us and that the world is not “random.” Yet this is different from the question of whether there are patterns in our senses, or how we come to have knowledge of such patterns since, obviously, “[h]umans have no direct access to the causal texture of the environment” (Bloom, 2000, p. 148). 

Consider again human language(s). If the properties of natural languages are, to a significant extend, determined by our genotype (DNA under the influence of environment), then what one needs to obtain a true belief about some rule of language is, not to have the relevant invariance to be bestowed upon our senses, but the same genes. In Fodor’s (2000) words, “all that’s needed to ensure that my innate beliefs about linguistic structure will allow me to learn the language that you speak is that you and I are conspecifics; and (hence) that you have the same innate beliefs about linguistic structure that I do” (p. 95). Thus, even if there are no invariances in the senses, there could perfectly be patterns in the world provided that we already knew them.[8]

What about “patterns” in the world that, as one might think, make our thinking possible? The outburst of the frame-problem in its various forms suggests that the matter is not altogether simple. The case of chess, I believe, could give us a useful hint on where to look. For although chess surely has a pattern in it, it is “buried deep” into the game. Much of the same could be the case with the real world, whose principles and laws our thinking is supposed to mimic successfully. The patterns of the world might be “deep” in the same sense. Humans are notoriously efficient in seeing what facts are relevant, given some goal; computers are notoriously ineffective. So here again it seems to be that, much like in chess, the patterns are simple to us insofar as we already knew them. That’s why, in order to get into a conversation with our friend, we do not need to stop to bother whether we still own a phone after consulting the phone book, or whether, when moving a plate, the potatoes will go along. These problems are simple for us, but extremely complex for a computer which tries to understand them with a minimal amount of information, so they might be simple for us due to the fact that we possess, contrary to computers, a head start in terms of information.

5 Conceptual tractability

Assuming that the human mind is truly complex, one becomes aware of the question of whether it is altogether conceptually intractable, as argued by Dreyfus (1972) and Fodor (1983, 2000). Anderson warned against allowing too much detail into one’s theory:

A theory must be sufficiently simple that it is conceptually tractable, so that one can understand it and reason about its implications . . . a theory which is so complex that it cannot be understood . . . is useless, regardless of whether there is any other adequate theory that is simpler . . . it is easy to become deluded about how complex a computer-based theory can become before it is intractable . . . The goal of a scientific theory is to reduce the complexity found in nature to a few general laws . . . If human behavior is not amenable to treatment in terms that are much simpler than the behavior, it is not really amenable to scientific analysis. (Anderson, 1976, p. 16-17.)

I fear Anderson, Dreyfus and Fodor might be right; if there are no simple principles, then we ought not to try to find them.

From a more positive side, consider the case of language where, as was argued in section three, perceptual organization is hopeless if understood as an algorithm for finding regularities in the stimulus. The computational problem of connecting stimuli to categorization might require a lot of information, but here we are thinking in terms of what linguists call “performance models:” the question of how linguistic categories are put into use. That is the central topic of investigation in other areas of psychology as well, and this is the area where complexity seems to induce impenetrable barriers for further progress. What linguists have succeeded in formulating are theories of linguistic competence, consisting only of the experience-transcending knowledge the speaker/hearers use when they engage into linguistic communication (Chomsky, 1965). The knowledge by itself need not be complex, for it is already abstracted from what seems to be too complex to allow useful generalizations, that is, abstracted from experience and language use.

Discussing the abstract level of competence is one way to avoid the threat of conceptual intractability, the other is in terms of ontogeny and phylogeny. The complexity, assuming that it is there, must originate from some source, given it cannot arise from simple principles, and it seems that both our phylogeny as well as ontogeny might provide useful explanations for the existence of such complexity. Consider our DNA, an active molecule that, with the help of natural laws and the environment, is both a product of a long evolutionary process and the “seed” from which the complexity of an organism develops. DNA is not totally simple, but neither is it totally random. It is not particularly complex in terms of Kolmogorov complexity either. How does it give raise to true complexity, which, by virtue of its definition, cannot be compressed to simple set of information?

Here it would be simply a mistake to suppose that the genotype would be a static “blueprint” of the whole organism (Gottlieb, 1998, Johnston, 1987, Johnston & Edwards, 2002, Oyama, 2000), since the DNA, taken in isolation, is nowhere near able to construct an organism; rather, it requires a carefully selected and very narrow intra- and extra-cellular environmental conditions to occur at every phase of the development in order to produce anything, meaning that these structures together, the context and the code, are truly complex. It is there, in the specific environmental conditions that the complexity of biological organisms dwells, and so if the organism is complex, it must have borrowed its complexity, not from the DNA molecule, but from the environment which make the DNA to do its work. This is consistent with the idea that, though there appears to be simple physical laws, the initial state of most systems is subject to considerable randomness and complexity due to the long evolution of the universe. We might be more a product of the history of our environment rather than the history of our genotype, but the question of how the evolution of physical universe brings such complexity about remains to be understood.

I believe there is a twist of irony in these remarks, if they indeed turn out to be true. It was the empiricists, not rationalists, who first proposed that the complexity of the human mind comes more or less from the environment, not from within. Maybe they were wrong, not in claiming that the complexity originates from the environment, but in that they claimed that it is transformed to the organism via a cognitive process of some kind, whereas the truer picture could be that the information transfer lies in our biology.

 


REFERENCES

            Anderson, J. A., & Rosenfeld, E. (1988). Neurocomputing: Foundations of research. Cambridge, MA.: MIT Press.

            Anderson, J. R. (1976). Language, memory and thought. New York: Lawrence Erlbaum Associates.

            Anderson, J. R. (1990). The adaptive character of thought. New Jersey: Lawrence Erlbaum Associates.

            Ashby, W. R. (1940). Adaptiveness and equilibrium. Journal of Mental Sciences, 86, 478-483.

            Baernstein, H. D., & Hull, C. L. (1931). A mechanical model of the conditioned reflex. Journal of General Psychology, 5, 99-106.

            Bennett, C. H. (1988). Logical depth and physical complexity. In R. Herken (Ed.), The Universal Turing Machine; A Half-Century Survey. Oxford: Oxford University Press.

            Bennett, G. K., & Ward, L. B. (1933). A model of the synthesis of conditioned reflexes. Americal Journal of Psychology, 45, 339-342.

            Bernstein, A., & Roberts, M. (1958). Computers versus chess player. Scientific American, 198, 96-105.

            Bever, T. G., Fodor, J. A., & Garrett, M. A. (1968). A formal limitation of associationism. In T. R. Dixon & D. L. Horton (Eds.), Verbal Behavior and General Behavior Theory. Englewood Cliffs, N. J.: Prentice-Hall.

            Bloom, P. (Ed.). (1993). Language acquisition: Core readings. Cambridge, MA.: MIT Press.

            Boden, M. A. (1981). Minds and mechanisms: philosophical psychology and computational models. Brighton: Harvester Press.

            Bradner, H. (1937). A new mechanical "learner". Journal of General Psychology, 17, 414-419.

            Chaitin, G. J. (1969). On the length of programs for computing finite binary sequences. Journal of Association of Computer Machinery, 13, 547-569.

            Chaitin, G. J. (1979). Toward a mathematical definition of 'life'. In R. D. Levine & M. Tribus (Eds.), The maximum entropy formalism. Cambridge, MA: MIT Press.

            Chater, N. (1996). Reconciling Simplicity and Likelihood Principles in Perceptual Organization. Psychological Review, 103(3), 566-581.

            Chomsky, N. (1957). Syntactic Structures. The Hague: Mouton.

            Chomsky, N. (1959). A Review of B. F. Skinner's Verbal Behavior. Language, 35(1), 26-58.

            Chomsky, N. (1965). Aspects of the theory of syntax. Cambridge, MA.: MIT Press.

            Chomsky, N. (1975). Reflections on Language. Great Britain: William Collins Sons & Co.

            Chomsky, N. (1980). Rules and Representations. Oxford: Basil Blackwell.

            Chomsky, N. (1986). Konwledge of language: Its nature, origin, and use. New York: Praeger.

            Chomsky, N. (2000). New Horizons in the Study of Language and Mind. Cambridge: Cambridge University Press.

            Chomsky, N., & Halle, M. (1968). The Sound Pattern of English. New York : Harper & Row.

            Chomsky, N., & Miller, G. (1958). Finite-state languages. Information and Control, 1, 91-112.

            Cowie, F. (1999). What's Within? New York: Oxford University Press.

            Craik, K. J. W. (1952). The Nature of Explanation. New York: Cambridge.

            Crain, S. (1991). Language acquisition in the absence of experience. Behavioral and Brain Sciences, 14, 597-650.

            Crozier, W. J. (1929). The study of living organisms. In C. Murchison (Ed.), The foundations of experimental psychology. Worcester: Clark University Press.

            de Groot, A. (1965). Thought and choice in chess. Mouton: The Hague.

            Dewey, J. (1896). The reflex arc concept in psychology. Psychological Review, 3, 357-370.

            Dreyfus, H. (1972). What computers can't do. New York: Harper & Row.

            Ellson, D. G. (1935). A mechanical synthesis of trial-and-error learning. Journal of General Psychology, 13, 212-218.

            Feigenbaum, E. A., & Feldman, J. (Eds.). (1963). Computers and thought (1 ed.). New York: McGraw-Hill.

            Fodor, J. A. (1975). The language of thought. New York: Crowell.

            Fodor, J. A. (1981). The Present Status of the Innateness Controversy. In J. A. Fodor (Ed.), Representations. Cambridge, MA: MIT Press.

            Fodor, J. A. (1983). The modularity of mind: An essay on faculty psychology. Cambridge, MA.: MIT Press.

            Fodor, J. A. (1985). Fodor's Guide to Mental Representations. Mind, Spring, 55-97.

            Fodor, J. A. (1998a). Concepts. Oxford: Claredon Press.

            Fodor, J. A. (1998b). Concepts: Where cognitive science went wrong.

            Fodor, J. A. (2000). The mind doesn't work that way. The scope and lmits of computational psychology. Cambridge, MA.: MIT Press.

            Fodor, J. A., & Pylyshyn, Z. (1988). Connectionism and cognitive architecture. A critical analysis. Cognition, 28, 3-71.

            Galanter, E., & Gerstenhaber, M. (1956). On thought: The Extrinsic theory. Psychological Review, 63, 218-227.

            Galanter, E., & Smith, W. A. S. (1958). Some experiments on a simple thought-problem. American Journal of Psychology, 71, 359-366.

            Gelernter, H. L. (1959). Realization of a geometry proving machine. Proceedings of the International Conference on Information Processing.

            Gelernter, H. L., & Rochester, N. (1958). Intelligent behavior in problem-solving machines. IBM Journal of Research and Development, 2, 336-345.

            Gold, E. M. (1967). Language identification in the limit. Information and Control, 10, 447-474.

            Gottlieb, G. (1998). Normally occurring environmental and behavioral influences on gene activity: From cental dogma to probabilistic epigenesis. Psyshoclogical Review, 105, 792-802.

            Hartmanis, J., & Stearns, R. E. (1965). On the computational complexity of algorithms. Transactions of th American Mathematical Socierty, 117(5), 285-306.

            Hebb, D. O. (1949). The organization of behavior: A neuropsychological theory. New York: John Wiley.

            Hopfield, J. J. (1982). Neural Networks and Physical Systems with Emergent Collective Computational Abilities. Proceedings of the National Academy of Sciences, 79, 2554-2558.

            Hull, C. L. (1937). Mind, mechanism, and adaptive behavior. Psychological Review, 44, 1-32.

            Hull, C. L. (1943). Principles of behavior. New York: Appleton-Century-Crofts.

            Jenkins, L. (2000). Biolinguistics: Exploring the Biology of Language. Cambridge: Cambridge University Press.

            Johnston, T. D. (1987). The persistence of dichotomies in the study of behavioral development. Developmental Review, 7, 149-182.

            Johnston, T. D., & Edwards, L. (2002). Genes, Interactions, and the Development of Behavior. Psychological Review, x, 26-34.

            Judd, J. S. (1990). Neural network design and the complexity of learning. Cambridge, MA: MIT Press.

            Judd, S. J. (1996). Complexity of learning. In P. Smolensky & M. C. Mozer & D. E. Rumelhart (Eds.), Mathematical perspectives on neural networks. New Jersey: Erlbaum.

            Kister, J., Stein, P., Ulam, S., Walden, W., & Wells, M. (1957). Experiments in chess. Journal of the Association for Computing Machinery, 4, 174-177.

            Kolen, J. F., & Goel, A. K. (1991). Learning in parallel distributed processing networks: Computational complexity and information content. IEEE Transactions on Systems, Man, and Cybernetics, 21(2), 359-367.

            Kolmogorov, A. N. (1965). Three approaches to the quantitative definition of information. Problems of Information Transmission, 1(1), 1-7.

            Kreuger, R. G., & Hull, C. L. (1931). An electrochemical parallel to the conditioned reflex. Journal of General Psychology, 5, 262-269.

            Li, M., & Vitányi, P. (1997). Introduction to Kolmogorov complexity and its applications. New York: Springer-Verlag.

            Longpré, L. (1992). Resource bounded Kolmogorov Complexity and Statistical Tests. In O. Watanabe (Ed.), Kolmogorov Complexity and Computational Complexity (pp. 66-84). Berlin: Springer-Verlag.

            Lotka, A. J. (1925). Elements of physical biology. Baltimore: Williams & Wilkins.

            Marr, D. (1982). Vision: A computational investigation into the human representation and processing of visual information. New York: W. H. Freeman.

            McCarthy, J., & Hayes, P. (1969). Some Philosophical Problems from the Standpoint of Artificial Intelligence. In B. Meltzer & D. Michie (Eds.), Machine Intelligence (Vol. 4). Edinburgh: Edinburgh University Press.

            McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysiology, 5, 115-133.

            Miller, G. A., & Chomsky, N. (1963). Finitary models of language users. In R. D. Luce & R. Bush & E. Galanter (Eds.), Handbook of mathematical psychology, vol. 2. New York: Wiley.

            Miller, G. E., Galanter, E., & Pribram, K. (1960). Plans and Structure of Behavior. New York: Holt.

            Minsky, M. (1954). Neural-Analog Networks and the Brain-Model Problem. Ph.D. Thesism Princeton University.

            Minsky, M. (1986). The society of mind. New York: Simon and Schuster.

            Minsky, M. L., & Papert, S. A. (1969). Perceptrons: An introduction to computational geometry. Cambridge, MA: MIT Press.

            Minsky, M. L., & Papert, S. A. (1990). Perceptrons. Cambridge: MIT Press.

            Neumann, J. v. (1958). The Computer and the Brain. New Haven: Yale University Press.

            Newell, A. (1955). The chess machine, an example of dealing with a complex task by adaptation. Proceedings of the Western Join Conference, Los Angeles, March, pp. 101-108.

            Newell, A., Shaw, J., & Simon, H. A. (1958a). Elements of a theory of human problem solving. Psychological Review, 65, 151-166.

            Newell, A., Shaw, J. C., & Simon, H. (1958b). Chess playing programs and the problem of complexity. IBM Journal of Research and Development, 2, 320-335.

            Newell, A., & Simon, H. (1976). Computer science as empirical inquiry: Symbols and search. Communications of the ACM, 19, 113-126.

            Newell, A., & Simon, H. A. (1961). Computer simulation of human thinking. Science, 134, 2011-2017.

            Newell, A., & Simon, H. A. (1963). GPS: A program that simulates human thought. In E. A. Feigenbaum & J. Feldman (Eds.), Computers and thought (pp. 279-293). New York: McGraw-Hill.

            Orponen, P., Ko, K.-I., Schöning, U., & Watanabe, O. (1994). Instance Complexity. Journal of the Association for Computing Machinery, 41(1), 96-121.

            Oyama, S. (2000). The ontogeny of information: Developmental systems and evolution. Durham, NC: Duke University Press.

            Pinker, S. (1979). Formal models of language learning. Cognition, 7, 217-283.

            Pinker, S. (1997). How the Mind Works. New York: W. W. Norton & Company.

            Pitt, M. A., Myung, I. J., & Zhang, S. (2002). Toward a Method of Selecting Among Computational Models of Cognition. Psychological Review, 109(3), 472-491.

            Pylyshyn, Z. (Ed.). (1986). The robot's dilemma: the frame problem in artificial intelligence. Norwood: Ablex.

            Rashevsky, N. (1938). Mathematical biophysics. Chicago: Univeristy of Chicago Press.

            Redding, N. J., Kowalczyk, A., & Downs, T. (1993). Constructive higher-order network algorithm that is polynomial time. Neural Networks, 6, 997-1010.

            Redington, M., & Chater, N. (1998). Connectionist and statistical approaches to language acquisition: A distributional perspective. Language and Cognitive Processes, 13(2&3), 129-191.

            Redington, M., Chater, N., & Finch, S. (1998). Distributional Information: A Powerful Cue for Acquiring Syntactic Categories. Cognitive Science, 22(4), 425-469.

            Rochester, N., Holland, J. H., Haibt, L. H., & Duda, W. L. (1956). Tests on a cell assembly theory of the action of the brain, using a large digital computer. IRE Transactions on Information Theory, 80-93.

            Rosenblatt, F. (1958). The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain. Psychological Review, 65, 386-408.

            Rosenblatt, F. (1962). Principles of neurodynamics. New York: Spartan Book.

            Rosenblueth, A., Norbert, W., & Bigelow, J. (1943). Behavior, purpose, and teleology. Philosophy of science, 10, 18-24.

            Ross, T. (1938). The synthesis of intelligence-its implications. Psychological Review, 45(185-189).

            Rumelhart, D. E., & McClelland, J. L. (1986). Parallel Distributed Processing: Explorations in the microstructure of cognition, vol 1. A Bradford book: Cambridge, MA.

            Sambrook, T., & Whiten, A. (1997). On the Nature of Complexity in Cognitive and Behavioural Science. Theory & Psychology, 7(2), 191-213.

            Samuel, A. L. (1959). Some studies in machine learning using the game of checkers. IBM Journal of Research and Development, 3, 210-229.

            Selfridge, O. G. (1959). Pandemonium: A paradigm for learning, Proceedings of the Symbosium on Mechanization of Thought Processes (pp. 513-526). London.

            Selz, O. (1922). Zur Psychologie des produktiven Denkens und des Irrtums. Bonn: Friedrich Cohen.

            Shannon, C. E. (1948). The mathematical theory of communication. Bell System Technical Journal, 27, 379-432, 623-656.

            Shannon, C. E. (1950a). Programming a computer for playing chess. Philosophical Magazine, 41, 256-275.

            Shannon, C. L. (1950b). A Chess-playing Machine. In J. R. Newman (Ed.), Wolrd of Mathematics (Vol. 1956). New York: Simon and Schuster.

            Solomonoff, R. J. (1964). A formal theory of inductive inference, part 1, part 2. Information and Control, 7, 1-22, 224-254.

            Stearns, R. E. (1988). Juris Hartmanis: The Beginning of Computational Complexity. Proceedings: Structure in Complexity Theory, Third Annual Conference, 128-134.

            Stephens, J. M. (1929). A mechanical explanation of the law of effect. American Journal of Psychology, 41, 422-431.

            Stockmeyer, L. J., & Chandra, A. K. (1979). Provably difficult combinatorial games. SIAM Journal of Computing, 8, 151-174.

            Tesauro, G. (1987). Scaling relationships in back-propoagation learning: Dependence on training set size. Complex Systems, 1, 367-372.

            Tesauro, G., & Janssens, R. (1988). Scaling relationships in back-propagation learning: Dependence on predicate order. Complex Systems, 2, 39-44.

            Tolman, E. C. (1939). Prediction of vicarious trial and error by means of the schematic sowbug. Psychological Review, 46, 318-336.

            Tolman, E. C., & Brunswik, E. (1935). The organism and the causal texture of the environment. Psychological Review, 42, 43-77.

            Troland, L. T. (1928). The Fundamentals of Human Motivation. New York: Hafner.

            Turing, A. M. (1936). On computable numbers, with an application to the Entscheidungsproblem. Proc. Lond. Math. Soc., 42(2), 230-265.

            Turing, A. M. (1948/1968). Intelligent Machinery. In C. R. Evans & A. D. J. Robertson (Eds.), Cybernetics: Key Papers. Manchester: University Park Press.

            Turing, A. M. (1950). Computer machinery and intelligence. Mind, 59, 433-460.

            Walton, A. (1930). Conditioning illustrated by an automatic mechanical device. American Journal of Psychology, 42, 110 f.

            Wiener, N. (1948). Cybernetics.: John Wiley.

            Wisdom, J. O. (1951). The hypothesis of cybernetics. British Journal for the Philosophy of Science, 2, 1-24.



[1] As pointed out by Chomsky, “In the case of language, the evidence for connectionist models is, for the moment, about zero. The most trivial problems that have been addressed – like learning a few hundred words – have been total failures” ([Chomsky, 1993, p. 86).

[2] This fact cannot be proved mathematically. It may occur that the finite instance we play our standard chess turns out to be trivial. On that occasion, it loses all of its interest, there being a simple but hidden strategy for winning the game. Chances are, I believe, that this will not happen.

[3]  Of course, this is just one way of stating the argument from “poverty of stimulus” (Chomsky, 1980, Chomsky, 1986).

[4] Fodor, 2000, p. 37.

[5] Though obviously not conclusive since there are psychologists who are still after such learning algorithms (see Cowie, 1999, Redington, Chater, & Finch, 1998, Redington & Chater, 1998 and Bloom, 1993, Crain, 1991, Gold, 1967, Jenkins, 2000, Pinker, 1979 for a conflicting and what I believe a more correct view).

[6] One particularly relevant example of simplicity as a goal of our cognition is computational modelling itself. It is clear that one must choose the model which both fits well to the data but does not fit too well to capture the effects of random variables (Pitt, Myung, & Zhang, 2002). In what sense does the task confronted by our senses or thinking differs from this?

First, it is evident that there is much noise in the perceptual channels; this is one of the essential points of the present discussion. That noise must be ignored, but while it must be ignored, it need not impose complexity to the system itself insofar as the computational systems knows beforehand what to ignore. For instance, the perceptual system might be tuned to detect some (simple) features, ignoring all the rest. The point is, rather, that what the cognitive system ends up believing as a consequence of detecting a feature or another in the sensory stream is not rationally, namely by means any kind of ‘inferential  logic,’ connected to the detected feature itself. The rationalists have thus speculated that the connection is abductive rather than inductive: the organism knows beforehand that a hypothesis h is verified by the occurrence of some data d, and infers h when d occurs. Another way to conceptualize this process is to say that the data d “triggers” the hypothesis h.

In the case of phonology, for instance, h consists of an unit drawn from an abstract system of phonemes, known to a significant extent innately, triggered by some sound feature or another. The complexity lies, rather, in the fact that, because of such facts as that the phonemes are smeared in the process articulation, a given physical sound feature might (correctly) trigger several such units and that vice versa, several physical sound features can be used to (correctly) trigger one such unit. The hearer must know these and other similar complex effects in order to understand what occurs in his or her ears. Furthermore, these processes are likely to be entirely specific: for each abstract entity one needs more or less (domain) specific code for the hearer to infer that it had occurred in the mind of the speaker.

[7] Tolman & Brunswik, 1935.

[8] This could not be the whole story, obviously, since there still seems to exist a connection between the distal stimulus and what one learns from it. “Why,” Fodor asks, “is it so often experiences of doorknobs and so rarely experience with whipped cream or giraffes, that leads one to lock to doorknobhood?” (Fodor, 1998a, p. 127). This is not a special problem if the relevant learning processes is implemented by inductive reasoning, but abandoning that theory as empirically suspicious runs directly into these problems. Thus, if the stimuli is complex, lacking simple invariances with respect to the knowledge the child attains, whether in the domain of language, concepts or knowledge, these cognitive structures cannot be learned from the environment but rather “the structure of the sensorium [senses] is such that certain inputs trigger the availability of certain concepts (Fodor, 1981, p. 273) or that “much of the detailed structure of the visual system is ‘wired in,’ though triggering experience is required to set the system in operation (Chomsky, 1975, p. 8). This “triggering theory” is, then, an alternative to the empiricist theory of learning, admittedly nowhere near to be complete (Fodor, 1998b), yet offering some idea of what could be happing if not learning in terms of seeking invariances. Experiences are necessary to the growing organisms in much the a way as nutrition is. Maybe the invariances that appear in the stimuli if considered in isolation are such that they suffice to trigger the availability of more abstract concepts and cognitive processes (based on genetic endowment)? This is the theory, in it’s bare outlines, that the rationalist conception of the mind has converged to.

Euroopan Unioni
EUROOPAN UNIONI