Back to Home page

If you are visiting this site after reading my paper in Critical Reviews in Neurobiology (2007, 19(2-3): 119-202), please, pay attention to pieces of text and illustrations links to which are provided below. These illustrations and the text have been added after publication of the article in response to readers questions and comments.

Example 1. Primitive gill control by a pacemaker neuron as an  example of simple NOCS.

A New Conceptual Understanding of the Brain



1. Introduction

2. The essence of the classical conceptual understanding of the brain

             2.1. The search for specific circuits

            2.2. Questions that are not properly answered by classical neuroscience

3. Biological neural networks perform computations

            3.1. The neural network computational principle

            3.2. Corollaries of the neural network computational principle

4. Functional building blocks of the brain

            4.1. Control of neuronal automatism by neural optimal control system

            4.2. Principles of interaction of various neural optimal control systems

                        4.2.1. Horizontal interaction of neural optimal control systems

                        4.2.2. Vertical interaction of neural optimal control systems

5. Mechanisms of normal brain-initiated behaviors: The physiology of neural optimal control systems

            5.1. Motor behaviors

                        5.1.1. Movement coordination in vertebrates: Understanding the cerebellum 

                        5.1.2. The skeletomotor cortico – basal ganglia – thalamocortical loop

             5.2. Non-motor behaviors: Further generalization of motor automatisms

6. Mechanisms of pathological brain-initiated behaviors: The pathophysiology of neural optimal control systems

             6.1. Basic mechanisms of malfunctions in a neural optimal control system: General considerations

            6.2. An example of the malfunction of an error distribution system within a neural optimal control system: Parkinson’s disease

                        6.2.1. Mechanisms of anti-parkinsonian treatments

            6.3. Other examples of clinical applications  

7. Conclusion

            7.1. Understanding structure through function

            7.2. What is next? The future of system neuroscience




Key Words: neural computation, neural network, neural optimal control system, central pattern generator, reflex, motor cortex, basal ganglia, dopaminergic neurons, Parkinson’s disease, deep brain stimulation, functional neurosurgical procedures

Abbreviations: CO - controlled object; CPG - central pattern generator, CS - conditioned stimulus, DBS - deep brain stimulation, DSCT - dorsal spinocerebellar tract, GPe - external segment of globus pallidus, GPi - internal segment of globus pallidus, MC - motor cortex, MPTP - 1-methyl-4-phenyl-1,2,5,6-tetrahydropyridine, NOCS - neural optimal control system, OCD - obsessive compulsive disorder, PD - Parkinson’s disease, PM - premotor area, SMA - supplementary motor area, SNc - substantia nigra pars compacta, SNr - substantia nigra pars reticulata, SOCT - spinoolivocerebellar tract, SRCT - spinoreticulocerebellar tract, STN - subthalamic nucleus, US - unconditioned stimulus, Vim - ventralis intermediate nucleus, Voa - ventral oral anterior nucleus, Vop - ventral oral posterior nucleus, VSCT - ventral spinocerebellar tract.


1. Introduction


Behavioral phenomena of the brain range from simple reflexes to complex mind processes like cognition, perception, feeling, self-awareness, will, thinking, and reasoning. The overwhelming majority of research studies and explanations of brain behavioral phenomena, both simple and complex, are based on the classical traditional understanding of the brain.

In this article, the terms classical and traditional are used to indicate the generally accepted views of the brain that can be found in any neuroscience textbook. These views of the brain are first acquired at universities when students study brain-related sciences. Brain anatomy and neurophysiology are a starting point for classical traditional brain views. This acquired knowledge is usually never questioned later, because in one form or another, directly or indirectly, the classical understanding of the brain is reiterated in other brain-related fields – behavioral neuroscience, psychology, psychiatry, neurology, molecular and cellular neurobiology, etc., - and in the research conducted in these fields. This reiteration works like positive reinforcement and leaves no doubt about the correctness of classical views. The overwhelming majority of the neuroscience community believes that any future progress in brain understanding should be built on top of this classical foundation.

This article is about these classical views dominating the neurosciences, what is wrong with them and how these dominating views should be changed to accommodate relatively recent discoveries in the neurosciences. Two major discoveries led to this change in the understanding of the brain:

1.      The discovery of the neural network computational principle with the development of neurocomputing.

2.      The discovery of the generic functional organization of hierarchical neural systems that control automatic inborn motor behaviors in animals.

The first happened in the 1980s and the second in the late 1980s and early 1990s, i.e., quite a while ago.

The new understanding of the brain is the result of a theoretical generalization of these two discoveries. A science cannot prosper without theoretical generalizations, a statement that applies to system neuroscience as well. Did profound theoretical generalizations exist in the case of classical system neuroscience? If one reads a textbook of neuroscience, he or she will easily come to the conclusion that there are few theoretical generalizations; classical system neuroscience had not yet become a science with a strong theoretical foundation. There is plenty to remember, however, if one is required to pass a test or exam.    

Both of the above discoveries should be categorized as technogenic in that they bring technical notions to neuroscience. To completely understand and actively use these discoveries in neuroscience will require a deep knowledge of mathematics, neurocomputing, control theory, and numerous adjacent technical disciplines (see Conclusion). It will mean a real multidisciplinary approach at work. Presently, the term multidisciplinary is most often just a buzzword. The term is well known to all scientists working in the field of neuroscience. It pleases everybody as long as it does not force someone to invest a lot of time in studying another discipline. It brings respect, because it implies an enormous complexity of the field. However, when it comes to real multidisciplinary approaches that some scientists, who have invested a lot of time in studying other disciplines, are using to explain brain phenomena, the situation is different. Scientists who are trying to bridge the gap between different sciences are most often finding themselves in the gap between those sciences.   

The majority of neuroscience community are biologists or physicians who mostly do not have a technical background. This situation cannot be changed quickly. Much needed educational changes, especially for those who want to study the mechanisms of normal and pathological complex brain-controlled behaviors, will require a lot of time, resources, and efforts (see Conclusion). The only thing that can be changed quickly is the conceptual understanding of the brain. Switching to the new conceptual understanding can be achieved relatively easily. It does not require a deep knowledge of mathematics, neurocomputing, or control theory. The importance of conceptual views of the brain is hard to overestimate, because they ultimately determine our understanding of the relationship between structure and function and, consequently, problem formulation, and how we conduct research, both theoretical and experimental, including our strategies of analysis and synthesis.

It is obvious that there will be no serious theory of the brain without an adequate understanding of the structure – function relationship in the brain. And neuroscience, as any other science, cannot prosper without a good theory. In the absence of a unifying theory, the neurosciences’ multidisciplinary nature and a broad arsenal of powerful computerized tools used in both experimental and theoretical research have created quite an unusual situation. In the majority of research studies, an accent is made on applying these tools to collecting new data as if the data itself, if collected in sufficient quantities, can bring new understanding. Neuroscience developed a very specific research culture where a search for new phenomena and new details became a primary goal. An enormous number of publications in neuroscience attest to this approach. The number of publications exceeds an individual’s capacity to absorb them. That is why each scientist conducts research in his or her own niche in neuroscience without a plan to integrate the acquired knowledge into a bigger picture. It is a defensive mechanism against an overwhelming amount of information in the absence of a unifying theory or at least adequate conceptual understanding of the brain.

Graham Hoyle predicted this situation more than 20 years ago. He wrote: “Neuroscience came to be the art of the do-able, with expediency ruling the day, rather than a soundly based intellectual domain. Three generations of neuroscientists have now been trained without any link to a widely accepted general theory of neural circuit function and neural integration. They have been given to believe that they are engaged in a massive fact-finding operation guided only by the relative softness of the seams in the body unknown that happened to face their individual picks! Science without larger questions provides a dismal prospect to a truly inquiring mind. Of course, to those who would make careers out of providing random facts, nothing could be nicer, so varied and so complex are nervous systems. There is enough material to occupy armies of such persons for centuries. But without some strong delineations neuroscience will continue to explode into myriad fragments. We shall end up with masses of descriptive minutiae of many nervous systems without advancing our overall understanding of how they do the job for which they evolved.” (Hoyle, 1984, p. 379). These words are the cry of the heart, and it is virtually impossible to better describe the need for a new conceptual understanding of the brain and subsequently a unifying brain theory.

Conceptual views glue together different neuroscience disciplines in order to explain brain functions and show the place of each discipline in the global picture. The change in conceptual understanding can be exciting and scary at the same time. It is exciting because it will bring the formulation and reformulation of new problems and, probably, solutions to the old ones. Both basic and applied brain sciences will benefit from this new conceptual understanding. For example, basic neuroscience may finally get a theory capable of explaining complex brain phenomena, while medical sciences like psychiatry and neurology may gain the ability to explain mechanisms of brain pathological symptoms. In the technical field, a more adequate understanding of the brain could help to create real brain-like intelligent machines that would inevitably change the course of human civilization. It is a scary prospect for some, because this new conceptual understanding may reveal the inadequacy of many theoretical and experimental approaches based on classical views. The benefit to neuroscience, however, is that it will become a science capable of asking the correct questions and obtaining answers to them.

This article was written with a single purpose – to facilitate the understanding of the new brain concept. The major focus of the article is on a systematic description of the experimental facts and ideas underlying the new brain concept. A systematic description is crucial because the new brain concept cannot be reduced to one experimental fact or idea and requires numerous logical derivations. There is no systematic description of the new brain concept in the scientific literature, although some pieces of it can be found. This article’s structure is built in such a way to attempt to gradually overcome the major psychological barriers to the acceptance of these new ideas. It is written for everyone who is interested in a new conceptual understanding of brain-initiated behavioral phenomena, both normal and pathological. No knowledge of neurocomputing, mathematics, or control theory is required to understand the article. The only requirements are motivation, an open mind, asking correct questions, utilizing common sense, and applying straightforward logical derivations. This article will be the most useful to students and young scientists who want to dedicate their lives to studying mechanisms of complex normal and pathological brain-initiated behavioral phenomena, and still have time to enrich their educational background with the knowledge of mathematics, neurocomputing, control theory, and other related disciplines.

This article does not comprehensively cite previous work for two major reasons. First, the conceptual description makes it impossible, because it implies that details are omitted. Second, citation of research that should be considered inadequate in the framework of new conceptual ideas is intentionally avoided for the most part for obvious ethical reasons. After reading this article, one could easily find numerous examples of such inadequacies in the scientific literature. Nevertheless, all the information needed to grasp the described ideas is included here.

Any science, and neuroscience is not an exception, utilizes a reductionistic approach in one form or another. Conceptual views define how we conduct reduction, analysis, and synthesis, and hence define the outcome of the reductionistic approach. Numerous experimental and theoretical data are synthesized here to demonstrate the difference between the classical and the proposed conceptual views.

As a starting point, we could consider the brain as a controlling device created by alien technology, and we are trying to understand how it works. In humans, the brain consists of more than 100 billion neurons. Most of the neurons have dendritic trees with thousands of synapses on them from other neurons. In turn, each neuron sends signals to tens, hundreds, or even thousands of other neurons. Engineers like the statement that all neurons in the brain are interconnected. It is probably true, if the term interconnected means any finite number of synaptic links between any given pair of neurons. Anyhow, it is obvious that applying a reductionistic approach to such a complex system as the brain requires dividing it into smaller parts to separately analyze the function of each part.

The analysis is mainly limited to motor control in animals for obvious reasons. Motor control is one of the most studied types of brain-controlled behaviors. Motor reflexes and centrally programmed movements are the foundation of any textbook of neuroscience. They are used to demonstrate how the brain controls motor behaviors. Central pattern generators (CPGs) for locomotion, scratching, breathing, etc. are typical examples of program control in animals. The term itself reflects the capability of these controlling centers to generate corresponding motor patterns in the absence of peripheral afferent feedback. Simple tonic commands activate CPGs. Classical studies of reflexes and CPGs were directed towards revealing specific neural circuits responsible for reflex responses and rhythm generation. Other fields of system neuroscience are using the same research strategies. It is understandable and generally accepted, even at the intuitive level, that basic principles of neural control should be similar in various parts of the brain. Any progress made in understanding motor control is applicable to other brain functions.

2. The essence of the classical conceptual understanding of the brain


2.1. The search for specific circuits


According to classical views, a function of any biological neural network must be explained in terms of neuronal interconnections, interactive nerve cells, synaptic, and cellular properties, including intracellular molecular mechanisms. The essence of the classical approach consists in finding within the brain’s neural network specific neural circuits that control corresponding forms of behavior. Analysis and synthesis are also rather straightforward within these classical views: An analysis is directed toward obtaining the information about the circuit, and a synthesis should “recreate” the system – the model must mimic the behavior of the system. Sometimes this approach is defined as mechanistic (see for example, Getting, 1986). But most often it is just used without any definition. In simple terms, the idea is that overlapping two sets of knowledge, that of specific circuit structure and the activity of its elements during function, should produce the desired understanding of the control mechanisms.

The foundation of this approach is laid during school years. It is a particular case of understanding the cause and effect relationship. This reductionistic approach works well for simple nonbiological systems. It is the approach we would use to reverse engineer rather simple electrical controlling devices, for example, a thermostat or refrigerator. To understand how they work, we would study properties of their elements, connections between them, and their behavior (activity) during system operation. The resultant understanding would emerge as electrical schematics and signal diagrams of these devices.

Several typical results (see, for example, Kandel et al., 2000 or Shepherd, 2003) related to the application of this approach to study the brain are summarized later in this section.

Typically, when a reflex is described, the shortest circuit is chosen. Longer circuits are either ignored or considered as performing reflex control and modulation, mechanisms that were never completely clarified within the framework of classical views. Synthesized knowledge of a specific reflex arc is usually shown in the form of a diagram where a specific group of afferent fibers or neurons is represented as one element (Fig. 1a). It is necessary to note that the fact that each afferent fiber or a neuron of a preceding layer makes synaptic connections with almost all the neurons of the next layer (Fig. 1b), the principle of convergence and divergence, is usually either ignored or is mentioned without any meaningful interpretation. The existence of vast, almost total convergence and divergence was experimentally proven for certain neuronal connections. For example, a single motoneuron receives projections from all or almost all of the spindles of the muscles it innervates. Conversely, each Ia afferent fiber sends its terminals to all of the motoneurons supplying the muscle of origin (Mendell and Henneman, 1971).

Fig. 1

Fig. 1. Disynaptic reflex arc.

a – classical diagram. b – actual converging and diverging connections between neuronal layers.


Clearly, the network in Fig. 1b cannot be reduced to the one in Fig. 1a. However, within the framework of the classical view it is not evident why. Section 3 will explain this discrepancy. The reason why neuroscientists have not paid a lot of attention to the principle of convergence and divergence in classical neuroscience is clear. The idea of a reflex arc is actually an idea of a path. Within this framework, numerous pathways created at the expense of convergence and divergence are considered simply as parallel pathways providing redundancy and, hence, reliability. Dendrites are also just a part of the path. Connections between different reflex arcs, or connections of reflex arc elements with a surrounding network, or the fact that the same neuronal elements can belong to different reflex arcs are simply interpreted as the basis for reflex interaction. For instance, various reflexes can be in agonistic, antagonistic, or indifferent relationships. The first two are observed if there are excitatory and inhibitory relationships between reflexes, correspondingly. Quite often the idea of a final common pathway is mentioned in this context. This idea was initially proposed by Sherrington (1947). According to it, different reflexes are fighting for the final common pathway.

Results of CPG studies also demonstrate limitations in the classical approach. Let us consider CPGs for locomotion, for example. Only interneurons participating in rhythm generation have been considered as being part of a CPG, but why this is so is unclear. Complete mapping of connections between CPG neurons has only been achieved in very simple invertebrates where CPGs contained just a few neurons. It has been found that there is no single plan of generator construction, either in invertebrates or vertebrates. In invertebrates, some generators rely on pacemaker neurons to produce rhythm, while others utilize network mechanisms (Getting, 1986). In vertebrates, as could be expected, much less detailed information about generator circuitry has been obtained, especially in higher vertebrates. In the latter, neural networks appeared to be too complex for this approach. The resultant knowledge of a CPG is usually presented in the form of generalized schematics of interacting groups of specific neurons.

The first generator hypothesis was proposed by Brown in 1914 (Brown, 1914). According to this hypothesis, the locomotor generator consists of two mutually inhibiting flexor and extensor half-centers (Fig. 2a). All other vertebrate generator hypotheses introduced afterward contained mutual inhibition (symmetrical or asymmetrical) as one of their crucial components (for example, Fig. 2b). These schematics were often used as a foundation for computer simulations demonstrating how the corresponding rhythm is produced. However, these models cannot properly address issues such as the complexity of efferent programs generated by CPGs in higher vertebrates, for example, by CPGs for locomotion or scratching. These programs cannot be reduced to a simple alternation of flexors and extensors. Similar to descriptions of reflexes, the principle of convergence and divergence has been ignored in descriptions of CPGs.

Fig. 2

Fig. 2. Conceptual models of central pattern generators.

a - Brown’s half-center hypothesis of locomotor generator controlling limb movements in vertebrates. F, E – flexor and extensor half-centers. Mf, Me – flexor and extensor motoneurons. b - conceptual model of segmental network generating locomotor rhythm in lamprey. Neuron symbols denote populations of neurons rather than single cells. All the spinal network neurons depicted within the box are excited by reticulospinal tonic brainstem neurons. The excitatory interneurons (E) excite all types of spinal neurons within the box. The inhibitory  interneurons (I) whose axons cross the midline inhibit all the neurons of the contralateral box. Lateral interneurons (L) inhibit I interneurons. Motoneurons (M) send signals to the segmental muscles.


The same types of neurons can be parts of reflex arcs and various CPGs -- for example, CPGs for scratching, locomotion, shaking, etc. -- and this sharing of common elements is considered to be the foundation for the interaction of programs and reflexes. According to classical views, a CPG produces a rough program that is adjusted to environmental conditions by reflexes. During this interaction, reflexes become phase dependent. Based on the classical concept, one could conclude that each animal species has its own specific set of reflexes and program controls, and that the goal of neuroscience is to study them. In addition, we need to study the phase dependency of those reflexes. Taking into account the variety of movements, this goal seems impossible to achieve, especially in higher vertebrates. This is an actual blueprint of how classical conceptual approach has been used to bridge the gap between neural processes and behavior in the case of reflexes and program controls.

The application of classical strategy to explain functions of the highest brain levels seems especially questionable. How to apply the strategy of selecting specific circuits to the highest brain levels? Should we choose a specific circuit for each specific behavor? In neuroscience textbooks (Kandel et al., 2000; Shepherd, 2003), the description of highest brain functions based on classical conceptual brain understanding is usually limited to function localization and the description of the direction of information flow: Command originates in point A and is conveyed to point B, or point A neurons excite point B neurons. This is how voluntary movements, for example, are explained, where the motor cortex is point A and the spinal cord point B. Even a superficial analysis shows that these ideas do not sufficiently explain complex brain behavioral phenomena. It must be pointed out that numerous neural circuit diagrams were created for various highest brain parts. Examples of such diagrams for the cerebellum and various areas of the cerebral cortex are plentiful in textbooks and scientific publications. They are usually known as microcircuits and are broadly used for computer simulation purposes to reveal what those circuits are performing. The microcircuits are, however, generated the same way the reflex arcs, CPG diagrams, and other serial circuit models are created: A specific group of neurons is shown as one element. Such generalization leads to very serious and widely spread misconceptions about neuronal interconnections. For example, based on the diagram shown in Fig. 3, one can conclude that the GPe neuron receives excitatory feedback from the same STN neuron to which it sends inhibitory connection. However, it has never been shown experimentally that such connections really exist. It is more probable that excitatory recurrent connections originated by one STN neuron are distributed among a subgroup of GPe neurons that does not necessarily include the GPe neuron inhibiting this STN neuron. Obviously, such misconceptions can be the source of numerous erroneous conclusions about the functions of the underlying neural networks. The correctness of replacing a group of neurons with one element is also discussed in Section 3.

Fig. 3

Fig. 3. Accepted serial circuit model of the skeletomotor cortico – basal ganglia – thalamocortical loop.

Excitatory and inhibitory synapses are shown by white and black circles, respectively. GPe - external segment of globus pallidus. GPi - internal segment of globus pallidus. PPN - pedunculopontine nucleus. SNc - substantia nigra pars compacta. SNr - substantia nigra pars reticulata. STN - subthalamic nucleus. TN – thalamic nuclei.

Attempts to explain behaviors based on the activity of neuronal populations are plentiful in classical neuroscience. Currently, there are quite widespread experimental approaches that analyze that activity of neuronal populations, for example, in the motor cortex during various motor behaviors to explain how neural ensembles encode certain behavioral parameters (Georgopoulos et al., 1988; Nicolelis and Ribeiro, 2002; Lebedev et al., 2005). In such experiments, single or multielectrode extracellular recording is performed from neurons located in brain areas of interest. Neither neuronal interconnections nor their types are taken into account during the analysis that includes various mathematical operations (one of these operations is usually calculating the mean) over activity of registered neurons. The results of such analysis are then correlated with various motor parameters. The found correlations, however, do not explain how the activity of the ensemble is actually translated into the behavior. The limitations of such approaches will be also discussed in Section 3.

The pathophysiology of neural networks based on classical views is actually non-existent for obvious reasons. How is it possible to explain pathological behavior if we do not know how to explain normal behavior? At the same time, neurology and psychiatry have accumulated a plethora of data about symptoms specific to various brain disorders. These data are very well classified and are the foundation for differential diagnosis. We may know the cause of a disease, but do not know mechanisms of its symptoms or how some symptom-alleviating procedures work (see Section 6). Neither psychiatry nor neurology have adequate explanations for the mechanisms of complex brain pathological phenomena like delusion, bradykinesia, tremor, akinesia, and speech impairments. Parkinson’s disease (PD) is a typical example of this situation. There is a push-pull theory of Parkinson’s disease (Wichmann et al., 2000) that is based on classical views. The theory is rather popular among neurologists and neurosurgeons but much less accepted by basic scientists who study the physiology of the basal ganglia. The theory is based on a very simple idea. Experiments have found that compared with normal animals, MPTP[1]-treated primates have higher discharge rates in the neurons in the subthalamic nucleus (STN) and globus pallidus pars interna (Gpi) (see Fig. 3). The serial circuit model of the basal ganglia suggested that the loss of striatal dopamine decreases the activity of inhibitory striatal neurons projecting directly to the GPi and increases the activity of inhibitory striatal neurons projecting to the external segment of the globus pallidus (Gpe). The increased inhibition of the Gpe permits more activity in the STN.

Based on this theory of PD, it was assumed that treatment should consist of restoring balance between the direct and indirect pathways from the striatum to Gpi and substantia nigra pars reticulata (SNr) by pharmacological or surgical treatments.  In recent years, however, some scientists have begun to consider this theory inadequate and limited (Levy et al., 1997; Parent and Cicchetti, 1998; Obeso et al., 2000).

The push-pull theory does not explain the function of the basal ganglia: The authors of this theory state that “at present the physiologic functions of the basal ganglia remain unknown” (Wichmann et al., 2000). It also fails to explain how this imbalance in firing rates leads to parkinsonian motor symptoms. Probably for this reason, the authors modestly call their push-pull theory “a model of the pathologic mechanisms underlying movement disorders of basal ganglia origin”. In addition, the theory cannot explain the experimental findings of lesion studies of the basal ganglia in normal animals. Typically, such lesions have no or short-lived effects on skilled fine movements, or they evoke only mild bradykinesia (Wichmann et al., 2000). These lesions studies, however, can easily be explained by the holographic properties of biological neural networks (Section 3).

Even simpler classical explanations of mechanisms of neurological symptoms or their alleviation using certain modern techniques like deep brain stimulation (DBS) can be found in the scientific literature (see Section 6). Several typical examples of these explanations are applications of DBS to treat certain forms of depression, obsessive compulsive disorder, and Tourette syndrome. In the case of depression, the positive effect of DBS is usually ascribed to changes in activity (excitation or inhibition) of the brain areas that are mono- or oligosynaptically connected to the stimulated area (see, for example, Mayberg et al., 2005). However, the mechanisms of these diseases and their treatment by DBS are most often described as unknown.

Other examples of attempts to apply classical views to explain pathological behaviors include tremor and seizure activity. In the case of tremor, attempts have been made to find a CPG that produces rhythm. In the case of seizure activity, epileptic seizures have been considered to be a result of the activity of a seizure generator. Obviously, these approaches are similar to the CPG problem that was described above. Alternative explanations of these phenomena are given in Section 6. There have also been attempts to explain repetitive pathological behaviors based on feedbacks, usually very numerous, of the involved systems.

As we see, explanations of mechanisms of brain pathological symptoms carry an imprint of the classical conceptual understanding of the brain. Their essence consists in an attempt to find specific neural circuits responsible for pathological behavior.

One can make a logical conclusion that based on classical views we do not know how the brain works.[2]


2.2. Questions that are not properly answered by classical neuroscience


It must be emphasized that the conclusions about mechanisms of reflexes, program controls, and other forms of brain-initiated behaviors in classical neuroscience were made based on observations of the activity of specific neurons during the execution of corresponding behaviors. Certain limitations of the microelectrode recording technique that was one of the important foundations for the classical views should be mentioned in connection with this. Microelectrode recording technique, especially extracellular recording, works better with larger neurons; therefore, smaller neurons are underrepresented. For example, based on microelectrode recording, the CPG for locomotion in vertebrates is generally accepted to be located in the intermediate zone and the ventral horn of the spinal cord. However, non-microelectrode approaches have demonstrated that the leading role in rhythm generation belongs to dorsal horn neurons (Baev and Chub, 1989; Chub and Baev, 1991). But the most important disadvantage of the microelectrode technique is that it is not capable of providing us with information about dendritic processes during behavior. In other words, dendritic processes during behavior remain unobservable or incompletely observable. At the same time, most of the information processing, especially in vertebrates, takes place at the level of the dendrites where the predominant number of synapses are located.

The classical brain concept creates an illusion of understanding that may appear intuitively correct. But is it a true understanding? A rather simple and straightforward understanding of the relationship between structure and function that is applicable to very simple systems lies at its core. As we shall see in Sections 3 and 4, only by acquiring some special notions from neurocomputing and control theory we can change this understanding of the structure – function relationship in the brain. For a curious mind, these classical views create more questions than answers. Several major questions are listed below:

1.      Is it correct to search for specific neural circuits within the brain’s mega-network to explain behaviors?

2.      Why can the same neurons be a part of different specific circuits?

3.      What is the role of divergence and convergence, and what is the purpose of parallel circuits?

4.      What is the role of dendritic trees?

5.      Is it adequate to use diagrams in which a functionally similar group of neurons is represented as one element for the purpose of explaining functions or computer simulations?

6.      Why are biological neural networks built the way they are? Classical neuroscience does not address this most fundamental question. There is not even room for this question within the framework of classical conceptual brain views.


The answers that have been provided by classical neuroscience are either non-existent or are an unjustifiable oversimplification. More adequate answers to these questions are given in Sections 3 and 4.


3. Biological neural networks perform computations


3.1. The neural network computational principle


The understanding of what biological neural networks do came from neurocomputing in the 1980s: They perform computations, and the converging and diverging connections within a network play a crucial role in these computations. Computation is a general term for any type of information processing. To understand how biological neural networks perform computations, let us consider the following mathematical construction, the so-called three-layer Kolmogorov’s unidirectional neural network (Fig. 4). The processing elements (n) of the first layer are fanout units that distribute input signals, the input vector components (x), to the processing elements of the second hidden layer. The processing elements of the hidden layer neither directly receive inputs from nor provide direct outputs to the external world. The transfer function (the rule that transforms input signals into output signals) of these units is similar to a linear-weighted sum. The output processing elements (m) of the third layer send signals to the external world, output vector (y). The transfer function of output units is highly nonlinear. A theorem was proven by Kolmogorov that such a three-layer neural network can implement any continuous mapping function, if its synaptic weights are adjusted properly (Kolmogorov, 1957). The history of this theorem is described in the article by Kurkova (1995). It is necessary to note that from a mathematical standpoint, numerous motor control functions are piecewise continuous. The analogy between this artificial construction and biological neural networks is clear. For example, input, hidden layer, and output elements correspond to sensory neurons, interneurons, and motoneurons, respectively. This network performs computations on input signals and generates corresponding motor control output.

Fig. 4           

Fig. 4. Architecture of Kolmogorov’s neural network. See text for further explanations.


The number of layers in a neural network may be greater than three. In neurocomputing, numerous network architectures, corresponding theorems, and learning algorithms have been described that allow a neural network to implement any function of practical need by adjusting the network’s synaptic weights (see, for example, Hecht-Nielsen, 1990; Munakata, 1998). Backpropagation neural networks are one of the most popular architectures currently employed in neurocomputing. They received their name because of the special backpropagation[3] procedure that is utilized for learning. 

Biological neural networks are usually multilayered and multidirectional. They have numerous negative and positive feedback loops. Biological neurons are far more complex than artificial ones. Most brain neurons have an immense dendritic tree whose size significantly surpasses the size of the neuron itself. Remember, for example, Purkinje cells. From a computational perspective, the role of a dendritic tree becomes clear. They significantly increase the computational capabilities of a biological neuron. A single dendrite together with all its synapses can be considered as a network. Therefore, one neuron with its dendrites is analogous to a complex network.

The question emerges: How do these properties influence the computational power of biological neural networks? It is obvious that the inclusion of more complex neuronal elements in a network can significantly simplify the schematic solutions and decrease the size of the whole network. A pacemaker neuron can simplify the problem of rhythm generation, and a neuron with bistable membrane properties simplifies building circuitries that possess trigger properties. The feedback loops increase the calculating abilities of the network because recursion becomes possible. In general, the process of recursion consists in defining the value of a function by using other values of the same function. This procedure is important when it is impossible to move the controlled object to a desired state during one control step. Recursive computation can continue until a desired result that satisfies a particular calculation criterion is reached. From a mathematical standpoint, the class of recursive functions has maximal functional power. It will be shown in Section 4 that the presence of feedback loops in real neural networks can be also interpreted as a necessary condition for improving afferent information processing. They provide a substrate on which to build an internal model of controlled object behavior (also known as forward model). Such a model plays a crucial role in the processing of afferent information to determine the current state of a controlled object. Consequently, from a computational perspective, real neural networks have more functional power than the unidirectional artificial neural networks described above. The existence of complex life forms with advanced nervous systems can also be considered evidence of the highly sophisticated computational capabilities of biological neural networks. Through evolution, nature has produced numerous network architectures that are yet to be discovered.

Neural networks have several unique features. First, they are multifunctional by nature; multiple functions can be stored within one network, and the quality of the approximations performed by each function degrades slowly as the number of stored functions increases. Second, they possess holographic properties: any network can continue to function after partial loss of its neuronal elements. Third, they can have multiple stable states that are called attractors. Neural networks, most often recurrently connected, tend to these stable states. Numerous studies have shown that there can be different attractors – point, line, ring, plane, cyclic, chaotic, etc. These attractors received their names based on their shape in the network’s state space (see Section 4.1). Attractor networks are the result of the neural network’s ability to implement any nonlinear dynamical system. The capability for stable, persistent activity is an important feature of biological neural networks. For example, cyclic and point attractors can be the foundation for CPGs and memory, respectively. Attractors of various shapes can also be the foundation for complex sensory detectors.

The conceptualization of learning in biological neural networks becomes very straightforward based on computational views. Any process of learning should appropriately change neural network parameters - synaptic weights or properties of neuronal elements, or both. As a result, the network becomes capable of computing new functions.

Over the past two decades, computational neuroscience has become accepted by neurobiology. Scientists working in motor and visual neurobiology are already utilizing computational views. However, it has not yet become a full-fledged part of neurobiology. The notion of computation still remains broadly ignored or misunderstood. This problem will be addressed in the next Section and in Section 4.


3.2. Corollaries of the neural network computational principle


The neural network computational principle has numerous corollaries. The most important of these corollaries are mentioned below.

            a. Selecting a specific circuit from a biological neural network, an approach used by classical neuroscience to explain behaviors like reflexes, program controls, etc., is an unjustifiable oversimplification that does not provide an adequate explanation of the corresponding behavior. The network computational principle is missing in these circuits. A multilayer computing network cannot be reduced to a serial chain of elements. The serial circuit diagram serves only one useful purpose – it shows the direction of information flow in the actual neural network.

            b. Computational models based on classical serial circuit diagrams are also inadequate. Such models are the result of a fashion. When computational neuroscience started to gain popularity, some neuroscientists decided to join the trend without having a deep understanding of what network computation means. Computer simulations based on serial circuits like the ones shown in Figs. 2 and 3 do not account for network computational principle (for example, Ullström et al., 1998; Hill et al., 2003; Contreras-Vidal and Stelmach, 1995). Formally, serial circuits also perform computations, but they are much simpler than a network’s computations. Such models just mimic the experimental correlations between the activity of various network elements and the parameters of the specific behavior without explaining the function that is computed by the corresponding network (see Conclusion). It is worth mentioning that the term computational neuroscience is sometimes misinterpreted. For some scientists, the mere fact that computers perform computations in such models based on serial circuits means that it is computational neuroscience. Another example of such misunderstanding of the term computation is the application of various mathematical operations (averaging, extracting directional components from neuronal activity, etc.) to the neuronal activity of a population of neurons registered during a specific behavior (see Section 2). There are usually several synaptic transmissions between the neurons of this population and the motor output. The correlations between behavioral parameters and parameters of neuronal activity revealed by such studies also do not explain computations performed by the corresponding networks. These issues will be also discussed in Section 7.

            c. The distribution of synaptic weights across any specific network has never been addressed by classical neurobiology. Methods that would study this distribution do not exist. Moreover, synaptic processes in dendrites, where most of the computations take place, have an extremely low level of observability. But what if it could be possible to obtain information about synaptic weight distribution across a network including dendrites? Is it really necessary to obtain this information for a specific biological network to understand the function that it is computing? The answer is no. Neurocomputing shows us that after a neural network has been successfully trained to perform its goal, its weights have no direct meaning to us. That is, it is impossible to extract underlying rules that may be implied from the neural network. Even after rather intensive research into this problem in neurocomputing, the statement still stands (Munakata, 1998).

            d. The network computational principle changes the understanding of the structure-function relationship. As mentioned earlier, the classical understanding of the structure – function relationship calls for the synthesis of a detailed knowledge of connections between interacting neurons, their properties, and their corresponding behavior. As we can see, the result of such synthesis has very little to do with explaining the function computed by the network that is being studied.

            e. The network computational principle is necessary but not sufficient to solve the problem of the structure-function relationship in the brain. At first glance, the network computational principle is the necessary missing link. One might assume that now, when we finally understand what biological neural networks are doing, we will be able to successfully unveil their functions. However, further analysis shows that the network computational principle is necessary but not sufficient to solve the problem of the structure-function relationship in the brain. The network computational principle taken alone “makes” the problem of the structure-function relationship even more complex. It does not show an alternative to the classical conceptual understanding of the brain and, consequently, of the structure – function relationship in it. The problem is complicated by the fact that the same function can be computed by different networks and that the same network can compute different functions. There is one well-known device that can do this – a computer. The way to complement the network computational principle by using a functional approach, understanding structure through function, will be discussed in the next Section.


4. Functional building blocks of the brain


Understanding structure through function requires a system approach that uses a different type of reduction. In control theory it is called decomposition, and it is based on functional principle. A decomposition of the whole system is usually done in such a way that any functional subsystem includes a controlling part and its controlled object. A controlling system sends controlling signals to its controlled object and receives from it feedback signals. Application of this decomposition principle to the brain immediately leads to very constructive results. It becomes clear that the brain is a hierarchical system in which each higher hierarchical level treats its lower hierarchical level as the controlled object (see also Section 4.2.2). Each controlling level performs numerous functions specific to the repertoire of a given animal. The more evolutionary complex an animal is, the richer the repertoire is. The principal feature of these functions is their automatic nature, whether inborn or acquired. An animal possesses a vast variety of inborn automatisms at birth and acquires new ones during its lifetime. An automatism is abolished after lesioning the corresponding brain part.

This extension of the concept of automatisms has profound meaning. Now it includes acquired skills compared to the definition commonly used in neurobiology to define inborn behaviors. This generalization allows analyzing the brain from a unified perspective. Program controls like locomotion, breathing, postural control, navigating in the environment, reading, writing, etc. are automatisms. Reflexes, inborn and acquired, conditioned and instrumental, are also automatisms. The logical conclusion is to consider learning as the process necessary for the formation of a new automatism.

The term automatism may be considered a generalization of the term function in biology. Its use helps avoid ambiguities related to the term function. We got used to terms like “function of the muscle”, “function of the heart”, “function of the neuron”, “function of the spinal cord”, “function of the thalamus”, etc. These terms mean that we take a certain structure and try to understand its purpose of existence, i.e., what it is doing. Putting function in the first place does not allow us to use the common definition of function, because it would mean that we conduct reduction the same way it was done before. The term function is also used in mathematics, neurocomputing, and computational neuroscience. The notion that biological neural networks perform computation is very useful, but it does not explain the purpose of computation. In Section 4.1, the relationship between neural computation and automatism will be clarified.

The more generalized definition of automatism leads to a reformulation of the major questions confronting neuroscience. Traditional questions focused us on the construction of neural networks that could generate programs for effector organs or could be substrates for different reflex arcs. Now we can introduce a more fruitful and universally applicable question: What is the basis for the automatisms of different parts of a nervous system? The answer to this question allows the formulation of a more comprehensive theory of brain function. In other words, the theory should also explain principles of coordination of automatisms at the same hierarchical level and principles of their hierarchical relationship.

It is logical to start from the lowest hierarchical automatisms. The automatisms, however, can be rather complex behaviors. Such automatisms are well known in neurobiology – locomotion, scratching, breathing, swallowing, etc. An executive organ (for instance, the hindlimb in cats) can perform not only locomotor and scratching movements, but other movements as well – flexor reflex, extensor reflex, shaking, etc. Moreover, all these movements can be performed by spinal animal. Therefore, the part of the spinal cord that controls all these movements, automatisms, is a rather capable controlling system. It is responsible for a broad set of automatisms. Clearly, solution of the CPG problem described in Section 2 is a prerequisite for the new theoretical approach. But the solution should explain not only the nature of a CPG but also how it interacts with signals arriving to it via peripheral afferent feedback.

We need a theoretical basis that is not only necessary but also sufficient to analyze biological neural networks. In other words, this theoretical basis should possess a necessary level of completeness to properly describe biological neural networks. It is logical to suggest that there are common principles of brain organization in animal kingdom. Otherwise our quest to understand the brain is doomed.


4.1. Control of neuronal automatism by neural optimal control system


Experiments in cats have shown that hindlimb CPGs for locomotion and scratching possess a model of object behavior (Baev et al. 1991; Baev et al. 1991; Baev and Shimansky, 1992). From the point of view of control theory, this situation can mean only one thing: The controlling system is an optimal one and the internal model is the result of its optimality. Locomotor and scratching CPGs for the hindlimb share the same controlled object and spinal circuitry. Therefore a CPG is a regime of work of a neural optimal control system (NOCS). The next logical conclusions are obvious:

1. The spinal NOCS that controls the hindlimb also controls all of its other movements, that is, it is multifunctional.

2. Other body parts are also controlled by NOCSs.

3. Hierarchically higher motor control levels are also NOCSs. Higher NOCSs also contain models of their controlled objects behavior, but their controlled objects are lower NOCSs.

4. Non-motor automatisms are also controlled by NOCSs and the brain is a hierarchical system of interacting NOCSs.

These conclusions lead to formulation of the following problems: How is a generic NOCS constructed and how do various NOCSs interact? But before describing a generic NOCS, let us consider what optimization is.

Optimization is a well-known procedure in technical fields. It allows finding the best solution for a particular process. The principle of this procedure is rather simple – to find the minimum or maximum value of a certain criterion or criteria while changing other parameters of the process or the system under consideration. Typical examples of optimization processes include finding the shape of a car or a boat that produces minimal drag, minimization of energy consumptions by a system, maximization of power output, etc. Numerous mathematical methods have been developed in various technical fields including control theory to design optimal processes and system. Biology also provides us with countless examples of optimization processes. Presently it is a well-known fact that each animal species is optimized for its ecological niche in respects ranging from their body shapes to their nervous systems. The whole process of evolution should be considered as the process of optimization. Detailed knowledge of how biological systems perform optimization is absent. Obviously, it will be one of the major goals of future biology including neurobiology. However, one of the basic optimization mechanisms is already known: Natural selection. Conceptually, numerous well-known processes, such as learning, should be considered optimization processes.      

            The technical meaning of the notion optimal differs from the same notion in non-technical fields. In the latter case, this notion usually does not imply any constructive meaning; it just describes something that is better than the others. In technical fields, the word optimal is used to mean that the process or the system is actually shaped depending on what optimization criterion or criteria are chosen. From a theoretical perspective, only one so-called global minimum or maximum can exist for any process or system, which means that only one global solution exists, if all possible optimization criteria are taken into account.

A generic NOCS includes two major functional units – a controller and an internal model of the dynamics of the controlled object (CO) (Fig. 5). The controller utilizes information about the current state of the CO to compute the control signal that will move the CO from its initial point to its destination point along the optimal trajectory. The internal model is a predictive mechanism. At any given moment, it predicts the next most probable state of the CO after the CO receives the controlling signal from the controller. This internal source of afferent information, expected or model afferent flow, interacts with actual afferent flow in a rather unusual way, and this interaction is far from trivial. The model afferent flow is treated by the NOCS and hence by the CPG as a component of actual afferent flow (Baev et al. 1991a, 1991b). This type of interaction between model and actual afferent flows is defined as parity interaction. Both flows produce primary afferent depolarization in the spinal cord. This mechanism enables the NOCS to pay the highest attention to the most active informational channels. Silent or low active channels are discarded by this mechanism and hence are considered as nonreliable sources of information by the controlling system. This explains the holographic properties of the afferent system, that is, why after partial or complete limb deafferentation, the CPG retains the ability to generate rhythm. After complete limb deafferentation, the control system relies solely on the information provided by the internal model. Therefore, the generation of a rhythmic motor pattern by a CPG in the absence of peripheral afferent feedback is a reverberation not between flexor and extensor half-centers as suggested by classical views (see Fig. 2), but rather within the loop that includes the internal model (Fig. 5).

Fig. 5


Fig. 5. A generic NOCS. See text for explanations.


Any NOCS receives two types of afferent signals: initiating signals and signals that contain the current informational context in which the system as a whole finds itself. There is only one difference between these two types of afferent signals. A NOCS attempts to minimize initiating signals, more exactly, to minimize the integral measure of initiating signals by using informational signals to compute proper output. Initiating signals are analogous to "energetic" signals. They activate NOCSs resulting in a realization of any NOCS’s corresponding automatism. Taking into account the concepts mentioned above, one may now formulate the following definition of an automatism: An automatism is a program of action or sequential actions that is stored in the NOCS and, during execution, leads to a minimization of its corresponding initiating signal. Withdrawal or escape reactions and locomotion are typical examples of automatisms that serve to avoid danger, i.e., to minimize the initiating signal that signifies danger. Reflexes, being automatisms, usually serve to minimize certain initiating signals. However, often the final goal of an automatism is not minimization but maximization of the initiating signal. For example, during pleasurable reactions, when the corresponding signals are maximized. This situation poses no contradiction, because the nervous system can easily invert the sign of the signal so that the latter becomes minimal instead of maximal. It should be noted that appropriate segregation of afferent flow into these two types of signals can be only made at the level of the recipient NOCS. The same signal can be interpreted differently by different NOCSs. It may be either initiating or informational signal, or even both. This concept will be explored later with different examples of NOCSs.

            Informational and initiating signals can have a short or long time span depending on the particular control task. They have a long time span when the controllability (see below) of the object is very low, and it requires a longer time to remove a corresponding initiating signal by sending control influences to the controlled object. It is necessary to distinguish subtypes of initiating signals, for example, one that starts an automatism and another that indicates a mismatch between model and real flows -- an error signal -- that goes to the model subsystem of a network and to the higher level (see Fig. 6). Both types have to be minimized during control. A mismatch signal is usually the initiating signal for learning processes within the model (see below). It initiates a learning automatism. In the discussion below, each signal is described as precisely as possible to avoid confusion.

Fig. 6


Fig. 6. Major functional blocks of an NOCS. Informational and initiating signals are shown by black and white arrows, respectively. See text for further details.


Although intuitively understood, the term controlled object requires additional explanation. It is possible to talk about a controlled object by using either a limited narrow definition or a broader and general description. In the first case, it is the executive organ itself, for instance, limb, or a lower NOCS. In the second case, it is the first description as well as all surrounding environment. If a body part such as a limb is the controlled object, then the environment with which the limb interacts is also a part of the controlled object. In the case of a NOCS, its own controlled object and all interconnecting neural systems to which the given NOCS sends efferent signals and from which it receives afferent signals are a part of the broader controlled object. It is the whole surrounding brain when considering the highest functional levels. At first, such a description sounds strange because for a lower NOCS to predict the behavior of the whole brain seems impossible. Such a NOCS is capable of only restricted predictions. For instance, a spinal NOCS is not capable of predicting descending influences based on visual, acoustic, or vestibular information received by higher brain levels. This broad understanding of a controlled object has profound consequences: (a) An internal model cannot predict everything. There is always unaccounted information that actually means that afferent channels are noisy; (b) A lower NOCS is incapable of determining the existence of higher functional levels; (c) A higher level NOCS has to “speak” with a lower one by using the language of the lower level, i.e., the language of lower level initiating and informational afferent signals. In the discussion below the more limited and narrow understanding of controlled objects will be used primarily. Cases in which a broad understanding is required will be specified.

There are two major reasons for an NOCS to have a model of the controlled object behavior: incomplete observability and incomplete controllability of the controlled object. Incomplete observability means that an NOCS does not receive complete information from the controlled object. There is always some unaccounted information caused by numerous factors. An environment with which a controlled object interacts has its own dynamics and may influence the controlled object in unexpected ways. The latter is usually a cause of perturbations. A construction of a sensory system usually allows receiving information only from part of the environment. This point is well illustrated by the highest brain levels. The visual system, for example, receives information from only a portion of the environment. There are also other reasons for incomplete observability. For example, an error in the execution of a centrally generated motor program will lead to unexpected afferent signals. Some small error in execution is always present even under normal circumstances. Biological neural networks perform approximations of functions with a certain degree of accuracy. Incomplete controllability means that the controlling system cannot bring the controlled object into a new state in a single control step. Multiple controlling steps require storing the results of intermediate computations within the controlling system to perform recursive computations. It has been shown in control theory that a control system must have a model of the object behavior to optimally control incompletely observable and incompletely controllable objects. 

A control system uses the internal model in many ways. Several of them are listed below. An NOCS uses model afferent flow to filter peripheral information to determine the current state of the controlled object. An NOCS uses a special integration procedure to merge both these flows (Fig. 6). As a result, the current state of the controlled object is determined with a higher precision than that based on either source separately (Shimansky, 2000). The complexity of the interaction between real and model afferent flows varies in different NOCSs. Another advantage of internal models is that they give NOCSs the ability to compare model and actual afferent flows. Error, or mismatch, signals are computed in this way. It is a very important type of signals that find multiple uses in a controlling system. One of them is learning (see below).

An internal model should be tuned to the controlled object as precisely as possible. Thus, an NOCS is a learning system that continually adjusts the model and the controller to function properly. As mentioned above, there are always errors in execution and prediction. The simplest way to think of a model adjustment process is to think of it as learning to mimic the actual afferent flow as precisely as possible. The mismatch between actual and model flows is used to guide this process of learning. The goal of this learning is to minimize the error signal, or more precisely, the integral measure of this error signal.

In control theory, it is customary to use such terms as state space and control space to describe a control system behavior. The first is a generalization of the term phase space that is broadly used in physics to describe a mechanical system. For example, the position of a material point in a phase space would require six coordinates – three to describe a position and three to describe a velocity along the corresponding space coordinates. In most of the literature, the term phase space is used as a synonym of state space in general. Within this framework, a control task consists of moving the controlled object in such a state space from an initial point to a destination point along the optimal or quasioptimal trajectory. It is worth mentioning that the state of a neural network itself can be considered as a point in a definite state space, and a process of computation can also be described as a motion from start to finish. The dimension of state space depends on the complexity of the particular system. The control space simply includes all possible controls for a given system.

An NOCS requires a rather simple command to execute one of its automatisms. Scratching or locomotor movements, as mentioned earlier, can be initiated by a simple tonic command. In the language of control theory it means that the NOCS is provided only with target state space coordinates at the destination point (Fig. 6). The NOCS does the rest – it determines the coordinates of the initial point A and moves its controlled object to the destination point B along the optimal trajectory. In terms of control theory, it uses state space vectors at the initial and destination points to compute the optimal control vector. Here we can give a natural and simple explanation of a difference between rhythmic, phasic, and tonic movements. In rhythmic movements, the velocity state space coordinate at the destination point is not set to zero, and a moving body part, for instance, a hindlimb’s tip during scratching, performs oscillating movements that are targeted at the destination point. A phasic movement from point A to point B is performed when the velocity state space coordinate at the destination point is set to zero. A tonic movement is usually performed against some other force that may include muscle antagonists. It is also characterized by a zero-velocity space state coordinate value at the destination point, but requires non-zero force to maintain the position. An initiating command can come not only from a higher hierarchical level, but also from the same hierarchical level or from the controlled object (Fig. 6). For example, a painful stimulus can evoke locomotion, or a flexor reflex and corresponding cross-extensor reflex. The same or very similar movements can be evoked by commands coming from higher hierarchical levels.

The functional blocks of an NOCS were described above. Anatomically, however, they may be inseparable from each other, especially in simple systems, where a single pacemaker neuron may be a substrate both for the controller and the model. Complex systems (see Section 5) can have the model and the controller anatomically separated. However, one functional block does not make sense without the other. 

An NOCS reacts to a perturbation in a very specific manner. The scenario of an NOCS perturbation is described by Bellman’s principle (see Bronshtein and Semendyayev, 1998). According to this principle, any optimal control system attempts to perform optimally after perturbation, i.e., it will try to move the controlled object along optimal trajectory after perturbation.

Decomposition of any functional part of the brain into NOCSs can be done in different ways. One can separate the whole controlling system into NOCSs and their respective controlled objects. On the other hand, one can combine numerous NOCSs in one complex system and also consider it as an NOCS. In both situations, the method will succeed if an NOCS and its corresponding controlled object are designated correctly, i.e., if a functional approach is used.

Example 1.

primitive gill controlling system

Figure. Primitive gill control by a pacemaker neuron as simple NOCS.

a, b - gill control before and after deafferentation.  pn - pacemaker neuron. opr - open position receptor. mf - muscle force. g - gill, closing - up.


It is well seen that after deafferentation the controlling system uses internal model to generate controlling commands. It is obvious that membrane properties of the pacemaker neurons play an important role in generating the model afferent flow.

Learning. It follows from the network computational principle that both network and cellular parameters can be changed during learning. The parameters can be synaptic weights, shapes of dendrites, membrane and other cellular properties, etc. Even new neurons can be generated in some areas of the brain, although it is not a trivial problem to incorporate a new neuron into a functioning neural network (See Section 6.2.1). NOCSs and underlying biological neural networks can use numerous learning strategies to adjust approximated functions. At the highest brain levels those strategies and learning mechanisms can be very complex. Trial-and-error is the basic most universal strategy, and numerous other types of learning may have derived from it during the course of evolution. This strategy requires some temporal and spatial relationships between initiating and informational signals during learning. These requirements are discussed below. The mechanisms responsible for changing network and cellular parameters are not discussed here, as they are not important for a conceptual understanding of the process.

The term trial-and-error implies that a trial occurs first and is followed by an error signal. This means that informational signals arrive in the controlling system earlier than initiating ones. This type of learning can occur if initiating signals evoke, for example, fluctuations in weights of the network’s synapses which transmit informational signals and their activity preceded the error. Both an increase and a decrease in synaptic weights are allowed. The stronger the signals are, the bigger the fluctuations. This process is usually called random search. It converges to the state where initiating signals are minimal. This method is universal because it gives a neural network the ability to find a new decision when there are no preexisting algorithms to direct its output appropriately.

Primitive classical conditioning can be described by this scheme. Classical conditioning is a type of associative learning also known as Pavlovian conditioning. Possible temporal relationships between a conditioned stimulus (CS) and an unconditioned aversive stimulus (US) during classical conditioning are shown in Fig. 7. Fig. 7The most



Fig. 7. Types of classical aversive conditioning.

CS – conditioned stimulus. US – unconditioned stimulus.


efficacious conditioning paradigm is simultaneous conditioning when a CS terminates at the end rather than at the beginning of a US. An NOCS tries to minimize initiating signals. The US is an initiating signal and the CS is informational context. The initiating signal informs the control system that the object is in a dangerous zone of its state space such as when the controlled object, for example, limb, receives a painful stimulus. Any information preceding the initiating signal, such as information from tactile receptors or distant receptors like antennae, functions as the informational context. Obviously, such a primitive system will be in a state of random search, i.e., in the state of learning, until the controlling output that optimally minimizes the initiating signal is calculated in response to the informational context (Fig. 8). The controlling signal (for instance, a motor command to a muscle) will first move the object from the danger zone, and an initiating signal will either not be generated or will be minimal. It is obvious that such a system will compute different functions depending on the informational signal that preceded the initiating signal, the features of the controlled object, and the parameters of the initiating signal.

 Fig. 8 


 Fig. 8. Minimization of initiating signal during learning.

a, b, c – beginning, middle and end of learning, respectively. CO – controlled object. ic -  informational context, is - initiating signal, and m - movement. The diagram in the middle represents gradual decrease in integral intensity (I) of initiating signals caused by an aversive stimuli during learning.


Classical conditioning of the eyelid closure reflex is a typical example of this learning scheme with tone and air puff to the eye being the CS and US, respectively (see Section 5.1.1). At the end of learning, eyelid closure is produced by the controlling system to minimize the initiating signals evoked by the air puff.  A similar learning scheme can be used to tune the model to the controlled object. In this case, various error signals play the role of the initiating signal.

As already mentioned above, the trial-and-error strategy requires certain obvious conditions. An error signal must be delivered to the appropriate neurons and hence to their corresponding synapses (spatial credit assignment), during a uniquely appropriate window in time (temporal credit assignment (Minsky, 1963; Barto et al., 1983). Clearly, delay and especially trace conditioning rely on trace processes evoked by informational signals. Otherwise, this type of learning could not work. Therefore, delay and trace conditioning need some form of memory mechanisms to be capable of solving a credit assignment problem. At lower control levels, the time window preceding error signals can be brief - tens or hundreds of milliseconds. Cellular properties could play a role in such short-term memory. At higher levels when the time between CS and US can be seconds, minutes, or hours, special memory mechanisms must be involved for this type of learning to work effectively.

Being universal, however, this method may be very slow when a search is performed in the hyperspace of a network’s parameters. Evolution could use several improvements to accelerate this process. Specific mechanisms could determine the gradient of an initiating signal. System parameters could be adjusted in a dependent fashion (pattern adjustment). Incorrect decisions could be memorized (possible at the highest levels).  Finally, during evolution, systems based on random search methods could find a rule or a set of deterministic rules to adjust their computed functions for certain situations. For example, an initiating signal could possess a sign -- excitatory or inhibitory - and show the direction of any necessary synaptic changes in informational synapses. Any improvement in learning must be based on the specifics of a network as well as its cellular and molecular mechanisms. Hence, the more complex the strategy involved in the search for new decisions, the more complex its mechanisms should be.

Three major categories of learning are known in neurocomputing – supervised, graded or reinforcement, and self-organization (Hecht-Nielsen, 1990; Munakata, 1998). The learning described above belongs to the reinforcement type of learning. Supervised training implies a regimen in which the neural network receives a sequence of pairs of input and correct output vectors. During self organization, a network modifies itself only in response to input signals.

In the neuroscience literature, the term “teaching signal” is broadly used to define reinforcement learning. There is no teacher in reinforcement learning. A correct answer is not delivered to the learning system, only an estimate of the system’s performance.

Learning requires complex metabolic processes that are initiated by error signals and can also be rather time consuming. In complex hierarchical systems, an accumulation of such processes (errors) at various levels might have led to the invention of special mechanisms like sleep to properly address the accumulated errors.    

Based on the notions of NOCS and computation introduced in this and previous sections, the relationship between neural automatism and computation becomes clear. The relationship between the terms automatism and computation is similar to the relationship between a goal and the means by which the goal is achieved. To improve an existing automatism or to create a new one, the computational abilities of each functional subdivision within a particular control system must be adjusted appropriately by using the procedure of learning.

Neurocomputing helps to clarify what must be achieved during the ontogenetic development of an NOCS. In neurocomputing, various learning algorithms are used to adjust network parameters, the synaptic weights. Synaptic weights are usually selected randomly at the beginning of a training session. Training often takes a long time and sometimes never converges. Lengthy training, however, is not a problem. Even though months of continuous training may be needed, once successful, the final network configuration easily be copied to other systems. Obviously, the benefits can be significant. Biological neural networks evolved along similar lines. In evolution, successful network solutions were then transferred to future generations by genetic automatisms.

A biological neural network must be genetically predetermined to compute specific classes of functions. Sources of initiating and informational signals (detectors) for each hierarchical level also must appear during specific developmental stages. If they do not appear, the system will be unable to learn and to create a necessary model of controlled object behavior, and will therefore lack the ability to perform proper control tasks. Learning is a necessary developmental component for any neuronal automatism, because genetic information cannot account for the numerous environmental conditions that confront an animal. The process of learning would take too long without initial genetic structural approximations of the networks used to compute specific classes of functions. Without them, it would be normal for learning to fail to create essential automatisms during a lifetime, and the existence of complex biological systems would be impossible. The situation would be analogous to finding a new solution without the benefit of any previous knowledge.

Rather well documented embryonic motility attests to these views. Embryonic motility starts early in ontogenesis when limb innervation is established (Hamburger, 1963; Hamburger and Oppenheim, 1967; Bradley, 1999). It is a process that should be considered as necessary for creating and tuning the internal model and the controller. The process of tuning the model to the controlled object and the controller to perform optimal movements persists through ontogenesis and ends with the death of an individual.


4.2. Principles of interaction of various neural optimal control systems


Almost all animal species have hierarchical nervous systems that include numerous NOCSs. Only the simplest diffuse nervous systems in the simplest animals are an exclusion from the rule. Functions of various NOCSs must be coordinated in space and time to achieve any meaningful control goal. There are two possible types of interactions between NOCSs in hierarchical systems – horizontal and vertical. In horizontal interactions, NOCSs of the same hierarchical level interact with each other to coordinate their functions. In vertical interactions, NOCSs of different hierarchical levels interact with each other in such a way that the hierarchically lower systems are controlled objects for the higher ones.


4.2.1. Horizontal interaction of neural optimal control systems


Based on the construction of a generic NOCS such as that described above, it is not difficult to make conclusions about mechanisms of interaction of various NOCS to solve horizontal coordination problem. Any given NOCS should receive information from all other NOCSs with which it coordinates its activity, and its internal model should be capable of predicting this information as precisely as possible. For this reason, anatomical interconnections between all possible pairs of NOCSs must exist to solve the coordination problem (Fig. 9). This construction is used in invertebrates where various ganglia that control limb movements are interconnected. In vertebrates, propriospinal neurons play a similar role in the spinal cord. In such systems, information from higher NOCSs that is not available at the lower levels (for example, information from distant rostral receptors) should also delivered to all lower level NOCSs. In their turn, lower NOCSs should send information about their activity to the higher NOCS that controls them. Obviously, this construction has limited capacity. It is relatively efficient to solve simple lower level coordination problems and be responsible for behaviors like escape, startle, and various vestibular reactions. When a number of participating NOCSs becomes large and other sensory modalities are added, this mechanism becomes awkward and inefficient. A more efficient solution evolved in nervous systems with a well developed hierarchy (see an example of the cerebellum below).

Fig. 9


Fig. 9. Solution of coordination problem in simple animals like crustaceans.

See text for explanations.

4.2.2. Vertical interaction of neural optimal control systems


The term hierarchy was mentioned earlier. It is a familiar term, however, a constructive meaning of this notion has not always completely understood. The major advantage of a hierarchical system is that a rather complex behavior produced by a lower control level can be started by a simple command from a higher level. This advantage means that a trajectory at the lower level within its state space corresponds to a point within a state space of the higher command level. When the higher level jumps to another point, the lower level changes its trajectory.

In its turn, the higher level can be a controlled object for an even higher control level. This means that the higher level state space becomes more discrete and the higher level model predicts in what state and when the transition of the controlled object to its next state will occur. In a multilevel hierarchical system, a very high degree of abstraction can be achieved at the highest hierarchical levels. In the case of motor control, the higher levels control more abstracted parameters like intensity of locomotion, direction of locomotion, etc. For example, directional and place neurons have been found in the brain (Crutcher and DeLong, 1984; Kobayashi, 1997; Sergio and Kalaska, 1997).

Sensory systems are also hierarchical, and each level can be considered as an NOCS. Descending control in sensory systems is well known. In such systems, a lower detector is fixed upon a particular feature when a corresponding descending control signal arrives. On the other hand, a mismatch signal will go higher and higher until it is caught by a competent level (Fig. 10a). It is obvious that secondary, tertiary, etc. detectors can be created this way. As a result of this process, a hierarchy of sensory detectors can exist. It is noteworthy that higher sensory levels including the cerebral cortex can send initiating and informational signals to lower detectors and, consequently, tune them to any desirable feature because new minimization criteria become available to the lower levels.

Fig. 10




Fig. 10. Advantage of hierarchy in sensory and motor systems.

a – minimization of an initiating signal (white arrow) is necessary to tune a sensory detector to a specific feature. b – higher level can help to remove an initiating signal that was not caught by a lower level.


Let us consider a scenario in which a high level detector -- for instance, a detector that processes information from distant receptors to recognize a dangerous object -- can be used to improve behavior. The low level of control does not receive an informational context that precedes a painful stimulus and cannot avoid the latter, because it cannot produce a necessary controlling output according to the scheme shown in Fig. 8. The initiating signal will be spread to the higher level where it can be minimized, if the information from the distant detector is used by the higher control level to compute the minimizing controlling output sent to the lower level (Fig. 10b). Another possibility of improving behavior in hierarchical NOCSs is by sending a new initiating signal (new minimization criterion) to a lower level. In this situation, any available informational context, including a context sent by a higher level, can be used to minimize this initiating signal. Finally, any conditioned reflex learned at a highest level can initiate very complex lower automatisms.

5. Mechanisms of normal brain-initiated behaviors: THE Physiology of neural optimal control systems


In the previous section, the statement was made that NOCS architecture and principles of interactions between various NOCSs are the key to understanding the mechanisms of complex behaviors. In this section, two examples of motor behaviors – movement coordination and the function of the skeletomotor loop in vertebrates -- will be described to support this statement. Later, several logical conclusions will be made to show how non-motor brain behavioral phenomena can be explained.


5.1. Motor behaviors


Spinal NOCSs are very potent. Their controlling repertoire ranges from simple reflexes to postural reactions and complex rhythmic movements like locomotion, scratching, shaking, etc. In the highest vertebrates, fine arm and finger movements that allow object manipulation are added to this repertoire. Obviously, complex movements require high quality coordination and movement initiation. The first is performed by the cerebellum, and the second by the motor cortex -- more exactly, by skeletomotor cortico – basal ganglia – thalamocortical loop.  


5.1.1. Movement coordination in vertebrates: Understanding the cerebellum  


The role of the cerebellum in movement coordination is well known. Without the cerebellum, movements are poorly coordinated, but possible. This is the most mysterious experimental and clinical fact. Any theoretical explanation of the cerebellar function must explain this fact.

Motor coordination requires the interaction of different NOCSs. There are two major mechanisms available to the system for solving the coordination problem. The first one was described in Section 4.2.1. It is the anatomical arrangement that results in mutual interconnections between all possible pairs of interacting NOCSs. The second is the creation of one coordinating central dispatcher which receives information from all NOCSs and other sources like the visual, acoustic, vestibular system, etc., processes it, and sends corresponding commands back to each NOCS (Fig. 11). Fig. 11The cerebellum has been proposed to be the central dispatcher created by nature and elaborated in the evolution of vertebrates (Baev and Shimansky, 1992; Baev, 1994, Baev, 1998). The cerebellum accumulates knowledge regarding spatio-temporal correlations between the input signals and optimal control signals that should be sent to each NOCS in a particular situation. It is also the place where the brain puts together in real time all the information about the controlled object – the body and the environment.


Fig. 11. Solution of coordination problem in vertebrates by utilizing the cerebellum to play the role of coordination central dispatcher.


The cerebellar coordination function can easily be conceptualized by using the semantics of its afferent inputs (Fig. 12). These semantics become clear when an interpretation of the data about the activity of the spinocerebellar loops during locomotion and scratching is performed based on theoretical views described earlier. This data can be found in the book by Arshavsky et al. (1986). There are two afferent inputs to the cerebellum: the mossy and climbing fiber systems. Three ascending tracts bring information from the spinal cord to the cerebellum via the mossy fiber system: the spinoreticulocerebellar tract (SRCT), the ventral spinocerebellar tract (VSCT), and the dorsal spinocerebellar tract (DSCT). The SRCT and VSCT are rhythmically active during both real and fictive locomotion and scratching. Fictive motor behavior is observed after pharmacological neuromuscular paralysis. The DSCT is rhythmically active only during real locomotion and scratching. This rhythmic activity disappears after deafferentation or immobilization when fictive behavior is present.

Only one ascending tract - the spinoolivocerebellar tract (SOCT), brings information to the cerebellum via the climbing fiber system. The SOCT has a low level of activity and modulation during fictive and real unperturbed movements. Climbing fibers evoke complex spikes in Purkinje cells. During locomotion, complex spikes occur only in response to an unexpected perturbation of movement. When an animal learns to overstep an obstacle placed in its path, the intensity of the complex spikes that occur when the limb contacts the obstacle (the perturbation) is maximal at the beginning of the learning process. At the end of the learning process (i.e., there is no contact between the limb and the obstacle during locomotion), their intensity is minimal (Lou and Bloedel, 1992).

Fig. 12


Fig. 12. Semantics of cerebellar inputs.

SOCT – spinoolivocerebellar tract. SRCT – spinoreticulocerebellar tract. VSCT – ventral spinocerebellar tract. DSCT – dorsal spinocerebellar tract. cfs, mfs – climbing and mossy fiber systems, respectively. 



The cerebellum sends information to the spinal cord via rubro-, reticulo-, and vestibulospinal tracts. These descending tracts are rhythmically active during real and fictive locomotion and scratching. This rhythmic activity almost disappears after cerebellectomy.

Obviously, information from the SRCT and VSCT can be interpreted as information about the state of the model of the corresponding lower NOCS (i.e., information about how the lower NOCS “sees” the controlled object). Information arriving at the cerebellum via the DSCT can be regarded as actual peripheral afferent information unaccounted for by the internal model. The cerebellum also receives rich informational context - visual, auditory, and vestibular information -- through mossy fibers. The cerebellum processes all the afferent information that it receives to obtain the most reliable information about the current state of its controlled object - in this case, the entire body of the animal and its surrounding environment.

The SOCT conveys information to the cerebellum through the climbing fiber system about errors or mismatches at the lower NOCSs. Error signals occur when the content of model afferent flow fails to coincide with peripheral afferent flow. Olivary neurons are activated during errors of execution of actual movements, during perturbations, and when unexpected afferent information is received (for example, when a limb contacts an obstacle).

The cerebellum does not create its own model of object behavior because it does not need to do so. It uses models from other NOCSs. The cerebellar complexity and size are the consequence of the following factors. It needs an enormous memory capacity to store its acquired knowledge, and it works in real time and processes enormous amounts of information to perform coordination tasks. Poorly coordinated movements are possible without the cerebellum because the lower NOCSs still perform motor control within their limited capabilities.

For the type of learning described in Section 4 to be effective, the error signal should reach the cerebellum after the informational signal. This sequence has been shown experimentally. When a stimulus is applied to the spinal peripheral nerve, the mossy fiber information reaches the Purkinje cells 10 to 15 ms earlier than the climbing fiber information (Eccles et al., 1967). It is obvious that the short-term memory mechanisms needed to permit learning (Section 4) must be present in the cerebellum. The cerebellum must be capable of memorizing a brief prehistory (what happened before the error signal arrived - from tens to at most several hundreds of milliseconds). Any informational signal within this time frame can be used to learn how to avoid future mistakes (i.e., to avoid receiving an error signal). Such informational signals allow the cerebellar circuitry to learn how to generate a controlling output that minimizes error signals. This process is the essence of the cerebellar coordination function.

If the cerebellum is provided with the necessary informational and initiating signals, this brief interval can be extended significantly. The cerebral cortex sends information to the cerebellum through both cerebellar afferent systems (i.e., it sends informational and initiating signals). The cerebellum is thus provided with new minimization criteria and necessary informational context, and it becomes possible to learn more complex motor tasks.

The cerebral cortex also benefits from the cerebellum. From the cerebellum, the cortex receives precise information about the current state of the controlled object. The cerebellum receives model and real afferent flows from other NOCSs and provides the cerebral cortex with information about any mismatch between these two signals (i.e., novelty in object behavior). Information of this type may play a significant role in complex pattern recognition (i.e., when cognitive motor learning occurs). Based on the law of symmetry, we can suggest that the cerebral cortex also provides the cerebellum with model, expected, and real afferent information as do other NOCSs.

Classical conditioning of the eyelid closure response was mentioned in Section 4.1. The role of the cerebellum in this conditioned response has been demonstrated experimentally (Thompson, 1989). Nerve impulses evoked by the unconditioned stimulus (air puff to the eye) are conveyed to the cerebellum through the inferior olive and its climbing fiber system. The conditioned stimulus (tone) is conveyed to the cerebellum through the mossy fiber system. This is completely in line with the role of the cerebellum in motor coordination described above in this section.


5.1.2 The skeletomotor cortico – basal ganglia – thalamocortical loop


This “skeletomotor” (or “motor”) loop consists of “closed” and “open” loops (Fig. 13). In the closed loops, the target cortical area sends projections to the basal ganglia. The motor circuit is closed by thalamocortical projections to the supplementary motor area (SMA), the premotor area (PM), and the motor cortex (MC). The open loop includes the arcuate premotor area (APA) and the somatosensory cortex (SC). All skeletomotor corticostriate projections are functionally interrelated and are progressively integrated in their passage through the basal ganglia to the thalamus, and from there to specific cortical areas (Alexander and Crutcher 1990; Alexander et al. 1986). Thus, the skeletomotor cortico - basal ganglia - thalamocortical loop receives inputs from much larger regions of the cortex than those to which they project their signals.

Fig. 13


Fig. 13. Functional organization of the skeletomotor cortico – basal ganglia - thalamocortical loop.

APA - arcuate premotor area. CO - controlled object. GPi - internal segment of globus pallidus. MC - motor cortex. PM - premotor cortex. SC - somatosensory cortex. SMA - supplementary motor area. SNr - substantia nigra pars reticulata. TN - thalamic nuclei.



The skeletomotor loop contains important feedback elements that include dopaminergic neurons of the substantia nigra pars compacta and the adjacent mesocorticolimbic group. Early studies suggested that the substantia nigra pars compacta projects to the striatum while adjacent tegmental dopaminergic neurons make mesocorticolimbic connections. Currently, the complex organization of the cell body subgroups - the one located in the substantia nigra pars compacta and the other in the ventral tegmental area - is no longer defined in terms of striatal or mesocorticolimbic projections,  because these projections are intermingled. Some mesocorticolimbic projections originate in the substantia nigra and vice versa (Le Moal and Simon, 1991). Dopaminergic neurons receive inputs from various brain structures: the cerebral cortex, putamen, nucleus caudatus, entopeduncular nucleus, dorsal nucleus of raphe, central nucleus of amygdala, bed nucleus of the stria terminalis, and other structures.

            Results of recordings from striatal neurons in different behavioral paradigms are intriguing. Some neurons are activated during the expectation period before an externally imposed stimulus is presented. Such a reaction is called predictive firing. The activity of dopaminergic neurons is even more intriguing. Dopaminergic neurons of the pars compacta and mesencephalic tegmentum react to behavioral or environmental change (i.e., to unpredicted stimuli). If the stimulus can be predicted, the situation is different. For example, in conditioning paradigms, dopaminergic neurons respond to a US at the beginning of learning trials.  As the animal learns the task, the cells respond to a CS that cannot be predicted and do not respond to the US (Ljungberg et al., 1992; Schultz 1998). However, they will fire if the US is not presented at its previously predicted time interval. For example, if the stimulus appears earlier than expected, the neuron again fires.

            Within the framework of the proposed conceptual views, the anatomical and physiological data can easily be explained. Closed neuronal networks are the substrate for the model (Fig. 13). Although a functional subdivision should not be completely identified with an anatomical subdivision, we may do so as a first approximation. Accordingly, the basal ganglia form a system that predicts the behavior of the controlled object, while the part of the cerebral cortex included in the closed loop is considered the controller. At this hierarchical level, specific complex neural networks, like the basal ganglia, evolved to predict the behavior of the controlled object. The cortical areas included in the open loop can be conceptualized as detectors of complex features of the controlled object providing additional informational context for the controlling system. The error distribution system within the skeletomotor cortico-basal ganglia-thalamocortical circuit includes the dopaminergic neurons of the substantia nigra pars compacta and mesencephalic tegmentum. As expected from this scheme, the error distribution system is not activated when the behavior of the controlled object is executed and predicted correctly.

The body of the animal and the environment are the controlled object for the skeletomotor loop. As part of the controlled object, the environment can be complex and includes static and dynamic objects. Dynamic objects can be passively or actively moving objects, such as other animals. An animal’s survival depends on how effectively its internal model predicts the behavior of such objects. A prey must successfully predict the behavior of a predator to avoid being caught, and a predator must have a model of its prey to hunt successfully. A person driving a car must have models of surrounding objects - of the road, pedestrians, cars, and so on. Such objects are incompletely observable and incompletely controllable.

Various motor control levels, such as initiating systems of the brain stem, cerebellum, and even other cortical levels, are controlled objects for the skeletomotor cortico-basal ganglia-thalamocortical loop.   The skeletomotor cortico - basal ganglia - thalamocortical loop is a hierarchical system with numerous subloops subordinated to one another. The higher the subloop in the hierarchy, the more abstract are the parameters processed by the loop. The relative position of various objects, direction, place in space and speed of an object’s movement are typical examples of variables encoded at the level of skeletomotor loops.

The model can be used in two ways. First, it can be used during execution of a movement to predict the transition to a new state. Second, it can be used during the planning phase of a movement that needs a full-scale model because afferent information about future states is lacking. It is hard to imagine this process without using a model. Moreover, a cause-effect model is unconstrained by real time and may function at rates faster than real time (e.g., rates necessary for rapid multistep planning). Using the model without the constraint of real time requires efferent and afferent channels to be turned off until a correct decision is found. At this level, functional deafferentation or deefferentation can be achieved with various inhibitory mechanisms.

This model must allow very rapid tuning to an object. Environmental changes occur quickly, and the model must immediately account for any changes. Hence, dopaminergic learning must happen quickly. Many strategies can be used to accelerate learning in biological networks. Very complex network and molecular automatisms could underlie the rapid learning that occurs in the basal ganglia. Numerous mediators in the basal ganglia are indirect evidence of this contention. Rapid learning also must occur in the controller, cortical regions included in the skeletomotor cortico-basal ganglia-thalamocortical loop.

The system at the higher level must “jump” from one state to another while performing controlling functions. The optimal (easiest) way to create such a system is to build it on the basis of pacemaker neurons or neurons possessing bistable properties. The latter neurons have been found in the basal ganglia (Baufreton et al. 2003). Circuit triggers also should be used to build the system.

The skeletomotor loop not only initiates and coordinates lower automatisms, like the cerebellum, which can add new components to current motor program, it can also add necessary details (for example, fine finger movements) to lower motor programs during their execution.


5.2. Non-motor behaviors: Further generalization of motor automatisms


Presently, five basal ganglia - thalamocortical circuits are distinguished: the already mentioned "skeletomotor", the "oculomotor", the "dorsolateral prefrontal", the "lateral orbitofrontal", and the "anterior cingulate" (Alexander et al., 1986; Alexander and Crutcher 1990). All of them have features similar to those described above for skeletomotor loop. Each cortico - basal ganglia - thalamocortical circuit receives its multiple corticostriate projections only from functionally related cortical areas. Each circuit is formed by partially overlapping corticostriate inputs which are progressively integrated in their passage through pallidum and substantia nigra (pars reticulata) to the thalamus, and from there to a definite cortical area. Usually the target area is one of those that sent projections to the basal ganglia. Each loop is the combination of "open" and "closed" loops. The apparent uniformity of synaptic organization at corresponding levels of these loops and the parallel nature of these circuits led to the opinion that similar neuronal operations are performed at comparable stages of each of the five mentioned loops.

            Application of the new brain concept to other loops can be found in previous publications (Baev 1997, 1998). Several things should be always taken into account while applying the new concept to other cortico – basal ganglia – thalamocortical loops. Each of these loops has a controlled object whose behavior each controlling system is trying to predict as precisely as possible. Each loop has a specific set of information and initiating signals. All these loops have a dopaminergic error distribution system. There may be and should be other error distribution systems as well, but they remain unknown. There is a hierarchical relationship between the loops: The higher the loop, the more abstracted its controlled objects, and the more abstracted its state space. The above-described motor loop is subordinated to higher loops. For example, feeding or mating behaviors initiated by higher loops include a motor component. However, this phenomenon of the perpetual ascension of parameters abstraction accompanying the creation of new hierarchical levels was probably the basis for the evolution of language itself. Physically adding new hierarchical levels to create new abstractions has obvious limitations. In humans, language evolved to solve this problem.  It became a universal tool for creating new abstractions, because new abstractions could be constructed by implementing new symbolic concepts or words. The state space at the level of the prefrontal loops is discrete.  Theoretically, they can process objects such as words. Therefore, thinking can be conceptualized as movement of abstract objects in an abstract state space.

After the creation of language, the power of an intelligent biological network computer in the form of an individual or society began to rely heavily on external memory mechanisms. The capability for accumulating knowledge was otherwise limited by the human life span. Initially, the role of external memory was performed by story-telling in an individual’s social group. With the development of writing, the social group acquired an unlimited ability to store information in a collective external memory. The evolutionary process in human societies depends on the effectiveness of access to information contained within this collective memory.


6. Mechanisms of pathological brain-initiated behaviors: The Pathophysiology of neural optimal control systems


The pathophysiology of brain disorders should be created anew, if NOCSs are considered functional blocks of the brain. Pathology in any functional block of NOCS may lead to the manifestation of brain pathological symptoms. Here, the term pathology is used in its most general meaning and defines any abnormal changes in an NOCS. These changes can be neural death or malfunction or abnormal synaptic transmission caused by infection, toxins, trauma, genetic defect, aging, and so on. General considerations about basic regularities of malfunctions in NOCSs are presented in the next section. Holographic properties of biological neural networks should always be remembered when analyzing any consequence of damage to a neural network.


6.1. Basic mechanisms of malfunctions in a neural optimal control system: General considerations


Any functional subdivision of an NOCS can malfunction. In addition, an NOCS may receive incorrect afferent inputs, either informational or initiating. Only several possible basic scenarios are described below.

Damage to initiating channels. Damage to initiating channels may cause signals initiating a specific automatism controlled by the NOCS to be lost partially or completely. Corresponding impairment of the automatism ranges from partial to complete. Under normal circumstances, an initiating system sets a target state for the controlled system. When parts of the signals do not arrive in the controlled system, the target states cannot be set properly.

One example of damage to the subtype of initiating system called an error distribution system that initiates learning of the internal model is discussed in the next section. In general, the result of partial damage to an error distribution system that supplies the controller, the model, or the system such as the cerebellum, depends on its construction. In some systems, for example, the cerebellum, each Purkinje cell receives an error signal from only one error distribution neuron. In others, several error distribution neurons may arrive to the same target neuron. The result of partial damage also depends on a system’s ability to reinnervate corresponding neurons that have lost their error inputs (a process usually called sprouting). Complete destruction of an error distribution system leads to a complete loss of the corresponding automatism. For example, damage to the inferior olive that is a part of the error distribution system of the cerebellum leads to symptoms similar to those associated with cerebellectomy (see, for instance, Dahhaoui et al., 1992).

A neural network calculating error signals could begin to function abnormally. Several scenarios are possible when these signals arrive in a modeling network. In one case, a new function that minimizes this abnormal initiating signal can be calculated by the model. Therefore, in the final stage of the learning process, the model network starts to calculate this incorrect function. The expression of pathological symptoms then depends on how far the predicted state varies from the actual state of the object. In another case, no function minimizes the mismatch signal. For instance, the device comparing model and actual afferent flows generates an output signal without considering input signals. There will be a persistent search of the system in its state space and which will therefore result in the destruction of the model. With time, the neural network may also partially or completely reject such an input about error that it cannot minimize. Similar reasoning applies when a mismatch signal is used to tune a network responsible for control law.

Damage to informational context. After diffuse and mild destruction of informational channels, a system could function but its resolution would be affected adversely. Precise control -- movements, for example -- would decrease. Severe diffuse destruction may lead to loss of function because a large decrease in precision precludes normal function. Total deafferentation also causes loss of function. An NOCS is a learning system. In the absence of afferent flow, and consequently proper error signals, an internal model eventually loses its capacity to predict future states of the controlled object. How fast this process happens depends on the speed of learning processes in the NOCS. If a specific informational input (e.g., visual or acoustic input or input from other parts of the body) is completely destroyed, the system loses its ability to predict the behavior of the object that corresponds to this type of informational context. For instance, if the part of the model describing the behavior of a specific body part receives no informational context from another body part, these parts cannot be coordinated. All other damage (between mild and total) to informational channels leads to intermediate dysfunctions. It is also obvious that when the system lacks part of the informational context that can be used to calculate the necessary controlling signal, the model is relearned (i.e., retuned to the new situation).

Damage to the controller or modeling network. When a neural network loses part of its elements, it can still function but with less precision. The bigger the loss is, the bigger the changes are in the network’s computational precision. As the result, the controller will generate less precise commands and the model will produce less accurate predictions. Obviously, this less precise control command would be translated to corresponding behavior of CO or lower NOCSs. The CO or the lower NOCSs are just following commands, but overall behavior can be erroneous for a given situation.

Optimal filter malfunctions. The term optimal filter describes the process of filtering information with the purpose of detecting a certain signal, or signals, by using certain predictions. It is an important mechanism necessary for pattern recognition. In the case of an NOCS, the model flow is used to determine the state of the controlled object. Many things can go wrong with this process. Suppose the afferent flow delivers unusual information as the result of pathology or an unusual intervention, for example, muscle tendons vibrating at a frequency of 70 Hz. It has been shown experimentally that the latter intervention produces motor illusions (Cordo et al., 2005). The reasons for this effect are obvious: The model does not predict this afferent flow and the state of the controlled object is determined incorrectly. There can be another situation when the model flow prevails, that is, when actual afferent flow is treated by the system as an unreliable source of information. At the lower level, this malfunction can also lead to an illusion. At the highest levels, both situations can be the basis for delusions. Obviously, various other scenarios involving malfunctions are possible that can lead to improper control.

The pathophysiology of NOCSs depends on learning processes. An NOCS’s capacity to learn should be considered when symptoms or treatments of any brain disorder are analyzed. With time, the network adjusts itself to a partial loss of its elements by invoking learning mechanisms. Changes in the control law or in the model will inevitably start learning processes in the system. The same is true for any changes in afferent inputs. In connection with this, one possibility should be discussed: Pathologic afferent flow produced by any other subsystem of the brain begins to arrive in the normally functioning NOCS. If this pathologic flow is correlated with other afferent signals, the learning process will stop when the model starts to properly describe this pathologic afferent flow and its correlations (association) with other components of normal afferent flow. This process can be called “mastering” pathology. In this case, any associative afferent flow can provoke pathological behavior.

The ability of neural networks to form attractors was mentioned in Section 3. Obviously, learning is the foundation for this capability. Attractors can be formed in various functional blocks of NOCSs, and they can play a significant role in the normal functions of an NOCS. They can be responsible for the stable normal behavior of an NOCS. However, some attractors can be responsible for pathological behaviors (see Section 6.3).

Functional subdivisions of an NOCS can be anatomically inseparable (see Section 4); therefore, damage to an NOCS can result in the malfunction of several of its functional blocks. The resulting symptom, or symptoms, can be understood only by conducting rigorous theoretical analyses that should include computer simulation experiments (see Section 7.2).

Obviously, in addition to all above-mentioned situations, a hierarchical relationship between various NOCSs should be always taken into account when analyzing mechanisms of pathological symptoms. An error made at a higher level can be sent to a lower level NOCS in the form of an incorrect command signal. Likewise, a signal about an error made by a lower level NOCS will go higher and higher in the hierarchy of NOCSs until it reaches a level that is capable of taking care of the error.


6.2. An example of the malfunction of an error distribution system within a neural optimal control system: Parkinson’s disease  


There are numerous diseases that involve the skeletomotor cortico – basal ganglia – thalamocortical loop. One of them is Parkinson’s disease, the most studied neurodegenerative disorder. Experimental and clinical studies have shown that the death of dopaminergic neurons in the substantia nigra leads to PD. The first symptoms of the disease are usually expressed when only about 15% of the dopaminergic neurons remain. Akinesia, muscular rigidity, and tremor are the main PD symptoms (Greene et al., 1992; Montgomery et al., 1991).

            The pars compacta of the substantia nigra is part of the error distribution system (see Section 5.1.2). Therefore, PD is a disorder of the error distribution system (Baev 1995, 1997, 1998). The controlling system, the skeletomotor loop, cannot function properly in the lack of precise signals about errors, because the model incorrectly predicts the state of the object in these patients. The modeling network cannot slide down to the global minimum on the error surface and stays in one of the local minima, for example, A or B, or is overthrown from A to B, because the error signal is inadequately large (see Fig. 14). The modeling network could have generated more precise output if it had been provided with more precise error signals. The controller also does not function properly and sends erroneous signals to the controlled object.

Fig. 14


Fig. 14. Error surface profile.

I – intensity of an error signal. S – state of the modeling network. See text for further explanations.




Several remarks should be made regarding prediction. In general, as follows from Section 3, biological neural networks cannot function with 100% precision because of the presence of noise in the informational channels. Therefore, any prediction is made with a certain error. This means that it is necessary to talk not about a predicted specific point in the controlled object’s state space but rather about a certain region where the controlled object most probably should be based on the prediction. Similarly, when an NOCS determines the state of its controlled object, we should also consider a certain region in state space where the controlled object is most probably located. From a functional standpoint, this means that small errors are allowable and that an NOCS will not find any error if predicted and actual regions coincide or at least partially overlap. This situation is what happens in normal subjects during motor control; predicted and actual states overlap. In Parkinson’s disease, predicted and actual states do not overlap (Fig. 15).

Fig. 15


Fig. 15. Predicted and actual states of CO in normal and parkinsonian subjects.

S1, S2 – state space coordinates.



Symptoms of PD reflect on the decreased precision of the error signals. Incorrect prediction results in system inability to accurately determine the current state of the controlled object. The controlling system treats incorrect predictions as if the controlled object were perturbed by an unaccounted external force. It tries to adjust the state of the object during the next step of control. The search for a correct control causes specific symptoms, depending on the object controlled by the skeletomotor subsystem. The lower level skeletomotor subloop, for example, performs postural control of a body part with certain position. Remember that the motor loop performs most of its control of spinal NOCSs through brainstem NOCSs so that the spinal level receives only parameters for the destination point. In this case, the skeletomotor loop “thinks” that the brainstem NOCS is not in the desired state. This situation looks like overregulation, in which the controlling system always misses the equilibrium point and generates sequential phasic movements. As a result, this part of the body will demonstrate tremor. If a higher skeletomotor control level is involved, however, large amplitude rather complex involuntary movements may be observed.

            Another mechanism of tremor can exist when the controller makes a mistake and sets erroneous parameters for a destination point, for example if the controller erroneously initiates a rhythmic program instead of a phasic one by setting non-zero velocity at the destination point. This mechanism explains why parkinsonian tremor can be present after deafferentation (McAuley and Marsden, 2000).

            More complex explanations are needed for bradykinesia and akynesia. In these cases, the prediction of the model differs significantly from the real state of the object. The model also may predict several states with equal probabilities. Such a system needs much more time to choose among these states and to minimize the error signal and slide down to the global minimum on the error surface. Severe akinesia results when the model predicts an absolutely unreal situation.


6.2.1. Mechanisms of anti-parkinsonian treatments


Currently, two major groups of anti-parkinsonian therapies are available – drug therapy and functional neurosurgical procedures. Mechanisms of these therapies are explained based on the ideas described above.

Levodopa is the most effective anti-parkinsonian drug. Levodopa is the precursor required by the brain to produce dopamine. One of the mechanisms of levodopa therapy consists in increasing the gain of the error distribution system. More intense error signals help the system to slide down to the global minimum on the error surface. However, the amplification of error signals has both advantages and disadvantages. The progression of PD is accompanied by a gradual decrease in the resolution power of the error distribution system and an increase in the dosage of medication. Hence, levodopa can be effective until the decrease in precision reaches a certain level. At this level, patients exhibit medically induced symptoms (e.g., dyskinesias) because overamplified error signals cause large adjustments in the model. As a result, the system jumps over the global minimum while trying to minimize error signals (see Fig. 14). When this stage of PD is reached, symptoms can be alleviated by decreasing the resolution power of the model network (see below). Another mechanism of levodopa therapy consists in an increase of spontaneous dopamine release that is equivalent to adding noise to the error signal (see below).

Functional neurosurgical procedures include three major treatments for PD - partial lesioning of basal ganglia structures, deep brain stimulation (DBS) of certain structures, and neurotransplantation. The last one is not generally accepted and remains experimental and controversial. Attempts to explain the mechanisms underlying these functional neurosurgical procedures based on traditional theoretical interpretations create more questions than answers. Empirical observations generated by functional neurological surgery have unearthed two puzzling problems related to partial lesioning of the basal ganglia. First, why does partial lesioning of various basal ganglia structures alleviate parkinsonian symptoms? The globus pallidus pars interna (Gpi), subthalamic nucleus, and ventral oral anterior and posterior (Voa and Vop) thalamic nuclei are well-known targets for partial lesions. Evidence suggests that lesions placed in other basal ganglia structures (e.g., the caudate nucleus, putamen, or external segment of the globus pallidus (GPe)) also can improve symptoms (see Baev et al. 2002). Therefore, although some lesions are more effective than others, partial lesioning of any basal ganglia structure alleviates parkinsonian symptoms. The other problem is related to the observation that lesions outside the basal ganglia - thalamocortical circuit also alleviate parkinsonian symptoms. Many neurosurgeons consider the ventralis intermediate nucleus (Vim) a good target for placing a lesion during thalamotomy. The Vim receives kinesthetic afferent inputs from contralateral body parts and from the cerebellum.  Why partial lesioning of this nucleus reduces or even eliminates tremor in parkinsonian patients is unknown. Both of these puzzles lead to an even more profound enigma: How can destroying part of a network improve its function? 

As mentioned earlier, the modeling network in parkinsonian patients could do better, if provided with more precise error signals. The network’s precision decreases when the basal ganglia network that generates model afferent flow is partially destroyed. A better relationship between the error precision and the precision of the modeling network is established; their resolution powers start matching each other. As a result, the region of possible predicted object states becomes bigger so that it now overlaps with possible real states of the object (Fig. 16b). The system no longer finds error in its prediction and does not try to correct the object position in its state space. Therefore, a positive clinical effect is obtained at the expense of a decrease in the resolution of the modeling network. It is necessary to note that from a theoretical standpoint a decrease in resolution also changes the error surface profile that can help the system to slide down to the global minimum.

Fig. 16


Fig. 16. Predicted and actual CO states in parkinsonian patients after various therapies.

Coordinates are the same as in Fig. 15. a – before any intervention. b - after a lesion placed in the basal ganglia. c - after a lesion placed in the circuitry processing information about actual CO’s state. d - after a lesion placed in both circuitries - the circuitry processing actual afferent flow and the model one. e – stimulation of the basal ganglia – adding noise to the modeling system. f – dual action of noise and functional block on the modeling system.


The decrease in resolution depends on the location of the lesion in the network. The cortico - basal ganglia - thalamocortical loop is usually compared to a funnel. A lesion in such a system will have the greatest effect if it is placed in the narrowest part of the funnel (i.e., near its output). In fact, Gpi, STN, and thalamic nuclei are very effective places for lesions. However, the STN is probably the most effective site to improve parkinsonian symptoms. Based on the existence of excitatory feedback from STN to GPe, it has been postulated that this feedback is the foundation for selectivity regulation or precision tuning of the network that determines how accurate the prediction should be (Baev et al. 2002). The functional meaning of the feedback is similar to recurrent collateral inhibition. In sensory systems, the latter works like a focusing (contrasting) mechanism. A partial lesion of the STN renders this focusing mechanism less effective thereby increasing the overall effect of partial lesioning of this nucleus. The connections of the pedunculopontine nucleus, which sends excitatory connections to the Gpi and pars reticulata of the substantia nigra, are reminiscent of those of the STN.

The situation is symmetrical when the network processing real afferent flow (i.e., Vim, the known place of cerebellar projections) is lesioned.  The predictions of the model remain the same. What changes is the region of the possible controlled object states generated by the system processing the real afferent flow. It becomes bigger, and the two regions, predicted and actual, start to overlap, alleviating parkinsonian symptoms, e.g., tremor (fig. 16c). In reality, a thalamotomy usually involves Vim, Vop, and possibly even Voa. Involvement of both the model network and the network processing real afferent flow decreases the resolution of both systems (Fig. 16d).  The final outcome is superior to lesioning only the modeling network or the network processing real afferent flow.

Therefore, partial lesions must be considered as a treatment based on the holographic properties of biological neural networks. Less than 10% of the total volume of a specific structure is usually destroyed.  For the decrease in resolution to be effective, the error distribution system must preserve some functional capability before any lesion is placed in the modeling network. In other words, predictions of the model should approximate reality. Otherwise, the two will not overlap after the lesion, and symptoms will not improve. Partial lesions of various structures in PD should be considered as palliative, symptomatic interventions. They are not curative because they do not stop the fundamental degenerative process. However, they can slow the disease’s rate of progression because overloading the error distribution system decreases after lesions are placed.

A second type of treatment, deep brain stimulation (DBS), has been developed as a method of treatment for PD and other neurological disorders relatively recently (see Section 6.3). The method has been implemented broadly, without a fundamental understanding of how it works.  As a method of treating PD, DBS is even more puzzling than lesioning. High-frequency (above 100 Hz) stimulation of the same structures (Gpi, thalamus, and STN) whose lesioning successfully alleviates parkinsonian symptoms produces an identical effect.

Two possible mechanisms could underlie the efficacy of DBS. First, stimulation might functionally block regions immediately adjacent to the electrode tip. The mechanism underlying DBS would be similar to that underlying lesioning. Although the results of chronic stimulation are similar to the clinical results of placing a lesion, neural tissue is not immediately destroyed. Turning off the source of current produces a reversible type of functional lesion. Second, in addition to blocking, activation may be involved. If so, influences spread to other brain regions via both fibers passing through the stimulated region and axons of neurons excited during stimulation. There are no clear data about the activation effects of DBS (Ashby, 2000), but concrete data suggest that activation components of DBS do exist. In the operating room, stimulating lesion targets has long been known to alleviate symptoms. This effect is rapid and may involve activation rather than neuronal inhibition. Long-lasting trains of stimuli alter the excitability of axons in complex ways. Axon excitability may increase, leading to spontaneous discharges, or it may decrease so that the axon is less responsive to electrical stimuli. It has been shown experimentally that STN stimulation releases dopamine and glutamate in the striatum and STN, respectively (Lee et al. 2004). These activating influences do not provide the system with meaningful information for signal processing. Thus, the second mechanism simply introduces noise into the network, and an obvious question appears: Why does adding noise to such a system improve its function?

Adding noise effectively helps the system to slide down the error surface to its global minimum; much like shaking an uneven sloping surface helps a ball slide down to the lowest point. In the proposed theoretical framework, introducing noise itself to the system can decrease the resolution of the system. Because high precision is impossible with a noisy background, the noise forces the system to reduce its resolution power (Fig. 16e). Dual action of noise and functional lesion on the modeling network is shown in Fig. 16f.

Obviously, the dual action of DBS is essential to improving this method by enhancing one action or another. In the case of chronic stimulation, more noise is introduced into the system as the frequency of stimulation decreases, and vice versa – an enhancement of the blocking effect accompanies the increase in stimulation frequency. An important conclusion about future strategies for DBS can be made based on this frequency dependency. For each stimulated structure, there should be an optimal ratio between how much neural tissue is blocked and how much noise is added to the controlling system to obtain the maximum therapeutic effect. This optimal ratio depends on the structure and stimulation parameters. Future theoretical, experimental, and clinical research should be dedicated to finding these ratios. A functional block in one structure and the addition of noise to another also might be a future strategy for DBS, but several electrodes and more sophisticated stimulators will be needed.

Although neurotransplantation did not prove to be a successful treatment for PD, it deserves comment because of the hype associated with it. Recent hype is mostly related to brain stem cell research. When degenerative loss of nigrostriatal dopaminergic cells was discovered to be the etiological basis of PD, the possibility of substituting them with transplanted dopamine- producing cells - whether of neural, paraneural, or transfected cell origin (see Marciano and Greene, 1992) - was embraced. 

Within the framework of the proposed conceptual views, transplantation should be considered as a way to restore damaged function by replacing a whole automatism (organ transplantation) or by implanting new structural elements (tissue or cellular transplantation) in the malfunctioning system with the hope that they will be integrated so that normal function can be restored. In neurobiology, only tissue and cellular transplantation has been studied intensively, especially fetal neurotransplantation. Transplantation of the whole brain or its entire parts remains science fiction.

In neurotransplantation, the underlying simplistic assumption is that transplanted tissue or cells will establish correct connections with the surrounding neurons. As a result, the damaged system would be rewired completely or almost completely and lost function would be restored. Neurotransplantation differs from the transplantation of entire organs (e.g., heart, liver, or kidney), in which a damaged automatism is replaced with a new one. In contrast, neurotransplantation is supposed to repair a damaged neuronal automatism by adding new neuronal elements, based on the assumption that these new elements “know” how to establish correct connections with surrounding neuronal structures. However, these cells do not “know” how to establish correct connections in an adult brain. During embryogenesis, the appearance of new neural cells is perfectly coordinated in time so that correct connections can be established. In an adult brain, a transplanted neural or stem cell will not receive the sequence of signals needed to establish the necessary connections. Therefore, a transplanted cell most likely can establish only aberrant connections with surrounding neurons.

Ironically, the establishment of wrong connections with the surrounding neurons is most often mentioned as evidence that transplanted cells have been integrated with the surrounding neural tissue. Experimental facts like growth of the dendritic tree, new synapses, and the limited appearance of new neurons in an adult brain are considered strong support for neurotransplantation. These capabilities are necessary but not sufficient for successful neurotransplantation.  The capability to heal a skin cut on a finger should not be identified with the capability to regrow an entire extremity. A network must possess complex, specific computational mechanisms to be able to incorporate new neuronal elements. Neurocomputing has shown that scaling up a neural network is not a simple matter. For example, to extend a trained neural network of 200 neurons to 201 neurons, an entire training session would be needed. The more complex the network is, the more complex these mechanisms should be. Genes of regeneration for neural tissue were blocked during the evolution of higher vertebrates. Salamander is the last species on the evolutionary ladder of vertebrates that is capable of regenerating neural tissue. Consequently, new neurons can appear in the adult brain in ontogenesis and become integrated by the surrounding neural tissue only in regions that have solved the above computational problem (i.e., computational problems related to incorporating new neurons are not very complex). Theoretically, genetic engineering could provide transplanted neurons with information about how to find target neurons in an adult brain. Possibly, genes of regeneration could be unblocked and improved in higher vertebrates. If these improvements became possible, science would have surpassed the creative achievements of nature. A rather detailed analysis of the neurotransplantation idea has been previously published (Baev et al. 2002).


6.3. Other examples of clinical applications


One very important feature of a neural network, the ability to form attractors, was mentioned in Sections 3.1 and 6.1. In the latter section, it was noted that attractors can explain extremely well various stable behaviors controlled by neural networks, but they also can be responsible for pathological behaviors. Several of these behaviors are analyzed below.

Suppose that a pathological stable state was formed in an NOCS. It would mean that this NOCS would spend some time in this state and would tend to slide to this state under certain afferent influences. Let us continue using the term attractor introduced in Section 3.1, although the NOCS is much more complex than a neural network and can have other mechanisms, yet unknown, responsible for stable behavior. The stronger the point attractor is, the longer the time that the NOCS “sticks” to this state. Obviously, when the NOCS is in the attractor state it will initiate the corresponding behavior of its CO, and the hierarchical location of the CO will determine the observed symptoms. This line of reasoning immediately leads to numerous clinical applications, for example, Tourette’s syndrome. When the pathological attractor is at the lower levels of the skeletomotor loop, simple motor ticks will be observed. More complex movements are observed when the attractor occurs in the higher skeletomotor levels. If one continues this reasoning, he or she will include obsessive compulsive disorder (OCD) to the group of the diseases caused by pathological attractors. Very complex obsessive compulsive behaviors can be evoked, if the attractor is located in the cingulated gyrus, for example. Interestingly enough, some patients with Tourette’s syndrome also have obsessive compulsive disorder. Finally, if the attractor is located in that part of the cingulated gyrus that is responsible for mood control, severe depression or a manic state can be observed.

Tourette’s syndrome is also often associated with attention deficit hyperactivity disorder. The latter can be considered as a competition between attractors. An attention can be conceptualized as the capacity of an NOCS to remain in a certain state. If there is another attractor that can prevail, a useless attractor, then the attention to the first one will suffer.

Other disorders, for example, drug addiction, can be also conceptualized this way. The NOCS learned (found a shortcut) how to maximize a certain pleasurable feeling by administering drugs. The attractor can become very strong and can dictate the individual’s behavior. This situation bears a striking similarity to OCD.

Chaotic attractors can cause pathological behaviors like epilepsy. Again, the attractor can occur in different NOCSs, and a lot depends on the hierarchical level of the NOCS. For the immediately higher NOCS (for which the chaotic attractor NOCS is the CO), this is just an unpredictable perturbation. These perturbations are kept in check most of the time. But, if the perturbations become very intense, or the controlling NOCS cannot keep them in check anymore (because of fatigue, for example), generalized seizures may occur because the signal about the perturbation, the error signal, is spread through various hierarchical levels. In the case of the cerebral cortex, this process involves the dopaminergic system and has been described in a previous publication (Baev, 1997).  

A pathological attractor can be a natural part of the controlling repertoire. Quite often pathological behaviors are described as "normal behaviors gone wrong.” From a theoretical perspective, it is possible to consider some pathological attractors as very strong attractors in the rerertoire of the NOCS. The difference between the pathological and normal attractors in this case consists in the fact that more numerous afferrent signals cause the NOCS to slide to the pathological sate.

Recently, it has been shown that DBS can be an effective treatment for Tourette’s syndrome, OCD, depression, and epilepsy. Two components of DBS mechanisms were described in the previous section – adding noise to the system and functional block (analogous to a lesion, but reversible). An obvious question arises: Can these components be accountable for alleviating the symptoms of these diseases? Any noise added to the system will help to throw the system from the attractor state. Causing functional block and hence decreasing the resolution power of the system will also help to avoid the attractor sate. Functional block of afferent or efferent channels (for example, stimulation of the internal capsule, which has been shown effective in some cases) is also an equivalent of a decrease in resolution power.   

As we see, there is no need to look for specific circuits that are responsible for a certain pathological brain-initiated behaviors. A pathological brain behavior can be the result of a neural network state, not a specific circuit within it.

Obviously, the explanation of DBS mechanisms described in Section 6.2 is applicable to other therapeutic stimulation procedures, like spinal cord or peripheral nerve stimulation. The same components – functional block and noise – are major mechanisms of these procedures as well. The final result of stimulation depends on which NOCS and which of its parts are involved.

Recent technology has developed the first successful neuroprosthetic devices, for example, cochlear implants. There have also been attempts to use neural signals recorded from the motor cortex to control limb muscles in paralyzed patients. The new conceptual views of the brain will help to better address these issues. In the core of these technologies should be the understanding of the hierarchical place of the NOCS of interest and its level of parameters abstraction. For example, if the level encodes highly abstracted parameters such as direction and velocity of limb movement, there is no way to extract parameters for single muscle control. In a similar line, a sensory prosthetic device delivering information to a certain sensory level should present this information in an abstracted form suitable to this level.


7. Conclusion


Based on the discussion above, it is possible to address two extremely important issues related to the profound change in the conceptual understanding of the brain. The first issue relates to the relationship between structure and function in the brain. The second issue relates to the future of system neuroscience, more precisely, how to accelerate the acceptance of the new ideas by the neuroscience community.


7.1. Understanding structure through function


Although the organizational principles of biological neural networks were introduced in Section 4, the problem of the relationship between the NOCS and the computational capabilities of underlying neural networks has not even been broached in subsequent sections of the article. It has just been implied that underlying neural networks are capable of performing necessary computations. Even intuitively it is obvious that these computations can be very complex and extensive, especially at the highest brain levels. The emphasis has been made on demonstrating the great advantage of the new concept of how functional building blocks are put together in the brain compared to the classical brain concept. The new conceptual model provides an effective approach for how to study various brain functions and their pathologies. The most amazing thing about this approach is that it is already effective even without the knowledge of computational capabilities of underlying neural networks. 

We have seen that the same neural network, the NOCS or one of its components, can compute various functions within a certain class or classes of functions. The classical approach does not address neural network computational capabilities at all. An important question arises: Is the new approach, understanding structure through function, capable of addressing the problem of computational capabilities of biological neural networks? If the answer is yes, then how it can be done?  

The answer to the first question is a simple yes. However, how it can be done, requires some explanation. An NOCS moves its controlled object along optimal trajectories in its state space (If there is no perturbation; but if there is a perturbation, the NOCS will still try to perform the optimal movement). Hence, if we know the controlled object, the goal of movement, and the optimization criteria, we can calculate the optimal trajectory. This can be done for a whole variety of optimal movements of the same controlled object. The controlled object can be a physical one, like a body part, or an abstract one that moves in abstract state space. When the optimal trajectories are known, it should be possible to study which computations should be performed by the NOCS and its functional blocks to move the controlled object along optimal trajectories. Only after we know these items can we ask the questions: What types of networks are capable of computing necessary functions or classes of necessary functions? Are analyzed neural networks capable of doing this? The answers to these questions will explain how controlled object states are encoded, the role of the network architecture, neuronal properties, and their interconnections in computational capabilities of the networks of interest.

It is always necessary to remember that networks with different architectures can perform the same computations. Moreover, it has been shown experimentally that biological neural networks are so robust that the same network can use different mechanisms to achieve a computational goal. For example, it has been shown that if spontaneous rhythmic activity in the embryonic chick spinal cord is blocked for 30-90 min after bath application of either an excitatory amino acid and a nicotinic cholinergic antagonist, or glycine and a GABA antagonist, it then reappeared in the presence of the drugs. The efficacy of the antagonists was assessed by their continued ability to block spinal reflex pathways during the reappearance of spontaneous activity. The authors concluded that these findings suggest that spontaneous rhythmic output is a general property of developing spinal networks (Chub and O’Donovan, 1998).

At this time, the relationship between the architecture of biological neural networks and their computational abilities remains mostly unknown (see Section 3). This problem will be the most challenging one for the foreseeable future. Its solution will depend on the mutual efforts of researchers in neuroscience, neurocomputing, and the control theory. As we have already seen in previous sections, biological neural networks are multifunctional. On one hand, both the controller and the model can be embodied by the same network, i.e., they can be anatomically inseparable. On the other hand, the same network is capable of computing different functions, and conversely, different networks can compute the same function. In the philosophy of neuroscience, the multiple-realizability thesis has been discussed for a long time. It actually means that the reductionistic approach has its limits, and the proponents of so called “ruthless reductionism” in neuroscience (see Bickle, 1998) should accept multiple-realizability -- and, probably, become less “ruthless”.

Optimality is the major feature of an NOCS, and the following obvious question arises: Are all neural automatisms optimal? The answer to this question is not a straightforward “yes”. Well-learned automatisms are optimal or suboptimal. Automatisms that are in the process of being acquired are obviously not optimal yet. An automatism that has not been used for a long time can lose its optimality. Optimal trajectories along which the CO can be moved by an NOCS have become optimal by using various available optimization criteria during the learning process. NOCSs are also structurally optimized. Revealing neural network optimization mechanisms is an important task for neuroscience in the future.

In essence, the brain is a hierarchical self-adapting network computer in which each hierarchical level is specialized to perform certain functions. Let us draw an analogy with a nonbiological computer to better understand this concept. A computer based on the same computational principle can be built with different elements; mechanical, optical, electrical, molecular, etc. For the sake of clarity, let us restrict the discussion to modern transistor-based computers. They can be built from different microchips, but still perform the same job defined by software. It is also well known that different software, based on different algorithms, can perform the same or similar functions. Biological network computers possesses the same features. Different initiating signals evoke execution of different programs by the same network. Various network architectures can be used to achieve the same or similar functional goals. It is possible to build a hierarchical controlling system from nonbiological computers. The same is true for biological network computers.

Suppose someone who does not know what a computer is tries to understand how it performs its function, for example, robotic arm control, by using various sensors, motors, and analog-to-digital converters, and vice versa. There is almost no doubt that the same stage of research similar to classical neuroscience would be repeated, and attempts to find within this controlling system the specific circuit responsible for the robotic arm movements would be made. Elements whose activity correlated to the robotic arm movements would be found in the computer, and their connections could be described. Illustrations demonstrating these correlations could be generated. Even dependencies between the parameters of the population activities of these elements and the robotic arm movements would be found. It is not necessary to continue this reasoning, because it is clear that until the notions of algorithm, software, processor, memory, and some other computer related notions are introduced, there is no chance to understand how the computer works. These notions are related to functional computer architecture. Obviously, until there is an understanding of the functional architecture of the brain, i.e., a new conceptual understanding of the brain, there is no chance to understand the brain’s functions.

Biological neural network computers – brains -- possess numerous features that modern computer science is only starting to tackle: self adaptation, self optimization, holographic properties, robustness, and numerous other features that are not yet achievable in modern computer science. Clearly, it is easier to copy biological principles to do this than to invent them anew.

When we better understand the inadequacy of applying classical views to study multifunctional systems it will be possible to go back to the discussion postponed in Sections 3 and 4. It was mentioned there that there are numerous computer models in the scientific literature based on serial circuits, for instance, models of CPGs. They are described as computational models explaining corresponding behavior. In Section 3, it was stated that those models have little to do with computational neuroscience. If they are not related to the computational capabilities of underlying neural networks, what are they? Classical models of CPGs always include mutually inhibiting neurons (unless it is a very simple generator built on a pacemaker neuron). From the perspective of network computation, this reflects the fact that the controlled object performs movements in opposite directions, corresponding controlling and model signals should be generated by the network, and corresponding correlations between network neurons are stored in the form of synaptic weights – excitatory, inhibitory, or both. Such models usually mimic some features of corresponding behavior without revealing its actual mechanisms. A toy frog also jumps, but it does not provide us with an understanding of how the real one actually controls its jumps. Toys are usually mimicking some external feature of behavior. The simpler the mimicked object, the closer the resemblance between the toy and the mimicked object is. This is why modeling of simple CPGs in invertebrates can be rather “convincing”.

Suppose one wants to combine computational views with the classical approach, understanding function through structure, to figure out what computations are performed by a certain complex neural network by synthesizing all available information about it. For the sake of clarity, let us assume that a supercomputer is used to simulate this complex neural network based on collected information. This scenario is quite possible, because a supercomputer itself quite often is considered as a panacea for complex problems. However, this scenario is bound to fail. The knowledge accumulated by neurocomputing together with the low observability of complex biological neural networks (see Section 3) attests to this. There will always be unaccounted parameters in such computer simulations, and everyone who has conducted computer simulation experiments knows that unaccounted parameters result in disastrously erroneous conclusions; changing or adding one parameter can radically change the computed function. At the same time, such an approach, if applied to specific subnetworks, can have some positive results. It could help to better understand what types of computations some parts of a bigger network are capable of performing.


7.2. What is next? The future of system neuroscience


Changing conceptual views is much more difficult than changing experimental method. System neuroscience would not be the first science to undergo radical changes in its conceptual basis. The history of science has numerous such examples. Based on this history, it can be predicted that there will be two major polar groups of neuroscientists – those who will accept the new conceptual understanding and will actively try to make the best use of it, and those who will actively resist it. The second group usually accepts new ideas as inconvenient truth that requires too many changes. There is only one truth – real truth. Inconvenient truth does not exist.

“Complex problems have simple, easy-to-understand, wrong answers.”[4] Such answers can create the rather strong illusion of knowledge and understanding. It is a natural process to replace simple wrong answers with more appropriate complex ones. Acceptance of the new conceptual brain views has to be made as soon as possible, if we want to understand brain mechanisms. This acceptance will require very broad changes and will not be an easy undertaking. Generations of neuroscientists were trained and are still being trained based on the classical conceptual views of the brain. The prevailing majority of research in system neuroscience is conducted based on classical ideas. In other words, we are dealing with great system inertia. It will not be easy to change something that has been in circulation for generations. Special measures are required to accelerate necessary changes. Education, education, education should be in the core of these measures.

I propose that the way we teach biological and medical students should be changed. There is a huge educational distance between biological or medical science and technical science[5]. It will not be an easy task to bring them closer. Biological and medical sciences are more descriptive while technical sciences rely more on logical derivations based on mathematics. For those students who are planning to become scientist in the field of system neuroscience, mathematics, control theory, and neurocomputing are a must. New textbooks of neuroscience must be written in which new explanations of normal and pathological brain-initiated behaviors are given. This long term goal will bring results in the future.

Changing the direction of research in system neuroscience will be the most difficult part. Science became an industry with its own specific money distribution system and neuroscience is not an exclusion from the rule. The system of funding research (for example, the system of peer reviewing) is very good at funding the best understood peers. The new brain concept is not mainstream yet, so the major problem will be: Who will judge the merit of publications and grant applications based on the new brain concept? The following example demonstrates how inert the system is. The solution of the CPG problem was found by 1992. As was already mentioned in Section 4, CPG is not a specific circuit: It is a regime of work of NOCS. But even now, there are CPG studies based on classical views. Here is a quotation from a recent paper: “A comprehensive understanding of any network requires identification of the participating neurons and deciphering of the wiring diagram. In the case of CPGs, so far, a complete wiring diagram has only been obtained for a small number of rhythmic motor systems, such as the Crustacean gustatory CPG and the swimming CPGs in the tadpole and the lamprey” (see Kiehn and Kullander, 2004). CPG is a ghost. Specific CPG circuits do not exist. But as we see, chasing a ghost still continues.

There is one more issue related to future research in system neuroscience based on the new conceptual understanding of the brain. Future research will be much more difficult to conduct than what we have now and will require resources practically unseen in modern neuroscience. Understanding neural network structure through function, an approach described in Section 7.1, will require extremely intensive and extensive theoretical analysis, and costly computer simulations. Presently, there is no single group in the world that is using this approach on a full scale to understand brain behavioral mechanisms. However, computer simulations should not be used to synthesize a detailed knowledge about the studied system such as would be done under the classical model (see Section 7.1). Detailed simulation of supercomplex systems is not realistic at this time or in the foreseeable future. Simplified analogies are necessary today to demonstrate the basic mechanisms of normal and pathological brain-initiated behaviors. Such analogies will greatly accelerate the new conceptual understanding of the brain. Such analogies are especially important in the field of applied neuroscience – neurology and psychiatry, where some new treatments became a mainstream without a proper understanding of the underlying mechanisms (see Section 6). The new brain concept will help us to better understand the prospects and limitations of current and potential therapies.     

I would not like to leave a reader under impression that the classical system neuroscience is completely useless. That is not the case; however, classical neuroscience was just a stage in our understanding of the brain, and it is time to change our conceptual views. We should not forget that almost all knowledge about the brain has been accumulated based on classical views. Neuronal properties, anatomy of neuronal interconnections, function localization, directions of information flow in the brain, activity of neurons during various behaviors, and so on, i.e., all knowledge accumulated till now, are the foundation for new interpretations based on the new brain concept. The new reinterpretation, however, will require a lot of work. Even figuring out the semantics of inputs to various NOCSs will not be easy. It will be like looking for a needle in a haystack. Original research studies usually did not intend to study signal semantics.

Finally, it is necessary to point out that the approach described in this article is entirely applicable to study psychological phenomena like cognition, self-awareness, perception, etc. It is also applicable to various non-neuronal network systems like the intracellular molecular controlling system, immune system, social system, etc., i.e., controlled objects and their corresponding optimal control systems can be found in them.




I am very grateful to Phil Pomeroy, Vice President of Neurosciences at St. Joseph’s Hospital and Medical Center, for his unconditional support of my theoretical research. I am also very grateful to Dawn Mutchler, Editor, Neuroscience Publications at Barrow Neurological Institute, for her tremendous help with preparing the text for this web page.




Alexander, G. E., Delong, M.R., Strick, P.L., 1986. Parallel organization of functionally segregated circuits linking basal ganglia and cortex. Ann Rev Neurosci. 9, 357-381.

Alexander, G. E., Crutcher, M.D., 1990. Functional architecture of basal ganglia circuits: Neural substrates of parallel processing TINS. 13, 266-271.

Arshavsky, Y. I., Gelfand, I.M., Orlovsky, G.N. , 1986. Cerebellum and rhythmic movements,  Springer-Verlag, Berlin.

Ashby, P., 2000. What does stimulation in the brain actually do? , in: Lozano, A.M., (Ed.), Movement disorder surgery. Prog Neurol Surg, pp. 236-245.

Baev, K. V., Chub, N.L., 1989. The role of different spinal cord regions in generation of spontaneous motility in chick embryo. Neurophysiology, New York: Plenum. Engl. transl. from Neirofiziologiya (Kiev). 21, 124-126.

Baev, K. V., Esipenko, V.B., Shimansky, Yu.P. , 1991a. Afferent control of central pattern generators: Experimental analysis of locomotion in the decerebrate cat. Neurosci. 43, 237-247.

Baev, K. V., Esipenko, V.B., Shimansky, Yu.P. , 1991b. Afferent control of central pattern generators: Experimental analysis of scratching in the decerebrate cat. Neurosci. 40, 239-256.

Baev, K. V., Shimansky, Yu.P. , 1992. Principles of organization of neural systems controlling automatic movements in animals. Progr Neurobiol. 39, 45-112.

Baev, K. V., 1994. Learning in systems controlling motor automatisms. Rev Neurosci. 5, 55-87.

Baev, K. V., 1995. Disturbances of learning processes in the basal ganglia in the pathogenesis of Parkinson’s disease: A novel theory. Neurol Res. 17, 38-48.

Baev, K. V., 1997. Highest level automatisms in the nervous system: A theory of functional principles underlying the highest forms of brain function. Progr Neurobiol. 51, 129-166.

Baev, K. V., 1998. Biological neural networks : Hierarchical concept of brain function,  Birkhauser, Boston.

Baev, K. V., Greene, K.A., Marciano, F.F., Samanta, J.E.S., Shetter, A.G., Smith, K.A., Stacy, M.A., Spetzler, R.F., 2002. Physiology and pathophysiology of cortico - basal ganglia - thalamocortical loops: Theoretical and practical aspects. Progr Neuro-Psychopharm Biol Psychiat. 26, 771-804.

Barto, A. G., Sutton, R.S., Anderson, C.W. , 1983. Neuronlike elements that can solve difficult learning control problems. IEEE Trans Syst Man Cyber. 13, 835-846.

Baufreton, J., Garret, M., Rivera, A., de la Calle, A., Gonon, F., Dufy, B., Bioulac, B., Taupignon, A., 2003. D5 (Not D1) dopamine receptors potentiate burst-firing in neurons of the subthalamic nucleus by modulating an L-type calcium conductance. J Neurosci. 23, (3), 816-825.

Bickle, J., 1998. Psychoneural reduction: The new wave,  MIT Press, Cambridge, MA.

Bradley, N. S., 1999. Transformations in embryonic motility in chick: Kinematic correlates of type I and II motility at E9 and E12. J Neurophysiol. 81, 1486-1494.

Bronshtein, I. N., Semendyayev, K.A., 1998. Handbook of mathematics,  Springer-Verlag, Berlin, Heidelberg.

Brown, T. G., 1914. On the nature of fundamental activity of the nervous centres; together with an analysis of the conditioning of rhythmic activity in progression, and a theory of evolution of function in the nervous system. J Physiol. 48, 18-46.

Chub, N., O'Donovan, M.J. , 1998. Blockade and recovery of spontaneous rhythmic activity after application of neurotransmitter antagonists to spinal networks of the chick embryo. J Neurosci. 18, (1), 294-306.

Chub, N. L., Baev, K.V., 1991. The influence of N-methyl-D-aspartate on spontaneous activity generated by isolated spinal cord of 16-20-day old chick embryos. Neurophysiology, New York: Plenum. Engl. transl. from Neirofiziologiya (Kiev). 24, 205-213.

Contreras-Vidal, J. L., Stelmach, G.E., 1995. A neural model of basal ganglia-thalamocortical relations in normal and parkinsonian movement. Biol Cybern. 73, (5), 467-476.

Cordo, P. J., Gurfinkel, V.S., Brumagne, S., Flores-Vieira, C., 2005. Effect of slow, small movement on the vibration-evoked kinesthetic illusion. Exp Brain Res. 167, (3), 324-334.

Crutcher, M. D., DeLong. M.R., 1984. Single cell studies of the primate putamen. II. Relations to direction of ovement and pattern of muscular activity. Exp Brain Res. 53, (2), 244-258.

Dahhaoui, M., Stelz, T., Caston, J., 1992. Effects of lesion of the inferior olivary complex by 3-acetylpyridine on learning and memory in the rat. J Comp Physiol. 171, (5), 657-664.

Eccles, J., Ito, M., Szentagothai, J., 1967. The cerebellum as a neuronal machine,  Springer-Verlag, Berlin.

Georgopoulos, A. P., Kettner, R.E., Schwartz, A.B., 1988. Primate motor cortex and free arm movements to visual targets in three-dimensional space. II. Coding of the direction of movement by a neuronal population. J Neurosci. 8, 2928-2937.

Getting, P. A., 1986. Understanding central pattern generators: insights gained from the study of invertebrate systems, in: Grillner, S., Stein, P.S.G., Stuart, D.G., Fossberg, H., Herman, R., (Eds.), Neurobiology of Vertebrate Locomotion. MacMillan Press, London, pp. 231-244.

Greene, K. A., Marciano, F.F., Golfinos, J.G., Shetter, A.G., Lieberman, A.N., Spetzler, R.F., 1992. Pallidotomy in levodopa era. Adv Clinical Neurosci. 2, 257-281.

Hamburger, V., 1963. Some aspects of the embryology of behavior. Quart Rev Biol. 38, 342-365.

Hamburger, V., Oppenheim, R., 1967. Prehatching motility and hatching behavior in the chick. J Exp Zool. 166, 171-204.

Hecht-Nielsen, R., 1990. Neurocomputing,  Addison-Wesley Publishing Company, San Diego.

Hill, A. A. V., Masino, M.A., Calabrese, R.L., 2003. Intersegmental coordination of rhythmic motor patterns. J Neurophysiol. 90, 531-538.

Hoyle, G., 1984. The scope of neuroethology. Behav and Brain Sci. 7, 367-412.

Kandel, E. R., Schwartz, J.H., Jessell, T.M. , 2000. Principles of neural science, second ed. McGraw-Hill, USA.

Kiehn, O., Kullander, K., 2004. Central pattern generators deciphered by molecular genetics. Neuron. 42 (3), 317-321.

Kobayashi, T., Nishijo, H., Fukuda, M., Bures, J., Ono, T., 1997. Task-dependent representations in rat hippocampal place neurons. J Neurophysiol. 78, 597-613.

Kolmogorov, A. N., 1957. On the representation of continuous functions of many variables by superposition of continuous functions of one variable and addition. Dokl. Akad. Nauk USSR (in Russian). 114, 953-956.

Kurkova, V., 1995. Kolmogorov’s theorem, in: Arbib, M., (Ed.), The handbook of brain theory and neural networks. A Bradford Book, The MIT Press, Cambridge, Massachusetts, London, England, pp. 501-502.

Le Moal, M., Simon, H., 1991. Mesocorticolimbic dopaminergic network: Functional and regulatory roles. Physiol Rev. 71, 155-234.

Lebedev, M. A., Carmena, J.M., O'Doherty, J.E., Zacksenhouse, M., Henriquez, C.S., Principe, J.C., Nicolelis, M.A., 2005. Cortical ensemble adaptation to represent velocity of an artificial actuator controlled by a brain-machine interface. J Neurosci. 25, 4681-4693.

Lee, K. H., Chang, S.Y., Roberts, D.W., Kim, U., 2004. Neurotransmitter release from high frequency stimulation of the subthalamic nucleus. J Neurosurg. 101, 511-517.

Levy, R., Hazrati, L.N., Herrero, M.T., Vila, M., Hassani, O.K., Mouroux, M., Ruberg, M., Asensi, H., Agid, Y., Feger, J., Obeso, J.A., Parent, A., Hirsch, E.C., 1997. Re-evaluation of the functional anatomy of the basal ganglia in normal and Parkinsonian states. Neurosci. 76, 335-343.

Ljungberg, T., Apicella, P., Schultz, W., 1992. Responses of monkey dopamine neurons during learning of behavioral reactions. J Neurophysiol. 67, 145-163.

Lou, J. S., Bloedel, J.R., 1992. Responses of sagittally aligned Purkinje cells during perturbed locomotion: synchronous activation of climbing fiber inputs. J Neurophysiol. 68, 570 –580.

Marciano, F. F., Greene, K.A., 1992. Surgical management of parkinson's disease. Part I: Paraneural and neural tissue transplantation. Neurol Forum. 3, 1-7.

Mayberg, H. S., Lozano, A.M., Voon, V., McNeely, H.E., Seminowicz, D., Hamani, C., Schwalb, JM, Kennedy, S.H., 2005. Deep brain stimulation for treatment-resistant depression. Neuron. 45, 651-660.

McAuley, J. H., Marsden, C.D., 2000. Physiological and pathological tremors and rhythmic central motor control. Brain. 123, (8), 1545-1567.

Mendell, L. M., Henneman, E., 1971. Terminals of single Ia fibers: Location, density, and distribution within a pool of 300 homonymous motoneurons. J Neurophysiol. 34, 171-187.

Minsky, M. L., 1963. Steps towards artificial intelligence, in: Feigenbaum, E.A., Feldman, J., (Eds.), Computers and Thought. McGraw-Hill, New York, pp. 406-450.

Montgomery, E. B., Gorman, D.S., Nuessen, J., 1991. Motor initiation versus execution in normal and Parkinson’s disease subjects. Neurology. 41, 1469-1475.

Munakata, T., 1998. Fundamentals of the new artificial intelligence,  Springer, New York.

Nicolelis, M. A., Ribeiro, S., 2002. Multielectrode recordings: the next steps. Curr Opin Neurobiol. 12, 602-606.

Obeso, J. A., Guridi, J., Rodriguez-Oroz, M.C., Macias, R., Rodriguez, M., Alvarez, L., Lopez, G., 2000. Functional models of the basal ganglia: Where we are?, in: Lozano, A.M., (Ed.), Movement Disorder Surgery. Prog Neurol Surg, pp. 58-77.

Parent, A., Cicchetti, F., 1998. The current model of basal ganglia organization under scrutiny. Mov Disord. 13, 199-203.

Schultz, W., 1998. Predictive reward signal of dopamine neurons. J Neurophysiol. 80, 1-27.

Sergio, L. E., Kalaska, J.F., 1997. Systematic changes in directional tuning of motor cortex cell activity with hand location in the workspace during generation of static isometric forces in constant spatial directions. J Neurophysiol. 78, 1170-1174.

Shepherd, G. M., 2003. The synaptic organization of the brain,  Oxford University Press, Oxford.

Sherrington, C. S., 1947. The integrative action of the nervous system,  Yale University Press, New Haven.

Shimansky, Y. P., 2000. Spinal motor control system incorporates an internal model of limb dynamics. Biol Cybern. 83, 379-389.

Thompson, R. F., 1989. Role of the inferior olive in classical conditioning. Exp Brain Res. 17, 347-362.

Ullström, M., Kotaleski, J.H., Tegnér, J., Aurell, E., Grillner, S., Lansner, A., 1998. Activity-dependent modulation of adaptation produces a constant burst proportion in a model of the lamprey spinal locomotor generator. Biol Cybern. 79, (1), 1-14.

Wichmann, T., Delong, M.R., Vitek, J.L. , 2000. Pathophysiological considerations in basal ganglia surgery: role of the basal ganglia in hypokinetic and hyperkinetic movement disorders, in: Lozano, A.M., (Ed.), Movement Disorder Surgery. Prog Neurol Surg, pp. 31-57.

[1] MPTP (1-methyl-4-phenyl-1,2,5,6-tetrahydropyridine) causes death of dopaminergic neurons and is broadly utilized to obtain animal model of parkinsonism.

[2] "When you do not know a thing, to allow that you do not know it - this is knowledge.” Confucius.

[3] The term is an abbreviation for "backwards propagation of errors".

[4] It is Grossman’s misquote of H. L. Mencken, said who has been famous for saying, “There is always an easy solution to every human problem”. This misquote is more often used in scientific literature.

[5] Here are some examples from my own experience demonstrating how big this distance is. Phase (state) space is often confused with 3d physical space, and any reasoning based on multidimensional state space is considered as being deranged. Internal model of object behavior is often identified with innate model known in psychology. Moreover, it is not a rarity to meet neuroscientists who are convinced that “there are no models in the brain.” Kolmogorov’s theorem revealing neural network computational principle is quite often regarded as irrelevant to neuroscience. Oversimplified views of the brain are quite common. It is not rare to meet someone who is convinced that actual neurons are just threshold elements and all of them are interconnected in the brain. The brain’s unique qualities are just the result of the sheer number of connected neurons.