Contents
- Background
- Common learning mechanisms
- Reinforcement contingencies and their informative significance
- Stimulus reinforcement contingency
- No stimulus reinforcement contingency
- Response reinforcement contingency
- No response reinforcement contingency
- References
Background
August 23, 2009
Originating from extensive observations of various types of animals, Charles Darwin (1859) theorized that animals adapt to changes in their environment. He argued that the variation among individual members of a species is the source of new adaptive forms. When changing conditions occur under nature individual members of a species with useful variations are more likely to survive. In turn, the surviving members will likely reproduce and pass those favorable variations onto their offspring. The continual challenges to survival posed by the environment force this process to repeat over and over across generations with the effect that the characteristics that contribute to survival persist and those that do not diminish. In this way the genetic characteristics that contribute to survival are naturally selected by the environment. Evolution is therefore the product of genetic variation and natural selection.
Evolution is not, however, confined to physical or innate characteristics. The capacity to learn is also a product of natural selection. Only learning mechanisms enable animals to respond rapidly, precisely, and adaptively to changing conditions. Innate response mechanisms simply are not as flexible for negotiating the continual challenges faced by animals.
A general feature of information processing systems that animals share is the capacity for associative learning, which, here, is the acquisition of knowledge about the relationship between events in an organism’s environment. All animals benefit from being able to learn about the relationship between events. The capacity to learn about the relationship between an environmental cue or signal and an outcome or one’s own action and an outcome enables animals to anticipate future events and modify behavior to suit the situation. Obviously, it is of great advantage and constitutes a substantial influence on behavioral adjustment to the environment for animals to learn which events predict the occurrence of biologically important events and on the basis of those predictions which responses are the most optimal. In fact, one of Darwin’s theoretical precepts is that whenever a characteristic is found to be either unique to a species or general across species, it may be assumed that it plays an important function for survival and if it does not, it would not have become specific or general.
Common learning mechanisms
Common characteristics can be explained by either common ancestry or by convergent selection pressures. Convergence, a product of convergent selection pressures, is the tendency to grow alike or develop similarities in form or habit in order to cope with similar problems and environmental resources. Often adaptations that share a common function are the product of convergent selection pressures or common environmental constraints. The example Dickinson (1980) offers in his book on animal learning is that all animals that live in the sea are faced with the common constraint of the relative uniformity of the medium in which they move. Faced with this common constraint, many different species have evolved with similar body shapes for the function of efficient swimming.
The important analogy Dickinson makes is that many different species also face common learning problems. The survival of animals often depends on their capacity to detect, learn, and store information about the relationship between events in their environment that are important to them. Survival depends on knowing which environmental events predict injury or assault, what foods to eat or avoid, where to find water, as well as knowing about the events that signal the nonoccurrence of harm or benefit, or events that are uninformative (signal nothing) so that attention can be directed elsewhere. In general, survival, a common selection pressure, often depends on the capacity to learn which sensory inputs signal the occurrence or nonoccurrence of important events and which inputs can be ignored, along with the actions that can cause important events to occur or not occur.
The critical question Dickinson asks is whether the relationship between events has properties that are common to many different species and situations and therefore would likely shape or maintain common learning mechanisms. Dickinson and others (e.g. Testa, 1974) argue that associative learning mechanisms have evolved to enable animals to detect and store information about the real causal relationships in their environment and the conditions under which learning takes place are those in which there is a causal relationship between events. He emphasizes that events that lie on the causal chain leading to the occurrence of events of value have universal properties that transcend any particular adaptations or species specific adaptations. For example, for all species effects never occur without a cause and they never occur before their cause. Causal events are unidirectional ensuring that certain events reliably precede other events. Since there is a common selection pressure for all animals to predict the occurrence of events that are important to them, and since all causal events follow the same primary causal laws, the evolutionary adaptation animals must share is the capacity to detect the causes of those important events, since the best predictors of important events are their causes.
Just as the homogeneity of our oceans along with the laws of fluid mechanics has contributed to the evolution of similar body types for the function of hydrodynamic efficiency, the universal survival needs to predict the occurrence of important events along with the universal properties of cause and effect relationships has most likely contributed to the evolution of common learning mechanisms in different animal species for the function of preparatory and instrumental efficiency. Animals that are able to learn about real world cause and effect relationships are more likely to survive and pass that capacity onto their offspring. Whereas, the animals that are not able to learn about cause and effect relationships are more likely to perish. Moreover, since cause and effect relationships are universal to all animals, the capacity to learn about real world causal relationships should be found in a number of different species which would help them to make sense of the world and structure the relationships between events in space and time.
As it turns out, the capacity to associate, that is detect, learn, and store information about causal relationships between events in the environment, is common among species of higher vertebrate such as mammals, birds, and some invertebrate. Vertebrate typically absorb information from exposure to the relations among events in their environment. For instance, once neutral stimuli that have been followed with some motivationally significant event will often be used as signals predicting the significant event when similar events occur in the future. In the real world a lot of what an animal must learn about, so that profitable approach or withdrawal can transpire, are the predictive environmental signs of benefit or harm. The learned signs or signals that foretell the occurrence of important events produce anticipatory responses (classical conditioning) in all vertebrate and many invertebrate species (MacPhail 1982), which provide a preparatory advantage that goes beyond just the capacity to respond with reflex action (1). The anticipatory responses enable animals to optimize interaction with the anticipated important events (Hollis 1982). Moreover, the informative signals can be used to guide appropriate decisions and responses that cause a desired outcome. Along with the capacity to detect and learn about the predictive relationships between environmental signals and important events, vertebrate have also evolved to learn about and control responses that are instrumental in procuring valuable resources and avoiding injury or assault (instrumental conditioning). It is the capacity for instrumental, goal-directed, action that allows animals to control their environment in the service of their needs or desires. In the real world, where change is common, successful adaptation requires both the ability to predict future events of importance and the ability of learning about responses that are instrumental to gaining access to important events of benefit or avoiding events that may be harmful. There is now ample evidence showing that both mammals and birds possess associative learning mechanisms that are indeed finely tuned to detect variations in the degree of correlation between a signal or cause and an outcome. Or we could say they possess associative learning mechanisms that are finely tuned to detect variations in the degree of correlation between a stimulus or response and a reinforcer that is arranged in a reinforcement contingency (2).
Reinforcement contingencies and cause and effect relationships
If animals are capable of detecting the causal relationships in their natural environment then they must also be capable of detecting the causal relationship between the events that are arranged in a reinforcement contingency, because to arrange a reinforcement contingency is to arrange a causal relationship. In other words, all animals that have the capacity to detect causal relationships in their natural environments also have the capacity to detect the causal relationships that are arranged in reinforcement contingencies.
In associative conditioning the trainer arranges a correlative or causal relationship between events by the arrangement of a particular contingency between a stimulus or response and reinforcement. The function of the contingency is to enable the subject to detect the correlation or causal relationship between those events. Learning is not observed directly; we cannot see the subject learn, we can only observe a change in behavior after exposure to the contingent relationship and then suppose, from the change in behavior, the animal has learned something about that relationship. When the subject is exposed to a certain relationship the behavioral changes observed by the trainer are used as an index or form of scientific measurement indicating the subject has learned something about that relationship. Instead of viewing the changes in behavior as the learning of new responses, the changes in behavior are viewed as an index that the subject has successfully learned something about the relationship between the events that the trainer contingently arranged. That is, from learning the subject is changed and the change in behavior is a reflection of that change or learning, although learning is not always evinced by a change in behavior. The effects of the contingency are assessed by comparing behavioral changes or responses that are maintained when the contingency operates with responses that are maintained when it does not (3). If, as a consequence of exposure to a reinforcement contingency the subject’s behavior has changed, the trainer may infer learning has occurred.
Stimulus reinforcement contingency
In the simplest case, the contingency is either between a particular stimulus and reinforcement, termed Pavlovian or classical conditioning, or between a particular response and reinforcement, termed instrumental conditioning (4). When there is a stimulus reinforcement contingency or classical contingency the subject is reinforced following the onset of the to be conditioned stimulus (termed a conditional stimulus or CS) and in its absence reinforcement is omitted. The function of both reinforcement following the occurrence of the stimulus and the omission of reinforcement in the absence of the stimulus is to better enable the subject to detect the relationship between the stimulus and reinforcement and the absence of the stimulus and no reinforcement. For instance, Rescorla (1968) showed that simple temporal contiguity, or proximity, between events is not sufficient for producing an association between them. Animals need to learn about the probability of an outcome both in the presence and in the absence of its supposed signal. For example, a certain stimulus would not be a very good cue or signal predicting the presence of, say, a predator unless that stimulus is encountered more often just prior to the predator than in its absence. In this way the subject learns about the causal or informative relationship between events, in this case, that the stimulus signals or predicts an aversive reinforcer, the predator (see note 2c.). The subject acquires knowledge about the relationship between events and the response is a measure of that knowledge.
No stimulus reinforcement contingency
Conversely, if there were no contingency between a particular stimulus and reinforcement the stimulus would provide no information about the probability of reinforcement which would likely result in the subject learning to ignore that stimulus. An experimental procedure that has been used to demonstrate the effects of a zero contingency or correlation between events to be associated is the “random control” procedure (Rescorla 1967). Suppose, for example, we take a hungry dog and put it in some sort of apparatus with an automatic feeder, and while in the apparatus a bell rings around every minute or so. Now let’s suppose, from this basic procedure, two contrasting experiments. In the first experiment (the “random control” procedure) the dog receives small amounts of food from the feeder always, let’s say, five seconds after the bell rings. However, along with the food always occurring promptly after the bell, food also randomly occurs more temporally distant after the bell rings or before the bell (e.g. sometimes a minute or two after the bell or five seconds before the bell rings). So, over the long run, the probability of the food occurring five seconds after the bell or at some other time is the same (5). In this arrangement, in which there is no reinforcement contingency, the occurrence of the bell can not predict the occurrence of the food. In other words, the bell does not provide information about the coming of the food. In the second experiment the food is also made available, but now always and only five seconds after the ringing of the bell so the occurrence of food is contingent upon, and can be correlated with, the occurrence of the bell (6).
In the first experiment the dog will probably pay attention to the bell the first few times it rings, but will soon learn to ignore it because, over the long run, when there is no contingency (or cause and effect relationship), the dog learns that the occurrence of the bell provides no information about the occurrence of the food. Or to put it informally, the dog attributes the bell-food pairings to chance. Since the food is just as likely to occur in the absence of the bell as when the bell and food are paired the dog has no reason to believe the bell is a signal for the food and thus, the bell provides no information about the occurrence of food. After exposure to uncorrelated presentations of the bell and food, the dog will pay no more attention to the bell than to any other uninformative feature of the apparatus, which would later interfere with the formation of an association between that stimulus and reinforcement (see e.g. Baker & Mackintosh 1977). The opposite will be the case in the second experiment; the dog will probably pay attention to the bell the first time it rings and then continue to pay more and more attention to it instead of less and less because the food only and always promptly follows the bell. That is, the presentation of the food does not occur at other times in the absence of the bell or before the bell.
The critical observation is that in both experiments the food always occurs five seconds after the bell, that is, they both share the same contiguity. However, they differ in the amount of information the bell provides about the probable occurrence of the food. In the first experiment, the food is equally likely to occur promptly after the bell rings as it is at some other time, so the bell provides no information about the occurrence of the food; in the second experiment, the food or reinforcer only and always occurs five seconds after the ringing of the bell, so the bell is a very good source of information predicting the occurrence of the food (7). What distinguishes the two experiments is the reinforcement contingency. In the first experiment there is no reinforcement contingency or causal relationship because the reinforcer occurs, not only after the bell, but also, at other times without the bell or before the bell (effects do not occur without a cause or before their cause); whereas, in the second experiment there is a contingency or causal relationship because the dog only and always gets the food reinforcer promptly after the bell.
If associative learning mechanisms evolved to detect real world causal relationships and if animal learning involves the acquisition of knowledge then a way to inform subjects of the significance of the to be conditioned stimulus or response is to apply the same laws that govern causal relationships to reinforcement contingencies. In a causal relationship an effect never occurs without a cause and the effect never occurs before the cause. Similarly, for a reinforcement contingency to be most effective the reinforcer (effect) should not occur without the to be conditioned stimulus or response (cause) and should not occur before the to be conditioned event. If the to be conditioned stimulus or response is the signal or cause of the reinforcer then the reinforcer would not and should not occur without the stimulus or response. Here, regularity and temporal contiguity between events is not everything. If regularity and temporal contiguity were the only things arranged during conditioning, the subject would be unable to resolve whether or not the pairing of events reflected the presence of a causal relationship or a chance happening. Both regularity and the temporal contiguity between events in reinforcement contingencies are important because they are important indicators of a possible causal relationship between events. But, when the possible cause or effect occur in isolation of one another half of the time (i.e. when there is no reinforcement contingency) then the probability that they are related is removed or canceled out. Looked at from an adaptive or real world perspective, events often occur together that are not reliably or causally related to an outcome or reinforcer. The world is full of chance conjunctions of events. If animals attributed the occurrence of a valued outcome to every event that happened to precede it, no matter its regularity or the temporal interval between a stimulus or response and outcome, then they would not be able to make sense of the world. For them there would be no causal structure and they would not be able to make sense of the world and would not likely survive. It’s more adaptive for associations to be formed selectively, in favor of better predictors of valued outcomes at the expense of worse predictors (Mackintosh 1975, 1977).
Response reinforcement contingency
When the reinforcement contingency is between a response and reinforcement the subject is reinforced promptly after the to be conditioned response if and only if the subject performs the particular target response. If the subject fails to respond correctly reinforcement is omitted, which is informationally important. In instrumental learning, a problem animal’s face is in discriminating a causal relation between their actions and an outcome from outcomes that occur independently of their behavior. One way for them to achieve this discrimination is to contrast the likelihood of the outcome when the target action is performed with occasions when the behavior is not performed and reinforcement is omitted. It is this contingent arrangement for reinforcement that facilitates the detection of causal relationships between events. From this type of contingency animals can learn that the occurrence of reward is not random, that there is a relationship between the correct response and reward and the absence of that response and no reward. They learn about the relationship between their actions and outcomes, that their actions are instrumental in causing the outcomes. That is, to the extent that reinforcement is more probable when the correct response is performed then when it is not, subjects can learn that they have causal control over the outcome. They control or cause the outcome by their actions.
No response reinforcement contingency
An instrumental analogue to learning to ignore uninformative or irrelevant stimuli, termed ‘learned irrelevance’ (Mackintosh, 1973), is a phenomenon Seligman and Maier (1967) termed ‘learned helplessness’. Seligman and Maier found that when dogs were first exposed to a series of inescapable shocks they would later have tremendous difficulty learning to simply jump over a barrier in order to escape or avoid shock. One of their observations was that the dogs prior experiences, in which there was absolutely no correlation (contingency) between what the dogs did to escape or avoid shock and the occurrence of shock, interfered with the subsequent detection of a correlation, or probable causal relationship, between their behavior and its consequences. The phenomenon of learned helplessness has been confirmed in a number of other species, including humans (e.g. Hiroto 1974), that were tested in various experimental situations establishing the effect depends on the inescapability of the aversive events to which subjects are initially exposed (Maier & Seligman, 1976). For instance, initial exposure to shock from which animals can escape has little or no detrimental effect on subsequent escape or avoidance performance (see e.g. Maier, 1970; Volpicelli et al., 1983).
In another interesting experiment Goodkin (1976) found that rats who had initially received free food, without having to earn it (i.e. with no contingency or correlation between responding and food), were subsequently almost as slow to learn to respond in order to escape or avoid shock as those animals who had initially received inescapable shock. This suggests that among other things, learned helplessness is most probably part of a more general phenomenon in which initial exposure to a zero correlation between responding and reinforcement interferes with the subsequent detection of a correlation or contingency between responding and reinforcement. Initial exposure to an appetitive or aversive reinforcer whose onset and termination are uncorrelated with any action of the subject’s will later interfere with the detection of a correlation between the subject’s behavior and reinforcement. Similarly, a zero correlation between a stimulus and a reinforcer will interfere with the detection of a correlation between the two events when a contingency between them is subsequently introduced. That is, animals are capable of learning, not only about the events that predict or cause the occurrence of important events, but also about stimuli or responses that are uncorrelated with them and conditioning suffers as a result (Baker, 1976; Mackintosh, 1973).
As we can see, conditioning procedures, involving the maintenance of reinforcement contingencies, help facilitate the detection of events that predict or cause the occurrence of additional events of value (the reinforcer). In turn, those predictive events are used as guides to action. However, when there is no reinforcement contingency animals can learn about events that occur independently of one another which makes those events uninformative. As a result, subsequent conditioning to those uninformative events can be retarded. In other words, conditioning occurs selectively; events that are better predictors or probable causes of a significant outcome are attended to whereas events that are uninformative, less informative, or redundant are ignored (8).
Notes:
(1). The word preparatory is used here to describe the function of the conditioned response. It is not meant to be synonymous with terms such as “preparatory-response” used in other animal learning contexts that are non-Pavlovian.
(2)a. When a reinforcement contingency is in operation reinforcement depends upon the occurrence of a to be conditioned stimulus or response. In if-then terminology, only if event X (the stimulus or response) occurs will event Y (the reinforcer) occur. In the simplest case, an event that is a contingent stimulus event is one in which the occurrence of a particular stimulus is the prerequisite for the occurrence of the conditional event, the reinforcer. That is, reinforcement is contingent upon the occurrence of a particular stimulus and in its absence reinforcement is omitted. A contingent response event is one in which a particular response is the prerequisite for the reinforcer. That is, reinforcement is contingent upon the occurrence of a particular response and if the subject fails to respond correctly reinforcement is omitted. If the occurrence of event Y (reinforcement) is dependent upon the occurrence of event X (a stimulus or response) then a reinforcement contingency is said to exist between events.
(2)b. A stimulus, in the broadest sense, is the physical energy in the environment that impinges on an animal’s sensory apparatus, such as, sights, sounds, smells, tastes, and feelings. Environmental stimuli are potential sources of information that animals learn to attend to when they predict an outcome of importance to them and ignore when they are not as good sources of information predicting a particular outcome relative to other stimuli that are better. Although the stimulus energy may activate the sense organs, it is the learned informative significance of the stimulus that helps guide decisions and appropriate action.
(2)c. The events termed reinforcer or reinforcement are the events, usually of motivational significance, that are consequences or outcomes of either appetitive or aversive value. An outcome, either appetitive or aversive, presented conditional upon the occurrence of a particular neutral stimulus or upon the performance of a particular response will usually cause the animal’s behavior to change in one way or another, that is, either increase or decrease. For example, a conditional stimulus for food will usually cause an animal to increase behavior; whereas, a behavior that is followed with a shock will usually cause an animal to stop or decrease that behavior.
Additionally, the omission of an otherwise expected outcome (reinforcer) may change behavior by either increasing or decreasing it. For example, if a particular stimulus is a signal for food and increases behavior, the omission of that expected outcome might decrease that behavior, or another stimulus may signal the omission of food, and decrease that behavior. Or, taking it a step further, a stimulus may be a signal for food and increase that behavior, whereas another stimulus may be a signal for shock and decrease that behavior, but a third stimulus that signals the omission of shock may cause that behavior to increase.
The change in behavior is only an index that the subject has learned something about the conditional relationship.
An appetitive reinforcer, such as food, and an aversive reinforcer, such as shock, are reinforcers for both classical and instrumental conditioning.
(3). In scientific experiments, this involves a control group to see if behavior would have similarly changed even without exposure to the contingent relationship.
(4). The terms “reinforcement” and “reinforcer” are used here with relevance to both classical and instrumental conditioning. In classical or Pavlovian terminology, the reinforcer is called the unconditional stimulus or US.
(5). Strictly speaking, over the long run there can be no reinforcement contingency between events unless the reinforcer occurs more often more closely after the event to be conditioned than at other times more temporally distant. If the reinforcer more often occurs at longer time intervals after the event to be conditioned, relative to other times when there is closer temporal contiguity between the occurrence of the two events, there will be no contingency. Relative temporal proximity is a more important factor than contiguity in an absolute sense (see e.g. Rescorla 1967). The critical factor is not the temporal contiguity between events but rather the relative temporal proximity. If the probability of the outcome immediately following the to be conditioned event is the same as the probability of the outcome at later times (or in the absence of the to be conditioned event or if the reinforcer occurs before the to be conditioned event) then there can be no objective cause and effect relationship and, as such, there is no reinforcement contingency. In order for the to be conditioned event to be a good probable cause of the outcome or effect (reinforcer) the supposed effect should not occur at other times more temporally distant after the to be conditioned event, (in the absence of the event or probable cause, or before the event or probable cause).
(6). Since cause and effect relationships are one directional, always going from the cause to the effect, and since the function of the reinforcement contingency is to facilitate the detection of a causal relationship then the relationship that is arranged in the contingency should also always go from the cause to the effect or the to be conditioned stimulus or response followed by the reinforcer. And since in cause and effect relationships an effect never occurs without a cause, in a reinforcement contingency a reinforcer should not occur without the to be conditioned stimulus or response (see Dickinson 1980; Siegel & Domjan 1971).
(7). Although the pairing or contiguity of two events remains a primary concept to many psychologists and animal trainers alike, the contemporary view of conditioning emphasizes the information that one event provides about another and sees contiguity alone as insufficient (and sometimes unnecessary, as when there is no other probable predictor or cause that is more closely conjoined in space and time between the to be conditioned event and the reinforcer) for producing conditioning (see e.g.Rescorla 1972).
(8). For a good review in which many of the contemporary developments are in depthly delineated read Mackintosh (1983). In this important book Mackintosh examines some of the more traditional theoretical assumptions and offers alternative cognitive explanations that are better supported by a broader range of the experimental evidence.
References:
Baker, A. G. (1976). Learned irrelevance and learned helplessness: rats learn that stimuli, reinforcers and responses are uncorrelated. Journal of Experimental Psychology: Animal Behavior Processes, 2, 130-141.
Baker, A. G. and Mackintosh, N. J. (1977). Excitatory and inhibitory conditioning following uncorrelated presentations of CS and UCS. Animal Learning and Behavior, 5, 315-319.
Darwin, C. (1859). On the origin of species. London: J. Murray.
Dickinson, A. (1980). Contemporary animal learning theory. Cambridge University Press.
Goodkin F. (1976). Rats learn the relationship between responding and environmental events: an expansion of the learned helplessness hypothesis. Learning and Motivation, 7, 382-393.
Hiroto, D. S. (1974). Locus of control and learned helplessness. Journal of Experimental Psychology, 102, 187-193.
Hollis, K. L. (1982). Pavlovian conditioning of signal-centered action patterns and automatic behavior: A biological analysis of function. Advances in the Study of Behavior. 12, 1-64.
Mackintosh, N. J. (1973). Stimulus selection: learning to ignore stimuli that predict no change in reinforcement. In R. A. Hinde and J. Stevenson-Hinde (Eds.) Constraints on Learning. London: Academic Press, 75-96.
Mackintosh, N. J. (1975). A theory of attention: Variations in the associability of stimuli with reinforcement. Psychological Review, 82, 276-298.
Mackintosh, N. J. (1977). Conditioning as the perception of causal relations. In R. E. Butts and J. Hintikka (Eds.), Foundational problems in the special sciences. Dordrecht, Netherlands: Reidel, 241-250.
Mackintosh, N. J. (1983). Conditioning and associative learning. Clarendon Press, Oxford.
MacPhail, E. M. (1982). Brain and intelligence in vertebrates. Oxford: Clarendon Press.
Maier, S. F. (1970). Failure to escape traumatic electric shock: incompatible skeletal-motor responses or learned helplessness. Learning and Motivation, 1, 157-169.
Maier, S. F. and Seligman, M. E. P. (1976). Learned Helplessness: theory and evidence. Journal of Experimental Psychology: General, 105, 3-46.
Rescorla, R. A. (1967). Pavlovian conditioning and its proper control procedures. Psychological Review, 74, 71-80.
Rescorla, R. A. (1968). Probability of shock in the presence and absence of CS in fear conditioning. Journal of Comparative and Physiological Psychology, 65, 55-60.
Rescorla, R. A. (1972). Informational variables in Pavlovian conditioning. In G. H. Bower (Eds.), The psychology of learning and motivation (Vol. 6). New York: Academic Press.
Seligman, M. E. P. and Maier, S. F. (1967). Failure to escape traumatic shock. Journal of experimental Psychology, 74, 1-9.
Siegel, S, and Domjan, M. (1971). Backward conditioning as an inhibitory procedure. Learning and Motivation, 2, 1-11.
Testa, T. J. (1974). Causal relations and the acquisition of avoidance response. Psychological Review, 81, 491-505.
Volpicelli, J. R., Ulm, R. R., Altenor, A., and Seligman, M. E. P. (1983). Learned mastery in the rat. Learning and Motivation, 14, 204-222.
Copyright © E Hale 2009.