It has long been known that behavior is affected by its consequences.
We reward and punish people, for example, so that they will behave in different ways. A more specific effect of a consequence
was first studied experimentally by Edward L. Thorndike in a well-known experiment. A cat enclosed in a box struggled
to escape and eventually moved the latch which opened the door. When repeatedly enclosed in a box, the cat gradually
ceased to do those things which had proved ineffective ("errors") and eventually made the successful response very quickly.
In operant conditioning, behavior is also affected by
its consequences, but the process is not trial-and-error learning. It can best be explained with an example. A
hungry rat is placed in a semi-soundproof box. For several days bits of food are occasionally delivered into a tray
by an automatic dispenser. The rat soon goes to the tray immediately upon hearing the sound of the dispenser.
A small horizontal section of a lever protruding from the wall has been resting in its lowest position, but it is now raised
slightly so that when the rat touches it, it moves downward. In doing so it closes an electric circuit and operates
the food dispenser. Immediately after eating the delivered food the rat begins to press the lever fairly rapidly.
The behavior has been strengthened or reinforced by a single consequence. The rat was not "trying" to do anything
when it first touched the lever and it did not learn from "errors."
To a hungry rat, food is a natural reinforcer, but the reinforcer
in this example is the sound of the food dispenser, which was conditioned as a reinforcer when it was repeatedly followed
by the delivery of food before the lever was pressed. In fact, the sound of that one operation of the dispenser would
have had an observable effect even though no food was delivered on that occasion, but when food no longer follows pressing
the lever, the rat eventually stops pressing. The behavior is said to have been extinguished.
An operant can come under the control of a stimulus. If pressing the
lever is reinforced when a light is on but not when it is off, responses continue to be made in the light but seldom, if at
all, in the dark. The rat has formed a discrimination between light and dark. When one turns on the light,
a response occurs, but that is not a reflex response.
The lever can be pressed with different amounts of force, and if only strong responses are reinforced, the rat presses
more and more forcefully. If only weak responses are reinforced, it eventually responds only very weakly. The
process is called differentiation. A response must first occur for other
reasons before it is reinforced and becomes an operant. It may seem as if a very complex response would never occur
to be reinforced, but complex responses can be shaped by reinforcing their component parts separately and putting them
together in the final form of the operant.
reinforcement not only shapes the topography of behavior, it maintains it in strength long after an operant
has been formed. Schedules of reinforcement are important in maintaining behavior. If a response has been
reinforced for some time only once every five minutes, for example, the rat soon stops responding immediately after reinforcement
but responds more and more rapidly as the time for the next reinforcement approaches. (That is called a fixed-interval
schedule of reinforcement.) If a response has been reinforced n the average every five minutes but unpredictably, the
rat responds at a steady rate. (That is a variable-interval schedule of reinforcement.) If the average
interval is short, the rate is high; if it is long, the rate is low.
If a response is reinforced when a given number of responses has been emitted, the rat responds more and more rapidly
as the required number is approached. (That is a fixed-ratio schedule of reinforcement.) The number can be increased
by easy stages up to a very high value; the rat will continue to respond even though a response is only very rarely reinforced.
"Piece-rate pay" in industry is an example of a fixed-ratio schedule, and employers are sometimes tempted to "stretch"
it by increasing the amount of work required for each unit of payment. When reinforcement occurs after an average number
of responses but unpredictably, the schedule is called variable-ratio. It is familiar in gambling devices and
systems which arrange occasional but unpredictable payoffs. The required number of responses can easily be stretched,
and in a gambling enterprise such as a casino the average ratio must be such that the gambler loses in the long run if the
casino is to make a profit.
Reinforcers may be positive or negative. A positive reinforcer
reinforces when it is presented; a negative reinforcer reinforces when it is withdrawn. Negative reinforcement is not
punishment. Reinforcers always strengthen behavior; that is what "reinforced" means. Punishment is used to suppress
behavior. It consists of removing a positive reinforcer or presenting a negative one. It often seems to operate
by conditioning negative reinforcers. The punished person henceforth acts in ways which reduce the threat of punishment
and which are incompatible with, and hence take the place of, the behavior punished.
This human species is distinguished by the fact that its vocal responses can be easily conditioned as operants.
There are many kinds of verbal operants because the behavior must be reinforced only through the mediation of other people,
and they do many different things. The reinforcing practices of a given culture compose what is called a language.
The practices are responsible for most of the extraordinary achievements of the human species. Other species acquire
behavior from each other through imitation and modelling (they show each other what to do), but they cannot tell each other
what to do. We acquire most of our behavior with that kind of help. We take advice, heed warnings, observe rules,
and obey laws, and our behavior then comes under the control of consequences which would otherwise not be effective.
Most of our behavior is too complex to have occurred for the first time without such verbal help. By taking advice and
following rules we acquire a much more extensive repertoire than would be possible through a solitary contact with the environment.
Responding because behavior has had reinforcing consequences is
very different from responding by taking advice, following rules, or obeying laws. We do not take advice because of
the particular consequence that will follow; we take it only when taking other advice from similar sources has already had
reinforcing consequences. In general, we are much more strongly inclined to do things if they have had immediate reinforcing
consequences than if we have been merely advised to do them.
The innate behavior studied by ethologists is shaped and maintained by its contribution to the survival of the individual
and species. Operant behavior is shaped and maintained by its consequences for the individual. Both processes
have controversial features. Neither one seems to have any place for a prior plan or purposes. In both, selection
Personal freedom also seems threatened.
It is only the feeling of freedom, however, which is affected. Those who respond because their behavior has had positively
reinforcing consequences usually feel free. They seem to be doing what they want to do. Those who
respond because the reinforcement has been negative and who are therefore avoiding or escaping from punishment are doing what
they have to do and do not feel free. These distinctions do not involve the fact of freedom.
The experimental analysis of operant behavior has led to a technology often called behavior
modification. It usually consists of changing the consequences of behavior, removing consequences which have caused
trouble, or arranging new consequences for behavior which has lacked strength. Historically, people have been controlled
primarily through negative reinforcement that is, they have been punished when they have not done what is reinforcing to those
who could punish them. Positive reinforcement has been less often used, partly because its effect is slightly deferred,
but it can be as effective as negative reinforcement and has many fewer unwanted byproducts. For example, students who
are punished when they do not study may study, but they may also stay away from school (truancy), vandalize school property,
attack teachers, or stubbornly do nothing. Redesigning school systems so that what students do is more often positively
reinforced can make a great difference.
(For further details, see my The Behavior of Organisms, my Science
and Human Behavior, and Schedules of Reinforcement by C. F. Ferster and me.)
-- B. F. Skinner