Operant Conditioning was a break through in learning theory, it’s the basis of how rewards and punishment are used in order to increase or decrease a desired behavior. If you are a parent there is a good chance you have used operant conditioning without even knowing it. This post will cover the different types of reinforcement and punishment, the schedules these can be administered on, some of the key concepts, and we will also touch on social learning theory (which is not a part of operant conditioning, but rather another learning theory).
On the EPPP Operant Conditioning falls under the topic of Cognitive-Affective Bases of Behavior which is weighted at 13%, so you should know this topic fairly well. The information is best used in collaboration with your own knowledge and what you have already been taught, your previous notes or textbooks, and research if you need more information on a particular topic. Or you could check out the product reviews page to see a variety of products to help you pass the EPPP.
Reinforcement and Punishment
Operant Conditioning is based on the assumption that the way we learn is based on what rewards and what punishes specific behaviors. The 2 major contributors were B.F. Skinner and E.L Thorndike, who created the Thorndike Law of Effect which states that behaviors are originally displayed at random and then become either weaker or stronger depending on the reward or punishment that follows.
The terms reinforcement vs punishment and positive vs negative can be confusing if you are trying to understand them as being either pleasant or unpleasant, and you have to separate what these words mean in contemporary society to truly understand operant conditioning. Reinforcement is an effort to increase behavior and Punishment is an effort to decrease behavior. Positive is adding something after the behavior and negative is taking something away to increase the behavior.
Positive Reinforcement: This is a typical reward, and after a behavior is performed something is given to the person or animal that will make them want to perform this behavior again in the future. For example praising a child for sharing.
Negative Reinforcement: This is typically relief, and involves removing something disturbing in an effort to increase the behavior. For example, a wife nags her husband to clean the garage until he finally does it. Then once the behavior is performed the nagging stops.
Positive Punishment: This is typically something painful (either emotionally or physically), and involves adding something aversive or unwanted in order to decrease the behavior from occurring again. For example giving a child a spanking, extra chores, or scolding are all positive punishment.
Negative Punishment: This is typically loss of something, and involves removing something of value in an effort of reducing the behavior from occurring again. For example time outs, loss of privileges like TV, or penalties in a hockey game. For example, a child hits her sister, the father gives the child a time out, the children do not fight for the rest of the day. There are actually 2 different behaviors that can be identified in this example:
- The Child’s target behavior of hitting her sister: After the behavior the dad takes away her privileges (removed=negative) and the child is in a less desirable state so most likely the child will not hit her sister in the future. So this is Negative Punishment. it would have been positive punishment is the child was spanked (added=positive).
- The Fathers target behavior of giving time outs: After the time out the child does not hit her sister anymore (removed=negative) and the father is in a more desirable state as the kids stop fighting, so most likely he will give time outs in the future. So this is Negative Reinforcement.
Schedules of Reinforcement
When using operant conditioning there are different phases of learning. The Aquisition Phase is when the learning occurs and Extinction is when the reinforcement is stopped. Not all responses are equal and Operant Strength is typically determined by the schedule of reinforcement. This can occur either continuously or intermittently
Continuous Reinforcement: This is basically when a reinforcement is given each and every time a behavior occurs. It is the most effective form of reinforcement during the acquisition phase, but is easily prone to extinction. For example is chocolates are given each and every time a child raises their hand in a classroom they will learn quickly to raise their hand, but as soon as the chocolates are no longer given they are more susceptible to slip back into old behaviors of calling out the actions. They are also prone to Satiation, which is basically not wanting the chocolates or reinforcement anymore.
Intermittent Reinforcment: Rather than being reinforced on every occurrence of the behavior, the reinforcement is only given every once in a while. Ideally a subject starts with continuous reinforcement and then moves to intermittent reinforcement for the best results. The intermittent schedule can be based on either the number of behaviors or the amount of time that has passed, and there are 4 possible schedules:
- Fixed Ratio: After a certain unchanging amount of responses a reinforcement is given. For example paying someone for every 100 envelopes they stuff. This has moderate to high effectiveness, and often results in a pause or break after receiving the reinforcement.
- Variable Ratio: After a unpredictable number of responses a reinforcement is given. For example playing the slot machines, you don’t know how many times you need to pull the lever but eventually you will get paid. This is a very effective schedule of reinforcement, and often results in continuous behavior with little pause
- Fixed Interval: After a set amount of time has elapsed, the first time the behavior occurs it is reinforced and the interval is reset. For example, you are able to pick up your paycheck after you work shift every 2 weeks. This typically has limited effectiveness for productivity
- Variable Interval: After a variable amount of time has passed, the first time the behavior occurs it is reinforced and the interval is reset. If the subject is unaware of this interval it is moderately effective in reinforcing a behavior. For example every now and then after variable amounts of time a child is rewarded with ice cream for good behavior.
Important Terms and Concepts
Stimulus Generalization: When a subject emits the target behavior in front of stimuli that is similar to but not
the same as the stimuli originally used for reinforcement. For example a child learns to raise there hand in school, and so they raise there hand in all setting with an instructor such as swimming lessons.
Response Generalization: A similar behavior yet not exactly the same one is elicited in an attempt to be reinforced. For example a child shares their vegetable with their sibling expecting reinforcement after the behavior of sharing toys has been reinforced.
Stimulus Control (Discrimination Learning): Often in the real world target behaviors are reinforced in certain circumstances but not in others, and the subject learns to “discriminate” between these 2 situations. For example, a child who’s parents are divorced may know that throwing a tantrum at their mothers house will result in them being able to continue to play video games, but also know that throwing a tantrum at their fathers house will result in their video game time being taken away.
Operant Extinction: This refers to removing the previous reinforcement in order to stop the behavior from continuing to occur. For example after a child returns from their grandparents house where whining is reinforced by getting what they want, the parents no longer give the reinforcement in order to stop the target behavior of whining from continuing to occur. There is typically a response burst prior to extinction where the behavior will actually get more intense, but it will eventually be extinguished
Prompting: This involves cuing the subject as to what behaviors they should perform. For example a parent may start by cuing with “say please,” then “What’s the magic word” and then simply give look to cue the use of manners. The process of reducing the prompting is known as fading.
Shaping by Successive Approximation: In an effort to teach a child a target behavior, the subject is reinforced as the behavior gets closer and closer to the target behavior. Its like playing a game of “hot and cold” where a person is told they are getting warmer as they move closer to a prize. For example, when a child is learning colors they may be shown a picture of an orange ball and be reinforced for mimicking the word orange, then reinforced after the parent says “or” and the child finishes the word, and finally reinforced for pointing at the picture and saying orange.
Superstitious Behavior: Typically this starts when a behavior is reinforced accidentally, and the subject now believes if they continue to display the target behavior the reinforcement will occur again. For example, believing that your favorite sports team is more likely to win if you wear your team jersey and watch the game from a specific local pub, while sitting on a specific bar stool.
Chaining: This involves reinforcing a series of behaviors to get to the much desired target behavior, and there are many mild reinforcements along the way to get to the major reinforcement. For example in order to go on a blind date you need your car to start, find a parking spot, find the right restaurant you decided to meet at, and being able to find your date in a crowd of people. If you find out you forgot your wallet the whole chain stops.
Behavioral Contrast: In a situation where 2 behaviors are equally reinforced, and then one stops being reinforced, the behavior still being reinforced is likely to increase while the other behavior is likely to decrease. For example if you equally like texting Jeff and James, but find that eventually James stops responding to your texts but Jeff continues to respond, your texting behavior with Jeff will increase and will decrease with James.
Social Learning Theory
This is NOT a part of operant conditioning, rather it’s own theory separate from classical and operant conditioning. This is also known as the theory of observational learning, and is based on the idea that not all behavior can be learned through reinforcement or conditioning, but rather some learning occurs through observation of others. This theory was highly influenced by Bandura who in 1963 conducted the Bobo doll experiment, and found that children exposed to violence tended to mimic violent behaviors.
He also believed that behaviors are performed because we have the cognitive ability to anticipate future reinforcement, and is not necessarily based on learning from past behaviors. It has 4 steps: Attention, Retention, Production, and Motivation. He also developed the concept of Reciprocal Determinism which describes the interaction between a person, their behavior and their environment. For example a student believes that everyone in her class hates her and so she sits in the back of the class and doesn’t talk to anyone, which makes everyone believe she is a snob and so no one talks to her, reinforcing her belief that no one in her class likes her.
You may find the information on this site is not enough to help you feel confident about your ability to pass the exam, That is OK and only you can be the judge of what you need. If this information seems overwhelming to you it does NOT mean you will fail the exam, but you may require a little more in depth material than is offered here. That is why there is a Product Reviews page which will give you a variety of additional options, as well as practice exam questions which I highly recommend as explained on the Study Tips page.
***We often experience reinforcement naturally or accidentally in our environment, can you think of any examples of reinforcement? Do you believe in social learning theory, and that we have the ability to learn simply by observing others?