AB Testing – The Right Way
The next topic in the romp through Mindware by Richard Nisbett is the idea of testing. The scientific way is to test theories to see which hypothesis is true. The author says that if you can do testing, then go for testing over doing regression analysis. Testing gets closer to the reality than regression analysis. If you have results from both testing and regression analysis, and both are conflicting each other, go for the testing results. Regression analysis has a problem with correlations not really meaning anything sometimes.
But in order for testing to really work, you need 1) to do many trials 2) where the assignment of variables or people are random, and 3) the observations are independent. Remember in the statistic section where "n" samples has to be large before the results start to approach the population's average? The same philosophy applies here: you have to conduct enough tests before you can be certain of the findings. And again, just like in the statistics section, the assignment of people has to be random to eliminate any possible bias. The idea of independent observation is best explained by an example the author used: suppose you have 2 classes, one with 30 students and the other with 25 students. The "n" here is not 55 but 2. Why? Well, within each class, the students may well affect each other. Somebody could be very disruptive and thus slow down the class. The results of each students' behavior may well be dependent on the behavior of other students in the class.
The kind of experiment the author spoke mostly about is the AB testing, the one most popular in the last few years in the web design/development world but could be applied in other situations. What AB testing generally does is to experiment in changing one variable, such as color, and then assign people to receive that particular change. Then you count up the responses that you are testing for such as signups for email newsletter. Again, this works better when you have a large number of cases. Under large "n" small differences can become significant.
When doing AB testing, you will want to do "within design" or "before and after design". It is better to test placement of swimwear within the same store (in front of the store versus in back of the store) rather than compare placement of swimwear amongst various stores. By testing within one store, you keep the variables consistent (such as income level, gender, race, etc.) that could change due to location. You want to minimize the changes in other variables as much as you can so you can study one variable only: the placement of swimwear.
Here's a not so good way to do AB testing: there are too many variables here.
- Should you keep your child in a most hygienic place for as long as possible or should you allow him to play in the dirt, in contact with germs?
- Does Help Start really help the disadvantaged?
- Does grief counseling actually promote full blown crisis?
- Do Scared Straight programs work?
- What does the finds say about the effectiveness of D.A.R.E?
- Why is Finland's per capita income rising faster than the U.S.? Their per capital income is just below the U.S.
- Do experiments show that the long termed unemployed are discriminated?
"...resource scarcity can have dire consequences for the cognitive functioning of everyone from farmers to CEOs. If you ask people to imagine how they would rejigger their budgets if they suddenly were confronted with the need for an auto repair costing several thousand dollars and then give them an IQ test, you will find that the IQ of poor people takes a big beating." Mindware, p. 184.
So if across Middle America, manufacturing firms are pulling out of rural towns, then it's reasonable to speculate that rural towns have become resource poor. There are no jobs to be had and so no money comes in. What kind of impact will this have on the reasoning ability when choosing candidates? Will their judgment decline? Will they be able to discern what's best for the average American? Experiments suggest that their ability to make decisions that's best for themselves will be adversely impacted. If you are resource poor, you will not be able to think clearly.
To close on the discussion on experiments, the last thing I want to mention is that during experiments, you cannot depend on verbal reports on attitudes or causes of behavior. We are just not good at being aware of what may be driving us. We may have been subtly influenced by choice words or the order of questions. In experiments, you have to look at actions rather than rely on words.
Finally, you can perform AB experiments on yourself and the author provides a brief methodology of doing so.
Other thinking tools:
- Be aware that we don’t always know what is going on in our thinking process.
- Take into account of the fact that the situation may be driving people’s or even your behavior.
- Rely on your unconscious to solve some of your problems because there are certain kinds of problems that your subconscious is best at solving.
- Use economic tools such as cost-benefit analysis.
- Don’t fall into the sunk cost fallacy. Think: the rest of your life starts now when doing the cost-benefit analysis.
- Also consider your opportunity cost when doing cost-benefit thinking.
- When listening or reading statistics from others, think of the law of large numbers and random assignments. The larger the number of samples and the more random the assignments or selection, the greater the confidence you can have in the results. This kind of thinking applies to interviews or assessing skills (football or theater tryouts as examples).
- Ignore "I know of the man who..." stories.
- Think regression to the mean: anytime someone performs superlatively or really badly the first time you encounter that person, then expect the next encounter will show either a so-so performance or dramatic improvements. This concept of regression to the mean is very closely related to the law of large numbers.
- Some key ideas about standard deviations for normal distributions: 68% of the population falls within 1 standard deviation, 96% within 2 standard deviations, 1 standard deviation is 84th percentile while 2 standard deviation is just below 98th percentile.
- We are not very good at doing correlations so watch out.