Organizations are increasingly favoring algorithms in an effort to make organizational decision making and judgment more rigorous, be it for setting prices or for selecting candidates to interview for a job. But many professionals remain wary of rule-based decision making, and the interaction between formulas and expert judgement remains a gray area. How empowered should employees be to alter or ignore an algorithmically generated decision? How and when should formulas be adjusted or altered?
Tom Blaser and Linnea Gandhi, two managing directors at the consultancy The Greatest Good (TGG), are enthusiastic champions of algorithms. Their recent Harvard Business Review article, coauthored with Noble Prize–winning behavioral economist Daniel Kahneman and TGG’s CEO Andrew M. Rosenfield, explains how formulas can, among other improvements, help remediate the hidden scourge of inconsistent human decision making. In an interview with HBR they discuss how they have become so trusting of algorithms and why they advise companies to share their enthusiasm for rule-based decision making.
HBR: You are both cheerleaders for algorithms. Why?
Blaser: When making predictions with data, algorithms tend to be superior to humans. Even for decisions that are traditionally the domain of highly trained experts, decades of studies show that even simple statistical models are often an improvement over human inconsistency. We understand that our admiration for algorithms is not widely shared. There is the phenomenon termed “algorithm aversion” — humans are more willing to accept flawed decision making from a human than from a formula. We give other people wide leeway and tolerate errors, but we suddenly become very judgmental if a formula makes a mistake.
Gandhi: Our collaborations with companies can sometimes include helping them overcome this aversion. Performing an audit to measure the inconsistency of the “business-as-usual” approach is often a good first step. Once you recognize that you have a problem, the question becomes whether, and especially how, algorithms can be used to help fix it.
What’s your advice for how experts should interact with algorithms?
Gandhi: The first rule of thumb is to resist the temptation to override algorithms. Professionals will sometimes find that mechanical predictions feel wrong, or at least seem in need of adjustment. That’s quite natural. After all, humans have access to as much information as they can seek out, while models are limited to a programmed set of variables. Humans can flexibly update assumptions and read into nuance, while models are rigid in their interpretations of a fixed set of data. In theory, we can see the big picture.
The evidence, however, points in the opposite direction. Our access to a seemingly infinite set of variables for consideration in any given problem often works to our disadvantage, as we are — consciously or not — subject to the influence of irrelevant factors. For instance, parole decisions by judges become far more lenient following a lunch break, and admissions decisions made on cloudy days focus more on academic attributes.
Blaser: Another problem is that even when we make a good effort to collect all the relevant inputs when making a judgment, we are generally not very good at giving these inputs appropriate weight or combining them in a consistent way. In a study of UK magistrates making bail decisions, they described their process as a relatively complex integration of many pieces of evidence. But their actual judgments appear to be driven disproportionately by what amounts to a heuristic based on a few (often idiosyncratic) pieces of data.
But clearly humans should be empowered to overrule algorithms in some cases, right?
Gandhi: Yes, but again, in our opinion it’s less frequently than you would think. One clear group of cases is what psychologist Paul Meehl referred to as “broken leg” cases. Professionals should ignore the results of a statistical model telling you that someone is going to go to a movie tonight if you learn she just broke her leg. A decent model will do better than you at recognizing how demographic and other variables predict movie attendance, but the place where the data in the model stops is an opportunity for humans to add something.
How can employees know when you actually have a “broken leg” data point that the model has missed?
Blaser: Knowing when you have a novel piece of data requires understanding the basics of how your algorithmic counterpart is wired. When an algorithm is opaque, human adjustments risk double-counting items that are already accounted for by the formula. For example, a loan officer might adjust a credit model’s output based on an applicant’s profession, but if the model already includes income, then that may not be adding much new at all. Your professional staff having the right skills is important for this — your professionals need not all be statisticians, but users should have a good idea of where the model ends and their judgment begins.
Gandhi: Our hope is that as algorithms get even smarter and better, people will learn to trust them more because they will feel more discerning, fair, and perhaps even more human. We are also rooting for the opposite trend: In order to help our employees improve their decision making, we need to help them become more rigorous and rule based. That is to say, we need to structure human judgement so that it functions, at least somewhat, algorithmically.
So how do we do that?
Gandhi: There are a few ways to approach this. One is to create a shared reference set, essentially a group of cases that your people all know well and can use for comparison. A study of potential jurors evaluating personal injury cases found that, while they tended to agree on the severity of a given case, they disagreed widely when mapping that severity to an unbounded financial scale. They lacked a shared reference set of acceptable financial penalties, and thus answers diverged. When people play a role in interpreting and adjusting different algorithmic outputs, training or structured decision aids can help to create a shared reference set and to better “program” human judgment.
Another way is to set rules around how much professionals can tinker with algorithms. Managers can allow people to adjust an algorithm’s output, but there should be limits on either the magnitude or frequency of adjustments, as shown in recent research. Of course, even minimal tinkering will introduce some error into the output, but in our opinion the gains in terms of overcoming algorithm aversion are often a worthwhile trade-off.
Blaser: I’d add one more way, which is that managers have to make sure their systems are really being used as intended. As part of a project at a global financial services organization, we observed professionals who used a pricing tool to help them arrive at quotes for clients. They were supposed to record a variety of facts and subjective judgments in their tool, which would give a suggested price that they had some limited ability to adjust. In practice, they often had a price they thought would close the sale, and they simply backed into what the subjective inputs of the tool needed to be to get there. This made it easy for them to hit certain short-term targets, but it led to frustrated management and lots of questionable data. One practice to consider is to separate those who collect the inputs, those who control the model, and those who use the outputs for business decision making.