The benefits of statistical noise
Statistical noise is nearly universally considered an impediment to sound decision-making. But what seems like noise to those shaping the rules may actually be critically important to those on the receiving end.
Published by Behavioral Scientist
By Ruth Schmidt
August 24, 2020
The year was 1999. Chicago’s public housing was in distress, with neglect and gang activity hastening the decline of already depressed neighborhoods. In response, the city launched the Plan for Transformation to offer relief to residents and rejuvenate the city’s public housing system: residents would be temporarily relocated during demolition, after which the real estate would be repurposed for a mixed-income community. Once the building phase was completed, former residents were to receive vouchers to move back into their safer and less stigmatized old neighborhood.
But a billion dollars and over 20 years later, the jury is still out about the plan’s effectiveness and side effects. While many residents do now live in safer, more established communities, many had to move multiple times before settling, or remain in high-poverty, highly segregated neighborhoods. And the idealized notion of former residents as “moving on up” in a free market system rewarded those who knew how to play the game—like private real estate developers—over those with little practice. Some voices were drowned out.
Chicago’s Plan for Transformation shared the same challenges—cost, time, a diverse set of stakeholders—as many similar large-scale civic initiatives. But it also highlights another equally important issue that’s often hidden in plain sight: informational “noise.”
Noise, defined as extraneous data that intrudes on fair and consistent decision-making, is nearly uniformly considered a negative influence on judgment that can lead experts to reach variable findings in contexts as wide-ranging as medicine, public policy, court decisions, and insurance claims. In fact, Daniel Kahneman himself has suggested that for all the attention to bias, noise in decision-making may actually be an equal-opportunity contributor to irrational judgment.
Kahneman and his colleagues have used the metaphor of a target to explain how both noise and bias result in inaccurate judgments, failing to predictably hit the bull’s-eye in different ways. Where bias looks like a tight cluster of shots that all consistently miss the mark, the erratic judgments caused by noise look like a scattershot combination of precise hits and wild misses.
When situations lack a single right answer, trying to reduce unwanted variability by eliminating noise can inadvertently eliminate useful information as well.
When decision-making benefits from the consistent application of well-defined rules, eliminating noise can effectively reduce the variability of judgments that arise from a single set of data. Imagine you’re waiting for medical test results. Chances are, any efforts that increase your physician’s decision-making accuracy by reducing distracting or irrelevant data will probably increase your confidence in their judgment, regardless of whether the eventual diagnosis is the one you want to hear.
But is noise always bad? In some contexts, as in the Plan for Transformation, information that may initially be dismissed as noise can contribute important nuance. Notions of what’s relevant can differ wildly depending on who’s perceiving the data, and over-enthusiastic noise removal can inadvertently over-simplify results by throwing out complexity. In other words, when situations lack a single right answer, trying to reduce unwanted variability by eliminating noise can inadvertently eliminate useful information as well.
Of course, the mental model of a target and bull’s-eye doesn’t always hold true. Chicago’s public housing policy and the following examples—academic publishing and health care—can help us better understand when noise might actually be useful.
Originally introduced in the mid-1950s to help libraries identify which journals were worth the cost of subscription, the Journal Impact Factor (JIF) metric has since evolved into a proxy for academic research quality. JIF evolved from traditional citation counts and is appealingly simple in its formulation—a journal’s total citations in the previous year compared to the number of citable articles published in the prior two years. Impact factors have increasingly been used to inform academic review and promotion and tenure decisions in the belief that its quantifiable objectivity is more likely to result in fair and efficient judgments compared to subjective assessments.
But impact factor was never intended to reflect individual contributions or inform promotion or performance reviews, and it has since been recognized that this overreliance on impact factor indicators may inadvertently skew the judgment of both researchers and evaluators. This can convert research careers into a numbers game, incentivizing certain choices and behaviors at the expense of others.
Quality research can take many forms—from methodical replication of existing results, to reports on the state of a field, to new breakthrough ideas—that may not become influential on the same timeline. For example, evidence that novel work is more likely to get cited outside the two year window (during which those citations affect journal impact scores) can deter elite journals from accepting even compelling boundary-stretching papers. Consequently, researchers—whose job prospects often rely on publishing in high-impact factor journals—may be dissuaded from pursuing cutting edge research to focus on what’s more likely to be accepted. What’s measured matters: once rules are known they can also be gamed, and when rewards are skewed or boiled down to formula, behaviors adjust to follow suit.
In addition, far from creating objective measures for quality, impact factor scores may actually exacerbate inequities in academic research. Achieving equity in higher education continues to be a struggle in part because efforts to achieve objectivity through indicators like impact factors often conflate notions of “measurable” and “fair.” But in the same way that higher scores on standardized tests like the SATs or GREs are more correlated with family income than academic success, it’s been suggested that publishing in high impact factor journals can be more indicative of system savvy or a mutual back-scratching cohort of coauthors than publication quality.
This hits already underrepresented populations especially hard. While the adoption of impact factor metrics may have started in an attempt to create a level playing field, research indicates that minoritized researchers are often penalized by a lack of mentorship that would normally create early on-ramps to publications, a disproportionately high load of “invisible” service work that distracts from time spent on scholarship, and embedded biases about their research topics that result in fewer grants and publications. To combat these challenges, efforts like the Leiden Manifesto and the San Francisco Declaration on Research Assessment (DORA) have begun to suggest that incorporating more qualitative assessment instruments, like narrative CVs and structured review protocols, can restore useful “noise” that journal impact factors are likely to miss.
Halfway through the 1992 movie Sneakers, the leader of a motley gang of security system analysts, played by Robert Redford, attempts to retrace his whereabouts after having spent the night driven around in the trunk of a car. It’s hopeless—until Whistler, the team’s blind hacker, interjects, “But what did the road soundlike?” Answers suddenly come into crisp focus: the cadence of wheels running over seams in a bridge provides one clue; the sound of a cocktail party, yet another; even the absence of a telltale foghorn is revealed as meaningful.
The heightened tendency to tune out some data as unimportant is a well-known side effect of expertise, which encourages us to become highly attuned to some signals and patterns at the expense of others. Solving the “where was I last night?” puzzle required similar tactics to those used by multidisciplinary medical teams, for whom complementary perspectives can compensate for the fact that when individual experts perceive data through a familiar lens they’re also more likely to dismiss some of that information as noise, or even miss it entirely, on their own.
Across many contexts, our enthusiasm for clearing out noise sometimes leads us to throw out useful—even vital—information, especially when top-down approaches dominate bottom-up contributions to define what matters.
But judging “what we see” in a stable environment with a history of reliable data differs from judging “what will be” in a complex systems context with multiple social factors. If diagnoses in health care are essentially about revealing an objective—albeit sometimes elusive—truth, the question of what treatment to pursue is often much more complicated, informed by highly personal decisions about length versus quality of life, or one’s appetite for taking more established or riskier courses of action. Even the best evidence-based therapy is based on averages and future projections, where confirmation that we chose the right path unfolds over time: the same strategic choices that look like bull’s-eye problems in retrospect are often highly ambiguous at the actual moment of decision-making. In these cases, rife with independent variables and future unknowns, we can benefit from intentionally inserting noise into decision-making in the form of speculative scenarios that force us to concretely grapple with the potential downstream implications of our choices.
In Chicago, the Plan for Transformation’s focus on safety and housing quality was hardly intended to be punitive. But when notions of what is relevant and important are assumed rather than agreed upon, what seems like noise to those shaping the rules may actually be critically important context to those on the receiving end.
While improved safety was a recognized need, dispersing residents across the city during demolition and rebuilding massively disrupted the informal networks that provided financial and personal support. Residents had long relied on their neighbors’ ability to step in as caregivers or babysitters at short notice; without this, they struggled to maintain flexible work schedules across multiple jobs. Resituated in locations with no established relationships, their inability to obtain medication and food on personal credit when funds were low removed a reliable safety net to combat health issues and food insecurity. These challenges would have been bad enough if they were temporary, with the promise of a return to normalcy. But qualifying for the mixed-income communities also carried requirements like maintaining a 30-hour workweek, a challenge that was nearly insurmountable without the reliable informal childcare that temporary relocation had virtually erased.
Our assumptions of a shared premise simply beg to be made noisier when we are creating solutions for others whose lives, interests, and needs may fundamentally differ from our own.
Across many contexts, our enthusiasm for clearing out noise sometimes leads us to throw out useful—even vital—information, especially when top-down approaches dominate bottom-up contributions to define what matters. In assuming that our values and lived experiences are universally shared, we risk insufficiently interrogating how they deviate from those for whom we are designing. Our assumptions of a shared premise simply beg to be made noisier when we are creating solutions for others whose lives, interests, and needs may fundamentally differ from our own.
Inspired by insights from the field of military intelligence, Gregory Treverton described two kinds of problem-solving challenges: puzzles, which can be resolved by doggedly accumulating and digesting the right information to reveal latent truths, and mysteries, which demand speculating about ambiguous or future contingencies to help us make sense of the information we’ve already got.
Solving puzzles is largely an interpretive exercise, where finding the solution consists of organizing the right information in the right way. This is the job of detectives and diagnosticians: in these cases, there is a bull’s-eye to aim for. Eliminating noise can help us arrive at these solutions more effectively and efficiently in part thanks to a shared, underlying agreement about what data is relevant to make those decisions and the collective benefit of less crime, or better health care, when the system works well.
Solving mysteries, on the other hand, relies on how we define, or frame, the nature of the problem itself. Unlike puzzles, mysteries are often situated in an uncertain future, where there may be multiple valid conceptions of what we’re even solving for. This is more the domain of hiring employees and crafting business strategy, where making smart decisions may require us to question our assumptions about what data registers as important. In situations like these, removing noise in the interest of fairly and consistently applying relevant criteria may contribute to a false sense of objective precision that is not as desirable as we may think.
Yet many things in life—academic publishing, health care, and housing policy among them—require addressing individual challenges within the context of complex systems, simultaneously solving for puzzles and mysteries. As behavioral designers, we must compel ourselves to deeply and empathetically understand both the needs of the people we are designing for and the systems in which they operate, and critically question what our legitimate desire for fairness and consistency leaves out. If we don’t, our well-meaning efforts to reduce noise may inadvertently strip away essential signals, causing us to miss patterns, gaps, and perspectives in data that deserve our attention.