1. The Streetlight Effect and Measuring What Matters
It was dark and a man lost his keys. He searches for them under a streetlight, and a friend comes over to help. Eventually, the friend asks, “Are you sure you lost your keys here?” The man says, “No. I lost them in the park.” So the friend asks, “Why are we looking here?” The man answers, “Because this is where the light is.”
This story describes a form of observational bias called the streetlight effect, in which we look for things where it’s easy, not where it’s important.
“I would argue that we do this all the time in health care measurement,” says Ari Robicsek, Chief Medical Analytics Officer for Providence St. Joseph Health.
Every hospital measures length of stay, for example, and many use this measure as a surrogate catch-all for quality and efficiency. “But let me ask, who really cares about length of stay?” says Robicsek. “Is it patients? Is that the first metric that comes to mind when a patient is thinking about hospital quality? Is it doctors? Probably not. Is it administrators? Even from an administrative point of view, you’re not going to realize the financial benefit of reduced length of stay, unless at the same time you reduce labor, or you find ways to fill those empty beds with paying customers, which is a much more complex measure than simply looking at length of stay.”
Length of stay may not be a great measure, but if we have to start somewhere with health care measurement, what’s the harm in tracking it? “If we assign resources to working on the wrong problem, those resources aren’t working on the right problem,” warns Robicsek.
Additionally, with length of stay specifically, a big push to get patients out the door risks sending them home before they’re ready — and when that happens, those patients may end up with complications and get readmitted. “We see one streetlight metric, length of stay, giving birth to another streetlight metric, 30-day readmissions, and so on,” he says.
“My modest proposal: We should measure the things that matter,” says Robicsek. “Yes, sometimes that’s going to mean that we need to collect data differently than the way we do today, or, said another way, sometimes we’re going to have to put up some lights in the park.”
2. Balancing Risk Adjustment
Robicsek shares a map showing distribution of glycemic control in diabetic patients on Chicago’s North Shore, where green is good and red is bad. The map overlays closely with an income map.
If we set up a bonus program for primary care doctors where they receive more money if their patients have better glycemic control, it’s easy to guess where most physicians will want to practice. This is why we need risk adjustment.
“Absent good risk adjustment, physicians working in disadvantage geographies are going to have the worst-looking outcomes,” Robicsek explains. “They’re going to get paid less. The poor get poorer, etc. Absent good risk adjustment, physicians are going to have an incentive to cherry-pick, that is, focus on the patients who are going to make them look good.”
“But with good risk adjustment, we have the opportunity to identify those providers who are outperforming expectations, who are doing a great job with the difficult-to-manage patients, and we can learn from them.”
There are disadvantages to risk adjustment, however, when done poorly. The most common problem is doing little more than creating the illusion that risk adjustment has occurred. “A lot of the risk adjustment models in use are lousy, including some of the ones used by CMS (Centers for Medicare and Medicaid Services),” Robicsek says. “I would argue that those do very little other than creating the patina of fairness, and I would argue when that happens, we’ve probably done more harm than good with risk adjustment.”
Another concern is that sometimes risk adjustment can justify outcome disparities that are amenable to management. A blood-pressure management metric risk-adjusted on race, for example, could remove the incentive for physicians to determine how to manage blood pressure in African-American patients, perversely promoting or entrenching existing inequalities.
“My proposal here: For every new measure that we build, we need to have a conversation about what amount of risk adjustment is enough,” says Robicsek.
3. Measuring to Learn
How much is enough? When we can learn from the measure, he explains. “So much of the health care measurement that we do is for the purpose of rank-ordering or some form of reward or punishment. I would argue that most of the measurement that we do should be taking into account the fact that, as humans, we’re curious and we’re altruistic — most of it should be to learn.”
In a graph of total knee replacement at Providence St. Joseph Health, each circle represents one high-volume orthopedic surgeon. Each of these surgeons performs high volumes of elective primary unilateral total knee replacement, and they all have great outcomes. But the difference between them is cost.
Every circle above the line represents a surgeon whose cost per case is high. Every circle below represents a surgeon who cost per case is low relative to their colleagues.
Are the doctors low on implant costs consistently low across other elements of care? Not necessarily. “In medicine, variation is the state of nature,” says Robicsek. “There are almost no clinicians who are consistently high cost, or consistently low cost across these elements of care. My takeaway from this is that we all have an opportunity to learn from each other.”
“My modest proposal: Most of the health care measurement that we do should not be for reward or punishment. It should be to learn.”
4. Whose Patient Is It?
Providence recently evaluated OPPE (Ongoing Professional Practice Evaluation), which the health system uses, and found that it was assigning 40% of hospital patients to the wrong doctor. “Who can blame them?” Robicsek asks. “It’s easy in a hospitalization for a patient to have three different, or five different, attendings of record. How do you know who to assign that patient to?”
“In a world where we’re measuring for reward and punishment, we feel obligated to assign one outcome or one hospitalization to a single clinician, but imagine if we were able to move away from that and we were measuring to learn,” he says. “Then we would have the ability to do things like ignore who the provider was and ask ourselves what specific elements of care, what specific combinations of behaviors, lead to the best outcomes.”
“Or we could recognize that medicine is a team sport,” he adds. “Let’s ask the question, can we tie outcomes to teams rather than to individuals? My modest proposal here: We practice in teams. Let’s recognize that in the way we measure.”
5. Metrics Aren’t Free
“To anyone who has ever said, ‘Let’s just add one more thing to this dashboard’: Metrics are not free.”
“Every time we build the metric, if it is done correctly, somebody needs to build business specs, technical specs. Someone needs to do data governance, coding. Somebody needs to do validation, automation, documentation, visualization, and then somebody needs to maintain the thing moving forward. Easily that’s a cost of $10,000,” says Robicsek.
6. The “Give a Darn” Test for Health Care Measurement
When measuring what matters, how do we know what that is? Robicsek describes a thought experiment where he sits with a small group of physicians considering a metric. “Imagine I told you that you’re doing better than your colleague on this measure,” he says to them. “Would you feel good about yourself? Imagine I told you you’re doing worse than your colleagues on this measure. Would you feel motivated to change your practice?”
“If the answer to both of those questions is not yes, let’s not build this measure. It’s not worth our time. We’ll go focus on something else.”
“Sitting at the front of this room is my partner in crime, Dr. Caleb Stowell, looking like the cat who ate the canary. He’s showing [the surgeons] the results of the process that I’ve described. They’re measuring to learn. He’s identified a measure that passes the ‘give a darn’ test for them, and some of those surgeons are literally leaning in. I work for 51-hospital system, but where this change happens, where you win hearts and minds, is in rooms like this.”
Robicsek’s final proposal: Try the “give a darn” test for health care measurement. And note that in many “give-a-darn” conversations, one metric that comes up as incredibly important to physicians is patient-reported outcomes.
From the NEJM Catalyst event Provider-Driven Data Analytics to Improve Outcomes, held at Cedars-Sinai Medical Center, January 31, 2019.