In 2019, Epic—the company behind the MyChart health care software used by over 190 million patients—developed an exciting new feature: the Opioid Risk Score. Calculated based on an individual’s health records, the risk score aims to help primary care physicians flag patients at risk of developing opioid use disorder (OUD) or experiencing an overdose.
The score, which uses a machine learning approach called cognitive computing, is one of many innovative artificial intelligence tools that strive to reduce drug-related risks. But the first external evaluation of the model, recently published in the Journal of General Internal Medicine, found discouraging—if not dangerous—results.
Comparing Epic’s risk score to reality for over 700,000 patients, the evaluation reports that the model did accurately rule out low-risk patients. But it only caught around 8 percent of the patients who did go on to experience an overdose or receive an OUD diagnosis.
The Opioid Risk Score aims to facilitate overdose prevention and enhance primary care physicians’ efforts to detect and treat OUD as early as possible. Since it uses information already available in a patient’s electronic health record (EHR), the Opioid Risk Score could be a lightweight yet powerful addition to physicians’ toolkits.
“Basically, they’re trying to predict if a patient comes into a primary care appointment, what level of risk they are for developing opioid use disorder or having an overdose in the next year,” study co-author Dr. Stephanie Hooker, of HealthPartners Institute, told Filter. “If you know who might be at higher risk, you can appropriately direct the resources to try to prevent overdoses or to treat people who have opioid use disorder.”
Over 2,000 patients experienced an overdose or OUD diagnosis, but Epic’s system only flagged 185 of them as high-risk.
To test the model’s accuracy, the researchers pulled patient data from three health care systems—two rural sites in Pennsylvania and along the Minnesota-North Dakota border, plus an urban hospital system in Minneapolis. They compared Epic’s prediction to the actual outcome for 704,764 patients across 92 primary care clinics.
The model struggled to correctly identify high-risk patients, producing a concerning amount of false positives and false negatives.
“Most of those people [flagged as high-risk] didn’t go on to have an ‘event’—a diagnosis of opioid use disorder or overdose. That’s the false positive rate,” Hooker explained. “The false negative rate is that most of the people who do go on to have an event are classified as not being at risk.”
Hooker believes the false negative rate is particularly concerning. Over 2,000 patients experienced an overdose or OUD diagnosis, but Epic’s system only flagged 185 of them as high-risk.
The Achilles heel of Epic’s model may be its reliance on EHR data. Integrating with a patient’s medical records is a seamless design, but health data give only a partial perspective on a person’s life circumstances.
“EHR data is somewhat limited … Sometimes the things that we have easily available may not be the most salient predictors.”
Filter reached out to Epic, but representatives were not able to provide comment by publication time.
“EHR data is somewhat limited in terms of the things that we collect regularly on people. Sometimes the things that we have easily available may not be the most salient predictors,” Hooker said. “We don’t have a lot of data on social situations or other risk factors that might be more related to actually developing a substance use disorder or having an overdose.”
Design choices made by engineers removed from health care settings may also impair the model. For example, the risk score was designed with an intentionally high tolerance for false positives. Physicians should expect only one out of every 10 people flagged as high-risk to experience an adverse outcome.
“I think in clinical practice, that’s too high,” Hooker said. “A clinician wants to know, ‘If I’m going to use this tool, is it actually going to improve my decision-making to an extent that I’m not wasting my time?’”
Another salient issue is that the risk score treats overdose and OUD diagnosis as a combined outcome.
“What that does is kick everybody with existing opioid use disorder out of the model—you can’t predict overdoses in people who have OUD, because [OUD itself is] an outcome,” Hooker said. “So that’s the main issue, I think—that those people are not included in this risk model.”
To improve future performance, Epic could consider developing separate models for OUD and overdose, or reducing the system’s tolerance for false positives. But on a more fundamental level, Hooker believes that health care companies need to think critically about developing AI tools, and test them thoroughly.
”What are the reasons we would actually want to implement something like this, before we do it? And is it going to improve patient care?” she asked. “In this case, I think that the model was just put out there.”
“Any tool that could affect a patient’s treatment should face a lot more scrutiny. The bar needs to be raised.”
Dr. Nicole Fitzgerald, a postdoctoral research fellow at Columbia University’s Mailman School of Public Health who specializes in the early detection of emerging drug trends and risks, was also alarmed that Hooker’s study was the first real test of Epic’s model.
“Any tool that could affect a patient’s treatment should face a lot more scrutiny,” she told Filter. “The bar needs to be raised in terms of the requirements for external validation before widespread deployment of these models in commercial health systems.”
Fitzgerald also pointed out weaknesses with the data and math that underlie the Opioid Risk Score. The “training data” used to calibrate the model are from 2015 to 2018—a period largely out of sync with the contemporary drug landscape. Additionally, the mathematical formula that turns these data into a score is relatively rudimentary.
“Other machine learning approaches are more adaptive, multisource and real-time,” she explained. “The way that people are using opioids—and the opioids themselves—have changed dramatically over the last couple years, and the opioid overdose crisis has evolved. That’s not captured here.”
Despite the shortcomings of individualized risk predictions, data-driven approaches to harm reduction—including, but not limited to, machine learning and artificial intelligence—still hold promise.
“Right now, the applications showing the most promise are operating at the population and drug supply level—not necessarily at the level of an individual clinical prediction,” Fitzgerald said.
Wastewater testing is one such intervention. It provides a passive, anonymous indicator of local drug consumption that can inform where and how community organizations provide safer use supplies. Machine learning may also supercharge drug checking by speeding up mass spectrometry.
“There is value in this technology, so I’m hoping we can deploy these tools where they’re most valuable and least harmful.”
Most importantly, these population-level innovations capitalize on the enormous predictive power of machine learning without introducing as many risks to the individuals and communities who use them.
“When a wastewater alert algorithm produces a false positive, a public health team might just end up deploying some resources slightly earlier than necessary, or distributing naloxone when they might not have needed to, but that’s a net benefit to the community,” Fitzgerald explained. “But when an EHR-embedded risk score produces a false positive, a patient’s clinical care may be impacted. They could be stigmatized … or denied pain treatment. Those are real clinical harms.”
Ultimately, effective implementation of AI boils down to the same principles that already guide harm reduction efforts: weighing the potential benefits and harms, making decisions in consultation with the community, and designing solutions based on evidence rather than stigma.
“There is value in this technology, so I’m hoping we can deploy these tools where they’re most valuable and least harmful,” Fitzgerald concluded.
Photograph (cropped) by Elen Sher on Unsplash