The Unmet Promise of Artificial Intelligence in Population Health

Health Tech Magazines March 2, 2020March 1, 2024 AI, ArtificialIntelligence, Benson, Benson Hsu, BensonHsu, diabetes, diabetescare, diabetescost, Electronic Health Record, electronic medical record, hospitalization, Hsu, population, populationhealth, populationhealthmanagement, Sanford Health, SanfordHealth, teachingdata

By Benson Hsu, VP, Population Health, Sanford Health

Population health is one of the last frontiers in medicine. One can argue that genomics and other therapeutics advances are still ahead of us, but the reality is that all medical advances work to advance the goal of population health, or as defined, the overall health of a population.

Over the past several years, there has been an almost incomprehensible burst of scientific literature, lay press, and blogs on the wonders of artificial intelligence. I would argue that there has been an unmet promise. Artificial intelligence has not invaded the way clinicians practice day-to-day medicine. It has not revolutionized diagnostics. Moreover, it has not dramatically improved our health.

To me, the primary reason for this unmet promise has been a lack of attention on the fundamentals. Taken at its most basic core, artificial intelligence requires a data set to learn from – a data set that must be (mostly) reliable, consistent, and reproducible.

Without this data set, it harkens back to the adage of “garbage in, garbage out.”

However, how about all the terabytes of data being generated not only by our electronic medical records (EHR) in healthcare systems, hospitals, and clinics – not to mention all the data within standardized claims or from health monitoring devices from iPhones to Fitbits?

The issues rest in that this information, in multiple layers, are not yet standardized.

To start, there is no standardization of data collection. There are two layers to this problem. First, we do not consistently collect the same information. One hospital may collect procedural information (time outs, sterile processes, checklists) in a free text format; making data collection almost impossible without the use of natural language processing. Another hospital may collect this information in a checkbox format, making data extraction much more manageable. Each hospital defines if and how to collect this information, thereby creating significant collection bias. Second, the information we collect may look on the surface consistent, but due to operational differences, be vastly divergent. One operating room may collect start time as a function of when the patient enters the room. Another hospital’s operating room may collect start time as a function of when anesthesia starts their process. Although the time in the time field looks consistent, if any comparison of operational metrics is made between these two hospitals, there will be a paucity of actionable information.

Next, there are no standards regarding common terms. One typical example is the idea of the length of stay. Within a hospital, an administrator may think that a length of stay represented by 2.4 days indicate that length of stay must reflect the time when a patient arrives and leaves the room. Thus, this administrator may institute projects at earlier discharge. As a clinician, you are then pushed to do morning discharges versus late afternoon discharges. After several months, the length of stay metric barely shifts. Ultimately, one may discover that the length of stay is a function of claims submission. In other words, each day is counted as an independent integer. Discharge in the morning or afternoon of the same day will have no impact. Without clarity of standard terms, data again become hard to use.

Lastly, there are no standards regarding common metrics. With countless payers across the US, many medical societies, and numerous measurement organizations, it is hard to classify what is a quality outcome. For instance, good diabetes care can be categorized by the number of hospitalizations, highest blood glucose measurements of the past six months, the hemoglobin A1C of a patient, the blood pressure parameters, the body mass index, or even a combination of the above. Depending on the metric chosen, a population (without any changes) can be classified as either extremely well controlled or extremely poorly controlled. The clearest example of this is the divergence a hospital may have when it comes to star ratings from different organizations. Within the same year, a hospital can be classified as the top 100 hospitals by one measurement organization and a one-star hospital by another – how do we know what is the “truth”?

Without a foundation of consistency in the collection, a consistency in definition, and consistency in metrics, artificial intelligence within healthcare is a free-floating mass of inconsistent teaching material. No strength in computing or programming can overcome bad data. On the road to meeting the promise of AI in healthcare to improve population health, we must collectively work to ensure that our teaching data is clear, defined, and with visible outcome metrics, whether quality, efficiency, or even costs.

Without this foundation, AI will always be an unmet promise.