The Shape of Health Data

Jul 13, 2020

While on vacation recently, a fellow health tech aficionado asked me whether I prefer a more abstract or more concrete representation of health data. That's right, action packed holidays. We talked it over a bit, and after a week of reflection, I'm convinced that the problem is with the question itself. Because the answer is "yes".

Health data, as all data, samples reality. Every data point is someone or something's observation of reality with context of observer, subject, time (point or span), location (both geography and anatomical site), severity/magnitude/dosage, etc. Usually, a health record includes some of this context. Some is assumed. And some is just missing.

Abstract and Concrete

Most abstractly, this context can be “pushed” into the terminology used to record the observation. SNOMED assumes a default context and provide a compositional grammar to get explicit when needed. Nearly all health observations could be codified using this technique.

At the other end of the spectrum, a model might include discrete attributes for each contextual facet. Or for each level of the facets' hierarchies. Or one might create a separate model for each category of data with attributes unique to that category.

In the middle, leaning towards "concrete", we find things like FHIR where observation has attributes for observer, subject, time, body site, etc. but leaves the laterality of the body site to the terminology.

So what's wrong with the question?

In classic consulting form, the initial answer is "it depends": it depends on how the data was originally captured. How it needs to be accessed. Who will access it and why. And on and on... The problem then is that health data must be used aggressively if we're going to use it to really change healthcare. Today's use won't be tomorrow's. Yesterday's collection must be made sense of today. What we need is a strategy to gather what we know and shape and project it as we use it.

This is where FHIR really shines; its model is built by people from across the industry across the world. Every reasonable use case has been considered, many have been included, and the result is a model that can, paired with capable terminologies, achieve capture of what was observed. Including the alternate identifiers. And the original proprietary codes. And your maiden name. It all fits in an interpretable way.

So what about the "shape and project" part? That's where it gets fun enough to think about on vacation.

Shape and Project

If I hand you some health data, assuming you're somewhat familiar with health data, you can probably make some pretty solid guesses on what you have in hand. "Look like blood pressure readings." Or "A reading of someone's xray. Broken leg. Bummer." As humans we apply what we know to make assumptions, fill in the gaps, and make sense of what we're reading. That broken leg was probably something like:

"closed fracture of the femoral condyle of the left femur"

We lay persons saw "fracture" and "femur" and empathized.

Most abstractly, that fracture is SNOMED

66926007 |Closed fracture of femoral condyle of femur|:
  272741003 |Laterality| = 7771000 |Left|

with a radiologist, patient, date, etc. It could be codified with a single expression, one string field:

ObservedExpression

Useful perhaps as the canonical form of the observation, but probably not useful for nearly any other use. For an app, an algorithm, or a human to use it we need something more concrete, like:

Date
Finding
Body Site

Date's self-explanatory. Finding though is a little trickier. Is this a finding of "closed fracture of femoral condyle of femur"? Or just a "closed fracture" leaving the "femoral condyle of femur" to the body site? Maybe "closed fracture" is too detailed for our app - we only understand "fracture". Body site has all the same kinds of issues - what if we only care about injuries to "legs" and "arms"?

Assuming we can explain what our app understands, a clever system can apply the knowledge in SNOMED to shape this observation into a "July 5th fracture of the left leg" safely and efficiently.

Both And...

So to the original question, should we store abstract and project to concrete? Store concrete and use abstract as a translation midpoint? It depends! Assuming your abstract form is rich enough, it's likely to capture all the information originally collected allowing "downcoding" to less detailed or complete shapes. For my money, FHIR's probably a good starting point if you want something that might be more directly usable. But no matter what you choose, to really use health data you must be able to apply knowledge and move between these shapes. And to secure it (your security rules don't mention "femoral condyles"). And to respect the consent given by the patient. And so on.

So "yes", I prefer abstract or concrete representations of health data as long as we're clever enough to use what we know to make sense of them both.

Grenz on Health

Discussion about this post