Implementing Data Classification in FHIR

Sep 25, 2020

Securing health data, like other data, is an exercise in classification and locality. In my previous article Securing FHIR I explain the meaning and implications of these axes on security. In this post, we'll look at implementing classification at scale.

This is going to be a pretty deep technical dive on the internals of FHIR's conformance system. For the uninitiated, I'd suggest some light reading first, perhaps:

Goals for Classification

Let's recap briefly: classification, the process of determining "what we have" when looking at a resource (e.g. at a FHIR JSON document), is important in securing data. Classification is what allows us to permit access to "lab results" and "pain medications". Problem is: classification of health data is complex and can be computationally expensive to evaluate.

Classification is a process that all systems apply. The most simple is technical: "data in this folder" or "data at this endpoint". We've all set permissions using this kind of classification. What we're after here is deeper: classification by semantics (meaning). We want to be able to actually look at the data and decide whether the data meets our definition of something.

In healthcare, classification is primarily by vocabulary. We determine if an allergy is a "medication allergy" by looking at the code. Often, this is enough. Many cases require more complexity. Consider "abnormal blood lab result". This requires looking at both the code of the result to determine if it's a "blood lab" and the interpretation to look for "abnormal". SNOMED provides a compositional grammar for this kind of thing.

FHIR on the other hand provides a powerful and increasingly well adopted mechanism for classifying data called "profiles". Profiles describe the attributes and conditions of a concept (e.g. prescription medications) using explicit rules and references to terminologies such as SNOMED and LOINC.

IF we can apply these profiles in a cost effective, scalable way, we have a powerful tool in both describing our data and securing it (which includes things like respecting consent).

It’s also worthwhile to note that these techniques can be used to classify non-FHIR data that is described using the FHIR conformance system (i.e. profiles).

Thinking about Profiles

Profiles, that is StructureDefinitions in use, can both add content/elements (derivation: specialization) and constrain or "narrow" the content of a resource. Only HL7 adds to standard FHIR resources, everyone can build profiles that constrain.

When data first arrives in a system, it's usually validated. Assume the base HL7 profiles are always checked, so everything in our system is at least valid FHIR. This means we can search for an Observation resource and be confident the JSON document has the right elements, the JSON types are correct, and the structural constraints are met. In many cases, this base validation represents the bulk of the checks needed.

On top of the HL7 Observation, we defined some local enterprise standards:

Lab Observations have a LOINC code (it's required)
HbA1c Observations have a code from the HbA1c valueset and must use particular units for results

From a validation/classification standpoint, each of these profiles adds a set of constraints that could be expressed as FHIRPath expressions (or potentially other constraint expression languages), and a validator could be built using only a FHIRPath engine. Think of this form of profiles as a sort of (although not technically) an abstract syntax tree (AST):

For Lab Observation:

Observation.code.coding.where(system='http://loinc.org').exists()

For HbA1c Observation:

Observation.code.memberOf('http://.../ValueSet/HbA1c')

(Observation.value as Quantity).memberOf('http://.../ValueSet/HbA1cUnits')

Using this form, we can do some interesting things:

Incremental validation

If we know we already have a valid Observation, we only have to check for the presence of a LOINC code to have a valid "Lab Observation"

Optimizing with search

Because FHIR search is defined with FHIRPath, we can use it to find resources that meet at least some requirements. In the example,

Observation?code=http://loinc.org|

would find "Lab Observations", and

Observation?code:in=http://.../ValueSet/HbA1c

would find "HbA1c Observation" resources. Remaining constraints can be checked using incremental validation

Unrelated but comparable profiles

If analysis shows that all “X”s are also “Y”s, a valid “X” no longer needs to be validated as a “Y” and can also be returned in searches.

Semantics, not just syntax

Thinking about classification as primarily a semantic problem rather than literal conformance means working to match requestor intent to data semantics. Let's assume that our FHIR store is filled with billions of Observations. They're all from laboratories, but many of them don't comply to our "Lab Observation" profile because they lack LOINC codes. Our terminology server however contains mappings between these legacy and proprietary codes and standard LOINC.

If a user is authorized to see and searches for "Lab Observations", what should we return? A literal search would return only those already translated to LOINC, an incomplete but technically (syntactically) correct response. A more clever system however would, perhaps knowing the distribution of code systems in the store and the capabilities of the terminology service beforehand, begin by asking the terminology server for some help.

It might:

Determine that the profile in the criteria requires LOINC codes.
Ask the terminology server what vocabularies (code systems) it can translate to LOINC.
Use that list of vocabularies to query the data store for potential matches.
Apply the mappings to the return set (page by page for performance) to reshape the result to satisfy the target profile, dropping any that can't be mapped to LOINC and therefore can't meet the search criteria.

The very clever server might keep track of which of these kinds of requests are common and starts to cache re-shaped resources proactively, maybe on write.

A more complex (yet common) example

A FHIR Condition records information about a problem, a clinical concern being monitored or managed, among other uses. It includes:

a code - the primary identification of the problem
a body site - the anatomical location of the problem
severity (of the problem)
stage - for formal grading of the problem

Each of these can be expressed independently using medical terminologies, but very commonly the information overlaps. For example, a "headache" is pain (problem) in the head (body site), but might just be coded directly as "headache" (problem) leaving the body site implicitly defined within the code.

SNOMED's guidance on this topic generally trends towards conciseness, omitting redundancy where possible (indicating "headache" in the code and leaving body site blank). Which is good for interoperability since it reduces opportunities for internally conflicting info. But it's rough on users.

What we want is both:

Expressiveness in recording data so professionals can use the most precise and/or most appropriate codes for their needs
Flexibility to express read intent in terms appropriate to the use case (including security/privacy)

To meet both these, we need terminology services much closer to the data than most systems in the wild today, and we need to be able to performantly "reshape" data from the expressive form of the writer to the direct intent of the reader, probably through some aggressive predictive caching.

In practice:

Store data in both the most concise form, the canonical form as an aid to resolving ambiguity
Cache resources with all available elements populated, extracting body site, severity, stage, etc. from the code
Integrated the terminology server more closely with the data server to then achieve the performance described above in the lab example

This means a client can search for all conditions of the "head structure" and receive in response a problem coded simply as "headache" without the client having to explicitly craft a query.

But...

I'm sure I have many colleagues out there that are super nervous about the kinds of things I'm suggesting. Applying terminology mappings implicitly can get you into trouble. If you've worked in this space long, you've seen it. But not returning data because it's teeeechnically not a match is also problematic. Omissions can harm too.

More importantly, missed opportunities for innovation can harm. If an app or algorithm never comes to market because the technical burden is too high, or if that app isn't available on your EHR platform because its APIs are too narrow, too literal to be reasonably used, there's opportunity cost.

For this reason, I advocate for both high transparency and high flexibility. APIs should be explicit about what they've done, clearly marking original (userSelected) and translated codes, maybe even carrying the mapping identifier inline as an extension perhaps. Conformance should remain clear and binary - the result of "shaping" a resource should be valid by the standard conformance rules. And servers should be clear about what shaping they will or will not apply implicitly.

Implementing classification at scale and making APIs that respect intent means patients can be more confident their wishes are followed. Innovators can focus on their ideas rather than dealing with esoteric EHR formats. And people delivering care today can get a more complete, more trustworthy picture of the patients in their care.

If you enjoyed this, make sure to follow me on Twitter, or over on LinkedIn. Or:

Grenz on Health

Discussion about this post