Securing FHIR

Classification and Locality

Jul 31, 2020

At first, securing health data looks a lot like securing any kind of data: role base or attribute based, probably some Access Control Lists (ACLs), encryption. Normal stuff. Until you realize the depth of semantics, the meaning, of the data makes this particular domain a whole lot harder.

Early implementations of FHIR servers and APIs have enjoyed early implementation simplicity. But as adoption explodes and FHIR becomes the backbone of a new era in health data liquidity, more complete capabilities are becoming urgent.

Classification and Locality

Most simply, protecting data requires one to understand two primary characteristics of the data:

Classification, knowing what it is that you have, is the most direct part of securing data. Security rules nearly always start here:

GRANT READ access for LAB RESULTS to MANISH

There are dozens, hundreds even, of methods for classifying data via all kinds of schema and terminology systems. You might use a resource or document method like JSON schema or an XSD. Or maybe an expression based approach like XPath. FHIR provides a healthcare specific mechanism via its StructureDefinition and related conformance module.

Locality, the position of the data within the larger graph or ecosystem can be a little trickier, but thankfully is usually as simple as “who’s this lab result for?” or “who ordered it?” For the graph minded, a SPARQL expression might be useful here. Again, FHIR provides something like this in its compartments (and also in GraphDefinition), so we’ll explore there first.

To limit the scope of the discussion a bit, I’m not going to talk about how we might do LDAP-ish sorts of things like putting users in groups and then groups in groups, etc. I’m going to assume something else get us to a place where we have a “permission” that we need to evaluate, something like “READ LabResults of CurrentPatients”.

Classification in FHIR

So, this is pretty much my favorite topic in health data. I laid some groundwork earlier talking about the “Shape of Health Data”, alluding to the principled amorphism needed to really use health data. It’s not enough just to look for a particular code or a particular value when deciding “what do I have here”. You have to really understand it.

Profiles (how HL7 refers to StructureDefinitions when you actually use them) provide a nice framework for this. They can do things like describe the attributes of a Patient and, when useful, get really specific about those attributes.

Here, a Patient (which is a “Resource” and also a “DomainResource”) has an identifier of some sort, and a “Clinic Patient” has a very specific “clinicId” and also optionally a “nationalId”. At my clinic, using this profile, a user could expect to see these identifiers on our records.

If you hand me any Patient resource, I could use these characteristics to tell whether I have a “Clinic Patient” or some other kind of Patient.

Some simple shaping

So, that’s useful and a little interesting. There’s all kinds of folks out there building FHIR profiles (14,000+ of them on Simplifier, and others in the official registry), and the tools and techniques have matured quite a bit. But as-is, profiles are pretty binary: I’m either a “Clinic Patient” or I’m not. Which might be ok in that case. But it’s not very useful in securing data.

Not so long ago, I was told we needed to grant access to our patient data to a group from customer service. Taking privacy seriously, these folks only needed a small part of the patient resource to do their job: just match up an ID coming from an alert system to a phone number they could call to pass along some important info.

While we could’ve just sent the whole resource (with address and email and date of birth, etc.) over to the call center and had the call center app be careful not to leak this other info, our foundational principles told us that we couldn’t delegate that trust to an app. We hold the data, we’re responsible to protect it.

Here’s what we did:

We created a profile for the call center that excluded all the attributes not appropriate for the use case. We used that profile as the contract when the call center app connected to the FHIR data store, shaping a resource to the contract at runtime and creating a security appropriate projection.

In classic thumbs up or down fashion, a normal FHIR validator would tell you that the result (on the right in the illustration) is a valid “Patient 4 CallService” resource. Great!

What’s even more fun is that the call center can update the patient’s phone number using this same contract; to the app, this is the complete resource. It simply PUTs the changed resource back and the server, knowing the app’s contract, cleverly PATCHes the changed number into the original resource, leaving email, gender, and date of birth all as they were.

If this interests you, there’s some older writing about this “Profile Governed API” concept over here.

And a little more complex…

Redacting some fields can be pretty important when securing data, and being able to use the general classification system from FHIR (those profiles) to do it means security is well aligned with the other things profiles do. But what was I saying about meaning being critical to securing health data?

Assume for a moment that you’re working for a diabetes management company that has clients all over the place. Your patients are getting their labs done in clinics and pharmacies and even mailing in samples for testing. You’re responsible for doing some analysis on these tests, specifically on HbA1c levels, so you need to tell the FHIR data hub that holds all this data how to grant access to the Observations in question.

Problem is, not all the labs use the same terminology to code their results. And the units of the results vary. And also, you prefer your results in French even though they came in using a variety of languages.

So some of these are post processing kinds of problems: go find your German to French dictionary and write some transform rules! But at the core is a security problem. Our FHIR hub contains all kinds of lab information, much of it quite sensitive. We care about privacy and the trust our patients have given us, so we want to make sure our researcher get all the HbA1c results and nothing else.

Luckily, we’ve taken the above approach and made it capable of handling this scenario. As before, you define a profile for the HbA1c results you need. Being an interoperability minded fellow, you choose the LOINC standard code (41995-2) to specify your target test. You also specify units of mg/dL and that you prefer French where possible.

Having granted access to you (GRANT READ Obs4Research), the FHIR server now needs to classify what it has to determine if it can show you any particular result. For Observation 75, the initial answer is “no, not an Obs4Research” (that’s not LOINC 41995-2 in mg/dL). But the more interesting question is not “is it an Obs4Research?” but rather, “can I safely make it an Obs4Research?”

You see, as a researcher, you don’t know the ins and outs of all the different labs. You have no idea what a mess lies under that hood. You were declaring an intention: “show me all HbA1c results in mg/dL, in French.” The server has the responsibility of understanding your intent, understanding the data it holds, and evaluating how they match up.

Assuming you can map that local code to the LOINC code (hopefully!), and assuming you know how to convert those units (probably!), and assuming your terminology server loaded up the French name for HbA1c (surely!), you can accurately classify this record as aligned with the intent of the research and include it in the results.

I know there’s a whole lot of hand waviness to that. I’ve written more on how exactly this could work here.

Locality

Now that we can evaluate what we have (including asking if we’re clever enough to understand intent), we still need to make sure that what we have belongs to, was authored by, is about, or otherwise is related to the requestor. In other words, the locality of the data in relation to the big picture.

This is basically a graph traversal problem: is the data in question have a path (the right kind of path) to the requestor. For example:

As a patient, I want to read my records.
As a provider, I want access to patients I have or soon will be seeing.
As a call center agent, I need access to patients covered by the organization I represent.
As an app on the Play Store, I need access to data for users who have signed up for my service.

For the patient, this is pretty straightforward in FHIR: all FHIR resources related to a patient have a direct reference to the patient.

The only trickiness is that I, a person, might be multiple patients from a recordkeeping standpoint. But still, not too bad.

The provider case is harder - have seen or will see can take a few paths. Have seen is hopefully logged via an encounter. Maybe an appointment. Will see should be an appointment.

You’ll have to decide if any kind of participation in the encounter counts. Also, if you’re covering for a colleague on vacation…well, that’s a whole other thing. But still, a pretty finite path.

Our call center colleague is starting to get interesting. Now we’re traversing the agent relationship with the organization, the path from the organization to the patient (via a coverage or contract probably), and then thru the patient to the data. Whew.

The last one’s the best. As an app, I don’t really have any inherent relationship with the patient. I have rights only because the patient (or a delegate) has consented to my access. So app via a consent record to the patient and on to the data.

So, from direct, one-hop connections to rather complex paths. But probably not too many kinds of paths. While the classes of data that actors might access may vary widely, the fundamental types of actors are relatively few: patients, consented delegates like family and apps, medical professionals, service reps. They’re generally “in” either because there’s a contract/policy/law that gives them access, because they’re delivering care, or because the patient has explicitly given them access.

Evaluating Locality

Because the kinds of paths are relatively few, and many of the paths overlap (all the above examples connect to the patient and then on to the rest of the data), evaluation of the locality of data as its written may be feasible. In FHIR, some servers take this tactic when evaluating compartments, and what I’ve described above could, if you squint, look a whole lot like compartments.

Like classification, there’s more to say on how to actually make this work. So…coming soon!

All Together Now

Holding health data is a big responsibility. Your customers are trusting you to keep it private and secure. Normal security policies are table stakes but can’t really link the data you have to the intent of the policy or consent that allows access.

Using the model and structure of FHIR along with the power of profiles and standard terminologies allows systems to move beyond the literal towards a working respect for patients’ wishes.

Much of this content was part of a presentation I made to the FHIR Meetup in July (thanks Pavel and Co!). If this stuff interests you, subscribe here, follow me on Twitter, or over on LinkedIn.

Grenz on Health

Discussion about this post