After building a number of Longitudinal (or Comprehensive) Health (or Person or Patient or Human) Record systems, collectively "LHRs", I've concluded a couple keys for success:
LHRs answer questions
LHRs help users navigate trust in data
We'll start with the first: what does "answer questions" mean? It means giving an answer that is immediately useful - like when Alexa answers the question on the spot vs. saying "hmmm...here's something I found on the web". It's the difference between a provider opening up the pdf equivalent of a box of patient notes and a clear, concise condition summary. Answering the question means that a machine without any of the nuance or knowledge of a human can continue its workflow unaided.
I don't mean that an LHR needs a specific kind of interface. Natural language is great, and if you're in a company of any size somebody somewhere is working on a voice product. It might be a web service interface of some sort that can be called in realtime, and you should probably be thinking about this. But it can also be a SQL query interface into a table that holds the answers discretely.
In any case, valuable LHRs don't leave their users with pieces of a puzzle, leaving results open ended in terms of time, trust, or meaning. They answer the question.
Categories of Questions
There's at least this many kinds of questions to consider when building an LHR:
Single discrete: one answer. Example: date of birth.
Time variant discrete: one answer at a point in time; single discrete at the current moment. Example: Patient's BMI.
Historical boolean: Has this ever happened? Example: Ever diagnosed with diabetes?
Discrete list: all real world events ever for this concept. Example: HbA1c history.
Curated list: a reconciled list. Example: medication list, current conditions, allergies, etc.
Discerning the signal of real world events from the noise of reported data is the common thread. Each of them requires a different kind of "question answering machine", and within each category, different topics will require different approaches. For example, reconciling a list of current medications requires pharmaceutical expertise not required by the birth date question.
This means LHRs have a logic scale challenge. You'll need a strategy that allows many teams to contribute to the question answering capabilities of the LHR platform within a common question intake/expression framework.
Start with Meaning
The first problem in all these categories is to zero in on the meaning of the question: what exactly are you asking? I like to start with simple questions because they're easiest to define precisely. Date of birth for instance is pretty straightforward:
It's SNOMED-CT code 184099003: Date of birth (observable entity)
It's LOINC code 21112-8: Birth date
It's PID.7 in HL7 v2
In FHIR, Patient.birthDate
Etc... This is as easy as it gets, and we didn't even talk about all the tables and databases in your environments. Luckily, each of these is directly equivalent in meaning, and data attached to any of them would be a good candidate to answer this question. Most questions aren't this simple.
Without going down the knowledge engineering rabbit hole (today!), I'll say that when building an LHR with any ambition, you're going to need some ontology chops both in terms of human expertise and technology. You need a system than can reason over your knowledge. You can't answer "has the patient reported pain?" if you don't know the many, many ways pain can be recorded.
SNOMED is becoming the go-to for expressing the concepts about which an LHR might be asked. It won't cover everything, but it's powerful and extensible. Most likely though, you're going to need a multi-vocabulary approach (probably including LOINC, RxNorm on the question expression side), and you'll need the ability to define new concepts as needs arise.
Add Context
With the base concept in hand, we need to layer on some context. SNOMED defines some default contexts:
Negation: We're talking about things that happened (vs. didn't happen, have been ruled out, etc.)
Subject: It happened/applies to the patient (not a relative or someone else)
Time: It happened now (whenever "now" is)
On the question side, these become part of the ask (birth date for the patient, most recent tetanus vaccination, etc). Context is often carried outside of the coded meaning. AWS Medical Comprehend for example includes entities/traits for negation and time.
You'll need to decide how to express the full context of the question and how to identify/annotate context in the data. Again SNOMED is pretty good at the expression part, but you'll need some machinery to take advantage.
Think Big, Start Small
The problem space is clearly pretty big here, connecting the complexity of meaning end-to-end between the question and the pile of data holding the answer. It's essential to start small. Too often I see teams picking the wrong MVP or "thin slice" when confronted with this complexity. Starting small is valuable because it allows us to experiment, testing the biggest unknown. The goal isn't to conceive of problem we know we can solve. It's to probe the limits.
A good example could be the birth date question. Let's assume we have a few sources of data we want to consider: the official registration system, a couple different EHRs sending clinical notes, and an app the patient uses directly. Some good experiments might be:
Organization: can we scale the needed support (legal, consent, stewardship, etc) to access this data for our purposes?
Meaning: can we identify and annotate where "birth date" appears within the data in a scalable way?
Access: can we query data where it lives at question time or do we need to move it?
Logic: do we have a scalable way of capturing and executing the unique logic needed for this answer (i.e. can we build the first of many question answering machines)?
On the other hand, a less useful start might be: Can we build an ETL job with static rules and static sources to populate a "birth date" table? I'm confident you can do that.
So pick a small part of the big problem and test that.
What's next
To recap: LHRs answer questions, and we need a good framework for precisely capturing the ask so that we can engage a fleet of question answering machines to respond. This fleet will be as diverse as the questions we ask, so we'll need a way to manage them. And getting started on all this depends on us finding the right experiments and learning as we go.
In my next post, I'll talk a little more about those question answering machines and the data that holds the secrets we seek.
Great read, thanks. Could you also talk about the kind of questions that LHRs get asked?