An Open Clinical Terminology?

This is fantastic stuff and is exactly what I had hoped SNOMED would be doing - putting out good and usable, permissively-licensed open source tooling to make it easier to work with. The Dockerization of the stack also really helps for those who don’t want to sully their development machine with Java badness :wink: Top marks for the docker-compose.yml which makes getting the whole stack up and running super easy.

I’m still surprised that there hasn’t been a better way developed for uploading the SNOMED-CT files - ie could not this step be managed with some kind of SNOMED ‘package manager’ or even using a Git server and URLs? I appreciate it’s only a single manual step, but manual steps are the enemy of continuous integration and other forms of automation.


Yes, absolutely. For now, we’re trying to walk before we can run and have been focussed on making the sure the main functionality is good to go. This terminology server is also the one that we will use for authoring/managing the terminology so it has a rich feature set, most of which not needed for most users.

However, we’ve had requests for this from developers in other countries, so our plan is to develop the functionality to allow devs/users to ‘request’ an edition/extension which will then be retrieved and imported without any other manual intervention, with the correct version being imported (a snapshot version or a delta if there is an existing version in the terminology server).

Hopefully later this year, after we’ve completed full FHIR compliance and other things on our shopping list.

1 Like

We got the ICD-10 toolchain going on Git (and the foundations of the UK ICD-10 tools are as a result arguably better than the WHO ones), but doing this for SNOMED CT is a harder problem, not because of the design of Git (the object model of Git is a great design for lots of collaborative works), but because of the underlying limitations of “file system” as a database. We got away with it on ICD-10 because it involved on the order of 10^5 nodes. SNOMED CT is an order of magnitude bigger at 400k nodes for just the core graph, and while *nix file systems are probably OK with that, NTFS and Windows really start to choke on that many files (not to mention, the multiplication of the typical overhead of all the virus checking and scanning tools most IT departments stick on Windows).

It’s a great ambition though. Part of the problem with the (hooray!) defunct IHTSDO Workbench was that it tried to solve the version control problem with 30 year old version control design (internally it was designed like RCS - only without the convenience of abstracting the version control layer away by) - those models were originally designed for version control of single objects and things like CVS are just hacks on top to orchestrate the companion versioning of multiple objects, rather than models like Git which treat revisions as a single composite object.

One of the things I seriously looked at when working on those tools were backend libraries for Git that used something other than a raw file system for storage, and the state of the art in that space may be a lot more mature now.

I had the notion that the systems for authoring and distribution should be not that different and Git seemed like a good choice to stand that on.

The other core design decision that was a problem in those tools was trying to make a generic terminology model that was itself a generalization that would support SNOMED CT. That lead to a great deal of complexity that sat squarely on top of what was already a complex metamodel designed to be very general. I note that snowstorm is billed as a “SNOMED CT terminology server” and not “a terminology server” and I presume (hopefully) that means it’s not trying to be a general terminology server.

1 Like

The main problem about open sourcing is the ever changing nature making it impossible to be consistent and comparable over time.

SNOMED is consistent and abhors change without reason and decisions are accepted and used by all their users
I can’t see that happening in the open source world…

I don’t think you’ve fully understood open source, judging by this comment. Are you involved directly in any open source projects? There is still centralised control of any open source project. It isn’t a complete free-for-all.

I would disagree with this quite strongly. Ironically, I am this week involved in some discussions at national UK level about serious and potentially breaking changes that SNOMED International are apparently imposing, which will have huge deleterious impact on our install-base of UK GP Systems. (more on this soon)

Yes the UK has representation at SNOMED International, however this tends to be an expert Terminologist rep, not a clinical rep. The proposed changes, while working towards an ontologically perfect terminology, may significantly undermine the actual primary purpose of the terminology (ie recording clinical care).

Not sure that open source would be a good solution rather than changing the governance structure of SNOMED International.

My experience has generally been that “changing the governance structure” of large international organisations is non-trivial.

"Hi SNOMED International, Marcus here. Yes that one. I’m new here and I’m not a career terminologist, but I’m just wondering if you’d mind changing your entire governance structure for me? Aiming to be more community-oriented, consensual, and clinically-relevant. You know, like an open source project?



They hung up - can you believe that?"


I have a long answer and a short one. The short one is that I believe that codifying medicine is killing the essence of medicine. William Osler coined the essence as: “Medicine is a science of uncertainty and an art of probability.” During the 50 years I have worked in hospitals all over the world, I always have enjoyed a good letter to amice. The last 20 years at best I had to accept sort of telegram style nonsensical referral letters, mostly the patient was transferred with one-word referral, like ‘headache’, ‘gallbladder’, ‘acute abdomen’. Often the single word had no close relationship to the patient or the complaint, without any mention of context. Dr Thornley just wrote a blog about the “Demise of Medicine”. It is a disaster waiting to happen?

Surprisingly so, I do like ontologies, because if designed well they represent at each step a question/decision/answer. The problem is the uncertainty we have to deal with. So, the idea of Diagnosis Related Imbursement of doctors is absurd, and yet everywhere in the world entertained by insurance companies and political parties. ICD10-11 is very popular for this purpose, even though it is designed for classifying causes of death, not daily practice. No surprise, in over 40% of death certificates, pathologists cannot relate the text in the certificates with reality, diagnostics in real live are even worse. Relating ICPC’s and ICD10-11 is impossible, and an example of the serious gap between the GP-bubbles and those of specialists. My conclusion for long is that we have developed natural language over 1000000 years, and 60 years of codifying has been a very nice experiment, which now kills the essence of medicine, the dealing with uncertainty.

The right diagnosis can be pinpointed in 80% of cases by good Medical History Taking. We need natural language for that and smart questions. This is how doctors (should) think, based on symptoms and signs the patient can report and show. That is called communication. The gathered data should be collected into a casebook with a pattern every doctor should have learned in medical school. In my own experience nowadays, visiting a doctor means an encounter a person who is glued to a square screen, does not introduce 'it’self, and often turns out to be a nurse or aid. NIH informs patients that a large part of medical care is provided by nurses. In the UK this has been ‘codyfied’.

As an anesthesiologist I deal with easy work, which if it goes wrong has serious consequences. This is the typical business model for insurance revenues, dealing with incidents with high impact. That has triggered me into my quest for intelligent, smart and efficient medicine. IT has a lot to offer, because smart, intelligent and dynamic questionnaires are able to extract very useful data from patients, and those data can be translated into structured and patterned information doctors like. This is the essence of my current work, MediPrepare Open Source Project. Every doctor now could create Expert Medical Systems with our tools by creating questionnaires for all 130+ specialties and incorporate their expert knowledge. The data can be translated into valuable information leading up to a differential diagnostic path which can be started by the patient.

So, I believe that using the route of natural language in Medicine for many years to come will be superior to communicating in digitized codes. Eventually smart computers will be able to dissect our natural language to the point that we can let them communicate by digits. For now Codifying Medicine is killing patients and doctors. In the USA third cause of death is medical mishaps… Maybe we need meaningful IT, created by close cooperation of doctor and programmer, like was used in the Caduceus Project at Pittsburgh University Hospitals around 1980.

My 5 cents, Hans

1 Like

Hans, thank you for so eloquently stating an opinion I arrived at after close to a decade of maintaining healthcare code system maintenance tools, which I carry forward into my opinions about EHR software - it’s all about communication, and the endpoints that really matter are the humans.

If you’re going to communicate with the limited endpoints of an API to elicit a service, sure, codify your inputs. But to me, the vast bulk of the drive to codify and structure health data is from the (noble, but also potentially profitable) desire to mine it for data, rather than the desire to serve an individual patient better.

As you say, getting the right code from ICD-10, which in the UK is around 14,000 codes, is hard enough. Having your insurance payment depend on having chosen the right code is harsh.

I hear ICD-11 is far more complex. But probably a game of tic-tac-toe next to using SNOMED CT for the same purpose.

Do any secondary care systems code as you type??

So when type ‘patient has asthma’ it automatically prompts you to select a code for asthma?

I’ve not seen it. I’d always had the impression, doctors would code items as it made it easier for them to drill into the medical record at a later time (in primary care). As a side effect it enabled reporting. [However in other sectors, codes seem to be done primarily for reporting, not care]

I’ve got so much to say on this I’ve started a new thread so as not to hijack this one.

I am glad that in Australia SNO-MED has much less of a focus and impact on the system.
Instead we have ICD and DRG (disease related grouping) system and SNO-MED is just an internal thing within hospital systems.

However, the positive side of having better terminology and codification is the incentivisation of the more specific and complex diagnoses and rewarding less the non-specific and minimal documentation approach to diagnosis.

In my opinion, mining for data can better be done by natural language clues. That way one could easily incorporate adjectives such as severe, thunderstrike-like etc. That is the beauty of modern IT, one can let it learn. My wife makes use of that in analysis of millions of emails about fraud at the Fraud Help Desk.

Yes, I believe we need a worldwide medical standard vocabulary. However, it is dangerous to use ICPC with 150 codes, ICD11 with 55.000 codes, SnoMed with 170.000 codes, for these codes represent not context of diseases in real patients, not provide phenotypical clues, constantly are new diseases added. I have tried to use ontologies, because each step represents a question. Have given up, because ontologies can’t deal well with synonyms yet.

Yes, I try to make use of standardized expressions in questionnaire design. But the main focus is to incorporate Expert Knowledge, not codes. I have never seen a patient’s life saved by a code…

describes the DBC system in the Netherlands. Assumption is that a diagnosis has been made at entrance of healthcare.
A disaster for all parties in my opinion. This has caused a lot of problems. See f.e.

Having thought about this a bit more, I think we need to separate the two parts:

  1. Namespace - a single immutable label that refers to a single immutable concept for ever. This should be like a character set eg ASCII or UTF-8. The code means the character. That’s all. No logic and no ontology at this stage. This means that (apart from probably needing to expand the namespace every so often) that code will always mean that character.

  2. Everything else - the hierarchy, ontology, combinatorial logic etc is added as a further layer or set of layers on top of 1). Doesn’t have to be managed by the same people who manage the namespace. There can be a multitude of different hierarchies, ontologies, classifications etc (as indeed there already are). But this would work in the same way as French and English having their own, perfectly internally consistent rules, despite using the same character set. As long as you have a common character set, though, you can build tools that work for both.