Skip to main content

Semantics in action: a guide for representing clinical data elements with SNOMED CT

Abstract

Background

Clinical data is abundant, but meaningful reuse remains lacking. Semantic representation using SNOMED CT can improve research, public health, and quality of care. However, the lack of applied guidelines to industrialise the process hinders sustainability and reproducibility. This work describes a guide for semantic representation of data elements with SNOMED CT, addressing challenges encountered during its application. The representation of the institutional data warehouse started with the guidelines proposed by SNOMED International and other groups. However, the application at large scale of manual expert-driven representation led to the development of additional rules.

Results

An eight-rule step-by-step guide was developed iteratively through focus groups. Continuously refined by usage and growing coverage, they are tested in practice to ensure they achieve the desired outcome. All rules prioritize maintaining semantic accuracy, which is the main goal of our strategy. They are divided into four groups which apply to understanding the data correctly (Context), and to using SNOMED CT properly (Single concepts first, Approved post-coordination, Extending post-coordination).

Conclusions

This work provides a practical framework for semantic representation using SNOMED CT, enabling greater accuracy and consistency by promoting a common method. While addressing challenges of large-scale implementation, the guide supports the drive from data centric models to a semantic centric approach, leveraging interoperability and more effective reuse of clinical data.

Background

Semantic interoperability in healthcare is held back by the incapacity to generate both semantically rich and easily computer-processable data. Data is often represented with use-case specific models, without explicit metadata representing its semantics. This inevitably leads to continuous low levels of interoperability. Attempts at building a global terminology have led to very broad coverage and important challenges at keeping global coherency [1].

Hospital-wide semantic representation is a major challenge, as a precise, contextual, and consistent representation of clinical information is necessary. At Geneva University Hospitals, a large hospital consortium, the institutional data warehouse (DWH) is a NoSQL database. It holds data generated by operational activities in near real time and is populated from multiple source systems. Over 150 ′000 data elements are stored in collections with varying ad hoc structures which often don’t accurately portray their intended meaning. The collection names, variables, and their assigned value sets are often unclear [2]. The most deterministic approach is to represent the meaning of data as it is acquired. Data models, however, most often represent data as it will be used. Additionally, data models often do not refer explicitly to a strong semantic ontology, limiting proper reuse of clinical data [3]. When representing data elements, one must first understand them, as slight variations can change the meaning drastically. Formularies, for example, contain information surrounding the label, often lost when storing data. Considering the data element’s acquisition context requires human interpretation by accessing the patient record. Since meaning goes beyond the label of a data element, a proper semantic description often cannot be done at that level.

Tools to support automatic semantic annotations have been shown to be lacking [4, 5]. The structure of clinical databases is not sufficient to allow these processes without information loss, and manual correction and verification is necessary [6, 7]. Therefore, though time- and resource-consuming, manual representation can become a real asset to leverage data accurately and consistently.

SNOMED Clinical Terms (SCT) is one of the most comprehensive explicit formal representations of meanings in healthcare [8, 9]. It contains more than 360,000 concepts, three million relations [10], and a compositional grammar allowing for post-coordination (PC) [11]. PC limits combinatorial explosion while allowing to represent an almost infinite number of concepts [12]. As there are often multiple ways to express the same thing, a consistent approach is important to limit redundancy [13]. As a unique contender to leverage semantic interoperability in healthcare, SCT’s adoption has been growing steadily [14,15,16,17,18,19], but its implementation has been slow [20, 21]. This requires a clear, reproducible guide to reach maximal efficiency, and a learning process to become familiar with using SCT. Existing guidelines, which are discussed in further detail in Sect. " Comparison to existing guidelines", do not cover all bases.

The main contribution of this work is a set of pragmatic rules for semantic representation of clinical data with SCT. It summarises the work of the team dedicated to this process, with various levels of seniority, clinical knowledge, and experience with SCT. We describe the elaboration of new rules through an iterative process and demonstrate practical examples. We hope that by sharing this process and its outcome, the necessity for human intervention and manual representation can be demonstrated. Furthermore, a common, DWH-agnostic, approach to semantic representation can lead to homogenous outcomes in heterogenous settings. This has the potential to contribute a significant improvement to semantic interoperability, particularly in multi-institutional settings. A semantically centred approach to leveraging real world clinical data can better support care processes, research and public health [22].

Methods

The guide presented in this work results from a long-term semantic representation effort with the goal of bringing semantics to the institutional DWH. As this process grew organically in scale and complexity, it was gradually complexified and rules were defined in focus groups. The creation of the guide depends on two core elements: a framework to teach and harmonise semantic competency within the team, and a creation cycle allowing the curation of a set of rules.

Competency building

The process to teach the use of SCT is based on three pillars designed to be followed in order. First, newcomers follow the basic training courses and documentation made openly available and curated by SNOMED International (SI) [23]. These are sufficient for one-dimensional representation such as a patient problem list value set [24]. The second pillar is made of internal documentation and practical work. These give trainees a deeper understanding of tasks focused on real data and local specificities, and the ability to semantically represent data. Finally, more advanced SI courses, such as the Authoring Course [25], are proposed for core members. These allow the person to discuss rules with the team and participate in the rule creation cycle described hereafter. Newcomers are supervised closely by a more experienced team member and moved on to more complex task when deemed sufficiently proficient.

This framework is designed to ensure production of a harmonised, stable guide by ensuring every participant in the discussion has a common background and vision.

Rule creation cycle

As more complex data elements are encountered in the process, PC becomes unavoidable. When items such as medical forms are added, SI’s rules become too restrictive, so task-specific rules must be agreed upon to maintain coherence. The creation cycle acts as a framework necessary to avoid losing track of an increasing number of internal rules. Indeed, the work started with a very open approach to semantic representation. It then evolved somewhat erratically over time, until the first set of rules emerged organically, and the harmonisation process started.

The cycle revolves around weekly focus groups comprising of team members which participate in the semantic representation effort. During focus groups, problematic situations and potential solutions are discussed. They are made up of a varying amount of people (2–5 depending on the team’s setup at that time) with heterogenous backgrounds (clinical, billing, bioinformatics, IT, students). The more experienced members all have proficient experience with SNOMED CT, including certification given by SNOMED International such as the Authoring Certification. Topics are defined based on the questions or problems each team member is currently working on. When an issue is identified, existing guidelines are thoroughly assessed to determine if they can solve the problem, and applied if that is the case. If the scope of existing guidelines is not sufficient, a new rule is necessary. For this reason, existing guidelines and ours were not formally compared on the same cases. When a consensus is reached through debate, it is put into practice, and evaluated for coherence with previous rules. This ensures the intended meaning is maintained and unambiguous. If satisfactory, it is added to the list, or re-discussed as needed. Earlier work is also reviewed with new rules to ensure they comply with previous versions. This process is repeated (Fig. 1) until rules are sufficiently tested, and constant evolution has led to the current guide.

Fig. 1
figure 1

Cycle describing the method of elaboration of the guide

To accommodate the constraints of a relatively small soft-funded team with a high-turnover rate and limited time and resources, prioritization follows the Pareto principle [26, 27]. Most frequently instantiated collections are represented first, and only the data elements which account for around 80% of instances are represented. For example, data collections with high instances and clinical value such as the patient problem list and laboratory procedures were among the first done. The data elements in these collections are then ordered by count from largest to smallest, and manual representation is done until at least 80% of instances are covered. As work is done in parallel on separate tasks, no inter-annotator agreement studies are carried out. Specific problems are discussed during focus groups, and only unanimously agreed-upon outcomes are validated.

Results

The competency building framework and rule creation cycle have been applied for the past 4 years [2], resulting in the following set of rules. They apply first to fully understanding the data element’s meaning, then to representing that meaning accurately (Fig. 2). The rules are grouped into four categories: 1) Context, 2) Single concepts first, 3) Approved post-coordination, and 4) Extending post-coordination, which each contain two subrules. Semantic accuracy is the foundation of our strategy and the intended outcome of the guide’s application. This principle extends to all the following rules, and examples of how semantic inaccuracy can lead to meaning loss are given throughout. Most of the rules apply to more complex tasks, so concern mainly post-coordination. We follow an order of priorities, grouped into the four categories mentioned above, to use the simplest and least number of codes possible and to ensure consistency. A quick guide summarizing the rules and a detailed example of the application of the guide is available in Appendix A.

Fig. 2
figure 2

A summary of the rules established and their intended sequence. After understanding the context (steps 1 and 2), the corresponding SCT expression is derived from one of the following steps depending on the complexity of the data element

Context

Context encompasses both the clinical acquisition context, i.e., patient and medical service characteristics, and the structural context, i.e., how fields are structured in a clinical form. Both are equally important to consider. When an unclear data element is encountered, context is needed to define it before representation.

Structural context

When representing fields from a clinical form, the following situation was encountered in the DWH: a form called “PF_0886”, with an id of “FORMIDOC_90” and label of “suites” (roughly translated to “what happens next”). Without further investigation, this field could be incorrectly represented as “308273005 |Follow-up status (finding)|”. However, after opening the EHR and finding the correct form (Obstetrics service—Pregnancy record) at the correct section (Post Partum), it can be correctly understood and represented (Fig. 3).

Fig. 3
figure 3

Complete understanding achieved through visualisation of the data element’s structural context (translated from French)

Sometimes a data element encountered seems simple to represent, but is ambiguous or lacking detail, and context is needed to disambiguate it. The label “breast”, for example, can have different meanings when located in two different forms, once as a checkbox and once as part of a drop list. It is easy to represent both as “76752008 |Breast structure (body structure)|”, but that would be incorrect (Fig. 4). Although both are part of an “oncological history” section, one is about the patient, the other about the patient’s family.

Fig. 4
figure 4

The term “breast” has different meanings depending on context in the electronic health record (translated from French)

Clinical acquisition context

The environment in which data is acquired plays an important role in the meaning attached to it. One example is the language used by the caregiver and the system. Most of our database is in French, but the most complete version of SCT is the International Edition, which only includes English descriptions. SCT is described as language-independent but was created by an English-speaking team, and therefore has an English-biased concept segmentation. If concepts are sufficiently defined, their relations should unambiguously convey the intended meaning. However, considering the large number of primitive concepts with little or no stated attribute relations, errors remain possible. Specific care must be taken to avoid semantic shifts to properly represent French expressions in English concepts while conserving meaning. Even words which are lexically identical can be semantically opposites, and just translating the labels of data elements can lead to major errors. For example, in English, nyctalopia is defined as the inability to see in low light, or night-blindness [28]. In French, it is the opposite, as nyctalopie is the ability to see in low light [29]. Abbreviations can also mean different things, such as BAV, which in French can mean “Bloc auriculo-ventriculaire” (233917008 |Atrioventricular block (disorder)|) or “baisse de l’acuité visuelle” (13164000 |Reduced visual acuity (finding)|). The temptation to use the first code which comes to mind must be resisted and the context understood first.

Single concepts first

It is important to be as precise as possible by choosing the most granular code, preferably with single concepts first. Granularity increases as one descends a hierarchy, as children are refinements of their parents [30, 31]. There are different ways to ensure use of single codes first to avoid unnecessary PC.

Exploration

The first is by exploring SCT’s poly-hierarchy and inferring attributes from parent concepts. One concept often has multiple parents, to which it is linked by “is a “ relationships [30], and multiple descendants which are linked to it by the same relationship. By exploring the parents, children of neighbouring branches, and semantically similar terms (“tree-walk” [12]), the most appropriate concept to use can be located [32]. When deciding which concept to use, choosing the correct hierarchy, and paying close attention to the synonyms, descriptions, and relationships is very important. For example, take the French term “thrombaphérèse”. When searching for the equivalent concept with “thrombapheresis” or “thrombopheresis”, no results appear in SNOMED CT’s browser. However, by searching for a similar term such as “plasmapheresis”, then exploring its parents and its children, the appropriate term of “plateletpheresis” can be found (Fig. 5).

Fig. 5
figure 5

Exploring the SNOMED CT hierarchy to find the appropriate concept by (1) finding a similar term, then (2) exploring its surrounding concepts

When there is no sufficiently precise single code, using a more general concept should be avoided as it is preferable to avoid any information loss.

Inference

As stated above, inference is already included in SCT with stated and inferred concept definitions [33]. Using clinical knowledge and common sense, inference is also possible by the annotator when certain details are inferred from the context with a high level of confidence. This allows the use of a single code which appears less precise but is complete. When representing the concept “fracture of the tibial tuberosity”, there are multiple options available (Fig. 6). The pre-coordinated expressions available have additional information on fracture morphology, so PC seems the best option at first. In clinical settings however, the distinction between open and closed fractures is generally only made when the former is true. Therefore, if a fracture is not explicitly described as open, it can be considered closed, and a single code chosen to represent it without losing any information. The opposite applies to wounds, which are considered open by default.

Fig. 6
figure 6

Inferring information to avoid unnecessary post-coordination

Approved PC

When post-coordination becomes unavoidable, different styles means there are different ways to express similar meanings [34, 35]. While this is unavoidable, rules are necessary to maintain consistency as much as possible.

Multiple focus expressions

First use multiple focus expressions, which are usually limited to concepts from the same top-level hierarchy [36]. When representing the clinical concept “post-infectious reactionary bacterial arthritis”, two SCT concepts encompass almost all the required information: “12913305 |Post-bacterial arthropathy (disorder)|”, and “239783001 |Post-infective arthritis (disorder)|”. They are neither equivalent nor complete however, as information on inflammatory morphology and bacterial causality, respectively, is missing. PC is therefore required. Combining the two gives a fully defined term and allows representation with a high level of precision while avoiding complex PC (Fig. 7). When semantically equivalent representations exist, always choose the simplest one.

Fig. 7
figure 7

Different but semantically equivalent ways of using post-coordination

Refinements

The next step is to use PC with refinements defined by SI, which becomes complex with the addition of attribute groups and nested refinements [37]. This must be done in compliance with SI’s rules, but will not be detailed specifically as sufficient guidelines exist from SI and others. Referring to the MRCM and Editorial Guide is generally sufficient for proficient refined PC [38, 39].

Extending PC

Despite SCT’s extensiveness [40, 41], challenges inevitably arise when no pre-coordinated expressions or approved post-coordination fit. In this case, extending SCT’s compositional grammar and editorial guidance is the only solution. Such changes modify the MRCM and must be added to the institution’s classifier to avoid classification issues, which can be a complex process. Different examples are shown in Table 1.

Table 1 Attribute extensions and use of unapproved attributes

Domain and range extensions

The first way is by using approved attributes in situations that resemble what they have been approved for. This can be done by extending an attribute’s domain, range, or both. Since these are approved attributes, maintaining coherency with current rules is mandatory, which means that definitions of attributes cannot be changed. For the attribute “255234002 |After (attribute)|”, “ < < 71388002 |Procedure (procedure)|” has been added to the domain, but the range and definition remain unchanged, only expanding the scope.

New attributes

As a last resort, unapproved attributes should be used sparingly, as they may disappear or change in future SCT versions. They are useful for refining hierarchies which have no approved attributes, such as “410607006 |Organism (organism)|”, and for representing scalar values such as sizes [42]. To ensure consistent use, a complete set of domain, range, cardinalities, and definition must be validated. These new attributes, if correctly classified, can then be submitted to the national SNOMED centre for further approval if necessary. As always, but especially for new attributes, all expressions are reviewed extensively during focus groups to ensure semantic equivalence with the source data and a uniform approach.

Discussion

As of November 2024, 97′930 data elements are represented with 41′345 distinct SCT expressions, including 18′175 distinct PC expressions. These numbers demonstrate two things: using SCT allows handling of redundancy efficiently, and post-coordination is inevitable to properly represent meaning. A snapshot of the representation results demonstrating Pareto’s Principle is given in Fig. 8. This work shows that large-scale manual encoding of categorical content is doable, and sustainable, even with a limited budget. Having a guide to follow leads to less errors and more consistent representation [43].

Fig. 8
figure 8

Pareto principle in practice, on average 23% of data elements represent 86% of instances for the examples given

Comparison to existing guidelines

This work is based on guidelines from both SI [23], whose grammar rules and scope are maintained wherever possible, and existing guidelines by other groups [41, 44,45,46,47,48,49,50]. Both were reviewed thoroughly and applied, when possible, but a full review or comparative evaluation is beyond the scope of this article.

To our knowledge, only Hojen et al. published guidelines on the practical use of SCT [44] across entire EHRs, focusing mainly on chosen hierarchies and PC. We intend to add to this work by exploring other problem areas. Other guidelines focus on specific clinical settings or conditions, such as diabetes [45, 46], palliative care [47], or the patient problem list [48]. This clearly impacts the development of these guidelines as rules defined for a specific context do not need to apply elsewhere. Common rules exist between these publications (Table 2), but no single one is included in all, highlighting the different methods applied.

Table 2 A comparative view of existing guidelines

Most authors agree on the importance of consistent and precise representation. Interestingly, authors mentioning PC often agree that current SI rules are too restrictive, and apply varying methods to counter that, using unapproved attributes [47] or combinations [44, 48]. Which hierarchies to select is the most recurring theme considering its importance to semantic accuracy, whether approached in detail [44] or broadly [45,46,47]. Clinical context is discussed in detail in only one other work [47], and is otherwise only mentioned in reference to the situation with explicit context hierarchy [45, 46]. Other related works focus on resolving specific issues, such as choosing between two hierarchies [49] or inter-coder agreement [34, 35, 50]. However, these often rather describe a method for selecting what to represent. Others mention similar principles such as context, exploration, or minimizing codes, but are not focused solely on SCT or discuss annotation of narrative text [32, 50, 51]. Others yet create local extensions with new attribute relationships to represent their needs [41].

In comparison to most of the guidelines, which are tailor-made for a specific clinical setting or disease, we have opted for general guidelines which can be applied to most settings. The iterative creation process used along with the heterogenous data elements encountered means that many possible scenarios have been encountered. This, coupled with the step-by-step approach, give us hope that these guidelines can be reproduced in different settings without too much difficulty.

Limitations

This document is not exhaustive, and some aspects, such as handling of inactive terms, are not addressed as a solution for this issue has not been implemented. A possibility would be to regularly run a check against the latest version to identify outdated codes, and apply established methods [47, 48] using historical relationships to find replacements. Likewise, existing guidelines which are sufficient, such as choosing hierarchies, are not repeated as focus is on expanding or developing rules.

This guide has only been applied within the author’s environment, meaning it might not be immediately applicable in other settings. This is the subject of future work described in Sect. " Future work".

One significant limitation is that, despite repeated evaluation of application of the guidelines, no formal quantitative evaluation was performed, for various reasons. First, the guide evolved organically with the work done by the team and was initially only intended for internal use. Secondly, as there are, by design, different ways of expressing things in SCT, the goal of the guide is not to have exact matching of outcomes, but rather to ensure that semantic accuracy of representation is maintained. Therefore, two different expressions can both be correct, and this would not appear in a formal inter-annotator agreement.

The use of unapproved attributes also limits interoperability due to non-compliance with the MRCM. Steps are taken to limit this, such as checks for logical equivalence with precoordinated concepts and ensuring correct classification [52].

Rules can sometimes contradict each other, as priorities for representation can vary between reproducibility and expressivity, or between what is best for machines and what is best for humans [53]. Therefore following this guide step-by-step will not always be possible, and steps may need to be skipped at times.

This work focuses exclusively on the use of SCT for semantic representation and does not approach topics such as standards used to structure, format, and exchange data. This is not an evaluation of the structure, strengths, or weaknesses of SCT, nor is it a comparative work between standards. The subjects of multi-terminology representation, semantic graphs, automatic annotation tasks, and use-case specific models are out of scope of this work.

Future work

Some important topics which are being worked on include negation and uncertainty, dealt with using new attributes. However, this has been only partly resolved, and is subject to change. These vast topics are the focus of ongoing work.

Incorporating these guidelines for manual representation to the work done on automatic or semi-automatic annotation using natural language processing (NLP) by other members of our team is also being explored but is still in preliminary stages.

As PC support is not trivial and SCT is not the only standard used in the DWH, a system to handle PC and other standards has been created using graph database technology. We believe graph databases to have huge potential for bridging between data and semantics, to facilitate querying of heterogenous data and exploring semantic connections which were not otherwise apparent. It also allows for the possibility of adding further representations with data models or other terminologies. The result, a new semantic graph which can represent data in various formats such as OMOP, FHIR, and i2b2, is the focus of a subsequent article.

Finally, talks are ongoing with other institutions to implement this strategy in different environments, which would help in demonstrating its generalizability.

Conclusion

When using SCT to represent data elements, a structured method is necessary. While existing guidelines are adequate at first, over time they become insufficient to represent a large proportion of terms adequately. Frequently encountered issues are analysed during focus groups, and solutions reached by consensus. Through trial-and-error, rules are established for general use of SCT. By adhering to a method for representation, semantic accuracy can be maintained, and use of time and resources optimised. This scalable and adaptable guide can be applied to various settings, leading to improved semantic interoperability for secondary usage of clinical data.

Data availability

No datasets were generated or analysed during the current study.

Abbreviations

SCT:

SNOMED Clinical Terms

SI:

SNOMED International

PC:

Post-coordination

DWH:

Data warehouse

MRCM:

Machine readable concept model

References

  1. Cornet R, Chute CG. Health Concept and Knowledge Management: Twenty-five Years of Evolution. Yearb Med Inform. 2016;25(S 01):S32–41.

  2. Gaudet-Blavignac C, Ehrsam J, Turbe H, Keszthelyi D, Zaghir J, Lovis C. Deep SNOMED CT Enabled Large Clinical Database About COVID-19. In: Séroussi B, Weber P, Dhombres F, Grouin C, Liebe JD, Pelayo S, et al., editors. Studies in Health Technology and Informatics. IOS Press; 2022. Available from: https://ebooks.iospress.nl/doi/10.3233/SHTI220466 . Cited 2024 Feb 28.

  3. Meystre SM, Lovis C, Bürkle T, Tognola G, Budrionis A, Lehmann CU. Clinical Data Reuse or Secondary Use: Current Status and Potential Future Progress. Yearb Med Inform. 2017;26(01):38–52.

    Google Scholar 

  4. Gaudet-Blavignac C, Foufi V, Bjelogrlic M, Lovis C. Use of the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) for Processing Free Text in Health Care: Systematic Scoping Review. J Med Internet Res. 2021;23(1): e24594.

    Google Scholar 

  5. Rossander A, Lindsköld L, Ranerup A, Karlsson D. A State-of-the Art Review of SNOMED CT Terminology Binding and Recommendations for Practice and Research. Methods Inf Med. 2021;60(S 02):e76–88.

  6. Gaudet-Blavignac C, Foufi V, Wehrli E, Lovis C. Automatic annotation of French medical narratives with SNOMED CT concepts. Swiss Med Informatics. 2018; Available from: https://doi.emh.ch/smi.34.00418. Cited 2024 Feb 21.

  7. Ruch P, Gobeill J, Lovis C, Geissbühler A. Automatic medical encoding with SNOMED categories. BMC Med Inform Decis Mak. 2008;8(S1):S6.

    Google Scholar 

  8. Benson T, Grieve G. SNOMED CT. In: Principles of Health Interoperability. Cham: Springer International Publishing; 2016. p. 155–72. (Health Information Technology Standards). Available from: http://link.springer.com/10.1007/978-3-319-30370-3_9 . Cited 2023 Jan 30.

  9. Millar J. The Need for a Global Language - SNOMED CT Introduction. Stud Health Technol Inform. 2016;225:683–5.

    MATH  Google Scholar 

  10. SNOMEDCT Release Statistics 2024–02–01. Available from: https://browser.ihtsdotools.org/qa/#/SNOMEDCT/general-release-statistics. Cited 2024 Feb 21.

  11. Compositional Grammar - Specification and Guide - Compositional Grammar - SNOMED Confluence. Available from: https://confluence.ihtsdotools.org/display/DOCSCG. Cited 2023 Feb 6.

  12. Cimino JJ. Desiderata for controlled medical vocabularies in the twenty-first century. Methods Inf Med. 1998;37(4–5):394–403.

    MATH  Google Scholar 

  13. Designing an Introspective, Multipurpose, Controlled Medical Vocabulary. Proc Annu Symp Comput Appl Med Care. 1989;513–8.

  14. assessct_final_brochure.pdf. Available from: https://assess-ct.eu/fileadmin/assess_ct/final_brochure/assessct_final_brochure.pdf. Cited 2024 Feb 21.

  15. f6405a1c.pdf. Available from: https://www.digitalhealthnews.eu/images/stories/pdf/f6405a1c.pdf. Cited 2023 Jan 23.

  16. 150504_recommandations_I_semantique_et_metadonnees_F.pdf. Available from: https://www.e-health-suisse.ch/fileadmin/user_upload/Dokumente/2015/F/150504_recommandations_I_semantique_et_metadonnees_F.pdf. Cited 2023 Jan 23.

  17. Gaudet-Blavignac C, Raisaro JL, Touré V, Österle S, Crameri K, Lovis C. A National, Semantic-Driven, Three-Pillar Strategy to Enable Health Data Secondary Usage Interoperability for Research Within the Swiss Personalized Health Network: Methodological Study. JMIR Med Inform. 2021;9(6): e27591.

    Google Scholar 

  18. eHAction_CSS-Document_open-consultation.pdf. Available from: http://ehaction.eu/wp-content/uploads/2019/08/eHAction_CSS-Document_open-consultation.pdf. Cited 2023 Jan 23.

  19. Finding a common language for healthcare data: key progress made towards interoperability in German medicine | Medical Informatics Initiative. Available from: https://www.medizininformatik-initiative.de/en/finding-common-language-healthcare-data-key-progress-made-towards-interoperability-german-medicine. Cited 2023 Feb 17.

  20. Lee D, de Keizer N, Lau F, Cornet R. Literature review of SNOMED CT use. J Am Med Inform Assoc. 2014;21(e1):e11–9.

    MATH  Google Scholar 

  21. Lee D, Cornet R, Lau F, de Keizer N. A survey of SNOMED CT implementations. J Biomed Inform. 2013;46(1):87–96.

    MATH  Google Scholar 

  22. Kruse CS, Goswamy R, Raval Y, Marawi S. Challenges and Opportunities of Big Data in Health Care: A Systematic Review. JMIR Med Inform. 2016;4(4): e38.

    Google Scholar 

  23. SNOMED CT Document Library - SNOMED CT Document Library - SNOMED Confluence. Available from: https://confluence.ihtsdotools.org/display/DOC/SNOMED+CT+Document+Library. Cited 2023 Feb 2.

  24. Gaudet-Blavignac C, Rudaz A, Lovis C. Building a Shared, Scalable, and Sustainable Source for the Problem-Oriented Medical Record: Developmental Study. JMIR Med Inform. 2021;9(10): e29174.

    MATH  Google Scholar 

  25. SNOMED CT Authoring Level 1 Course. Available from: https://courses.ihtsdotools.org/product?catalog=AL1. Cited 2024 Feb 1.

  26. Alkiayat M. A Practical Guide to Creating a Pareto Chart as a Quality Improvement Tool. Glob J Qual Saf Healthc. 2021;4(2):83–4.

    Google Scholar 

  27. Wright A, Bates DW. Distribution of Problems, Medications and Lab Results in Electronic Health Records: The Pareto Principle at Work. Appl Clin Inform. 2010;1(1):32.

    MATH  Google Scholar 

  28. Definition of NYCTALOPIA. Available from: https://www.merriam-webster.com/dictionary/nyctalopia. Cited 2023 Feb 3.

  29. Larousse É. Définitions : nyctalopie - Dictionnaire de français Larousse. Available from: https://www.larousse.fr/dictionnaires/francais/nyctalopie/55304. Cited 2023 Feb 3.

  30. subtype relationship - SNOMED CT Glossary - SNOMED Confluence. Available from: https://confluence.ihtsdotools.org/display/DOCGLOSS/subtype+relationship. Cited 2023 Aug 25.

  31. 6.2 Subsumption - Data Analytics with SNOMED CT - SNOMED Confluence. Available from: https://confluence.ihtsdotools.org/display/DOCANLYT/6.2+Subsumption. Cited 2023 Feb 8.

  32. Luo YF, Sun W, Rumshisky A. MCN: A comprehensive corpus for medical concept normalization. J Biomed Inform. 2019;92: 103132.

    Google Scholar 

  33. 2.3.1 Stated and Inferred Concept Definitions - Release File Specification - SNOMED Confluence. Available from: https://confluence.ihtsdotools.org/display/DOCRELFMT/2.3.1+Stated+and+Inferred+Concept+Definitions. Cited 2023 Mar 9.

  34. Chiang MF, Hwang JC, Yu AC, Casper DS, Cimino JJ, Starren JB. Reliability of SNOMED-CT coding by three physicians using two terminology browsers. AMIA Annu Symp Proc. 2006;2006:131–5.

    Google Scholar 

  35. Andrews JE, Richesson RL, Krischer J. Variation of SNOMED CT Coding of Clinical Research Concepts among Coding Experts. J Am Med Inform Assoc. 2007;14(4):497–506.

    MATH  Google Scholar 

  36. 6.2 Multiple Focus Concepts - Compositional Grammar - SNOMED Confluence. Available from: https://confluence.ihtsdotools.org/display/DOCSCG/6.2+Multiple+Focus+Concepts. Cited 2023 Feb 6.

  37. 6.4 Expressions With Attribute Groups - Compositional Grammar - SNOMED Confluence. Available from: https://confluence.ihtsdotools.org/display/DOCSCG/6.4+Expressions+With+Attribute+Groups. Cited 2024 Feb 1.

  38. SNOMED CT MRCM Maintenance Tool. Available from: https://browser.ihtsdotools.org/mrcm/?branch=MAIN%2F2024-01-01. Cited 2024 Feb 21.

  39. SNOMED CT Editorial Guide - SNOMED CT Editorial Guide - SNOMED Confluence. Available from: https://confluence.ihtsdotools.org/display/DOCEG/SNOMED+CT+Editorial+Guide. Cited 2024 Feb 21.

  40. Campbell JR, Xu J, Fung KW. Can SNOMED CT fulfill the vision of a compositional terminology? Analyzing the use case for Problem List. AMIA Annu Symp Proc. 2011;2011:181–8.

    MATH  Google Scholar 

  41. Green JM, Wilcke JR, Abbott J, Rees LP. Development and Evaluation of Methods for Structured Recording of Heart Murmur Findings Using SNOMED-CT® Post-Coordination. J Am Med Inform Assoc. 2006;13(3):321–33.

    Google Scholar 

  42. Ehrsam J, Gaudet-Blavignac C, Lovis C. Scalar Values in SNOMED CT: A Proposed Extension. In: Mantas J, Hasman A, Demiris G, Saranto K, Marschollek M, Arvanitis TN, et al., editors. Studies in Health Technology and Informatics. IOS Press; 2024. Available from: https://ebooks.iospress.nl/doi/10.3233/SHTI240665 . Cited 2024 Oct 2.

  43. Navas H, Lopez Osornio A, Gambarte L, Elías Leguizamón G, Wasserman S, Orrego N, et al. Implementing rules to improve the quality of concept post-coordination with SNOMED CT. Stud Health Technol Inform. 2010;160(Pt 2):1045–9.

    Google Scholar 

  44. Randorff Højen A, Rosenbeck GK. SNOMED CT Implementation: Mapping Guidelines Facilitating Reuse of Data. Methods Inf Med. 2012;51(06):529–38.

    Google Scholar 

  45. El-Sappagh S, Elmogy M, Riad AM, Zaghloul H, Badria F. A proposed SNOMED CT ontology-based encoding methodology for diabetes diagnosis case-base. In: 2014 9th International Conference on Computer Engineering & Systems (ICCES). Cairo, Egypt: IEEE; 2014. p. 184–91. Available from: http://ieeexplore.ieee.org/document/7030954/. Cited 2023 Jan 17.

  46. El-Sappagh S, Elmogy M. An encoding methodology for medical knowledge using SNOMED CT ontology. Journal of King Saud University - Computer and Information Sciences. 2016;28(3):311–29.

    Google Scholar 

  47. Lee DH, Lau FY, Quan H. A method for encoding clinical datasets with SNOMED CT. BMC Med Inform Decis Mak. 2010;10(1):53.

    MATH  Google Scholar 

  48. Lau FY, Simkus R, Lee D. A Methodology for Encoding Problem Lists with SNOMED CT in General Practice. In: Proceedings of the Third International Conference on Knowledge Representation in Medicine, Phoenix, Arizona, USA, May 31st - June 2nd, 2008. 2008. Available from: http://ceur-ws.org/Vol-410/Paper17.pdf.

  49. Rasmussen AR, Rosenbeck K. SNOMED CT implementation: implications of choosing clinical findings or observable entities. Stud Health Technol Inform. 2011;169:809–13.

    Google Scholar 

  50. Miñarro-Giménez JA, Martínez-Costa C, Karlsson D, Schulz S, Gøeg KR. Qualitative analysis of manual annotations of clinical text with SNOMED CT. Lovis C, editor. PLoS ONE. 2018;13(12):e0209547.

  51. Yada S, Aramaki E, Tanaka R, Cheng F, Kurohashi S. Medical/Clinical Text Annotation Guidelines.

  52. Rector A. Lexically suggest, logically define: Quality assurance of the use of qualifiers and expected results of post-coordination in SNOMED CT. Journal of Biomedical Informatics. 2012.

  53. Rosenbloom ST, Miller RA, Johnson KB, Elkin PL, Brown SH. Interface Terminologies: Facilitating Direct Entry of Clinical Data into Electronic Health Record Systems. J Am Med Inform Assoc. 2006;13(3):277–88.

    Google Scholar 

Download references

Acknowledgements

Not applicable.

Funding

Open access funding provided by University of Geneva Funding by the Swiss Personalized Health Network initiative has partly contributed to enabling our work. The funding source had no input or influence in the decision to publish this article, and was not involved in the writing, editing or submission procedures.

Author information

Authors and Affiliations

Authors

Contributions

JE: conceptualization, data curation, methodology, visualization, writing—original draft, writing—review and editing. CGB: conceptualization, data curation, methodology, supervision, writing—review and editing. MM: data curation, methodology, writing—review and editing. MB: data curation, methodology, writing—review and editing. CL: conceptualization, funding acquisition, supervision, validation, project administration, writing—review and editing. All authors reviewed the manuscript.

Corresponding author

Correspondence to Julien Ehrsam.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix

Quick guide for representation

figure a

Semantic accuracy

There are multiple ways to say the same thing in SNOMED CT (SCT), and different encoders will inevitably use different concepts to represent the same data element. However, the meaning can and must be maintained, and opting for semantic equivalence rather than exact matching allows a concept to be represented in more than one way while retaining its meaning.

Structural context

The information surrounding the data in a form, or where it is located within the database can be useful information to determine its meaning.

Clinical acquisition context

The intent of the healthcare professional that used the term in question is difficult to determine exactly, but the representation they had of that element must be deduced from the context.

Exploration

Exploring hierarchies by searching the parents of similar terms, as well as the children of neighboring branches, can help locate the most appropriate concept.

Inference

Some implied information can be added if it does not modify the semantic accuracy of the concept.

Multi focus expressions

Post-coordination should be done as simply as possible, first with simple associative post-coordination, combining concepts from the same hierarchy.

Refinements

Following SCT’s guides for using attribute relationships to further refine precoordinated concepts.

Domain and range extensions

Expanding on SCT’s MRCM by extending domain and ranges for attributes already approved by SI.

New attributes

Adding complete MRCM rules for previously unapproved attributes allows refinement of previously inaccessible hierarchies or concepts, greatly expanding the scope of representation.

Example of implementation

figure b

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ehrsam, J., Gaudet-Blavignac, C., Mattei, M. et al. Semantics in action: a guide for representing clinical data elements with SNOMED CT. J Biomed Semant 16, 7 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13326-025-00326-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13326-025-00326-5

Keywords