Skip to main content
Fig. 2 | Journal of Biomedical Semantics

Fig. 2

From: Chemical entity normalization for successful translational development of Alzheimer’s disease and dementia therapeutics

Fig. 2

ChEBI entity normalization pipeline. 286,484 PubMed Abstracts were queried with the keywords ‘Alzheimer’ and ‘Dementia’ resulting in 56,553 chemical mentions. Using chemical entity database resources (ChEBI ontology, PubChem), a hierarchical dictionary-based method was used to generate ChEBI entity candidates. These candidates were disambiguated using a sentence-pair classification task where they were ranked by cosine similarity. We developed two models for this (1) using the pretrained PubMedBERT and (2) continuing pretraining on PubMedBERT using ChEBI converted into natural language. The maximum cosine score between the original named entity and the candidate was retained. Our method was validated using our annotated gold standard dataset and compared to the MeSH normalized TaggerOne mentions

Back to article page