Fig. 3
From: Using predicate and provenance information from a knowledge graph for drug efficacy screening

The most important features for a cross-validation experiment. The top-20 most important features when trained on the complete feature set are presented. The importance measures, calculated with the standard feature importance calculation function of the random forest algorithm, have been normalized. The colours indicate whether it is a predicate, provenance, or overlap feature. While knowledge sources such as SemMedDB contain information about relationships between many types of entities, we only used the protein-protein interaction (PPI) subsets of these datasets