Fig. 3
From: Dynamic Retrieval Augmented Generation of Ontologies using Artificial Intelligence (DRAGON-AI)

Performance gap vs confidence level. If an evaluator lacked confidence in their assessment (lower confidence level), they were more likely to assign an LLM-generated definition a comparable score to a human curated one. As the evaluator confidence increases, the evaluator is more likely to rank the LLM-generated definition lower than the human one