Journal of Biomedical Semantics

Table 4 Comparison of version 3.5 and 4 of ChatGPT language models with varying runs and threshold settings, illustrating the impact on low and high accuracy metrics

From: Explanatory argumentation in natural language for correct and incorrect medical diagnoses

LM	Run	Threshold	Low acc.	High acc.
v-3.5	3	20%	0.64	0.73
	3	50%	0.79	0.82
	5	20%	0.64	0.79
	5	50%	0.78	0.84
v-4	3	20%	0.63	0.72
	3	50%	0.8	0.82
	5	20%	0.63	0.73
	5	50%	0.8	0.81

Back to article page

ISSN: 2041-1480

Contact us

General enquiries: journalsubmissions@springernature.com