Skip to main content

Table 4 Comparison of version 3.5 and 4 of ChatGPT language models with varying runs and threshold settings, illustrating the impact on low and high accuracy metrics

From: Explanatory argumentation in natural language for correct and incorrect medical diagnoses

LM

Run

Threshold

Low acc.

High acc.

v-3.5

3

20%

0.64

0.73

3

50%

0.79

0.82

5

20%

0.64

0.79

5

50%

0.78

0.84

v-4

3

20%

0.63

0.72

3

50%

0.8

0.82

5

20%

0.63

0.73

5

50%

0.8

0.81