Peer-reviewed veterinary case report
Performance of large language models versus clinicians and novices in veterinary theriogenology decision support.
- Journal:
- Journal of the American Veterinary Medical Association
- Year:
- 2026
- Authors:
- Okur, Damla Tuğçe et al.
- Affiliation:
- Faculty of Veterinary Medicine
Abstract
OBJECTIVE: To compare the clinical decision-support performance of 2 large language models (LLMs), ChatGPT-5 and ChatGPT-5 Thinking, with that of experienced clinicians and novices in veterinary theriogenology. METHODS: 15 standardized obstetric and gynecologic scenarios were independently evaluated by 2 expert clinicians, 2 novice veterinarians, and both LLMs under matched, cold-start conditions. Responses were assessed with a 5-point global quality score by a blinded expert panel. RESULTS: ChatGPT-5 Thinking achieved the highest overall quality ratings, followed by ChatGPT-5 and the expert clinicians. Novice veterinarians received the lowest scores. Responses generated by LLM were generally more consistent and complete than those of human readers. CONCLUSIONS: Within the constraints of a simulated scenario design, LLMs, particularly ChatGPT-5 Thinking, provided clinically appropriate guidance that exceeded novice performance and approached that of expert clinicians. These findings support the potential role of LLMs as adjunct decision-support tools in time-sensitive obstetric and gynecologic cases. CLINICAL RELEVANCE: LLMs may assist clinicians and trainees in managing reproductive emergencies by offering rapid, structured, guideline-aligned recommendations. Further evaluation in real clinical settings is warranted.
Find similar cases for your pet
PetCaseFinder finds other peer-reviewed reports of pets with the same symptoms, plus a plain-English summary of what was tried across them.
Search related cases →Original publication: https://pubmed.ncbi.nlm.nih.gov/41689958/