Peer-reviewed veterinary case report
Limits of veterinary AI in reading dog abdominal X-rays from general
By Ma, Doris et al.·Published in Journal of the American Veterinary Medical Association·2026·Murdoch University, Australia·View original on PubMed →
PetCaseFinder translated the abstract of this peer-reviewed paper into plain English so pet owners can read it. We do not publish original research — every detail traces back to the citation above. How we work →
Original publication title: Pilot study: external validation of commercial veterinary radiology artificial intelligence services shows deficiencies in interpretation of general practice-sourced canine abdominal radiographs.
- Species:
- dog
Plain-English summary
A study evaluated how well commercial artificial intelligence (AI) platforms could interpret X-rays of dogs' abdomens that had already been diagnosed by veterinarians. The researchers found that while the AI systems showed some ability to identify health issues, their accuracy was mostly low to moderate, with many critical conditions like small intestinal obstruction often missed. Out of 53 cases, the AI platforms had varying performance, and none were reliable enough for use in veterinary clinics at this time. More testing and improvements are needed before these AI tools can be safely used to help diagnose pets.
Abstract
OBJECTIVE: To evaluate the diagnostic performance of commercial veterinary radiology AI platforms on general practice canine abdominal radiographs with confirmed diagnoses. METHODS: For this pilot study, canine abdominal radiographs with definitive diagnoses were collected and submitted to 6 AI platforms between September and December 2024. Confirmation of diagnosis was obtained with surgery, necropsy, CT, ultrasound, cytology, or treatment response when appropriate. RESULTS: 53 cases were selected and submitted to AI platforms. After platform rejections, 307 evaluations were available for analysis. When differentiating cases with pathology (51 of 53) and without pathology (2 of 53), platform performance was variable and mostly low to moderate, including mean accuracy (70% to 90%), balanced accuracy (60% to 65%), and Matthews correlation coefficient (-0.08 to 0.43). Across all platforms, classification of radiographic findings (labels) showed low sensitivity (28% to 78%), F1 score (28% to 51%), and positive predictive value (25% to 54%) due to frequent missed diagnoses. Matthews correlation coefficient was higher (0.16 to 0.45), as it was less impacted by label misclassification. Small intestinal obstruction, a critical finding, was often not identified, with a sensitivity of 23% to 69%. CONCLUSIONS: Diagnostic performance varied between the 6 AI platforms tested and was overall low to moderate for this small sample. Even the best-performing algorithm had notable limitations, and none appeared suitable for clinical use in their current form. CLINICAL RELEVANCE: Further independent external validations on a larger scale and performance gains are needed before AI platforms can be safely integrated into clinical practice.
Find similar cases for your pet
PetCaseFinder finds other peer-reviewed reports of pets with the same symptoms, plus a plain-English summary of what was tried across them.
Search related cases →Original publication on PubMed: https://pubmed.ncbi.nlm.nih.gov/41861469/