Peer-reviewed veterinary case report
performance of a targeted enriched metagenomics approach to inferstrains in milk.
- Journal:
- Frontiers in veterinary science
- Year:
- 2026
- Authors:
- Biesheuvel, Marit M et al.
- Affiliation:
- Faculty of Veterinary Medicine · Canada
Abstract
Strain variation plays a key role in the microbial epidemiology of, yet its true diversity remains incompletely characterized, partly due to limitations of culture-based methods. This study evaluated thesuitability of a targeted enrichment (TE) shotgun sequencing approach to detect and classifystrains in milk metagenomic samples. As a proof of concept, the accuracy of this approach was assessed using milk-derivedstrains. A total of 620whole-genome sequences were downloaded from NCBI, of which 162 (26.1%) originated from milk samples. Genomes were grouped into Genomically Clustered Sequence Variants (GSVs) using MashTree and TreeCluster to enable strain-level classification. To simulate TE sequencing data, genomes from different milk-associated GSVs were randomly selected and fragmentedinto 150-bp reads. Mock milk samples were generated by sampling reads with replacement from these genomes. Sequencing depth was modeled using a Poisson distribution, while mixed-strain DNA samples were simulated by including 1, 3, 6, or 9 GSVs per sample. Enrichment proportions were set at 0.3, 0.5, 0.7, and 0.9. Two classification tools, Kraken2 and Themisto/mSWEEP, were evaluated for their ability to detect and classify the simulated TE reads. Themisto/mSWEEP consistently outperformed Kraken2, achieving an average read classification accuracy of 84.9% compared with 1.4% for Kraken2. Sensitivity for Themisto/mSWEEP was 100% with a single spiked GSV and declined slightly to 97.0% with nine GSVs, whereas Kraken2 achieved sensitivities of only 17.3% and 4.7%, respectively. Positive predictive value (PPV) showed a similar pattern: 98% for Themisto/mSWEEP vs. 4.7% for Kraken2 with a single GSV, and 65.5% vs. 10% with nine GSVs. While Kraken2's PPV increased slightly with additional GSVs, Themisto/mSWEEP's PPV decreased. Both methods maintained high specificity and negative predictive value (>91%) across all scenarios. Enrichment proportion had no measurable effect on performance. Overall, Themisto/mSWEEP demonstrated superior accuracy for GSV-level identification ofstrains. Enrichment to at least 30% of total reads was sufficient to recover strain-level data. Further work is needed to assess the biological relevance and practical applications of these genomic clusters.
Find similar cases for your pet
PetCaseFinder finds other peer-reviewed reports of pets with the same symptoms, plus a plain-English summary of what was tried across them.
Search related cases →Original publication: https://pubmed.ncbi.nlm.nih.gov/41929272/