Peer-reviewed veterinary case report
A dataset for geographical origin identification of tobacco leaves from multiple countries using near-infrared spectroscopy and chemometric analysis.
- Year:
- 2026
- Authors:
- Chen H et al.
- Affiliation:
- Zhengzhou Tobacco Research Institute of CNTC · China
Abstract
This dataset comprises 347 complete spectra of tobacco leaves, acquired using an Antaris II Fourier Transform Near-Infrared spectrometer equipped with an integrating sphere diffuse reflectance sampling system. The samples were collected from six countries: Argentina, Brazil, Zimbabwe, the United States, Tanzania, and Zambia. All samples were dried using an FD240 oven (Binder GmbH, Germany), ground using a ZM200 grinder (Retsch GmbH, Germany), and sieved through a 0.250 mm mesh screen. NIR spectra were acquired with the following parameters: a spectral range of 4000-10,000 cm<sup>-1</sup>, a resolution of 8 cm<sup>-1</sup>, and 64 scans. The dataset was partitioned into a training set (70%) and a validation set (30%) using stratified sampling. Six preprocessing techniques were applied: Savitzky-Golay (SG) smoothing, Multiplicative Scatter Correction (MSC), Standard Normal Variate (SNV), First Derivative (1D), Second Derivative (2D), and Mean Centering. Partial Least Squares (PLS) regression was utilized to establish predictive models for 13 chemical indicators: total alkaloids, reducing sugars, total sugars, total nitrogen, K, Cl, pH, starch, neochlorogenic acid, chlorogenic acid, cryptochlorogenic acid, scopoletin, and rutin. The Random Forest (RF) algorithm was employed to create the origin classification model. Ultimately, through the selection of appropriate spectral preprocessing methods, the quantitative prediction models established using this dataset all achieved coefficients of determination (R<sup>2</sup>) exceeding 0.83, demonstrating robust predictive performance. Furthermore, the geographical origin classification models yielded an overall validation accuracy greater than 0.8, indicating strong classification performance. Consequently, this dataset is confirmed to be accurate and reliable, capable of providing foundational data for other researchers to construct near-infrared (NIR) spectral databases and develop NIR prediction models. The developed models exhibit excellent predictive capabilities and can be utilized in practical applications as an alternative to classical chemical analysis methods for these chemical indicators.
Find similar cases for your pet
PetCaseFinder finds other peer-reviewed reports of pets with the same symptoms, plus a plain-English summary of what was tried across them.
Search related cases →Original publication: https://europepmc.org/article/MED/41561893