Ternikar C. R., Gomez Cécile, Dutta D., Kumar D. N. (2025). Nearest neighbor versus regression approach: effect of performance measures, calibration set size, and sampling method on soil organic carbon prediction using VNIR lab spectroscopy. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 18, 25583-25604. ISSN 1939-1404.
Titre du document
Nearest neighbor versus regression approach: effect of performance measures, calibration set size, and sampling method on soil organic carbon prediction using VNIR lab spectroscopy
Ternikar C. R., Gomez Cécile, Dutta D., Kumar D. N.
Source
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2025,
18, 25583-25604 ISSN 1939-1404
Soil organic carbon (SOC) plays a critical role in soil health, agricultural productivity, and ecosystem functioning, making accurate SOC estimations essential for sustainable land management and climate change mitigation. Visible and near-infrared spectroscopy has emerged as a promising, nondestructive, and cost-effective method for SOC estimation. This study evaluates the performance of nine nearest neighbor (NN) models and the partial least squares regression (PLSR) model to estimate SOC using the global open soil spectral library data. Detailed error analyses and the use of mean absolute error (MAE) as performance metric revealed differences in model performance that traditional metrics like R-2, RMSE, and ratio of performance to deviation alone fail to capture. Error correlation analysis further indicated that o_plsd (optimized partial least squares distance, one of the NN models) and PLSR provide structurally independent insights, while certain pairs of NN models (pcad - plsd and o_plsd - o_pcad) yield redundant information. Among the ten models tested, o_plsd model outperformed PLSR by leveraging local data density, exhibiting lower MAE (1.79% versus 2.36%) but was more sensitive to reduction in calibration set size. In contrast, PLSR demonstrated better generalizability with less sensitivity to calibration size variation, but relatively higher sensitivity to the choice of sampling method. Future research should focus on strategies to improve computational efficiency of NN models. The findings highlight the importance of performance metric selection and calibration strategy in large-scale SOC modeling. These results have practical implications for improving SOC prediction models and designing efficient hybrid approaches for large, heterogeneous soil datasets.
Plan de classement
Sciences fondamentales / Techniques d'analyse et de recherche [020]
;
Pédologie [068]