-
Using vis-NIRS and Machine Learning methods todiagnose sugarcane
soil chemical properties
Diego A. Delgadillo-Durana, Cesar A. Vargas-Garćıaa, Viviana
M.Varón-Ramı́reza, Francisco Calderónb, Andrea C. Montenegroa,
Paula H.
Reyes-Herreraa
aCorporación Colombiana de Investigación Agropecuaria, CI
Tibaitatá, Bogotá, ColombiabSchool of Engineering, Pontificia
Universidad Javeriana, Bogotá, Colombia
Abstract
Knowing chemical soil properties might be determinant in crop
manage-ment and total yield production. Traditional property
estimation approachesare time-consuming and require complex lab
setups, refraining farmers fromtaking steps towards optimal
practices in their crops promptly. Property es-timation from
spectral signals(vis-NIRS), emerged as a low-cost, non-invasive,and
non-destructive alternative. Current approaches use mathematical
and sta-tistical techniques, avoiding machine learning framework.
Here we propose bothregression and classification with machine
learning techniques to assess perfor-mance in the prediction and
infer categories of common soil properties (pH, soilorganic matter,
Ca, Na, K and Mg), evaluated by the most common metrics.
Insugarcane soils, we use regression to estimate properties and
classification to as-sess soil’s property status and report the
direct relation between spectra bandsand direct measure of certain
properties. In both cases, we achieved similarperformance on
similar setups reported in the literature.
Keywords: Vis-NIR, Soil properties, Machine learning
1. Introduction
As the population grows, the demand for food continues to
increase. Butunsustainable practices reduce the arable soil. Soils
are dynamic systems thatchange in response to different natural and
anthropogenic activities. Soil healthmust be a priority,
particularly in agricultural practices, to increase
productivitywithout affecting the soil. It is essential to monitor
soil quality through physico-chemical analyses to provide a
specific assessment looking towards sustainability[1].
Email addresses: [email protected] (Diego A.
Delgadillo-Duran),[email protected] (Paula H. Reyes-Herrera)
Preprint submitted to Catena January 22, 2021
arX
iv:2
012.
1299
5v2
[cs
.LG
] 2
0 Ja
n 20
21
-
Analysis of soils in the laboratory is widely used to know the
soil properties;it uses traditional chemical analyzes that are
expensive, time-consuming, andgenerate environmental contamination
due to the number of chemical reagentsused [2]. There is currently
a growing demand to obtain immediate results.The search for
alternatives for conventional laboratory analysis has allowed
theNIRS technique to be a potential candidate. The use of visible
and near-infraredreflectance spectroscopy (vis-NIRS) of the
electromagnetic spectrum emerges asa precision agriculture
technique to monitor soil physicochemical characteristicsin the
field and the laboratory. This non-destructive analysis method is
usedbecause of its cost-effectiveness, rapid results, and
simultaneously infer multiplecomponents from a single spectrum.
Also, it does not require chemical agentsin the analysis procedure;
thus, it is not harmful to the environment.
vis-NIRS is a method based on the absorption of light by
different materialsin the near-infrared visible region (400 - 2500
nm) of the electromagnetic spec-trum [3][4]. Materials absorb
specific frequencies when irradiated with visible-NIR light.
Absorption occurs when the incoming light frequency correspondsto
the molecular vibration frequency of a constituent in the sample. A
detectormonitors the portion of the light reflected and decomposes
into the componentsat different frequencies of the spectrum with
the corresponding magnitudes.
Nevertheless, processing raw NIRS data requires (1) using
advanced mathe-matical and statistical analysis to provide
information on what and how muchsubstance is present in the sample
and (2) performing good calibrations thatguarantee the values
obtained and associated accuracy. Soil vis-NIR spectra arelargely
nonspecific because of the overlapping absorption of soil
constituents.Complex absorption patterns generated from soil
constituents and quartz needto be mathematically extracted from the
spectra[5]. The most frequently usedmethod to estimate chemical
properties from vis-NIRs is the Partial LeastSquares Regression
(PLSR), but it hides the nonlinear relationships betweenthe
spectrum and the soil constituents[6].
The usage of machine learning (ML) in soil science has increased
in the lastdecade [7], also impacting the use of infrared spectral
data to infer soil prop-erties [8][9]. Recent studies
[10][11][7][6] adopt ML approaches (SVM, neuralnetworks, random
forest, and cubist) to estimate organic carbon and matter,cation
exchange capacity, pH, clay content, and nitrogen from fresh and
pro-cessed samples vis-NIR. However, soil properties depend on
soil-forming factorsand processes in a specific region, and ML
approaches performance depends onthe training set. Therefore,
models trained with data from different locationsare not easily
extended.
Colombia is a country with a diversity of soils. In Colombia,
IGAC instituteat 1:100.000 scale have identified 11 of the 12
soils’ orders according to USDAclassification [12]. Previous
studies in Colombia used NIRS, in an oxisol, topredict total carbon
and total nitrogen and to incorporate these predictions formapping
using geostatistical techniques in a region of about 5100 hectares.
[13].Later, they found NIRS useful to predict also clay content in
the same studyarea [14]. However, we are not aware of any study
that is exploring ML andvis-NIRS in the country.
2
-
In this study, we use vis-NIRS and ML approaches from sugarcane
for panelaColombian soil samples with two-fold purposes. First, to
evaluate the capac-ity of ML approaches to estimate six chemical
properties: pH, organic matter(OM), calcium (Ca), magnesium (Mg),
sodium (Na), and potassium (K) con-tent. We compare the selected ML
model for each property with two scenar-ios that simulate
traditional chemometric techniques (1) using the band withthe
highest regression coefficient(s) and (2) Partial Least Squares
Regression(PLSR) [15][16]. Second, to estimate soil properties’
value as a first step totune a recommendation. Moreover, we use ML
classification to infer categoriesfor soil properties to see
whether this is a viable alternative.
2. Materials and methods
2.1. Data
2.1.1. Study area and sample collection procedure
We used a data set derived from a previous study in the Hoya del
ŕıo Suárezregion in Colombia (Coordinates: 73°22’ - 73°39’ West
longitude and 5°53’ -6°10’ North latitude). This region covers an
area of about 470 km2. Entisols,inceptisols, and vertisols
characterize this region, according to the soil survey[12] . The
area has two principals crops: sugar cane for panela
agro-industryand grasslands.
The sampling stage occurred during 2015 and 2016; samples are
from thesurface to a depth of 20 cm. Each sample (Figure 1) point
represents four sub-samples collected and mixed; the samples area
corresponds to a reticulate gridof 700 meters.
Figure 1: Sampling area for 653 points.
3
-
2.1.2. Chemical measurements
We dried and analyzed the samples for classic laboratory
analysis. We useda pH meter with 1:2.5 soil-water suspension (NTC
5264, 2008), organic matter(OM) with Walkley and Black’s wet
digestion method. We used the ammoniumacetate extraction method to
measure exchangeable cations (Ca+2, K+, Mg+2,and Na+) by NTC 5349 -
2008 [17].
2.2. Methods
We transform the spectrum to obtain informative features and use
five MLregression models from scikit-learn [18] in Python and
classification in the Statis-tics and Machine learning toolbox in
MATLAB. The coefficient of determinationR2 and regression
coefficient ρ helped us to select the model. For some proper-ties,
none of the regression ML is promising (ρ >0.6); we did not
proceed in thiscase because we consider that ML regression is not
suitable for the property.
In all cases, we trained an ML classification to infer
categories for soil prop-erties to see whether this is a viable
alternative. We selected the classificationmodel using accuracy and
then used a grid-search in the penalties for the confu-sion matrix
to look for a model that handles the classes’ imbalance. Finally,
weperform feature selection to identify the wavelengths that have a
higher effecton each property model.
Features and Preprocessing The vis-NIR spectra for each sample
coverthe range between 400 and 2491 nm with steps of 8.5 nm (vector
of 247 el-ements). We took each data point as a feature, and
applied transformationsto the spectra, such as the first derivative
(D1), second derivative (D2), andthe Fast Fourier Transform (FFT).
We applied standard normalization by fea-ture in the whole samples
dataset to ensure unit variance and zero mean [19],and concatenated
each feature set, resulting in 247x4 = 988 features for
everysample.
2.2.1. ML regression models
The dataset contains 653 samples and 988 features for six soil
properties.First, we randomly split this dataset 70% for training
(to evaluate and adjustparameters) and 30% only for validation
purposes.
We evaluated four regression models from [18] such as: (1)
linear (LR), (2)support vector regression (SVR) using lineal
kernel,(3) LASSO by using cross-validation, and (4) Multilayer
perceptron neural network.
Cross-validation: We selected the best model by using a 5-fold
cross-validation in the 70 % defined for training and the 988
features. We performedthe selection by using the distribution for
the correlation (ρ) and determinationcoefficients (R2), and the
mean squared error (MSE).
Comparison ML regression to chemometric approaches: We usedthe
30 % test set to compare the selected model against (1) the
regression withthe band with the highest correlation coefficient
and the target label, and (2)Partial Least Squares with six
principal components from [18].
4
-
2.2.2. ML classifiers
Classes: We defined the target classes for the properties
according to soilfertility requirements for sugarcane crops for
panela: K (Low: 0.4), Na (Acceptable: 1), pH(acidity correction:
7.3),Mg (low: < 1.5, Medium: 3-5, High: > 5), Ca (Low: 5), OM
(Low: 5). However, this definition conducedto imbalanced
classes.
Classification Models: Thanks to data pre-processing and the
practicalityof modern ML tools, as stated earlier, we were able to
make an initial selectionand evaluation of 24 ML models. These 24
classifiers models are divided into sixgroups: (1) three based on
binary trees, (2) linear and quadratic discriminant,(3) Naive Bayes
and Kernel Naive Bayes, (4) Support Vector Machines (SVM)with six
different kernel configurations, (5) KNN with six different
distancemetrics and (6) five with ensemble-based architectures.
Cross-validation: Due to the imbalance of the classes, we opted
to perform5-fold cross-validation and selected the best performing
ML model from the 24available in the toolbox. The cross-validation
gives us an estimate of the finalmodel’s predictive accuracy
trained with all the data. It requires multiple fitsbut makes
efficient use of all the data, so it is recommended for small data
sets.This method gives us an estimate of the precision of the final
model trainedwith the entire dataset[20].
Missclassification cost grid search Also, to choose the
configurationclassifier in the face of class imbalance, we use a
penalty for all misclassificationsduring training. This cost was
applied to all Type I and Type II errors in theConfusion matrix; by
default, ML models associate a cost of one to all errorsand 0 to
the Confusion matrix’s diagonal. We use a grid search to select
thebest performing combination misclassifications cost for all
mistakes.
For pH, OM, Ca, Mg, K, we use a grid-search of 6 parameters
correspondingto all type I and II errors on a three-class confusion
matrix; each parametervaried between gs = {1, 2, ..., 7} for a
total of 117649 different cross-validationexperiments. For Na, we
optimize the two misclassification cost, we increasedthe grid
search to gs = {1, 2, ..., 150} for a total of 22500
cross-validations.Finally, to evaluate the performance, we use the
Mathews correlation coefficientas our preferred metric due to our
dataset’s imbalance [21].
2.2.3. Feature ranking
Finally, we propose a feature ranking approach to unveil the
effects of eachband from the spectrum and the properties. First, we
obtained and normalizedthe correlation coefficient and LASSO
ranking for each band spectrum (andtransformations such as first
and second derivatives) and the training set’s targetlabel. We
added the coefficients for each band (for spectrum, first and
secondderivatives) to obtain a unique value, similar to the
traditional chemometricapproach.
5
-
4.5 6.0 7.5True
4.5
6.0
7.5Pr
edic
ted
0.0 0.8 1.6Log10(True)
0.0
0.8
1.6
Log 1
0(Pr
edic
ted)
0.0 0.8 1.6 2.4True
0.0
0.8
1.6
2.4
Pred
icte
d
Reference All features Best feature PLSR, 6 components
High Medium LowPredicted
High
Med
ium
Low
True
84.1%348/414
15.0%62
1.0%4
9.4%13
75.4%104/138
15.2%21
4.0%4
30.7%31
65.3%66/101
High Medium LowPredicted
High
Med
ium
Low
True
72.0%311/432
26.4%114
1.6%7
14.7%28
81.6%155/190
3.7%7
12.9%4
58.1%18
29.0%9/31
High Medium LowPredicted
High
Med
ium
Low
True
90.7%254/280
7.1%20
2.1%6
11.5%16
47.5%66/139
41.0%57
8.1%19
21.8%51
70.1%164/234
High Medium LowPredicted
High
Med
ium
Low
True
47.4%9/19
31.6%6
21.1%4
5.9%6
69.3%70/101
24.8%25
3.4%18
28.5%152
68.1%363/533
High Medium LowPredicted
High
Med
ium
Low
True
56.0%65/116
25.9%30
18.1%21
24.3%86
46.3%164/354
29.4%104
15.3%28
26.8%49
57.9%106/183
Low MediumPredicted
Low
Med
ium
True
99.7%643/645
0.3%2
62.5%5
37.5%3/8
A. pH
Pred
icte
d
0 5 10 15 20True
0
5
10
15
20B. OM
C. Ca D. Mg
Regression
ClassificationE. pH
H. K
F. OM G. Ca
I. Mg J. Na
Figure 2: Regression results A. ph, B. Organic matter, C. Ca, D.
Mg. For each property,we present the result with the best ML model
(red). And the results simulate chemometrictechniques such as the
regression result with the band (blue) with the highest correlation
andthe PLSR (green). Classification results: Confusion matrix for
each property E. pH, F. OM,G. Ca, H. K, I. Mg, J. Na.
6
-
3. Results
Figure 2 show the prediction results using both regression and
classifiers. Weget the best pH estimates using a SVR regressor in
the test set with a correlationbetween true and predicted ρ = 0.90
(R2 = 0.80). When using the feature thatbest correlates with pH
(see Figure 3), we get ρ = 0.70 (R2 = 0.48) usinglinear regression.
LASSO performed slightly better than PLSR (ρ = 0.87 andR2 = 0.75).
LASSO and PLSR regressor improved significantly the accuracy ofthe
pH estimates, shown by the non-overlapping 95% confidence interval
of allthree LASSO, best feature, and PLSR.
Table 1: Models comparison for property: correlation ρtest and
determination R2test coeffi-cients, and MSE in the test set. The
models presented are the best result of the ML regressionand the
two approaches that simulate chemometric techniques (1) linear
regression with thehighest correlated band and (2) PLSR.
Property Model Model detail ρtest R2test MSEtest
pHSVR All bands 0.898 0.802 0.270LR Band D1 at 621 0.694 0.479
0.709
PLSR 6 components 0.865 0.745 0.347
OMLASSO Selected by model 0.620 0.372 6.364
LR Band D1 at 1913 0.471 0.220 7.907PLSR 6 components 0.611
0.359 6.495
CaLASSO Selected by model 0.746 0.541 70.371
LR Band D1 at 612.5 0.545 0.280 110.480PLSR 6 components 0.691
0.424 88.284
MgLASSO Selected by model 0.649 0.415 0.246
LR Band D1 at 621 0.454 0.193 0.340PLSR 6 components 0.634 0.399
0.253
KSVR All bands 0.478 0.167 0.022SVR Band D2 at 493.5 0.173 0.022
0.026PLSR 6 components 0.360 0.019 0.026
NaLASSO Selected by model 0.253 0.060 0.053
LR Band D2 at 1751.5 0.145 0.013 0.055PLSR 6 components 0.276
0.056 0.053
We got OM estimates correlated ρ = 0.62 with ground true values
(R2 =0.37) using the LASSO regressor. Comparing to a best feature
based regressor(ρ = 0.47, R2 = 0.22), LASSO and PLSR improved
significantly the estimates(non-overlapping 95% confidence
intervals). Ca estimates using LASSO cor-related ρ = 0.75 with true
values (R2 = 0.54), showing a significant increasein accuracy if
compared with the best feature based regressor. LASSO alsoslightly
improved PLSR estimates, although not significantly. Mg estimates
us-ing LASSO and PLSR were similar (ρ = 0.65, R2 = 0.42) and are
significantlydifferent from regressors based on the best correlated
feature. We tested sev-eral regression models on the remaining soil
properties (K and Na), obtaining a
7
-
correlation ρ below 0.5. Table 1 summarises regressor results
and presents thebest ML regressor and the results simulating
chemometric approaches.
The performance of the best ML classifiers based on pre-defined
labels fromexperts is shown in 2. For all properties, we obtain an
accuracy of 74%. For pH,accuracy in each label is over 65%, where
every label was represented at leastin 15%. However, for OM, K and
Na, some labels were under-represented in5% or less in the training
dataset, accuracy decreased to 30 - 40%. It is worthnoting, K and
Na, most of the time labels are predicted correctly for K exceptfor
Medium levels. For Na, the low label is correctly labeled 99% of
the time,however medium values (under-repressented) are misslabeled
more than 60%.
At last, Figure 3.A shows the feature ranking for each property
and regionwith distinct absorption (towards red). The visible
450-670nm range containshighly ranked features for all properties.
pH and Ca have similar feature rank-ing heatmaps with the highest
bands ranked around 600nm. Mg has a highlycorrelated range of
600-670nm, while K has a highly ranked area near 500nm.Instead, Na
presents highly ranked features between 2100-2400nm.
400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700
1800 1900 2000 2100 2200 2300 2400 2500
Ca
K
Mg
Na
OM
pH
Prop
ertie
s
2
3
4
score
Wavelength [nm]
400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700
1800 1900 2000 2100 2200 2300 2400 2500
Gro
up
Aliphatics
Alkyl asymmetric– symmetric doublet
Amides
Amine
Aromatics
Carbohydrates
Carbonate
Carboxylic acids
Goethite
Haematite
Hydroxyl
Illite
Kaolin doublet
Methyls
Phenolics
Polysaccharides
Smectite
Water
A.
B.
Figure 3: A. bands with the highest correlation for each
property B. NIRS spectra and bandswith relative peak positions for
soil costituents absorption
4. Discussion
Classifiers, missclassication cost and oversamplingWe proposed
an alternative tool for property estimation from soil samples
byrecasting the regression into a classification problem. We
labeled conventionaltest results depending on the property to be
estimated. We then implementedstandard mappings of real property
values into qualitative classes from litera-ture. For our dataset,
such mappings ended in unbalanced classes, with few
8
-
samples (classes with less than 5% of the samples). Surveyed MLs
classified thetest samples mostly in the label with the largest
training set. We marginally im-proved the classification of the
under-represented labels by introducing weightedmetrics in the
cross-validation stage. We also tested oversampling approachesto
sinthetically balance our training dataset, however weighted
metrics outper-formed such approach. These classifiers can be used
as qualitative assestmenttool that migth help in optimal sampling
design for further expensive conven-tional lab test or the design
of initial interventions plans.
vis-NIRS to diagnose soil conditionIn sugarcane soils, Viscarra
et al., [4] used vis-NIRS to predict soil properties
and moved towards a soil fertility index. They used 184 soil
samples, PLSR toestimate soil properties, in addition to 17 terrain
attributes to derive the index.Awiti et al. [22] used vis-NIR into
an odds logistic model to classify soil intogood, average and poor
condition. The usage of vis-NIRS and ML is a rapidstrategy that
offers the possibility to diagnose soil conditions. This study is
thefirst step to evaluate the performance of vis-NIRS, ML
regressors, and classifiers,but we look forward to getting into
soil diagnosis.
Bands with highest correlations and chemical hypothesesRegions
for features highly ranked are centered near to 500, 600, 1400,
1700,
1900, 2200, and 2400 nm(Figure 3). For the pH(H2O), absorptions
near 500and 600 nm are primarily associated with some minerals
containing hematiteand goethite [23, 5]; while those near 600 nm
result from chromophores and thedarkness of organic. In the
vis-NIRS, the overtones and combination bands dueto organic matter
result from the stretching and bending of CO, CH, and NHgroups
[24]. The band around 1400 nm is linked to the vibration of OH
andresidual water in organic matter[25]. On the other hand, the
wavelengths at1700 and 1930 nm are assigned to groups (C-H) and
(C=O) that correspondto aromatic asymmetric alkyl-symmetric doublet
and carboxylic acids, respec-tively [26]. These bands have been
identified as important bands for organicmatter calibration [5].
The band near 2200 nm can be attributed to metal–OHbend plus O–H
stretch combinations of several clay minerals, among them
illitictypes[27], organic compounds, and carbonate. The wavelength
at 2350 nm isrelated to Mg-OH [28]. Finally, in Figure 3 the region
between 500 and 600nmhas a high correlation with the chemical
parameters analyzed, which could berelated to both the dissolution
mechanisms of iron oxides within the soils andparticularly within
the rhizosphere (protonation, reduction, complexation) [29];as well
as the reactions that organic matter (humic acids and fulvic acids)
withcations (Ca+2, K+, Mg+2, and Na+) ([30, 31, 32, 33, 34]).
Although there is nodirect association between properties and the
NIRS, highly classified character-istics could be associated with
property components.
5. Conclusions
ML regressors using a combination of spectra, its first and
second derivative,and FFT features as input were the best model for
pH, OM, Ca, and Mg soilcontent. Despite the estimation performance
being close to reported in the
9
-
literature, it is critical to increase the number of samples,
adding soil sampleswith extreme values to enhance prediction power.
ML classifiers are a feasiblestrategy when ML regressors poorly
perform. Also, ML classifiers can be usedas a qualitative
assessment tool for optimal sampling design.
The feature ranking approach enables the researcher to get
insight into thebands that highly correlate with each property. It
is essential to understandwhat is behind ML approaches; thus,
feature ranking is the first step in gettingback to the data.
6. Data availability upon acceptance
The filtered datasets and scripts are archived at github
(available upon ac-ceptance).
7. Acknowledegments
Special thanks to Oscar Daniel Torres Rodŕıguez and Andrés
Felipe MariñoGuerra for a preliminary study. We are also grateful
for the project 243. Re-comendaciones técnicas preliminares de
manejo de suelos en ladera para el sis-tema de producción de caña
panelera en la HRS from AGROSAVIA that ob-tained the data used in
this study.
References
[1] E. K. Bünemann, G. Bongiorno, Z. Bai, R. E. Creamer, G. De
Deyn,R. de Goede, L. Fleskens, V. Geissen, T. W. Kuyper, P. Mäder,
et al.,Soil quality–a critical review, Soil Biology and
Biochemistry 120 (2018)105–125.
[2] M. R. Nanni, J. A. M. Demattê, Spectral Reflectance
Methodology inComparison to Traditional Soil Analysis, Soil Science
Society of Amer-ica Journal 70 (2006) 393–407. URL:
http://doi.wiley.com/10.2136/sssaj2003.0285.
doi:10.2136/sssaj2003.0285.
[3] J. C. Cañasveras, V. Barrón, M. C. del Campillo, R. A.
ViscarraRossel, Espectroscoṕıa de reflectancia: Una herramienta
para predecirlas propiedades del suelo relacionadas con la clorosis
férrica, SpanishJournal of Agricultural Research 10 (2012)
1133–1142. doi:10.5424/sjar/2012104-681-11.
[4] R. Viscarra Rossel, R. Rizzo, J. Demattê, T. Behrens,
Spatial modeling of asoil fertility index using
visible–near-infrared spectra and terrain attributes,Soil Science
Society of America Journal 74 (2010) 1293–1300.
10
http://doi.wiley.com/10.2136/sssaj2003.0285http://doi.wiley.com/10.2136/sssaj2003.0285http://dx.doi.org/10.2136/sssaj2003.0285http://dx.doi.org/10.5424/sjar/2012104-681-11http://dx.doi.org/10.5424/sjar/2012104-681-11
-
[5] B. Stenberg, R. A. Viscarra Rossel, A. M. Mouazen, J.
Wetterlind,Chapter five - visible and near infrared spectroscopy in
soil science,in: D. L. Sparks (Ed.), Advances in Agronomy, volume
107, Aca-demic Press, 2010, pp. 163 – 215. URL:
http://www.sciencedirect.com/science/article/pii/S0065211310070057.
doi:https://doi.org/10.1016/S0065-2113(10)07005-7.
[6] M. Yang, D. Xu, S. Chen, H. Li, Z. Shi, Evaluation of
machine learningapproaches to predict soil organic matter and pH
using vis-NIR spectra,Sensors (Switzerland) 19 (2019).
doi:10.3390/s19020263.
[7] J. Padarian, B. Minasny, A. B. McBratney, Machine learning
and soilsciences: A review aided by machine learning tools, Soil 6
(2020) 35–52.
[8] J. Ding, A. Yang, J. Wang, V. Sagan, D. Yu,
Machine-learning-based quan-titative estimation of soil organic
carbon content by vis/nir spectroscopy,PeerJ 6 (2018) e5714.
[9] M. Yang, D. Xu, S. Chen, H. Li, Z. Shi, Evaluation of
machine learningapproaches to predict soil organic matter and ph
using vis-nir spectra,Sensors 19 (2019) 263.
[10] A. Morellos, X.-E. Pantazi, D. Moshou, T. Alexandridis, R.
Whetton,G. Tziotzios, J. Wiebensohn, R. Bill, A. M. Mouazen,
Machine learn-ing based prediction of soil total nitrogen, organic
carbon and moisturecontent by using vis-nir spectroscopy,
Biosystems Engineering 152 (2016)104–116.
[11] S. Nawar, A. Mouazen, On-line vis-nir spectroscopy
prediction of soil or-ganic carbon using machine learning, Soil and
Tillage Research 190 (2019)120–127.
[12] IGAC, SUELOS Y TIERRAS DE COLOMBIA, 3 ed., Instituto
GeográficoAgust́ın Codazzi, 2015.
[13] J. H. Camacho-Tamayo, Y. Rubiano S, M. d. P. Hurtado S,
Near-infrared(nir) diffuse reflectance spectroscopy for the
prediction of carbon and ni-trogen in an oxisol, Agronomia
colombiana 32 (2014) 86–94.
[14] J. H. Camacho-Tamayo, N. M. Forero-Cabrera, L.
Ramı́rez-López, Y. Ru-biano, Near-infrared spectroscopic
assessment of soil texture in an oxisolof the eastern plains of
colombia, Colombia Forestal 20 (2017) 5–18.
[15] D. Cozzolino, A. Morón, The potential of near-infrared
reflectance spec-troscopy to analyse soil chemical and physical
characteristics, Journal ofAgricultural Science 140 (2003) 65–71.
doi:10.1017/S0021859602002836.
[16] R. Zornoza, C. Guerrero, J. Mataix-Solera, K. Scow, V.
Arcenegui,J. Mataix-Beneyto, Near infrared spectroscopy for
determination of variousphysical, chemical and biochemical
properties in mediterranean soils, SoilBiology and Biochemistry 40
(2008) 1923–1930.
11
http://www.sciencedirect.com/science/article/pii/S0065211310070057http://www.sciencedirect.com/science/article/pii/S0065211310070057http://dx.doi.org/https://doi.org/10.1016/S0065-2113(10)07005-7http://dx.doi.org/https://doi.org/10.1016/S0065-2113(10)07005-7http://dx.doi.org/10.3390/s19020263http://dx.doi.org/10.1017/S0021859602002836
-
[17] S. B. Aguiar Herrera, Bases técnicas para el
establecimiento y manejo delcultivo de caña en el departamento de
Casanare, 1 ed., Corpoica, 2001.
[18] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B.
Thirion, O. Grisel,M. Blondel, P. Prettenhofer, R. Weiss, V.
Dubourg, J. Vanderplas, A. Pas-sos, D. Cournapeau, M. Brucher, M.
Perrot, E. Duchesnay, Scikit-learn:Machine learning in Python,
Journal of Machine Learning Research 12(2011) 2825–2830.
[19] P. Juszczak, D. M. J. Tax, R. P. W. Duin, Feature scaling
in support vectordata description, 2002.
[20] S. J. Russell, P. Norvig, E. Al, Artificial intelligence :
a modern approach,Pearson, Cop, 2010.
[21] D. Chicco, G. Jurman, The advantages of the Matthews
correlation co-efficient (MCC) over F1 score and accuracy in binary
classification eval-uation, BMC Genomics 21 (2020) 6. URL:
https://doi.org/10.1186/s12864-019-6413-7.
doi:10.1186/s12864-019-6413-7.
[22] A. O. Awiti, M. G. Walsh, K. D. Shepherd, J. Kinyamario,
Soil conditionclassification using infrared spectroscopy: A
proposition for assessment ofsoil condition along a tropical
forest-cropland chronosequence, Geoderma143 (2008) 73–84.
[23] R. V. Morris, H. V. Lauer, C. A. Lawson, E. K. Gibson, G.
A. Nace,C. Stewart, Spectral and other physicochemical properties
of submicronpowders of hematite (alpha -Fe2O3), maghemite (gamma -
Fe2O3), mag-netite (Fe3O4), goethite (alpha - FeOOH) and
lepidocrocite (gamma -FeOOH)., Journal of Geophysical Research 90
(1985) 3126–3144. doi:10.1029/JB090iB04p03126.
[24] E. Ben-Dor, s. J. R. Iron, G. F. Epema, PSoil reflectance,
in: RemoteSensing for the Earth Sciences, volume 3 of Manual of
Remote Sensing,Wiley, New York, 1999, pp. 111––188.
[25] R. Reda, T. Saffaj, B. Ilham, O. Saidi, K. Issam, L.
Brahim, E. M. ElHadrami, A comparative study between a new method
and other machinelearning algorithms for soil organic carbon and
total nitrogen predictionusing near infrared spectroscopy,
Chemometrics and Intelligent LaboratorySystems 195 (2019).
doi:10.1016/j.chemolab.2019.103873.
[26] R. V. Rossel, T. Behrens, Using data mining to model and
interpret soildiffuse reflectance spectra, Geoderma 158 (2010)
46–54.
[27] R. N. Clark, T. V. King, M. Klejwa, G. A. Swayze, N. Vergo,
High spectralresolution reflectance spectroscopy of minerals,
Journal of Geophysical Re-search 95 (1990) 12653–12680. URL:
https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/JB095iB08p12653https://agupubs.
12
https://doi.org/10.1186/s12864-019-6413-7https://doi.org/10.1186/s12864-019-6413-7http://dx.doi.org/10.1186/s12864-019-6413-7http://dx.doi.org/10.1029/JB090iB04p03126http://dx.doi.org/10.1029/JB090iB04p03126http://dx.doi.org/10.1016/j.chemolab.2019.103873https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/JB095iB08p12653
https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/JB095iB08p12653
https://agupubs.onlinelibrary.wiley.com/doi/10.1029/JB095iB08p12653https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/JB095iB08p12653
https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/JB095iB08p12653
https://agupubs.onlinelibrary.wiley.com/doi/10.1029/JB095iB08p12653https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/JB095iB08p12653
https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/JB095iB08p12653
https://agupubs.onlinelibrary.wiley.com/doi/10.1029/JB095iB08p12653
-
onlinelibrary.wiley.com/doi/abs/10.1029/JB095iB08p12653https:
//agupubs.onlinelibrary.wiley.com/doi/10.1029/JB095iB08p12653.doi:10.1029/jb095ib08p12653.
[28] Q. Fang, H. Hong, L. Zhao, S. Kukolich, K. Yin, C. Wang,
Visible andnear-infrared reflectance spectroscopy for investigating
soil mineralogy: Areview, JOURNAL OF SPECTROSCOPY (2018). URL:
http://hdl.handle.net/10150/628358. doi:10.1155/2018/3168974.
[29] U. Schwertmann, Solubility and dissolution of iron oxides,
Plant and Soil130 (1991) 1–25.
[30] M. Ali, W. Mindari, EFFECT OF HUMIC ACID ON SOIL
CHEMICALAND PHYSICAL CHARACTERISTICS OF EMBANKMENT, MATECWeb of
Conferences (2015). doi:10.1051/conf/2016.
[31] H. R. Sindelar, M. T. Brown, T. H. Boyer, Effects of
natural organic matteron calcium and phosphorus co-precipitation,
Chemosphere 138 (2015) 218–224.
doi:10.1016/j.chemosphere.2015.05.008.
[32] F. L. Wang, P. M. Huang, Effects of organic matter on the
rate of potassiumadsorption by soils, Canadian Journal of Soil
science (2001). URL: www.nrcresearchpress.com.
[33] M. Yan, Y. Lu, Y. Gao, M. F. Benedetti, G. V. Korshin,
In-Situ In-vestigation of Interactions between Magnesium Ion and
Natural OrganicMatter, Environmental Science and Technology 49
(2015) 8323–8329.doi:10.1021/acs.est.5b00003.
[34] S. Droge, K. U. Goss, Effect of sodium and calcium cations
on the ion-exchange affinity of organic cations for soil organic
matter, EnvironmentalScience and Technology 46 (2012) 5894–5901.
doi:10.1021/es204449r.
[35] R. N. Clark, T. V. King, M. Klejwa, G. A. Swayze, N. Vergo,
High spectralresolution reflectance spectroscopy of minerals,
Journal of GeophysicalResearch 95 (1990).
doi:10.1029/jb095ib08p12653.
[36] E. Suess, Interaction of organic compounds with calcium
carbonat-II.Organo-carbonate association in Recent sediments,
Geochimica et Cos-mochimica Acta 37 (1973) 2435–2447.
[37] C. Pasquini, Near infrared spectroscopy: Fundamentals,
prac-tical aspects and analytical applications, 2003.
doi:10.1590/S0103-50532003000200006.
[38] A. Niemöller, D. Behmer, Use of Near Infrared Spectroscopy
in the FoodIndustry, Nondestructive Testing of Food Quality (2008)
67–118. doi:10.1002/9780470388310.ch4.
13
https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/JB095iB08p12653
https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/JB095iB08p12653
https://agupubs.onlinelibrary.wiley.com/doi/10.1029/JB095iB08p12653https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/JB095iB08p12653
https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/JB095iB08p12653
https://agupubs.onlinelibrary.wiley.com/doi/10.1029/JB095iB08p12653https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/JB095iB08p12653
https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/JB095iB08p12653
https://agupubs.onlinelibrary.wiley.com/doi/10.1029/JB095iB08p12653http://dx.doi.org/10.1029/jb095ib08p12653http://hdl.handle.net/10150/628358http://hdl.handle.net/10150/628358http://dx.doi.org/10.1155/2018/3168974http://dx.doi.org/10.1051/conf/2016http://dx.doi.org/10.1016/j.chemosphere.2015.05.008www.nrcresearchpress.comwww.nrcresearchpress.comhttp://dx.doi.org/10.1021/acs.est.5b00003http://dx.doi.org/10.1021/es204449rhttp://dx.doi.org/10.1029/jb095ib08p12653http://dx.doi.org/10.1590/S0103-50532003000200006http://dx.doi.org/10.1590/S0103-50532003000200006http://dx.doi.org/10.1002/9780470388310.ch4http://dx.doi.org/10.1002/9780470388310.ch4
-
[39] B. S. Bansod, N. Kamboj, Measurement of soil attributes
using NIR spec-troscopy : A review, International Journal of
Advance Research in Scienceand Engineering (2016) 601–606.
[40] T. Udelhoven, C. Emmerling, T. Jarmer, Quantitative
analysis of soilchemical properties with diffuse reflectance
spectrometry and partial least-square regression: A feasibility
study, Plant and Soil 251 (2003)
319–329.doi:10.1023/A:1023008322682.
[41] H. U. Rehman, M. Knadel, L. Wollesen de Jonge, E. Arthur,
Predicting soilcation exchange capacity for variable soil types
with visible near infraredspectra, in: EGU General Assembly
Conference Abstracts, EGU GeneralAssembly Conference Abstracts,
2018, p. 3595.
[42] A. P. Leone, G. Leone, N. Leone, C. Galeone, E. Grilli, N.
Orefice, V. An-cona, Capability of Di ff use Reflectance
Spectroscopy to Predict Soil WaterRetention and Related Soil, Water
(Switzerland) 11 (2019) 1–16.
[43] Y. Ulusoy, Y. Tekin, Z. Tümsavaş, A. M. Mouazen,
Prediction of soil cationexchange capacity using visible and near
infrared spectroscopy, BiosystemsEngineering 152 (2016) 79–93.
doi:10.1016/j.biosystemseng.2016.03.005.
[44] J. Padarian, B. Minasny, A. McBratney, Transfer learning to
localise acontinental soil vis-nir calibration model, Geoderma 340
(2019) 279–288.
[45] R. V. Rossel, T. Behrens, E. Ben-Dor, D. Brown, J.
Demattê, K. D. Shep-herd, Z. Shi, B. Stenberg, A. Stevens, V.
Adamchuk, et al., A globalspectral library to characterize the
world’s soil, Earth-Science Reviews 155(2016) 198–230.
14
http://dx.doi.org/10.1023/A:1023008322682http://dx.doi.org/10.1016/j.biosystemseng.2016.03.005http://dx.doi.org/10.1016/j.biosystemseng.2016.03.005
1 Introduction2 Materials and methods2.1 Data2.1.1 Study area
and sample collection procedure2.1.2 Chemical measurements
2.2 Methods2.2.1 ML regression models2.2.2 ML classifiers2.2.3
Feature ranking
3 Results4 Discussion5 Conclusions6 Data availability upon
acceptance7 Acknowledegments