Calibration
This tab implements the multivariate regression and calibration using aerosol FT-IR spectra described in the following manuscripts:
Dillner, and Takahama, Atmos. Meas. Tech., 2015a,
Dillner, and Takahama, Atmos. Meas. Tech., 2015b and
Reggente, Dillner, and Takahama, Atmos. Meas. Tech., 2016. Currently implemented for PLS1 and PLS2 using the pls package (Mevik and Wehrens, J. Stat. Softw., 2007).
Inputs before computation:
Response file (y)
File containing the response values (target variables).
Case file
File containing the list of samples to be used in the calibration and test.
Variable(s)
The User can choose the target variable(s) from a list given by the column names of the response file uploaded. Multiple choices are allowed. If the PLS type parameter is set to PLS1 (default), the AIRSpec will compute one calibration model for each variable. If the PLS type parameter is set to PLS2, the AIRSpec will compute one calibration model for whole variables matrix.
Spectra type
The user can choose which spectra to use: the uploaded spectra (Home tab) or the baseline corrected spectra (Baseline correction tab).
Show optional inputs (optional) checkbox:
- Wavenumber: the user can upload a file containing indices for selecting wavenumber (this file should contain integers or list of integers corresponding to each variable). For example, the index 1 corresponds to the most left wavenumber. If the user does not upload a file, as default, AIRSpec uses all wavenumbers.
- Minimum detection limit: the user can upload a file containing the minimum detection limit (MDL) for each variable. AIRSpec will remove samples below the MDL.
Show/change default parameters checkbox:
- Max number of PLS components: the user can choose the upper limit of the number of latent variables (default is 60).
- Number of segments for CV: the user can choose the number of folders (segments) in the k-fold cross-validation (default is 10). Please, be aware that the maximum number of segments has to be equal or minor the number of samples in the calibration data set.
- PLS Method: the user can choose the fitting algorithm (default is the orthogonal scores algorithm, oscorespls).
- PLS type the user can choose between PLS1 (default) and PLS2. In the case of PLS1, the AIRSpec will compute one calibration model for each variable. In the case of PLS2, the AIRSpec will compute one calibration model for whole variables matrix (the number of variables needs to be higher than one).
- Parameters optimization the user can choose between cross-validation (CV, default), leave-one-out (LOO). In the case of CV, k-fold cross-validation is performed. The number and type of cross-validation segments are specified in the fields Number of segments for CV and Segment type (see below). In the case of LOO, leave-one-out cross-validation is performed. In this case, the input Number of segments for CV is ignored. In these two cases (cross-validation is used), the calibration set is used for both "training" and "validation," and the test set is completely separate from the model building process. Differently, in the case of none, the test set is used for "validation." Be aware that in this case, the number of components (LVs) is chosen according to the test set.
- Segment type the user can choose how to generate the CV segments (folds).
Interleaved is the default, and the first segment will contain the indices 1, length.seg+1, 2*length.seg+1, …, (k-1)*length.seg+1, and so on. lenght.seg is the ceiling of the number of calibration samples divided by the number of segments (k). In the case of random, the indices are allocated to segments in random order. In the case of consecutive, the first segment will contain the first length.seg indices, and so on.
- order response variable: the user can choose to order the calibration samples according to the concentrations.
- Figure units: micrograms, micromoles or micrograms per cubic meter.
Compute
Click the Compute button to start the computation, with the setting described above. At the end of the calculation, the AIRSpec produce a figure with different plots. Each row corresponds to a variable. From left to right, in the first panel, there is the RMSE in cross-validation against the number of components (latent variables), the dotted vertical line shows the number of components selected according to the minimum RMSE. The second and third panels show the scatter plots of predicted against observed (or reference) values for the calibration and test datasets.
Inputs after computation:
Download results
Click to download the following files:
-
fits.rds file containing the fitted model for each variable. The file contains all the model specifications (e.g., regression coefficients, scores, loadings, loading weights, fitted values, residuals, and so on).
-
fitsplot.pdf Figure with different plots. Each row corresponds to a variable. From left to right, in the first panel, there is the RMSE in cross-validation against the number of components (latent variables), the dotted vertical line shows the number of components selected according to the minimum RMSE. The second and third panels show the scatter plots of predicted against observed (or reference) values for the calibration and test sets.
-
input_mvr_user.json For the sake of reproducibility, the file contains all the input parameters used in the computation of the calibration model.
- param: list of pls parameters:
- method: pls fitting algorithm (see PLS Method above).
- ncomp: upper limit of the number of latent variables (see Max number of PLS components above)
- validation: method for Parameters optimization, see above.
- segments: number of folders (segments) used in the cross-validation (see Number of segments for CV above).
- segment.type: method to generate the CV segments (see Segment type above.
- responsefile: name of the file that contains the list of the samples with the variable concentrations.
- plstype: type of the pls used (see PLS type above).
- variables: list of variable computed (see Variable(s) above).
- casefile: name of the file that contains the list of the samples used for calibration and test.
- specfile: name of the file that contains the spectra matrix used (see Spectra type above).
- mdlfile: name of the file that contains the MDL values (see Minimum detection limit above).
- wavenumfile: name of the file that contains the indices of the selected wavenumber (see Wavenumber above).
- reordercalib: true if the calibration samples are reordered according to the concentrations (see order response variable above).
- units: name of the units used in the figures (see Figure units above).
- responsefile_reader: responsefile type.
- specfile_reader: specfile type.
- casefile_reader: casefile type.
-
pred.rda: file containing the predicted values for each variable and the whole set of components.
-
prediction_table.csv file containing the table with the predicted and observed values (for each variable) for the optimal number of components (LVs).
-
rmsep.rda file containing the RMSE values obtained in cross-validation.
-
stats_table.csv file containing the statistics of the fitted models.
-
[variable]_VIPscores.pdf file containing the plot of the Variance Importance in Projection (VIP) scores as described in Chong, Il-Gyo and Jun, Chi-Hyuck, Chemometr. Intell. Lab., 2005 for each variable.
-
[variable]_EVx.pdf file containing the plot of the explained variation in X (EVjk(X)) for each LV.
-
[variable]_EVy.pdf file containing the plot of the explained variation in Y (EVjk(Y)) for each LV.
-
[variable]__coefficients.pdf file containing the plot of the regression coefficients.