AIRSpec - Tools for Aerosol InfraRed Spectroscopy

Welcome

AIRSpec is an online facility for using the chemometric tools that we have developed for FT-IR spectra processing and analysis of atmospheric aerosol.

AIRSpec is built using the open source web application framework for R, Shiny, and currently hosted at the EPFL. The facilities are provided here via a browser-friendly interactive web-interface.

The available chemometric tools are accessible by clicking on the navigation tab above:

The site is academically developed and managed by Dr. Matteo Reggente and Prof. Satoshi Takahama. If you use results or developments from these chemometric tools (either via the web interface or source code), please cite the manuscripts above. A manuscript documenting this software can be found here.

You can download the template files (required and optional) that are used as inputs for the chemometric tools by clicking "Download templates" located in the sidebar in this page.

To start, please upload an IR spectra file.

Baseline correction

This tab implements the smoothing splines baseline correction method of Kuzmiakova, Dillner, and Takahama, Atmos. Meas. Tech., 2016.

Inputs before computation:

Sample list file (optional)

List of samples to be processed. If not uploaded the AIRSpec will baseline correct all the samples present in the spectra file uploaded.

Sequence of EDF values

The AIRSpec will compute the baseline for each EDF value listed in this field and for each segment selected (see below).

Non negativity constrain

Checkbox to impose that the absorbance of the baseline corrected spectra is non-negative in the region 3500 -- 2300 cm-1 with a tolerance of 10-3. This constraint is useful when correcting laboratory standards.

Compute

Click the Compute button to start the computation, with the setting described above. At the end of the calculation, the AIRSpec plots the spectra of the baseline-corrected samples with default parameters (see below).

Inputs after computation:

Segment 1 EDF and Segment 2 EDF

Desired EDF values for the baseline-corrected spectra. The spectra with the selected EDF values are plotted on the right. For each segment, the default value is set to be the minimum median of the negative absorbance fraction (NAF — for details see Section 2.3.2 in Kuzmiakova, Dillner, and Takahama, Atmos. Meas. Tech., 2016) computed from all the samples baseline-corrected and for each EDF.

Stitch method

Method to join the baseline-corrected segments together in the overlapping background regions between 2000 and 1820 cm-1. The background regions are chosen where analyte absorption is not expected.

  • diff excludes the overlapping region.
  • mean uses the mean absorbance (default).
Download results

Click to download the following files:

  • input_ssb_user.json For the sake of reproducibility, the file contains all the input parameters used in the computation of the baseline correction.
    • specfile: name of the file that contains the spectra.
    • samplelistfile: name of the file that contains the list of the samples computed.
    • param: list of EDF values that are used in the baseline correction.
    • whichsegment: list of the segments that are baseline-corrected.
    • specfile_reader: type of the specfile.
    • samplelistfile_reader: type of the samplelistfile.
  • ssb_selected.json For the sake of reproducibility, the file contains:
    • selected: the EDF parameter choices for each segment.
    • stitch: stitching method selected.
    • inputfile: name of the file that contains all the input parameters used in the computation of the baseline correction (./input_ssb_user.json).
  • segm1_agg_criteria_table.csv and segm2ext_agg_criteria_table.csv contain the summary statistics of the aggregated NAF and aggregated total normalized absolute blank absorbance (if blank samples are provided).
  • segm1_agg_criteria.pdf and segm2ext_agg_criteria.pdf display the summary statistics of the aggregated NAF and aggregated total normalized absolute blank absorbance (if blank samples are provided). Black lines refer to the aggregated median, red dashed lines refer to aggregated mean, and gray regions represent the first and third quartile range.
  • segm1_baseline_param.csv and segm2ext_baseline_param.csv contain the wavenumber bounds of the computed smoothing spline and the effective EDF of each baseline-corrected spectra.
  • segm1_baseline.rds and segm2ext_baseline.rds are the baseline matrix (load with readRDS) of each segment. Each row represents a single baseline, and the sample names are provided in the rownames attribute; wavenumbers are saved in the "wavenum" attribute.
  • segm1_baselinedb.sqlite and segm2ext_baselinedb.sqlite are SQL databases that contain baselines generated for each spectrum and parameter, and additional parameters if they are estimated internally. The current implementation of the database is SQLite. The user can inspect the contents of the output file using the DB Browser for SQLite.
  • segm1_blankabs.csv and segm2ext_blankabs.csv (if blanks are provided) contain the aggregated total normalized absolute blank absorbance for each sample and EDF.
  • segm1_naf.csv and segm2xt_naf.csv contain the NAF for each sample and EDF.
  • segm1_spec.rds and segm2ext_spec.rds are the baseline-corrected absorbance spectra matrix (load with readRDS) of each segment. Each row represents a single spectrum, and the sample names are provided in the rownames attribute; wavenumbers are saved in the "wavenum" attribute.
  • spectra_baselined.csv and spectra_baselined.rds are the baseline-corrected absorbance spectra matrix (load with readRDS) of the stitched segments. In the .csv file, the first column contains the wavenumber values and the spectra in each of the subsequent columns (with sample names as column labels). In the .rds file, each row represents a single spectrum, and the sample names are provided in the rownames attribute; wavenumbers are saved in the "wavenum" attribute. Wavenumbers are saved in the "wavenum" attribute.
  • spectra_plots.pdf contains the spectra plots for each sample. Different color lines refer to spectra computed with different EDF.
  • baseline_plots_selected.pdf contains the selected spectra plots. Blue lines represent the corrected spectra using the spline baseline (red lines). The yellow lines represent the spectra corrected using a linear baseline.

Tab: parameter selection

Top panel shows the spectra plots for each sample and EDF. Different color lines refer to spectra computed with different EDF. Bottom panel contains the aggregated median NAF in function of the EDF values for each segment.

Peak fitting

This tab implements the multi-peak fitting algorithm of Takahama, Johnson, and Russell, Aerosol Sci. Tech., 2013, with additional modifications for extendibility.

Inputs before computation:

Spectra type

The user can choose which spectra to use: the uploaded spectra (Home tab) or the baseline corrected spectra (Baseline correction tab).

Bond Sequence

List of peaks or profiles to be fitted in sequence. The default sequence is:

  • carbonylCO+amineNH
  • carboxylicCOH
  • ammoniumNH
  • alcoholCOH2
  • alkaneCH4
  • alkeneCH+aromaticCH
Sample list file (optional)

List of samples to be processed. If not uploaded the AIRSpec will peak fit all the samples present in the spectra matrix chosen.

Compute

Click the Compute button to start the computation, with the setting described above. At the end of the calculation, the AIRSpec plots the fitted peaks for each spectrum.

Inputs after computation:

Download results

Click to download the following files:

  • fitpars.csv contains the parameters of the fitted peaks.
  • input_mpf_user.json For the sake of reproducibility, the file contains all the input parameters used in the computation of the multi-peak fitting algorithm.
    • specfile: name of the file that contains the spectra.
    • casefile: name of the file that contains the list of the samples computed.
    • peaksequence: list of peaks or profiles to be fitted in sequence.
    • specfile_reader: type of the specfile.
    • casefile_reader: type of the samplelistfile.
  • moles.csv contains the estimated micromoles of functional groups in each sample.
  • pkareas.csv contains the total areas of the fitted peaks for each functional group and each sample.
  • specfits.pdf contains the plots of the fitted peaks for each spectrum.

Calibration

This tab implements the multivariate regression and calibration using aerosol FT-IR spectra described in the following manuscripts: Dillner, and Takahama, Atmos. Meas. Tech., 2015a, Dillner, and Takahama, Atmos. Meas. Tech., 2015b and Reggente, Dillner, and Takahama, Atmos. Meas. Tech., 2016. Currently implemented for PLS1 and PLS2 using the pls package (Mevik and Wehrens, J. Stat. Softw., 2007).

Inputs before computation:

Response file (y)

File containing the response values (target variables).

Case file

File containing the list of samples to be used in the calibration and test.

Variable(s)

The User can choose the target variable(s) from a list given by the column names of the response file uploaded. Multiple choices are allowed. If the PLS type parameter is set to PLS1 (default), the AIRSpec will compute one calibration model for each variable. If the PLS type parameter is set to PLS2, the AIRSpec will compute one calibration model for whole variables matrix.

Spectra type

The user can choose which spectra to use: the uploaded spectra (Home tab) or the baseline corrected spectra (Baseline correction tab).

Show optional inputs (optional) checkbox:
  • Wavenumber: the user can upload a file containing indices for selecting wavenumber (this file should contain integers or list of integers corresponding to each variable). For example, the index 1 corresponds to the most left wavenumber. If the user does not upload a file, as default, AIRSpec uses all wavenumbers.
  • Minimum detection limit: the user can upload a file containing the minimum detection limit (MDL) for each variable. AIRSpec will remove samples below the MDL.
Show/change default parameters checkbox:
  • Max number of PLS components: the user can choose the upper limit of the number of latent variables (default is 60).
  • Number of segments for CV: the user can choose the number of folders (segments) in the k-fold cross-validation (default is 10). Please, be aware that the maximum number of segments has to be equal or minor the number of samples in the calibration data set.
  • PLS Method: the user can choose the fitting algorithm (default is the orthogonal scores algorithm, oscorespls).
  • PLS type the user can choose between PLS1 (default) and PLS2. In the case of PLS1, the AIRSpec will compute one calibration model for each variable. In the case of PLS2, the AIRSpec will compute one calibration model for whole variables matrix (the number of variables needs to be higher than one).
  • Parameters optimization the user can choose between cross-validation (CV, default), leave-one-out (LOO). In the case of CV, k-fold cross-validation is performed. The number and type of cross-validation segments are specified in the fields Number of segments for CV and Segment type (see below). In the case of LOO, leave-one-out cross-validation is performed. In this case, the input Number of segments for CV is ignored. In these two cases (cross-validation is used), the calibration set is used for both "training" and "validation," and the test set is completely separate from the model building process. Differently, in the case of none, the test set is used for "validation." Be aware that in this case, the number of components (LVs) is chosen according to the test set.
  • Segment type the user can choose how to generate the CV segments (folds). Interleaved is the default, and the first segment will contain the indices 1, length.seg+1, 2*length.seg+1, …, (k-1)*length.seg+1, and so on. lenght.seg is the ceiling of the number of calibration samples divided by the number of segments (k). In the case of random, the indices are allocated to segments in random order. In the case of consecutive, the first segment will contain the first length.seg indices, and so on.
  • order response variable: the user can choose to order the calibration samples according to the concentrations.
  • Figure units: micrograms, micromoles or micrograms per cubic meter.
Compute

Click the Compute button to start the computation, with the setting described above. At the end of the calculation, the AIRSpec produce a figure with different plots. Each row corresponds to a variable. From left to right, in the first panel, there is the RMSE in cross-validation against the number of components (latent variables), the dotted vertical line shows the number of components selected according to the minimum RMSE. The second and third panels show the scatter plots of predicted against observed (or reference) values for the calibration and test datasets.

Inputs after computation:

Download results

Click to download the following files:

  • fits.rds file containing the fitted model for each variable. The file contains all the model specifications (e.g., regression coefficients, scores, loadings, loading weights, fitted values, residuals, and so on).
  • fitsplot.pdf Figure with different plots. Each row corresponds to a variable. From left to right, in the first panel, there is the RMSE in cross-validation against the number of components (latent variables), the dotted vertical line shows the number of components selected according to the minimum RMSE. The second and third panels show the scatter plots of predicted against observed (or reference) values for the calibration and test sets.
  • input_mvr_user.json For the sake of reproducibility, the file contains all the input parameters used in the computation of the calibration model.
    • param: list of pls parameters:
      • method: pls fitting algorithm (see PLS Method above).
      • ncomp: upper limit of the number of latent variables (see Max number of PLS components above)
      • validation: method for Parameters optimization, see above.
      • segments: number of folders (segments) used in the cross-validation (see Number of segments for CV above).
      • segment.type: method to generate the CV segments (see Segment type above.
    • responsefile: name of the file that contains the list of the samples with the variable concentrations.
    • plstype: type of the pls used (see PLS type above).
    • variables: list of variable computed (see Variable(s) above).
    • casefile: name of the file that contains the list of the samples used for calibration and test.
    • specfile: name of the file that contains the spectra matrix used (see Spectra type above).
    • mdlfile: name of the file that contains the MDL values (see Minimum detection limit above).
    • wavenumfile: name of the file that contains the indices of the selected wavenumber (see Wavenumber above).
    • reordercalib: true if the calibration samples are reordered according to the concentrations (see order response variable above).
    • units: name of the units used in the figures (see Figure units above).
    • responsefile_reader: responsefile type.
    • specfile_reader: specfile type.
    • casefile_reader: casefile type.
  • pred.rda: file containing the predicted values for each variable and the whole set of components.
  • prediction_table.csv file containing the table with the predicted and observed values (for each variable) for the optimal number of components (LVs).
  • rmsep.rda file containing the RMSE values obtained in cross-validation.
  • stats_table.csv file containing the statistics of the fitted models.
  • [variable]_VIPscores.pdf file containing the plot of the Variance Importance in Projection (VIP) scores as described in Chong, Il-Gyo and Jun, Chi-Hyuck, Chemometr. Intell. Lab., 2005 for each variable.
  • [variable]_EVx.pdf file containing the plot of the explained variation in X (EVjk(X)) for each LV.
  • [variable]_EVy.pdf file containing the plot of the explained variation in Y (EVjk(Y)) for each LV.
  • [variable]__coefficients.pdf file containing the plot of the regression coefficients.