For example, Golotvin et al [ 5 ] proposed identifying noise points by comparing the intensity range of a small neighborhood with the standard deviation of noise regions, which is estimated by dividing the spectrum into 32 sections and taking the minimum value of standard deviations of these sections.
We observed that this method occasionally identifies the low signal points in metabolomics spectra as noise because they may overlap with each other and have reduced standard deviation, and as a result these signal points will be offset to zero after baseline correction. Noise standard deviation estimation is also theoretically biased to be smaller than the true value in a statistical view, and leads to additional inaccuracy in detection of noise data points. As an alternative to the existing noise detection and interpolation approaches, we developed a new baseline correction method based on a penalized parametric smoothing model.
This method fits a curve following the bottom envelope of the spectrum and doesn't need explicit identification of the noise data points. The primary motivation is that we model the baseline as a smooth curve of arbitrary form that goes through the noise region instead of linked pieces of selected noise points. We describe key features of this model by a score function and construct the optimal baseline curve corresponding to the function maximum.
In addition, we present a more accurate estimation of noise variance by LOWESS locally weighted scatterplot smoothing regression and use it to determine the model parameters. The fundamental model behind our method is that the spectrum can be represented as. An estimated baseline should be 1 smooth, but not necessarily flat; and 2 run through the middle of the data in segments where there is no signal.
Based the on these features, we construct the following score function:. The optimal baseline curve b 0 should maximize the score function F b. F b has three components. The smoothness penalty is a discrete form of integral of squares of second-order derivatives, which is small for linear segments and large for small curvature radius.
The negativity penalty is designed to be nonzero only when the baseline point is above the data point, by using the Heaviside step function g b i - y i. It counteracts the uptrend of the first term and force the baseline to run through the middle of the data. By maximizing this function the baseline is pushed up to the spectrum but not exceeding the zero-signal level, and forced to be as smooth as possible to link peak regions. The negative penalty parameter B is determined by the condition that the baseline should run through the center of the noise region.
The expectation value of negativity term could be calculated based on the probability density function PDF of the noise P y. Substitute into equation 8 , we have. Multiplying the spectrum by a constant does not affect finding the optimal baseline by maximizing this score function.
In addition, the smoothness penalty should be robust to the abscissa resolution. For example, if we take half the data points with odd indices of the original spectrum so that the chemical shift interval is doubled, the baseline curve should not be affected. Therefore C has an inverse quadruple relation with the resolution dx of the abscissa.
The baseline curve is insensitive to small changes of A and B , unless the orders of magnitude are changed. We divide the spectrum into small regions and compute the variance and mean intensity within each regions. Figure 1 plots the variances versus mean values with region size of 32 data points, corresponding to 0. The red line in Figure 1 represents the fitted regression line. It has a quadratic form as expressed in equation Variances versus mean intensities sampled bins of 1D NMR metabolomics spectra.
The bin size was set to 32 data points, corresponding to 0. After determining the parameters, we maximize the function F b to find the baseline b 0 , according to equation 3. This partial derivative equation expands as a linear system with the solution to be b 0. The numerical implementation of solving this linear system is attached in the appendix.
Based on this penalized smoothing model, we test the baseline correction method on simple 1D NMR spectra and complex metabolomics spectra. Figure 2A shows the original spectrum with apparent baseline distortions. This distorted baseline is detected by the penalized smoothing method in Figure 2B. In Figure 2C , this baseline curve is subtracted from the spectrum and the distortion is corrected.
The optimal baseline found by our baseline model fits well with the distortion curve. The small peak at 2ppm in the spectrum is correctly presented after baseline subtraction. Baseline correction by penalized parametric smoothing method. B Detected baseline curve by penalized parametric smoothing method. C Corrected spectrum after baseline subtraction. We test this method in more complicated metabolomics spectra collected from tissue samples of red abalone.
The data are from a study of environmental stresses on the development of a bacterial infection among red abalones Haliotis rufescens [ 12 , 13 ]. The dataset include 65 1D proton NMR spectra with data points in each spectrum. In our test the penalized smoothing method correctly detected and removed the distorted baseline for all 65 spectra.
Figure 3 shows the baseline correction result on one example of testing spectra using the penalized smoothing method. In Figure 3A , the peaks of metabolites aggregate together and form continuous peak regions. Lack of noise points in these regions generates big gaps in baseline construction.
As demonstrated in Figure 3B and Figure 3C , the baseline distortion is correctly detected and removed. The default baseline correction wavelengths are preconfigured for each of the included assays types and may not be changed.
The default baseline correction wavelength for the UV-Vis app is nm. However, it is suggested that the optimal wavelength be empirically determined for each sample type. Use the Baseline Correction feature, which is accessed from the Overflow menu to change selections.
A baseline correction wavelength is required for the Standard Curve app methods but not for Formula Methods. The optimal baseline correction wavelength should be empirically determined for each method and should take into account the sample type and reagents to be measured.
A general recommendation would be to use nm for UV only wavelength ranges and nm for methods including the Vis wavelength ranges. Microbial cell culture OD values are measurements of light scattering. The DS software does not use a baseline normalization unless specified as a user selection.
A baseline correction wavelength is required for all Kinetics app methods. As in all custom methods, the optimal baseline correction wavelength should be empirically determined.
Check our help guide for more info. Potential Baseline Offset Figure 1 illustrates how an incorrect absorbance reading can be obtained from a non-baseline corrected sample. Baseline Wavelength Selection The optimal wavelength to use for baseline correction is a wavelength at which there is no absorbance attributed to the sample buffer or the molecule of interest. This proposed IAsLS method was successfully applied to practical Raman spectral data and the results in the paper indicate that the baseline of Raman spectra can be automatically subtracted.
He, W. Zhang, L. Liu, Y. Huang, J. Xie, P. Wu and C. Du, Anal. Methods , , 6 , DOI: To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page. If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given. If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given.
Read more about how to correctly acknowledge RSC content. Fetching data from CrossRef. This may take some time to load. Loading related content. Jump to main content. Jump to site search.
0コメント