The added utility of nonlinear methods compared to linear methods in rescaling soil moisture products
Introduction
Soil moisture is one of the key variables in many geophysical science applications (e.g., those dealing with climate, hydrology, water resources, or agriculture; Lawrence and Hornberger, 2007) owing to its memory (Han et al., 2014) and role in water and energy exchange between land and the atmosphere (Koster et al., 2004). Hence, an accurate estimation of soil moisture is critical for many applications (Dorigo et al., 2012). Different soil moisture time series for the same location and same time period can be retrieved via different platforms (e.g., hydrological models, in situ observations, and remote sensing). It is often desirable to merge these different datasets to obtain more accurate estimates (Anderson et al., 2012, Yilmaz et al., 2012). However, due to the limitations of these platforms (e.g., satellites can monitor only the top few centimeters at relatively coarse resolutions, points in in situ observations have spatial representativeness limitations, and models have different parameterizations (Koster et al., 2009)), these datasets have systematic differences in their horizontal, temporal, and/or vertical supports (Dirmeyer et al., 2004, Koster et al., 2009). As a result, soil moisture values obtained from various platforms often need to be rescaled before they can be meaningfully validated, merged, or used in different applications (Dirmeyer et al., 2004, Reichle and Koster, 2005, Reichle et al., 2008, Yilmaz and Crow, 2013, Yin et al., 2014, Su and Ryu, 2015).
Many different methods are proposed to handle these systematic differences between soil moisture products, where an unscaled original product Y is rescaled to the space of a reference product X. However, the performances of these methods depend on many factors, including sampling errors, the degree to which the rescaling methods' underlying assumptions are met, and the goal of the rescaling efforts. Examples of such goals include minimizing the variability of the difference between the rescaled product (Y∗) and X via a first-order linear regression (REG1), matching the total variability of a dataset Y to an arbitrary reference dataset X (VAR), matching the cumulative distribution function (cdf), and matching only the signal variability of Y to that of X (here, “signal” refers to the true variability of a dataset, where the total variability is composed of true signal variability and noise variability components) using triple collocation analysis (TCA: Hain et al., 2011, Miralles et al., 2011, Parinussa et al., 2011, Scipal et al., 2008, Stoffelen, 1998, Zwieback et al., 2012).
Once the rescaling method is selected for implementation in a specific application, this method can be implemented using different strategies (Yilmaz et al., 2016). For example, a dataset can be rescaled by using a single coefficient for the entire time series by using separate rescaling coefficients for each month or separate coefficients for the anomaly and seasonality components. Such rescaling strategies affect the accuracy statistics of Y∗, even though, by definition, a particular rescaling method is selected to be the optimum method for a particular application (here, the optimum method refers to the method that results in the best statistic of interest, among other methods). To give a more specific example, consider the relative accuracies of X and Y or the differences between the signal-variability-to-noise-variability ratio (Gruber et al., 2016), for X (SNRX) and Y (SNRY). In general, the relative variations of SNRX and SNRY are expected to impact the overall performance of the rescaling methods through the use of various rescaling strategies (Yilmaz et al., 2016) for many applications (e.g., the creation of homogenous time series and data assimilation). For example, if SNRX >> SNRY, it is better to rescale Y strongly to X (e.g., by rescaling the seasonality and anomaly components separately using two different rescaling coefficients or rescaling datasets for each month separately using 12 different rescaling coefficients). By contrast, if SNRY > SNRX, it is better to weakly rescale Y to X (e.g., by rescaling the entire time series at once and using a single rescaling coefficient). Hence, the performance of any rescaling method (e.g., REG1, VAR, TCA, and CDF) could vary depending on the aggressiveness with which the rescaling strategy is implemented (e.g., weak or strong; Yilmaz et al., 2016).
Both the rescaling method selection (Yilmaz and Crow, 2013) and degree of aggressiveness implemented (Yilmaz et al., 2016) can impact the optimality of the Y∗ statistics. Here, the question arises whether the inter-comparisons of rescaling methods make sense, without taking into consideration SNR variations. Yilmaz et al. (2016) investigated the impact of SNR variations using only a particular rescaling method (VAR). Hence, before making comments with high confidence, a sensitivity study that comprehensively investigates the impact of SNR variations on the performances of various rescaling methods is still required. However, in the absence of evidence, it is viable that SNR variations will impact various rescaling methods similarly, though the actual degree of improvement via stronger/weaker rescaling strategies may depend on the particular rescaling method. Accordingly, a universally optimum rescaling method that fits all applications may not exist; the optimality of a rescaling method is largely application specific, particularly if the underlying assumptions inherent to its own methodology are not met. Hence, studies investigating the relative performances of different rescaling methods (both linear and nonlinear) may still contribute to the efforts on the topic of optimal rescaling methods, even without explicitly considering SNR variations.
Satellite-based soil moisture data are often validated using station-based watershed average data (Jackson et al., 2010, Jackson et al., 2012), which have considerably higher local nonlinearity, due to the soil moisture dynamics (Crow and Wood, 2002). The spatial support difference between station- and remote sensing-based products (i.e., point vs areal average) is another source that introduces nonlinear relations between different products. In a recent study, Zwieback et al. (2016) introduced nonparametric CDF and used two new parametric methods to extend TCA to investigate the impact of nonlinear relations on the error statistics obtained via TCA. This study particularly stresses the existing quadratic relations (e.g., the saturation of sensitivity of a product with respect to the sensitivity of another product) between the actual signal components of different soil moisture products, which may lead to nonlinear relations. Zwieback et al. (2016) also provided an extensive discussion on the existence of nonlinear relations between soil moisture products. It is, therefore, viable that such existing nonlinear relations between datasets may not be captured using linear methods, and the use of nonlinear methods may be necessary. By contrast, the variety of nonlinear methods used to rescale soil moisture datasets remains very limited, and there is still more room to investigate the performance of such nonlinear methods.
Among the rescaling methods used in soil moisture studies, CDF (Drusch et al., 2005, Reichle and Koster, 2004, Yin et al., 2015, Zwieback et al., 2016) has received particular attention. Other methods, based on VAR (Crow et al., 2005, Draper et al., 2009, Su et al., 2013), REG1 (Brocca et al., 2013, Crow and Zhan, 2007, Crow, 2007), TCA (Yilmaz and Crow, 2013), quadratic polynomials (Zwieback et al., 2016), copula (Leroux et al., 2014), and Wavelets (Su and Ryu, 2015) have also been implemented to reduce the systematic differences between soil moisture time series. However, a comprehensive intercomparison of the performances of these methods in a soil moisture rescaling study has not yet been performed.
The above-listed methodologies have been explicitly used in soil moisture rescaling studies, whereas many other methods have not. For example, multiple linear regressions using quadratic equations (REG2) and lagged observations (REGL) have previously been used in a soil moisture TCA framework (Crow et al., 2015, Su et al., 2014, Zwieback et al., 2016), but quadratic equations and lagged observations together (REGL2) have not. Among the many machine learning methodologies, ANN methods (Rochester et al., 1956) have been used to retrieve soil moisture via microwave measurements (Notarnicola et al., 2008, Paloscia et al., 2008, Prigent et al., 2005, Rodriguez-Fernandez et al., 2015) and SVM methods (Cortes and Vapnik, 1995) have been used to predict soil moisture (Gill et al., 2006) in the root zone using data assimilation techniques (Liu et al., 2010). Other methods that can be used to relate the different datasets, such as the nonlinear regression methods GEN (Koza, 1994) and MARS (Friedman, 1991), have not been used in soil moisture-related studies. To our knowledge, none of these methods (REG2, REGL, REGL2, MARS, GEN, SVM, and ANN) have previously been explicitly used to rescale soil moisture datasets.
The soil moisture has a high temporal memory (i.e., autocorrelation), and consecutively retrieved soil moisture observations have high dependence, implying that previously retrieved soil moisture observations could arguably be viewed as a slightly degraded version of the current values. This property is very valuable for satellite-based soil moisture retrievals; lagged soil moisture products could be used as independent observations, given that past observations are quasi-independently obtained from current observations. This dependence has been utilized by many recent studies (Crow et al., 2015, Su et al., 2014, Zwieback et al., 2013), particularly those focusing on soil moisture TCA methods, which require three independent products. Exploiting the same information source, lagged variables are inherently used by some ANN types in building robust relations between the input and output layers. Although many other methods (e.g., multiple linear regression, MARS, GEN, copula, and SVM) could also benefit from such information in the framework of rescaling soil moisture variables, such an effort has not been made to date.
VAR, REG1, TCA, and CDF have unique solutions and are widely implemented in soil moisture rescaling studies. The optimality of linear rescaling methods (VAR, REG1, and TCA) in the context of data assimilation has been investigated both analytically and numerically by Yilmaz and Crow (2014), and some remedies are available for these methods when the underlying assumptions are not met (Crow and Yilmaz, 2014, Su et al., 2014). However, because the implementations of nonlinear rescaling methods remain limited in the context of rescaling soil moisture time series, the performance of these nonlinear methods, which are relative to that of linear methods, remains largely unexplored. Therefore, there is still room to investigate the performances of nonlinear methods relative to those of linear methods to better understand the degree of existing nonlinearity in soil moisture products, even though the degree of existing nonlinearity and degree to which these nonlinear relations can be captured drives the actual difference between the performance of the nonlinear and linear rescaling methodologies.
This study is the first to use a number of methods (REG2, REGL, REGL2, ANN, SVM, GEN, and MARS) and their lagged types to explicitly rescale the soil moisture observations. This study also includes the first comprehensive comparison of the performances of linear methods (REG1, REG2, REGL, REGL2, VAR, TCA, and MARS) as well as nonlinear methods (CDF, copula, ANN, SVM, and GEN) in rescaling soil moisture datasets. Through these intercomparisons, this study comprehensively analyzes the added utility of lagged observations in a soil moisture rescaling framework. This study is particularly relevant for the efforts to create a homogenous time series in the framework of global soil moisture dataset validation (Leroux et al., 2014) and trend analysis (Dorigo et al., 2012), contributes to the efforts to better understand the optimality of different rescaling methodologies (Yilmaz and Crow, 2013, Yilmaz et al., 2016), and adds to the efforts to identify the degree of the existing nonlinearity in soil moisture products.
Section snippets
First-order linear regression
Linear rescaling methods have been widely used to rescale soil moisture time series to reduce their inconsistency (Brocca et al., 2013, Crow et al., 2005, Crow and Zhan, 2007). Overall, linear rescaling methods are implemented by considering the most general linear relation between a reference dataset (X) and an original unscaled dataset (Y) in the form of:where Y∗ is the rescaled version of Y; μX and μY are time averages of X and Y, respectively; and cY is a scalar rescaling factor
Datasets
The remote sensing-based Land Parameter Retrieval Method (LPRM) soil moisture datasets (Owe et al., 2001, Owe et al., 2008) used in this study utilizes the Advanced Microwave Scanning Radiometer – Earth Observing System (AMSR-E) X-band and C-band observations. These datasets are acquired between 2002 and 2009 from the Vrije Universiteit Amsterdam (personal communication with Robert Parinussa, 2013). LPRM uses three parameters (soil moisture, vegetation water content, and soil or canopy
Added utility of rescaling methods
In this study, the LPRM soil moisture values are rescaled to watershed average datasets using linear (VAR, TCA, REG1, REG2, REGL, REGL2 and MARS) and nonlinear (CDF, GEN, SVM, ANN, and copula) methods, where ANN has four types (MLP, RBF, ELMAN, and JORDAN) and copula has five types (NORMAL, CLAYTON, GUMBEL, FRANK, and JOE). Additionally, 12 lagged types are also considered (MARSL, GENL, SVML, MLPL, RBFL, ELMANL, JORDANL, NORMALL, CLAYTONL, GUMBELL, FRANKL, and JOEL). Overall, 31 different
Results and discussion
The statistics of the LPRM and watershed average soil moisture datasets are analyzed (Table 4) prior to evaluating the results of the rescaling experiment. On average, there are 1600 days where the LPRM and watershed average data are mutually available between June 2002 and July 2009. Two different experiments are conducted using two different training datasets, and validation dataset are used to check the consistency of the results. On average, 1200 of the available data points are used for
Conclusions
In this study, LPRM soil moisture datasets are rescaled to station-based datasets over four USDA ARS watersheds to reduce the systematic differences between datasets. The rescaled datasets are validated by using independent data that are not used in the training part. This study is the first to perform a comprehensive comparison of the performances of various linear (VAR, TCA, REG1, REG2, REGL, REGL2, and MARS) and nonlinear (CDF, GEN, SVM, ANN, and copula) methods (total 31 methods); the first
Acknowledgments
The authors would like to thank three anonymous reviewers for their constructive comments. The authors would also like to thank the International Soil Moisture Network for the USDA ARS station-based soil moisture datasets, Vrije Universiteit Amsterdam (Robert Parinussa, personal communication) for the LPRM datasets, and NASA for the GLDAS datasets (downloaded from http://mirador.gsfc.nasa.gov). This research was supported by the EU Marie Curie Seventh Framework Programme FP7-PEOPLE-2013-CIG
References (102)
- et al.
Evaluation of remotely sensed and modelled soil moisture products using global ground-based in situ observations
Remote Sens. Environ.
(2012) - et al.
Soil moisture estimation through ASCAT and AMSR-E sensors: an intercomparison and validation study across Europe
Remote Sens. Environ.
(2011) - et al.
Temporal stability of surface soil moisture in the Little Washita River watershed and its applications in satellite soil moisture product validation
J. Hydrol.
(2006) - et al.
Temporal persistence and stability of surface soil moisture in a semi-arid watershed
Remote Sens. Environ.
(2008) - et al.
Bankruptcy forecasting: a hybrid approach using Fuzzy c-means clustering and Multivariate Adaptive Regression Splines (MARS)
Expert Syst. Appl.
(2011) - et al.
An evaluation of AMSR–E derived soil moisture over Australia
Remote Sens. Environ.
(2009) Finding structure in time
Cogn. Sci.
(1990)- et al.
Recent advances in (soil moisture) triple collocation analysis
Int. J. Appl. Earth Obs. Geoinf.
(2016) - et al.
Critical comparative analysis, validation and interpretation of SVM and PLS regression models in a QSAR study on HIV-1 protease inhibitors
Chemom. Intell. Lab. Syst.
(2009) Serial Order: A Parallel Distributed Processing Approach
(1997)
Estimation of river flow by artificial neural networks and identification of input vectors susceptible to producing unreliable flow estimates
J. Hydrol.
Data assimilation using support vector machines and ensemble Kalman filter for multi-layer soil moisture prediction
Water Sci. Eng.
Trend-preserving blending of passive and active microwave soil moisture retrievals
Remote Sens. Environ.
Remote monitoring of soil moisture using passive microwave-based techniques — theoretical basis and overview of selected algorithms for AMSR-E
Remote Sens. Environ.
Inter-comparison of microwave satellite soil moisture retrievals over the Murrumbidgee Basin, southeast Australia
Remote Sens. Environ.
Conditional copula-based spatial-temporal drought characteristics analysis – case study over Turkey
Water
Towards an integrated soil moisture drought monitor for East Africa
Hydrol. Earth Syst. Sci.
Neural Networks in R Using the Stuttgart Neural Network Simulator: RSNNS
J. Stat. Softw.
Scaling and filtering approaches for the use of satellite soil moisture observations
Neural networks for nonlinear dynamic system modelling and identification
Int. J. Control.
A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence
Biometrika
Support-vector networks
Mach. Learn.
A novel method for quantifying value in spaceborne soil moisture retrievals
J. Hydrometeorol.
The value of coarse-scale soil moisture observations for regional surface energy balance modeling
J. Hydrometeorology
The Auto-Tuned Land Data Assimilation System (ATLAS)
Water Resour. Res.
Continental-scale evaluation of remotely sensed soil moisture products
IEEE Geosci. Remote Sens. Lett.
Relevance of time-varying and time-invariant retrieval error sources on the utility of spaceborne soil moisture products
Geophys. Res. Lett.
On the utility of land surface models for agricultural drought monitoring
Hydrol. Earth Syst. Sci.
Optimal averaging of soil moisture predictions from ensemble land surface model simulations
Water Resour. Res.
Comparison, validation, and transferability of eight multiyear global soil wetness products
J. Hydrometeorol.
The International Soil Moisture Network: a data hosting facility for global in situ soil moisture measurements
Hydrol. Earth Syst. Sci.
Evaluating global trends (1988–2010) in harmonized multi-satellite surface soil moisture
Geophys. Res. Lett.
Initializing numerical weather prediction models with satellite-derived surface soil moisture: data assimilation experiments with ECMWF's Integrated Forecast System and the TMI soil moisture data set
J. Geophys. Res.
Observation operators for the direct assimilation of TRMM microwave imager retrieved soil moisture
Geophys. Res. Lett.
Implementation of Noah land surface model advances in the National Centers for Environmental Prediction operational mesoscale Eta model
J. Geophys. Res.
R genetic programming framework
Elliptical copulas: applicability and limitations
Statistics & Probability Letters
Multivariate adaptive regression splines
Ann. Stat.
Frank's family of bivariate distributions
Biometrika
Everything you always wanted to know about copula modeling but were afraid to ask
J. Hydrol. Eng.
Soil moisture prediction using support vector machines
J. Am. Water Resour. Assoc.
Distributions des valeurs extrˆemes en plusieurs dimensions
An intercomparison of available soil moisture estimates from thermal infrared and passive microwave remote sensing and land surface modeling
J. Geophys. Res. Atmos.
Benchmarking a soil moisture data assimilation system for agricultural drought monitoring
J. Hydrometeorol.
The elements of statistical learning
Copula: multivariate dependence with copulas
Upper bounds on the number of hidden neurons in feedforward networks with arbitrary bounded nonlinear activation functions
IEEE Trans. Neural Netw.
Validation of advanced microwave scanning radiometer soil moisture products
IEEE Trans. Geosci. Remote Sens.
Validation of Soil Moisture and Ocean Salinity (SMOS) soil moisture over watershed networks in the U.S.
IEEE Trans. Geosci. Remote Sens.
Multivariate Models and Multivariate Dependence Concepts
Cited by (17)
Spatiotemporal variations of soil salinization in China's West Songnen Plain
2023, Land Degradation and DevelopmentImpacts of Climate Change on Extreme Climate Indices in Türkiye Driven by High-Resolution Downscaled CMIP6 Climate Models
2023, Sustainability (Switzerland)Time-independent bias correction methods compared with gauge adjustment methods in improving radar-based precipitation estimates
2023, Hydrological Sciences JournalBias Correction for ERA5-Land Soil Moisture Product Using Variational Mode Decomposition in the Permafrost Region of Qinghai-Tibet Plateau
2022, IEEE Journal of Selected Topics in Applied Earth Observations and Remote SensingToward the Removal of Model Dependency in Soil Moisture Climate Data Records by Using an L-Band Scaling Reference
2022, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing