1. Introduction
Precipitation estimates from satellite data have been broadly used in land and water management studies at various scales. Although the rain gauges data can provide accurate point-based measurement, it cannot be easily extrapolated to produce accurate maps for the basin scale, especially when rain gauges are unevenly distributed or for ungauged basins [1,2]. Remotely sensed precipitation datasets had been developed with the intention to solve these limitations. A series of rainfall datasets have been developed at both regional and global scales [3,4,5,6]. For example, the Tropical Rainfall Measuring Mission (TRMM) multi-satellite precipitation analysis data merges microwave data from multiple satellite estimates with the monthly accumulated rain gauge analysis [7]. The consistency between TRMM and monthly gauged precipitation has been confirmed worldwide [8,9]. However, the spatial resolution of TRMM is 25 km, which could fail to capture the detailed precipitation patterns in small watersheds. They still need further improvement related to coarse spatial resolution and uncertainties [10,11,12,13].
Downscaling is an effective way to obtain fine-resolution precipitation for further essential research on ecology [14,15], the hydrological cycle, water budgets [16,17,18,19], discharge simulations [20], and grid cell-based soil erosion by water [21,22,23]. Scholars have conducted extensive work on downscaling TRMM data (Table A1). The core idea of this method is to establish the internal correlation between precipitation and environmental variables, and then use finer environmental indicators as input to downscale remote sensing precipitation data from coarse resolution to fine resolution.
Finding suitable environmental variables is a critical step in building a downscaling model. Environmental factors, which have been used for downscaling precipitation, can be divided into dynamic variables (e.g., vegetation indices that have frequent changes both spatially and temporally) and static variables (e.g., topography and geolocation that may remain constant over time). For example, Normalized Difference Vegetation Index (NDVI), elevation, longitude, latitude, and elevation are widely used in TRMM data downscaling (Table A1). The NDVI is the most commonly used dynamic vegetation index in the TRMM downscaling process due to its positive correlation with precipitation [24,25]. However, when precipitation is over a certain level, the NDVI may be saturated and result in a lagged (up to three months) response to precipitation, in which case the NDVI-precipitation relationship gradually weakens [10,26,27,28,29]. Alternatively, a few studies used the Enhanced Vegetation Index (EVI) to overcome NDVI limitations [10,20,30]. However, saturation issues were still observed. Therefore, it is required to find a more sensitive dynamic factor for downscaling. Leaf Area Index (LAI) is more sensitive to the dynamic change of vegetation conditions [31,32], and it may have better potential to precisely describe the relationship between precipitation and vegetation. However so far, LAI has rarely been used in the process of TRMM downscaling. In addition, little attention has been paid to estimate the importance of vegetation indices in the process of downscaling that hinder our understanding of feature selection.
Using the appropriate downscaling methods to explore the relationship between precipitation and environmental variables is another critical step in building downscaling models. To date, many methods have been developed to perform TRMM downscaling. For instance, multiple linear regression, which is applicable in regions where consistent spatial relationships between precipitation and the environmental factors are present [33,34], and machine learning approaches, which is suitable for the complicated relationship between precipitation and land surface characteristics [35]. Table A1 provides a summary of the models adopted by previous studies on downscaling precipitation products. The relationship between environmental variables and precipitation varies with region and time that causes the downscaling precipitation accuracy changing dramatically with the regression model. Therefore, different regression models should be compared to identify the optimal downscaling strategy.
Furthermore, the majority of downscaling studies were conducted offline, which means the required data (i.e., Precipitation, NDVI, and Elevations, etc.) had to be downloaded to the local computer for processing. This approach seems to be time-consuming and cost-inefficient in testing multiple algorithms and input variables for the final decision on the best downscaling approach. For that, a better TRMM downscaling framework should be designed.
Recently, more cloud-based platforms were made available for free public use, such as the Google Earth Engine (GEE). It is a cloud-based platform with built-in functions for planetary-scale geospatial analysis and a multi-petabyte catalog of public earth observation data archive [36,37]. GEE is widely used in several fields, for instance, crop and crop yield mapping [38,39,40,41], burned area mapping [42], vegetation and land use mapping [43,44,45], and actual evapotranspiration estimation [46,47]. However, few studies have been conducted to downscale precipitation data using GEE. Google Colaboratory (Colab) is also a free cloud service from Google Research which can get free access to Google cloud services and graphics processing units (GPUs) [48]. In the Colab environment, it can easily integrate GEE and machine learning to process geospatial data [49,50]. The combination of GEE and Colab may significantly improve the downscaling efficiency, which may reduce uncertainties derived from resampling and re-projection of data.
For all aforementioned limitations in the previous downscaling efforts, the main purpose of this study is to build a flexible, operational, and efficient TRMM downscaling framework based on GEE, Colab environment, and machine learning techniques. Three objectives are expected to be accomplished: (1) compare the performance of different machine learning algorithms in simulating annual TRMM downscaling; (2) quantify the importance of variables in annual TRMM downscaling; (3) downscale annual TRMM from 25 km to 1 km and disaggregate the downscaled TRMM at 1 km into monthly precipitation maps; (4) find the sensitive variable or composite approaches for monthly TRMM precipitation maps. Section 2 describes the study area, datasets, machine learning algorithms, and the proposed framework. Section 3 and Section 4 present and discuss the results using data for the upstream area of the Great Mekong region. Conclusions are provided in Section 5.
2. Materials and Methods 2.1. Study Area
The upstream area of the Great Mekong region, which has diverse vegetation and climate patterns and complex terrain, was selected as our study area. It extends from 95°50′18″E to 106°11′34″E and 29°13′20″N to 18°43′01″N, and approximately covers 692,379 km2 shared between China, Myanmar, and Laos (Figure 1a). Elevation decreases dramatically from 6494 m in the northwest to 81 m in the southeast with a north-south mountain-valley [51,52] (Figure 1b). The precipitation in this region is deeply affected by the complex seasonal monsoons [53], such as the southwest monsoon from the Indian Ocean and Bay of Bengal, resulting in an extremely uneven spatial and temporal distribution of precipitation. The average annual precipitation is 1494 mm yr−1, with the maximum annual precipitation of 3311 mm yr−1 in the Northern Myanmar and the minimum annual precipitation of 527 mm yr−1 in the Northwest Yunnan in China (Figure 1c). The temporal distribution of precipitation is also extremely uneven, the monthly average precipitation is approximately 122 mm month−1, and the minimum and maximum monthly precipitations are respectively 9 mm month−1 in February and 301 mm month−1 in August based on records from 2015 to 2018 (Figure 1d).
2.2. Data
This study used remote sensing precipitation (version 7 TRMM 3B43 dataset), vegetation indices (NDVI, EVI, and LAI), MCD12Q1 land cover dataset, and SRTM digital elevation model, (DEM). It should be highlighted that all these datasets are available on Google Earth Engine: https://developers.google.com/earth-engine/datasets. Moreover, this study also used monthly observed precipitation data from 17 weather stations provided by the China Meteorological Data Service Centre.
2.2.1. Precipitation
The Tropical Rainfall Measuring Mission was launched in 1997, which is a joint project by NASA and JAXA. The TRMM multi-satellite precipitation analysis (TMPA) was developed by combining several available satellite precipitation estimates, as well as with any possible precipitation gauge analyses [7]. One of TMPA products is the TRMM 3B43 monthly data with a 0.25° resolution. It is one of the most popular satellite-based precipitation datasets and has been widely used as the source data to downscale precipitation [1,10,24,33,54]. This study uses the TRMM 3B43 version 7 dataset, and it will be referred to as TRMM in subsequent sections.
2.2.2. Vegetation
The vegetation indices used in this study include the NDVI, EVI, and LAI. The NDVI and EVI are from the MOD13A2 version 6 dataset with low or no-clouds at 16-day composite and a 1 km spatial resolution. Many studies (Table A1) have used the vegetation index (e.g., NDVI and EVI) as a fundamental factor for downscaling precipitation because there is a positive correlation between it and precipitation. The LAI is collected from the MCD15A3H version 6 level 4 dataset. It is a 4-day composite dataset with a 500 m pixel size.
2.2.3. Land Cover
The MCD12Q1 Version 6 product provides annual global land cover at 500 m spatial resolution. MCD12Q1 is derived using supervised classifications of MODIS Terra and Aqua reflectance data and comes in five different classification schemes. The International Geosphere-Biosphere Programme (IGBP) classification scheme, which contains 17 land cover classes, was adopted for this study due to its broad applications [29,55].
2.2.4. Elevation
Previous studies have shown that elevation has a more substantial impact on precipitation in locations where the topography is not flat [10,54,56,57]. Considering the Shuttle Radar Topography Mission (SRTM, version 4) digital elevation dataset provides consistent, high-quality elevation data at a 90 m spatial resolution [58], so that this study adopts SRTM to investigate the effect of topography impact on precipitation patterns.
2.2.5. Rain Gauge
Since calibration with observed precipitation data is a crucial phase to improve the downscaled precipitation dataset, 17 monthly meteorological stations (Figure 1a) from the China Meteorological Data Service Centre are used to validate downscaled precipitation data. Most of the rain gauges are located in the central and eastern parts of China. The observation period is from January to December 2018. Generally, the study area is a sparse gauged area with only 17 rain gauges available.
2.3. Machine Learning Algorithms
Three machine learning algorithms of the scikit-learn in Python [59] include the Gradient Boosting Regressor (GBR) [60], Support Vector Regression (SVR) [61], and Artificial Neural Network (ANN) [62] used to simulate the complicated relationship between TRMM precipitation and environmental factors for TRMM downscaling in this study. GBR is an ensemble learning algorithm that uses a boosting technique to minimize the loss of the model by adding weak learners in a stage-wise fashion. In each iterative step, a regression tree is fitted on a negative gradient (reduce the loss) of the given loss function and added to the model [63]. The final output from GBR is the ensemble of all the regression trees. The SVR relies on an optimization theory that uses a hyperplane to classify the input variables into an m-dimensional feature space with a maximal margin, which can be derived by solving a quadratic problem [61]. The ANN is an algorithm that interconnects processing units, called neurons or nodes, to each other as a network. This network can construct complex relationships between different sets of variables [64]. The ANN architecture consists of an input layer, at least one hidden layer, and an output layer. Each layer consists of several neurons. ANN has been successfully applied to downscale precipitation data [54,57,65]. In this study, the input layers number are four, which is equal to the number of independent variables (either NDVI, EVI, or LAI, Elevation, Longitude, and Latitude). The output layer is just one, which is the dependent variable (here, predicted TRMM) (Figure 2). The number of nodes within each hidden layer was 9. We calculate it by the following formula, the number of nodes= number of predictors * 2 + 1, which was proposed by Hecht-Nielsen [66].
2.4. Downscaling Framework
The downscaling process can be expressed by P = F (Xi) + ε, where P is downscaled precipitation, Xi is environmental variables, ε is the residual. The general approach is to establish a correlation function (F) between precipitation and environmental variables at coarse resolution, and then use the fine-resolution environmental variable as inputs of F to predict precipitation at fine resolution. According to the study of Immerzeel, et al. [68], the function between predictors and TRMM precipitation is stable at coarse and fine resolution, only the coefficient has a smaller change, which means the model built on the coarse can be used at fine resolution.
According to the principle of the downscaling process and objectives of this study, an innovative downscaling framework (Figure 3) is designed by integrating GEE and three machine learning approaches using Google Colab. In this framework, the three machine learning algorithms are used to establish the relationship between precipitation and four environmental variables, including elevation, longitude, latitude, and one vegetation index (either NDVI, EVI, or LAI) ), and cross-validation is adopted to select the best downscaling algorithms. The best relationship between precipitation and environmental variables established at the coarse resolution (25 km) was applied to predict TRMM precipitation with 1 km resolution using 1 km environmental variables as an input. Except for monthly rain gauge data, other datasets, including TRMM, Elevation, Latitude, Longitude, NDVI, EVI, and LAI, are all processed online by GEE in Colab that avoids data downloading.
2.4.1. Data Preparation and Pre-Processing
Four environmental variables with a spatial resolution of 25 km and 1 km are prepared in this study, including three static variables (elevation, longitude, and latitude) and one dynamic variable (i.e., either NDVI, EVI, or LAI). In order to eliminate atmospheric and cloud cover effects, the maximum value composite was employed to generate the monthly dynamic variables. Then, the annual composite was generated by averaging the monthly values [25,27]. Environment variables at 25 km are generated by resampling the environment variables at 1 km using the nearest neighbor technique aggregated by the average of all 1 km pixels within each 25 km pixel [12]. All vegetation indices with negative values, as well as urban, built-up, permanent snow, ice, and water bodies of the MCD12Q1 land-use dataset, were masked out from both dependent and independent variables due to its negative impact on the construction of the downscaling model [10,12,29]. Because environment variables may differ in their ranges and units (i.e., NDVI/EVI, ranging from −1 to +1; LAI, ranging from 0.1 to 10; Elevation, ranging from 81 to 6494; Longitude, ranging from 95 to 106; Latitude, ranging from 18 to 29), the StandardScaler algorithm of scikit-learn was used to standardize variables using their means and standard deviation to eliminate the effects of different scaling [69].
2.4.2. Hyper-Parameter Optimization
The hyper-parameter of machine learning algorithms plays a pivotal role in its performance. In this framework, the scikit-learn GridSearchCV algorithm with cross-validation (GSCV) splitting strategy [59] is used to identify the best hyper-parameter values of each machine learning-vegetation index [12,35,70,71]. The total number of pixels at 25 km were divided into two groups. The first group constitutes 90%, and it is used for training and testing each algorithm to define the best hyper-parameters. For that, this study uses a 10-fold cross-validation strategy (CV = 10) to confirm the best hyper-parameters which construct the optimal prediction model (OPM). The remaining pixels at 25 km (10%) are used later to validate the OPM in simulating TRMM precipitation. The best OPM among the three survey models in this study was adopted to estimate the contribution of prediction variables in the downscaling model and downscale TRMM participation from 25 km to 1 km grids.
2.4.3. Generation of 1 km TRMM Product
Six steps are required to generate the final TRMM downscaling product with a resolution of 1 km [33,68]. Step 1: use the best OPM (established in Section 2.4.2) to predict TRMM at 25 km from the environmental variables at 25 km (Predicted (25 km)). Restricted by the regression model, there is an amount of precipitation that cannot be explained by the regression model [27,33,68]. Step 2: generate residual precipitation values with a resolution of 25 km using the following formula: ΔResidual (25 km) = TRMM (25 km) − Predicted (25 km). Step 3: resample the precipitation residual from 25 km to 1 km (ΔResidual (1 km)) using the spline algorithm [49] considering it works well for the regularly spaced data [35,54,57,70]. Step 4: use the same OPM used in Step 1 to generate TRMM prediction values with a resolution of 1 km by feeding the environmental variables at 1 km (Predicted (1 km)). Step 5: the final 1 km TRMM product is generated by the following equation: TRMM (1 km) = Predicted (1 km) + ΔResidual (1 km). Step 6: the annual TRMM downscaled at 1 km disaggregated into monthly precipitation maps following Duan and Bastiaanssen [27].
2.4.4. Assessment Indices
Three assessment indices (Equations (1)−(3)) were used to compare model performance [10,30], including the correlation of determination (R2), Root Mean Square Error (RMSE), and Mean Absolute Error (MAE).
R2=∑i=1n[(Si−S)(Pi−P¯)]∑i=1n (Si−S¯)2 ∑i=1n (Pi−P¯)2
RMSE=∑i=1n (Si−Pi)2n
MAE=∑i=1n|Si−Pi|n
where S is the original TRMM precipitation and P is the simulated TRMM precipitation. R2 is used to measure the strength of the relationship between the original and simulated precipitation [25], while MAE is used as a bias indicator, and RMSE is used to describe the accuracy of each machine learning algorithm [27]. In general, the higher R2 and the lower RMSE and MAE are the better the model. Besides, the analysis of variance (ANOVA) test was also applied to compare the performances of the investigated models in simulating TRMM precipitation [72]. The rain gauge data are used as ground “truth” to validate the final downscaled results; in this case: S is the observed precipitation and P is the downscaled TRMM precipitation.
3. Results 3.1. The Optimal Prediction Model
Table 1 shows the validation results of the simulated TRMM precipitation from different optimal prediction models. From Table 1, In general, the annual TRMM precipitation data predicted by the three machine learning methods show good consistency compared with the original TRMM data. ANN produced the highest R2 (ranges from 0.977 to 0.984) and the lowest RMSE (ranges from 71 mm year−1 to 85 mm year−1) and MAE (ranges from 51 mm year−1 to 56 mm year−1) in simulating TRMM annual precipitation, followed by GBR and SVR, respectively. The ANOVA analysis between the three algorithms (Table A2) revealed that there is a statistically significant difference among their performance in simulating TRMM precipitation (p-values < 0.05). Hence, in the following analysis, we employed ANN-based annual TRMM precipitation downscaling for the year 2018.
3.2. Variable Importance
The importance of input predictors for the ANN model (Figure 4) was identified by the scikit-learn “permutation importance” algorithm [73]. It reveals that latitude and longitude have the highest importance scores (34–50%), followed by elevation (9–25%), whereas vegetation indices, including EVI, NDVI, and LAI, contribute the last (2–8%). These findings are may because latitude and longitude as significant geolocation predictors play a dominant role in downscaling TRMM precipitation in the upstream area of the Great Mekong region. Among the three vegetation indices, EVI is the most crucial variable in contributing to the downscaling model flowed by NDVI and LAI.
3.3. Annual Downscaled Products
Figure 5 presents the original TRMM precipitation and the predicted precipitation from the three different vegetation index for the year 2018 using the ANN algorithm. The predicted results share the same spatial pattern as the original TRMM precipitation. They are highly close regarding their means (a: 1558.12 mm year−1; b: 1556.46 mm year−1; c: 1556.36 mm year−1; d: 1555.31 mm year−1) but slightly differs in their spatial ranges. This also further proves the good performance of ANN in the TRMM precipitation forecast in 2018.
Figure 6 previews the residual precipitation maps at the coarse and fine resolutions. The residual maps represent the amount of precipitation that cannot be explained by the regression model. Negative values indicate areas where the independent variable effect is higher than expected (overestimation of the predicted precipitation). In contrast, positive values indicate areas where the independent variable effect is less than expected (underestimation of the predicted precipitation).
Figure 7 presents residuals precipitation contribution (RC) maps to the original TRMM and the downscaled precipitation before residual correction. RC values more than (0.5) 50% depict areas that the regression model is ineffective, and the downscaled result is mostly inherited from the residuals, which are not founded in the three predicted results. In this study, RC maps are classified into four classes (e.g., <0.05, 0.05–0.10, 0.10–0.20, >0.20) and then the contribution of each class (class cells number/total number of cells * 100) was calculated. Figure 7 reveals that a minor contribution of residuals in the downscaling model where the contribution of residuals less than 0.05 covers most of the study area; more than 83% of the total area at the coarse resolution and more than 93% of the total area at the fine resolution before residual correction.
Figure 8 presents the fine predicted precipitation before and after residual correction. In general, the downscaled results before and after residual correction have spatial distribution patterns similar to that of the original TRMM (Figure 5a), but with much more spatial variation and local details. The NDVI-predicted precipitation map before residual correction slightly has higher spatial precipitation patterns compared to the predicted precipitation using EVI and LAI datasets. After adding the residual precipitation, the spatial variation of NDVI-predicted precipitation becomes similar to other products (e.g., EVI and LAI). We compared the downscaled results after residual correction with the observed precipitation. The results show the R2, RMSE, and MAE were 0.91, 290 mm year−1, and 239 mm year−1 for NDVI-downscaled product, 0.89, 200 mm year−1, and 181 mm year−1 for EVI-downscaled product, and 0.91, 202 mm year−1, and 179 mm year−1 for the LAI-downscaled product, while those for the original TRMM precipitation were 0.79, 350 mm year−1, and 265 mm year−1.
3.4. Monthly Downscaled Products
The monthly TRMM downscaled products are generated by decomposing the annual downscaled products for the year 2018. Figure 9 presents the validation metrics of the monthly downscaled products and the original TRMM precipitation versus the observed precipitation. In general, the three downscaled products returned a higher R2 and lower RMSE and MAE compared to the original TRMM precipitation. The LAI-downscaled results yielded the highest performance (R2 = 89, RMSE = 39 mm month−1, MAE = 27 mm month−1) followed by the EVI; the NDVI-downscaled product ranked the last.
There are three options regarding which product generated by each vegetation index may be used monthly: (1) the best product in all months (e.g., LAI); (2) the ensemble mean or median of the three products in each month; (3) the combination of the highly performed product from each month. Figure A1 proves that the last option outperforms the others (R2 = 90, RMSE = 37 mm month−1, MAE = 24 mm month−1). For that, the product that returned the highest R2 and the lowest RMSE and MAE was chosen to create the best combination of downscaling results for each month of the year 2018. More specifically, EVI-downscaled product is adopted in February, November, and December; NDVI-downscaled product is used in January, March, May, and September; while LAI-downscaled product is used in the remaining months. Compared with the original monthly TRMM data at 25 km spatial resolution (Figure 10), the downscaled maps at 1 km spatial resolution (Figure 11) present a similar overall precipitation pattern with more local details.
4. Discussion Accurate precipitation data at high spatiotemporal resolution play an important role in land and water management. Downscaling coarse precipitation is an effective way to obtain precipitation estimates at a finer resolution for further essential environmental studies at the basin scale. In this study, we proposed a downscaling framework for TRMM precipitation products through integrating GEE and Colab. Three machine learning algorithms (GBR, SVR, and ANN) were investigated to simulate the TRMM precipitation data, and the highly performed algorithm used to derive the annual precipitation at a 1 km resolution over the Great Mekong region. Three vegetation indices (NDVI, EVI, and LAI) are compared in annual downscaling of TRMM and producing monthly maps of TRMM using disaggregation. 4.1. Result Compared to Previous Studies
Among the three algorithms that were implemented in this study, ANN performed the best in simulating the annual TRMM precipitation followed by GBR while SVR ranked the last. Our result is supported by Xu et al. [57], who found that ANN performs well in TRMM downscaling. However, there are different opinions on the performance of different machine learning methods. For example, Jing et al. [35] found that the performance of random forest is better than classification regression tree (CART) and KNN. In here, only one year of data was used in this study; we only provide an example to prove the applicability of the proposed framework in downscaling TRMM precipitation. Considering the relationship between these explanatory variables and precipitation is complex, the performance of different machine learning methods should be compared in practical applications.
The final annually downscaled maps (Figure 8d–f) show similar spatial precipitation patterns to the original precipitation (Figure 5a) with many local details. They also returned higher accuracies compared to the original TRMM when all compared against the observed precipitation (Figure 9). The highly performed downscaled product for each month is selected to consist of the downscaled TRMM of 1 km on a monthly scale (Figure 11). These maps compared to the original monthly TRMM precipitation (Figure 10) had similar overall spatial distributions but present a higher resolution and thus could display more detailed precipitation patterns. These results indicated that the downscaled precipitation in this study could improve not only the spatial resolution but also the accuracy of the TRMM downscaled precipitation. The findings are in accordance with previous results [24,57,68].
The effective of our framework is also proved by residual maps. In general, negative values of residual maps (e.g., greener areas) indicate that vegetation types in these areas may have an additional water source (e.g., irrigated areas) or it is less sensitive to precipitation (e.g., evergreen forest with deep rooting systems). On the other hand, positive values of residual maps may be characterized by vegetation types that are less green than would be expected (e.g., sparse vegetation). The higher residual magnitude returned by the LAI dataset followed by EVI, but these higher residuals are not dominant in the study area (Figure 6c). Figure 7 indicates that the three-predicted precipitation products present fewer residuals contribution to the downscaled precipitation.
4.2. Importance of Each Predictor and the Role of Vegetation Indices in TRMM Downscaling
Analysis of input variables importance in the downscaling which assigned by the ANN model indicated that latitude and longitude play a dominant role in downscaling TRMM precipitation in the study region. This result is in accordance with previous findings [29,57,74] that latitude and longitude may significantly affect precipitation and its spatial distribution. The higher importance score of elevation over all vegetation indices may be attributed to the uplift precipitation effects of mountains [10], in which case, the altitude affects the climate parameters that in turn influence the rate of precipitation [10,54,56]. Besides, the three vegetation indices were found to differ in their role in the downscaling model, where EVI introduced more contribution to the prediction model followed by NDVI while LAI produced the lower importance.
In general, the LAI-dataset performed slightly better than EVI, followed by NDVI in the annual downscaling model. This finding is may because the LAI dataset is found to be more correlated (R2 = 0.47) to precipitation than EVI (R2 = 0.45) and NDVI (R2 = 0.34), as shown in Figure A2. Another possible reason may be attributed to fine original spatiotemporal resolution of LAI (500 m; 4-days) compared to NDVI and EVI (1 km; 16-days). Furthermore, NDVI was reported by several studies [10,26,27,28,29] as prone to saturation when the precipitation exceeds a certain threshold. It is worth mentioning that LAI contributed fewer in terms of variable importance to the prediction model (Figure 4), but it performed the best in both annual and monthly downscaled products. This result indicates a higher importance score of a variable does not guarantee its better performance in precipitation downscaling. Alternatively, this study recommends the use of LAI to overcome both EVI and NDVI limitations, which may have neutralized the saturation effect.
4.3. Advantage and Disadvantage The downscaling framework proposed by this study makes full use of the powerful data processing capabilities of Google Earth Engine and the powerful online computing capabilities of Google Cloud. It can help to achieve a deep coupling of online processing of remote sensing data and machine learning approaches. It does not require downloading data, installing software, and does not limit by personal computing devices. This framework allows easy comparison between different machine learning methods and is capable of selecting the optimal downscaling method and parameters based on specific regional characteristics. Despite that, this framework has some limitations. For example, it is limited by Google Drive space (15GB) and the Colab life cycle (12 h). Besides, the RAM and GPUs in Colab change with time to adapt to fluctuations in demand and the overall growth of user concurrent computing. 5. Conclusions
Accurate estimation of precipitation is a vital factor for land and water management application at the basin scale. The main merit of this framework is to deliver an easy to follow and accurate method for statistical downscaling TRMM precipitation by utilizing different sources of remote sensing data with machine learning methods, easy access, and a free processing environment. Python and GEE via Colab were used to facilitate the proposed downscaled procedures, which is time- and space-saving in data downloading, data format conversion, and data analysis. Three machine learning algorithms (GBR, SVR, and ANN) and auxiliary variables (elevation, latitude, longitude, and either NDVI, EVI, or LAI) were utilized to describe the relationship between precipitation and the geospatial environmental variables. Our results reveal that (1) the regression module based on ANN gave better and significant statistical metrics in simulating TRMM precipitation, (2) the most sensitive vegetation index for downscaling TRMM was the LAI followed by EVI, and (3) geolocation and elevation play an essential role in the downscaling model over the study area. The main conclusion of this study is that it is possible to accurately downscale TRMM precipitation which is a key input parameter in several essential studies [14,15,16,17,18,19,20,21,22,23] based on free-of-charge cloud computing. By this framework, the downscaling of TRMM precipitation can be achieved in a timely, efficient, and operational manner and the concept can be applied to another area for a similar subject as well as it is flexible to integrate more input variables and machine learning algorithms.
Predictors | R2 | RMSE | MAE | |||||||
---|---|---|---|---|---|---|---|---|---|---|
Static | Dynamic | ANN | GBR | SVR | ANN | GBR | SVR | ANN | GBR | SVR |
Elevation, Longitude, Latitude | NDVI | 0.983 | 0.971 | 0.953 | 73 | 95 | 121 | 54 | 69 | 77 |
EVI | 0.984 | 0.975 | 0.959 | 71 | 89 | 114 | 51 | 66 | 76 | |
LAI | 0.977 | 0.97 | 0.965 | 85 | 96 | 105 | 56 | 67 | 72 |
Author Contributions
A.E. was responsible for the experimental designing, manuscript preparation, and Jupyter notebook for data processing via Colab. H.Z. contributed to conceptual designing, editing, and reviewing the manuscript. B.W. contributed to the final reviewing of the manuscript, funding acquisition, and project administration, N.Z. contributed the structure designing, editing, and reviewing. F.T., M.Z., W.Z., N.Y., Z.C., Z.S., X.W., Y.L. gave useful comments which improved the paper. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by the National Key R&D Program of China (2016YFA0600304, 2016YFA0600301), National Natural Science Foundation of China (41561144013, 41601464 and 41761144064), Key R&D project of Chinese Academy of Science (KFZD-SW-316), the environmental protection project of China Three Gorges Corporation.
Acknowledgments
Thanks to the China Meteorological Data Service Centre (CMDC) for providing the rain gauged precipitation data. Thanks to the Tropical Rainfall Measuring Mission (TRMM), the Moderate Resolution Imaging Spectroradiometer (MODIS) mission, and the Shuttle Radar Topography Mission (SRTM) for their data support. We would also like to thank all the staff members of the Python Software Foundation, Google Earth Engine, and Google Colab teams. Finally, we want to express our great appreciation to Anonymous reviewers and editors; your comments have significantly improved the quality of the article.
Conflicts of Interest
The authors declare no conflict of interest and no potential conflict of interest with China Three Gorges Corporation Company.
Source Code
Since the source code has been registered on the National Copyright of the People's Republic of China (Certificate No: 5037633, Registration No: 2020SR0158937) as well as it relies on openly accessible sources, it can be shared by emailing the correspondence author.
Appendix A
Table
Table A1.Summary of relevant studies for monthly downscaling TRMM precipitation data.
Table A1.Summary of relevant studies for monthly downscaling TRMM precipitation data.
Reference | Predictors | Residual
correction | Regression
model | Performance | ||
---|---|---|---|---|---|---|
R2 | RMSE | MAE | ||||
(mm month-1) | ||||||
[34] * | DEM, aspect, roughness, humidity, temperature | Spline | MLR | 0.58 | 19.99 | -- |
TRMM | 0.58 | 39.45 | -- | |||
[24] | NDVI, DEM | Area-to-point Kriging | MLR | -- | 20.03 | 13.03 |
Ordinary Kriging | -- | 24.81 | 17.73 | |||
[57] | DEM, Long, Lat | Spline | ANN | 0.936 | 40.56 | -- |
MF | 0.934 | 41.20 | -- | |||
[74] | NDVI, LST, DEM, slope, Long, Lat | Ordinary Kriging | GWRK | 0.95 | 25 | 16 |
TRMM | 0.95 | 30 | 19 | |||
[75] | EVI, DEM, aspect, slope, Long, Lat | Bilinear | RF | 0.78 | 25 | 14 |
TRMM | 0.73 | 31 | 16 | |||
[54] | NDVI, VWSI,
albedo, DEM, | Spline | MLR | 0.47 | 54 | -- |
ANN | 0.60 | 59 | -- | |||
TRMM | -- | 37 | -- | |||
[76] | NDVI, DEM, LST | - | SVM | 0.75 | 29.90 | -- |
RF | 0.82 | 26.10 | -- | |||
[35] | NDVI, DEM, LST | Spline | MLR | 0.46 | 27 | 14 |
kNN | 0.71 | 17 | 12 | |||
CART | 0.70 | 18 | 12 | |||
SVM | 0.73 | 16 | 11 | |||
RF | 0.74 | 16 | 11 | |||
[29] ** | NDVI, DEM,
slope, Long, Lat | Kriging | GWRK | 0.91 | 22.2 | 13.5 |
0.84 | 7.50 | 4.8 | ||||
0.80 | 30.5 | 22.2 | ||||
TRMM | 0.88 | 26.5 | 13.7 | |||
-- | 5.10 | 3.8 | ||||
0.69 | 37.1 | 23.7 | ||||
[25] *** | NDVI, DEM | Bilinear | Exponential | 0.74 | 24 | -- |
0. 60 | 25 | |||||
GWR | 0.67 | 32 | -- | |||
0.42 | 20 | |||||
MLR | 0.80 | 22 | -- | |||
0.26 | 15 | |||||
QPP | 0.89 | 16 | -- | |||
0.45 | 11 | |||||
TRMM | 0.94 | 11 | -- | |||
0.64 | 9 | -- | ||||
[77] | DEM, Long, Lat, TRMM-1 km | - | GWR | 0.87 | 32.92 | 18.19 |
Ordinary Kriging | GWRK | 0.89 | 31.11 | 17.05 | ||
TRMM downscaled by ATPK to 1 km (TRMM-1 km) | 0.76 | 46.14 | 26.44 | |||
TRMM | 0.72 | 49.63 | 28.66 | |||
[30] | EVI, DEM | -- | GWR | -- | -- | -- |
NDVI, DEM | 0.86 | 35 | 23 | |||
TRMM | 0.85 | 38 | 26 |
Note: Geographically Weighted Regression Kriging (GWRK); Geographically Weighted Regression (GWR); Area-to-point Kriging (ATPK); Optimal Subset Regression (OSR); Multiple Linear Regression (MLR); Artificial Neural Network (ANN); Multi-fractal approach (MF); Vegetation Water Supply Index (VWSI), Land Surface Temperature (LST), Random Forest (RF); Quadratic Parabolic Profile (QPP); Classification and Regression Trees (CART); Cubist is a spatial data mining algorithm; Longitude (Long); Latitude (Lat). * R2 and RMSE values represent the mean for the six events, ** performance during monthly (higher row), dry season (middle row), and wet season (lower row), *** performance using national stations (higher row) and regional stations (lower row).
Appendix B
Table
Table A2.Summary of ANOVA analysis test between the investigated algorithms performance in simulating TRMM precipitation for the year 2018.
Table A2.Summary of ANOVA analysis test between the investigated algorithms performance in simulating TRMM precipitation for the year 2018.
Metrics | Source of Variation | SS | df | MS | F | p-Value | F crit |
---|---|---|---|---|---|---|---|
R2 | Between Groups | 0.0008 | 2 | 0.00038 | 20 | 0.002 | 5.14 |
Within Groups | 0.0001 | 6 | 0.00002 | ||||
Total | 0.0009 | 8 | |||||
RMSE | Between Groups | 2069 | 2 | 1035 | 22 | 0.002 | 5.14 |
Within Groups | 286 | 6 | 48 | ||||
Total | 2355 | 8 | |||||
MAE | Between Groups | 728 | 2 | 364 | 72 | 0.0001 | 5.14 |
Within Groups | 30 | 6 | 5 | ||||
Total | 758 | 8 |
Note: SS: Sum-of-Squares; df: the degree of freedom; MS: Mean Square; F: F test statistic; p-value: probability value (here it is at 0.05); F crit: F critical value.
Appendix C
Remotesensing 12 03860 g0a1 550
Figure A1.Monthly validation results of the ensemble mean (a), ensemble median (b), and the highly performed (c) monthly downscaled TRMM by disaggregation using ANN algorithm and NDVI, EVI, and LAI vegetation indices, 2018.
Figure A1.Monthly validation results of the ensemble mean (a), ensemble median (b), and the highly performed (c) monthly downscaled TRMM by disaggregation using ANN algorithm and NDVI, EVI, and LAI vegetation indices, 2018.
Appendix D
Remotesensing 12 03860 g0a2 550
Figure A2.Mean annual values of NDVI, EVI, and LAI versus the original TRMM for the year 2018.
Figure A2.Mean annual values of NDVI, EVI, and LAI versus the original TRMM for the year 2018.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2020. This work is licensed under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Accurate precipitation data at high spatiotemporal resolution are critical for land and water management at the basin scale. We proposed a downscaling framework for Tropical Rainfall Measuring Mission (TRMM) precipitation products through integrating Google Earth Engine (GEE) and Google Colaboratory (Colab). Three machine learning methods, including Gradient Boosting Regressor (GBR), Support Vector Regressor (SVR), and Artificial Neural Network (ANN) were compared in the framework. Three vegetation indices (Normalized Difference Vegetation Index, NDVI; Enhanced Vegetation Index, EVI; Leaf Area Index, LAI), topography, and geolocation are selected as geospatial predictors to perform the downscaling. This framework can automatically optimize the models’ parameters, estimate features’ importance, and downscale the TRMM product to 1 km. The spatial downscaling of TRMM from 25 km to 1 km was achieved by using the relationships between annual precipitations and annually-averaged vegetation index. The monthly precipitation maps derived from the annual downscaled precipitation by disaggregation. According to validation in the Great Mekong upstream region, the ANN yielded the best performance when simulating the annual TRMM precipitation. The most sensitive vegetation index for downscaling TRMM was LAI, followed by EVI. Compared with existing downscaling methods, the proposed framework for downscaling TRMM can be performed online for any given region using a wide range of machine learning tools and environmental variables to generate a precipitation product with high spatiotemporal resolution.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer