Naturally occurring arsenic groundwater contamination affects aquifers around the world. Odorless and tasteless, just trace amounts of arsenic can present a hazard to human health. Since arsenic is relatively difficult to detect, it is not routinely measured in groundwater quality analyses. In order to better assess the global extent of arsenic groundwater contamination, we have used random forest modeling to create a prediction map of areas likely to contain naturally occurring concentrations of arsenic in groundwater exceeding the WHO’s guideline of 10 µg/L for drinking water. We will present this map as well as the modelling procedures behind it.
Random forest modeling is a highly effective statistical learning method based on an ensemble of decision trees that has seen much use in classification problems for environmental applications. In this case, we have trained and tested a random forest model on ~120,000 known groundwater arsenic measurements trying more than 20 various geology, soil, climate and hydrological parameters taken from the latest publically available global datasets. Specifically, 80% of the arsenic concentration dataset was used for model training, while the other 20% was retained for testing and verification. 1001 individual trees were grown by bootstrap aggregating (bagging) a random selection of the training dataset and making a random subset of variables available at each branch split. So doing helps reduce variance while maintaining low bias in the random forest model. The final prediction model comprises the average votes of these 1001 trees for the predictor variables used.
Our highly accurate, 1-km resolution random forest model represents the most sophisticated characterization of arsenic contamination in aquifers around the world. In particular, it utilizes considerably more and higher resolution data than does the 2006 Amini et al. global groundwater arsenic model. The new map can be used by water managers and other government and non-government agencies to prioritize area for groundwater quality testing. Furthermore, it works toward Sustainable Development Goal (SDG) 6 with regard to the provision of safe drinking water.