World total agricultural use of either chemical or mineral fertilizers was 110 Mt nitrogen (N) in 2016, reaching 69 kg N/ha the use of fertilizers per hectare of cropland (arable land and permanent crops) (FAO, 2018). The excessive use of nitrogen-containing fertilizers and manures is one of the main sources for the nitrate contamination of groundwater. WHO (2011) and the EU Water Framework Directive (2000) establish groundwater as polluted when nitrate concentration is equal or above the guideline value of 50 mg/L. Machine learning algorithms (MLAs) have been increasingly used to predict nitrate concentration in groundwater since they can recognize patterns between them and different features, learning from data without an imposed physical model. For the induction of an MLA, one can use all available features or select a smaller subset of them, removing redundant or spurious features. Many approaches can be used to evaluate the importance of features, which are related to groundwater pollution caused by nitrates. Feature selection (FS) is a process that selects a subset of the original features, optimizing the feature space considering a given criterion. FS contributes to a better understanding of nitrate pollution of groundwater, focusing on the relevant data and improving MLA performance. Different approaches for FS exist such as wrappers and embedded methods. Wrapper-based algorithms select a subset of relevant features based on the performance of a given learning method when the feature space is either increased or reduced. Within wrapper methods, different types of sequential searches can be applied to feed the MLA (sequential backward selection (SBS), sequential forward selection (SFS), sequential forward floating selection (SFFS) and sequential backward floating selection (SBFS)) were evaluated. On the other hand, embedded algorithms perform variable selection using internal measures of performance during the training of the algorithm. Random forest (RF) for classification was used as the learning method, where a bootstrap routine was incorporated into the wrapper and embedded methods to evaluate the generalization of the prediction model. A database of 20 features composed of hydrogeological and hydrological features, driving forces (sectors of activities that may produce a series of pressures, either as point and non-point sources) and remotely sensed variables (Normalized Difference Vegetation Index—NDVI) was used. Nitrates concentrations of 110 wells were used as a target feature. The SFFS RF wrapper outperformed the rest of the methods (mean misclassification error = 0.12, Area Under the ROC Curve = 0.92), selecting only three features: industries and facilities rated according to their production capacity and total nitrogen emissions to water within a 3 km buffer, livestock farms rated by manure production within a 5 km buffer and, cumulated NDVI for the post-maximum month, being used as a proxy of vegetation productivity and crop yield.