Accurate crop yield forecasting is important for various stakeholders in the agri-food chain, including farmers, agronomists, commodity traders, and policymakers. Accurate and timely forecasts enable informed decision-making and better planning in the face of agriculture's inherent uncertainties. Machine learning can be leveraged to predict crop yields, guide crop selection, and inform actions during the growing season. Since crop yield depends on numerous factors, identifying the most significant variables is essential to enhance prediction accuracy. Feature selection algorithms help focus on relevant features, improving model performance while reducing computational time. This method is especially useful in agriculture, where yield is affected by factors like land use, water management, fertilizer application, and weather conditions. In this study, the Random Forest algorithm was applied both to develop a regression model and to perform feature selection across three different datasets. A multiple linear regression model was then built using the selected features.
The models' forecasting performance was assessed using statistical metrics, including Root Mean Square Error (RMSE), Mean Absolute Percentage Error (MAPE), and Mean Absolute Deviation (MAD). A comparison between Random Forest-based feature selection and Random Forest regression showed that the RF regression model delivered the most accurate forecasts across the datasets.