Authors: Khushbu Rajput, Bhavesh Jain
Abstract: Agriculture is an important sector in terms of food security and economic development, especially in developing nations. Precise crop yield estimation is required for efficient agricultural planning and management in the context of the increasing effects of climate change. Crop yield is affected by various factors, including climate variability, soil type, and availability of nutrients. Conventional crop yield estimation techniques, which rely on average values and traditional knowledge, are not reliable due to the complexities involved in crop yield estimation. Proposed in this paper is a framework for crop yield prediction using machine learning, incorporating climatic and soil variables. The climatic variables of rainfall, temperature, and humidity, and soil variables of soil pH and necessary nutrients (nitrogen, phosphorus, and potassium) are used as input variables. Three supervised machine learning algorithms—Linear Regression, Random Forest, and Gradient Boosting—are applied and compared to assess their predictive capability. Linear Regression is applied as a baseline algorithm, while ensemble methods are applied to deal with non-linearities in agricultural data. The performance of the models is measured using typical regression evaluation criteria, including Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and R-squared (R²). The experimental outcomes show that the models based on ensemble methods perform better than the baseline model in terms of prediction accuracy and generalization ability. The results confirm that the combination of climatic and soil properties helps to improve crop yield prediction.