Cristian Lopez
This project shows how machine learning can reduce drilling costs by over $5.58 million by optimizing the Rate of Penetration (ROP). Since drilling is a major expense in the oil and gas industry, this data-driven approach offers a significant improvement in efficiency. Traditional ROP prediction models, such as the Maurer and B&Y models, have substantial limitations, with mean errors of 134.0% and 48.6%, respectively. This work introduces a machine learning framework that delivers much more accurate ROP predictions and provides a clear method for optimizing controllable drilling parameters to maximize efficiency and reduce costs. The analysis focuses on four Flemish Pass exploration wells, all drilled using 12.25-inch PDC drill bits.
The project followed a structured workflow, transforming raw well data into an actionable, optimized drilling plan.
The dataset was compiled from four wells (G-92, J-31A, K67Z, C-78) in the Flemish Pass region. It included a comprehensive set of drilling parameters and geological information. The data was cleaned, normalized, and prepared for model training.
A detailed feature analysis was conducted using Spearman Correlation and Permutation Feature Importance. This step was crucial for identifying the most influential parameters on ROP. Key predictors identified include Depth, Standpipe Pressure (SPPA), and Surface Torque (STOR).
Advanced regression models, including Gradient Boosting and XGBoost, were developed to predict ROP. These models were chosen for their high accuracy and ability to capture complex, non-linear relationships in the data. The final model achieved a strong coefficient of determination (R2) on test data.
Multi-Objective Particle Swarm Optimization (MOPSO), a powerful metaheuristic algorithm, was applied to the trained machine learning model to identify the optimal combination of controllable parameters—such as RPM and Surface Weight on Bit (SWOB). MOPSO systematically searched for solutions that maximize the predicted ROP while simultaneously optimizing other objectives, such as minimizing energy consumption, across different depths and geological conditions.
The analysis of feature importance revealed that Depth, Standpipe Pressure (SPPA), and STOR are the most influential predictors of ROP. This is consistent with the physics of drilling, as:
Figure 1: Permutation feature importance highlights the key drivers of ROP.
The Spearman Correlation Heat Map clearly illustrates the relationships between variables in drilling operations. It shows strong positive correlations between ROP and key controllable parameters like Surface Weight on Bit (SWOB) and Equivalent Circulating Density (ECD). This is expected, as increasing SWOB directly increases the force on the drill bit, accelerating the drilling process. ECD also supports faster drilling by maintaining wellbore pressure, ensuring stability, and aiding in hole cleaning.
Figure 2: Spearman heatmap showing the relationships between drilling parameters.
The scatter plot comparing predicted and actual ROP values clearly illustrates the model's high accuracy.An R2 of 0.85 shows that 85% of the variation in actual ROP is explained by the model’s predictions. The close clustering of data points along the "Perfect Prediction" line highlights the model’s reliability and robustness in forecasting drilling performance.
Figure 3: Test predictions vs. actual values, showing a high R² of 0.85.
The optimization trade-off analysis for the claystone section identified the following optimal parameters to maximize ROP:
These parameters achieve a predicted ROP of 72.00 m/hr with a Mechanical Specific Energy (MSE) of 31890.33 psi. The analysis reflects an algorithmic decision to strike a careful balance between maximizing drilling speed and minimizing energy use. Specifically, the MOPSO algorithm identified and selected the point nearest the "knee" of the curve, representing the most balanced trade-off. Note TFLO = Total flow rate of all active pumps (L/min)
Figure 4: Optimization trade-off analysis for the claystone section.
For the limestone section, a different set of optimal parameters was found, reflecting the change in rock properties:
These settings result in a predicted ROP of 46.11 m/hr and an MSE of 31055.80 psi. This highlights the model's ability to adapt its recommendations to different geological layers, a key requirement for effective real-world application.
Figure 5: Optimization trade-off analysis for the limestone section.
By applying the ML-optimized parameters, the simulated drilling cost was reduced from $16.96M to $11.38M, achieving a total saving of $5.58M CAD. This highlights the immense financial benefit of adopting a data-driven optimization strategy.
Figure 6: Drilling cost comparison before and after ML optimization.
The performance of the GBR model was benchmarked against established industry standards. When predicting a new well, it achieved an error of 28.3%. In comparison, traditional models were significantly less accurate: the B&Y model had a mean prediction error of 48.6% while Maurer’s model showed a much higher error of 134.0%. This stark contrast highlights the limitations of older formulas and the clear advantage of the machine learning approach. Note that
Figure 7: The ML model (GBR) shows a much lower prediction error compared to traditional engineering models.
The optimization process resulted in a significantly improved ROP. The optimized drilling plan (shown in red) is not only higher on average but also more stable than the original, highly variable ROP (shown in blue). This leads to faster, more predictable, and safer drilling operations.
Figure 8: Comparison of original ROP against the smoother, higher optimized ROP across different geological layers.
This project demonstrates that combining a Gradient Boosting model with Particle Swarm Optimization significantly improves drilling operations. By turning historical data into a real-time predictive tool for optimal drilling parameters, it achieves multi-million-dollar cost savings while boosting operational stability and efficiency. The scalable framework can be adapted to various geological settings, making it a valuable asset for modern oil, gas, and geothermal exploration.
The complete source code for this project, including data analysis notebooks and model implementation, is available on GitHub.
View on GitHub