Digitalisation in its various shapes of artificial intelligence, machine learning or big data analytics is slated to add unprecedented efficiencies that will transform the oil and gas landscape. The role of data-based machine learning and artificial intelligence techniques continues to grow to streamline error-prone, labor-intensive, repetitive tasks.
The data-based machine learning and artificial intelligence models are only as good as the training data used to train and validate these models. Accordingly, data acquisition and readiness for analysis is a critical part of the digitalisation journey. This paper discusses the application of the statistical method of principle component analysis (PCA) to systematically assure data quality and representation to boost the design and function of machine learning and artificial intelligence models to generate the levels of operational intelligence required.
The analysis work performed shows that PCA is a useful quantitive technique to controllably extract representative data that feeds machine learning models and systems that drive automation. Additionally, the paper highlights the application of PCA to condense data in favour of the data interpreter or big data machine learning and artificial intelligence engineer; it is evident that PCA can increase the correlation with the variables to unveil hidden trends.
Moreover, while digital initiatives, such as automation, augment safety practices, unintended different classes of safety breaches may be encountered. Accordingly, the paper highlights the importance of digitalising safely using embedded protection controls.
This work shows the success of the combination of legacy-based data and intelligence extracted using digital analytics to transcend boundaries towards intelligent decision-making framework.
All rock formations contain natural gamma ray. However, the quantity of natural gamma ray varies between the different rock lithologies. Measurement while drilling (MWD) gamma ray log and shale shaker formation cuttings are the classic tools used by the mud logger to measure gamma ray for a given formation. The gamma ray log is used for lithology identification, which is essential in reservoir description and characterisation as the formation lithology controls petrophysical properties of the rock. In today’s digital oilfield phenomena, big data of contaminated gamma ray is measured and recorded. To effectively utilise this data, statistical machine learning techniques, including PCA, are used to assure data quality for digital data analytics and intelligent models.
Likewise, hydrocarbon rock formation porosity is a key parameter to estimate the quality of reservoir rock and its overall storage capacity. Logging while drilling (LWD) neutron porosity log is used to estimate the formation porosity. As the drilling tools penetrate the formation, neutron porosity big data is being logged and processed to quantify the amount of pore space in the reservoir rock. In conjunction with bulk density log, neutron porosity log is used to detect the presence of hydrocarbons and type of fluid, Figure 1. As the raw data often contains noise and overwhelming amount of big data, PCA can be applied to quantitatively analyse the data and reject uncorrelated data to flexibly extract data of interest to the interpreter or machine learning and artificial intelligence modeller.
Some of the common methods used to enhance quantitive interpretation of big data and preprocess big data for digital analytics and machine learning models are classical statistical methods and artificial neural network (ANN). ANN has been widely developed and applied in the oil and gas industry due to efficiency and user-friendly features ((Benaouda et al., 1999; Qi and Carr, 2006; Wang and Carr, 2012). Out of the statistical approaches, PCA is gaining more popularity because it can condense the source raw data from multiple, varied sources into fewer, more controllable representative components to convey meaningful insights that can be acted upon to gain new levels of operational intelligence. This also creates a mean to reject noisy components of the data that are undesired or uncorrelated by introducing a data subset during data reconstruction process. In processing big data, such as MWD and LWD, PCA has the advantage of reducing high frequency noise that could be misleading to the interpreter or big data machine learning and artificial intelligence engineer. Another major advantage of PCA is the ability to flexibly shrink the input processing data that feed digital analytics and algorithms that drive automation (Guo et al., 2009).
Often, PCA uses three principle components. The first principle component has the largest data variance compared to the other principle components. The data variance is reduced as the principle component order is reduced. Generally, the first component represents more than 50% of the data variance.
PCA is a classical statistical method that can be used to represent big data into fewer principle components. A common example of PCA in the oil and gas industry is the representation of the infinite number of formation in-situ stresses into three principle in-situ stress components: minimum horizontal principle in-situ stress, maximum horizontal principle in-situ stress and overburden (vertical) in-situ stress. The reconstruction of the infinite in-situ stresses into three principle components is helpful in many applications. For instance, it is used in selecting the wellbore azimuth stress plane direction to generate favourable induced flow paths for the hydrocarbon fluid to increase well production while preventing operational problems, such formation wash out or break out and instability of boreholes, Figure 2 (Al-Ghazal et al., 2013).
This and other examples show the effectiveness of PCA technique to glean insightful intelligence that improves operational performance (Lima et al., 2018).
This paper highlights the application of PCA to assure data quality and readiness in the interest of the interpreter or big data machine learning and artificial intelligence engineer. Also, the paper sheds lights on the importance of health, safety, security and environmental (HSSE) aspects as more and more instrumentation and automation is added to the oilfield to digitalise safely.
There are several reasons to condense the data collected. The main reasons are as follows: first, remove data noise; second, increase data correlation; third, focus on a given section or variable of the data; fourth, convenience for the data interpreter; and finally, assure and improve data quality.
Data quality remains a pressing issue for oil and gas operations. There are several reasons that compromise the quality of data, including faulty sensors, failure to calibrate instruments periodically, harsh weather conditions and the location of the digital system on the facility. It is very critical to address this issue as bad data can undercut the value of analytics (Feblowitz, 2013).
PCA is a practical filter that provides data with the largest variance to maintain good representation of the data while removing outliers selectively to reduce uncertainty. PCA can also be utilised as a preprocessor to filter data prior to using the data for machine learning and artificial intelligence modelling, Figure 3. This can increase the accuracy and reliability of machine learning models as data quality is a major pre-quisite for the success of a data-based machine learning technique (Al-Ghazal, 2018). This is of critical importance as machine learning techniques and applications rely heavily on training data.
Overall, PCA is a classic statistical method to re-engineer the data to be more favourable to machine learning and artificial intelligence techniques that are built around operational efficiency and market competitiveness.
PCA was applied on a gamma ray and neutron porosity dataset to remove noise, Figure 4. The exercise was carried out in order to decontaminate the data, and thereby enhance the accuracy of machine learning algorithm training and modelling.
Implementing PCA using the first principle component on the gamma ray and neutron porosity dataset, it is evident that PCA removes contaminated data while preserving feature data of interest, Table 1.
PCA, using a few principle components, can capture the maximal information from a dataset given the existence of orthogonality between the assigned principle components to represent the largest variance in the data. However, in some instances, the assigned principle components may not capture the data variance to the extent required. In such cases, attempts are made to rotate the principle component until the required data variance is achieved.
Rotated PCA analysis has been carried out on the gamma ray and neutron porosity dataset. The results of the correlation coefficient are shown on Table 2 after rotating the first principle component towards gamma ray.
Table 1 shows that PCA using the original first principle component yields a correlation coefficient of (0.948) with gamma ray and a correlation coefficient of (- 0.947) with neutron porosity. The rotation of the first principle component in the direction towards the gamma ray increases the weighting of gamma ray, and ultimately a higher correlation coefficient for gamma ray of (0.962). After shifting the first principle component towards gamma ray, the the weighting of porosity decreases and results in a smaller correlation coefficient of (- 0.901), Table 2. The correlations for the subject variables demonstrate the power of PCA to condense the data to become more correlated and user-friendly while excluding noisy, misleading outliers.
The assessment work performed shows that PCA improves data-based machine learning modelling by excluding data selectively to re-engineer the dataset to be more practical to glean intelligence. Also, sometimes it is more advantageous to rotate the principle components to capture more data of interest. In this example, PCA was used to extract representative data controllably to be used for machine learning and artificial intelligence modelling towards intelligent operations.
Additionally, the exercise of rotating the principle components is useful to ensure hidden feature data are not overlooked. The principle components are rotated until desirable correlations are achieved. The rotation of the principle components is often in the direction towards variables of interest to increase their weightings.
The intelligence gathered from data analysis can be implemented as advisory system or closed-loop automation system mode. In the case of closed-loop drilling system, health, safety, security and environmental (HSSE) aspects become more critical, thus user’s privileges to take over control of drilling process in case of system malfunction are necessitated (Wallace et al., 2015). Additionally, it is also important to maintain the drilling skills and capability of the workforce on location despite the replacement with autodrillers to ensure competent response when the closed-loop system fails. This can include simulated training and certification to retain competences (Thorogood, 2013). For the safe success of automated processes, it is of paramount importance to add redundant, reliable sensors that feed the system continuously. Also, it is necessary to have a clear differentiation between the sensors used to retrieve data for analysis and the sensors used to retrieve data that performs process control and execute actions automatically (de Wardt, 2013). The distinction between the two types of data augments efforts to prevent unforeseen errors due to overlap confusion.
In the case of remote control operating centers that run facilities onshore and offshore from distance, it is critical to check he situational awareness of the operators that control the operational processes. Salehi (2018) demonstrates the use of eye-tracking technology to track human situational awareness in oil and gas operations. The technology application demonstration was carried out using a drilling simulator at the University of Oklahoma. The technology is helpful to detect the cognitive status of the worker, and send an alert in case of mental fatigue identified by image processing algorithms.
To support the safety of workforce and environmental protection, the closed-loop drilling system deployment shall include redesign of rig instruments, including their function, location and requirement for safety instrumented systems (AlJubran et al., 2018). The design shall include redundant layers of protection to validate data and prevent errors while allowing human intervention to regain control as necessary (AlGhazal & AlJubran, 2017).
A demilitarized zone (DMZ) network is always recommended for secure system deployment patterns. The most important function of the DMZ network is to enforce termination of network traffic within the DMZ. The intent is to avoid a single point of failure that results in potential breach of the control system. Cybersecurity deployment patterns proposed by OSIsoft for plant information (PI) system are useful to review when designing data and information flow and control architectures. Figure 5 shows a deployment pattern, pattern 1, which enables cybersecurity for the network architecture without compromising operational efficiency, allowing for a two-way data flow (Owen, 2013).
As digitalisation and instrumented components develop in the oilfield, it is critical to have functioning procedures and verification policies in place to maxmise value and mitigate implications arising from poorly performing digital systems. Also, new digital roles and responsibilities will be born in the oilfield to become intelligent users of intelligent systems.
Data quality and readiness for analysis is the foundation on which the success of data-based machine learning and artificial intelligence models rely.
PCA is a flexible tool to focus analysis on useful data while excluding contaminated and uncorrelated data.
The PCA work performed demonstrate the power of PCA to assure and smoothen data quality and data correlation.
In some cases, rotating the principle component increases the correlation for a given variable with the principle component. This is achieved by increasing the weighting of the variable as the principle component is rotated towards it.
The insights gleaned from PCA data machine learning and artificial intelligence models add competitive advantages and uncover invisible relationships.
While digital initiatives, such as automation, augment safety practices, unintended different classes of safety breaches may be encountered. Accordingly, HSSE implications should be assessed and controlled continuously.
Sustainable closed-loop automated drilling systems offer the driller with the capability to regain control of the system. Such systems offer a good, dynamic balance between the tasks performed by the control system and the driller.
Closed-loop drilling system deployment design shall include redundant layers of protection to validate data and prevent errors while allowing human intervention to regain control as necessary.
It is necessary to have a clear differentiation between the sensors used to retrieve data for analysis and the sensors used to retrieve data that performs process control and execute actions automatically. The distinction between the two types of data augments efforts to prevent unforeseen errors due to overlap confusion.
As digitalisation and instrumented components develop in the oilfield, it is critical to have functioning procedures and verification policies in place to maxmize value and mitigate implications arising from poorly performing digital systems. Also, new digital roles and responsibilities will be born in the oilfield to become intelligent users of intelligent systems.
The power of legacy-based data and digital analytics transcends barriers to access valuable information that can be acted upon.
Al-Ghazal, M.A. “The Value of Digital Data Analytics,” Oil & Gas Vision 14: 10-11, 2018.
Al-Ghazal, M.A. and AlJubran, M.J.: “Upstream Operations: Cybersecurity and Generation Y,” Saudi Aramco Journal of Technology, Summer 2017, Online Content.
Al-Ghazal, M.A., Al-Ghurairi, F.A. and Al-Zaid, M.R.: “Overview of Open Hole Multistage Fracturing in the Southern Area Gas Fields: Application and Outcomes,” Saudi Aramco Ghawar Gas Production Engineering Division Internal Documentation, March 2013.
Aljubran, M., Al-Ghazal, M., & Vedpathak, V. “Integrated Cybersecurity for Modern Information Control Models in Oil and Gas Operations,” SPE 190582-MS presented at the SPE International Conference on Health, Safety, Security, Environment, and Social Responsibility, 16-18 April 2018, Abu Dhabi, United Arab Emirates.
Benaouda, D., Wadge, G., Whitmarsh, R.B., Rothwell, R.G. and MacLeod, C. Inferring the Lithology of
Borehole Rocks by Applying Neural Network Classifiers to Downhole Logs: An Example from the Ocean Drilling Program. Geophysical Journal International 136(2): 477–491, 1999.
De Wardt, J. P. “Industry Analogies for Successful Implementation of Drilling Systems Automation and Real Time Operating Centers,” SPE 163412-MS presented at the SPE/IADC Drilling Conference and Exhibition, 5-7 October 2013, Amsterdam, The Netherlands.
Feblowitz, J. “Analytics in Oil and Gas: The Big Deal about Big Data,” SPE 163717-MS presented at the SPE Digital Energy Conference and Exhibition, 5-7 March 2013, The Woodlands, Texas, USA.
Guo, H., Marfurt, K.J. and Liu, J. “Principal Component Spectral Analysis,” GEOPHYSICS 74: 35-43, (July-August) 2009.
Lima, R.A. and Marfurt, K.J. “Principal Component Analysis and K-means Analysis of Airborne Gamma Ray Spectrometry Surveys,” SEG International Exposition and 88th Annual Meeting. 2277-2281, 2018.
Salehi, S. September 2018. Cognitive Study of Human Factors for Safe Drilling Operations: Eye-Tracking Technology. Society of Petroleum Engineers. www.spe.org.
Thorogood, J. Automation in Drilling: Future Evolution and Lessons from Aviation. SPE Drilling & Completion: 194-202, 2013.
Owen, B.: “Recommended Deployment Patterns,” talk presented at the OSIsoft Users Conference, San Francisco, California, April 16-19, 2013.
Qi, L. and Carr, T.R. Neural Network Prediction of Carbonate Lithofacies from Well Logs, Big
Bow and Sand Arroyo Creek Fields, Southwest Kansas. Computers & Geosciences 32(7): 947–964, 2006.
Wallace, S. P., Hegde, C. M., & Gray, K. E. “A System for Real-Time Drilling Performance Optimization and Automation Based on Statistical Learning Methods,” SPE 176804-MS presented at the SPE Middle East Intelligent Oil & Gas Conference and Exhibition, 15-16 September 2015, Abu Dhabi, United Arab Emirates.
Wang, G. and Carr, T.R. Marcellus Shale Lithofacies Prediction by Multiclass Neural Network Classification in the Appalachian Basin. Mathematical Geosciences 44(8): 975–1004, 2012.
Technical Paper authored by: Mohammed A. Al-Ghazal and Viranchi Vedpathak