I’m still riding the tide of new information that swept over me at this year’s EAGE (European Association of Geoscientists and Engineers) conference in Paris last month. This wave of change was obvious for those that looked, but many still didn’t see it for what it is: the future. I’m talking about machine learning (ML) and all those other very contemporary techniques that sit in the nebulous world of computing unicorns and rainbows, like AI and deep learning.
Three motley crews...
When it comes to machine learning, I found three distinct crews at EAGE: the actively engaged, the sceptics, and the ambivalent. Those actively engaging in ML are exactly the kinds of (relatively) early adopters that participated in the excellent Subsurface Hackathon, hosted by Total the weekend before EAGE. Vocal sceptics of these new techniques, I would argue, are resistant because of the perceived disruption to software sales, rather than an active aversion to the techniques themselves. Those who are plain ambivalent about machine learning are the same people ignoring the data sea change: carrying on as they were until somebody hands over a better widget.
Within this last group I include EAGE themselves; their long-standing inertia to this area finally gave way to a workshop and a couple of sessions on machine learning this year.
Yet, with still a chunk of the expo floor and technical sessions given over to high-performance computing (HPC) drag racing and algorithm juicing for the inversion modellers, isn’t it time to embrace the new ML computing paradigm too? Maybe next year ...
Less is more
Why should we care? Well, in 2017 the oil industry is all about doing more with less. Oil employs some very smart people, but in a lot of subsurface workflows, we’re asking them to use their faculties and judgement to answer some very basic questions: “How similar is this thing to this thing?” (correlation between wells), and “where have I seen this before?” (rock characteristic — or facies — associations). The first is a statistics approach, the latter a data mining question. To my mind, the data mining question is the most economically significant.
"Machine learning is not ‘science-y’ enough"
But first, the similarity thing. My colleague Jane has just blogged about the potential for — and the limitations of — machine learning in the subsurface world. The technique is still very much in its infancy, because there is a lot of resistance to the approach: Machine learning is not ‘science-y’ enough. The demographics of the domain data (i.e. its statistical texture) is poorly understood, and the data is typically either locked away in application projects or of dubious quality. All of this impedes training machine learning algorithms. We are currently incapable of training our ML algorithms to the extent that Google et al. have. They can identify cats, cars, trees, people (so self-driving cars don’t hit them), and have a good understanding of how to assemble the components of their ML capabilities.
Hell or high water?
However, if you look at the reports from the hackathon, you’ll see that a team of near-strangers trained a machine learning algorithm to identify and classify direct reflections and multiples in synthetic shot gathers in less than a weekend. They did all of that using open-source tools.
This exemplifies the work those using machine learning techniques on seismic and petrophysical data to answer, “How similar is this thing to this thing?” without actually having to do something as mundane as clicking a mouse
button hundreds of times to digitise geological features.
We were training algorithms to extract features for us, because we (i.e. the data scientists) could assemble the low-level parts ourselves that let the computer do the boring stuff.
Sharing is caring ...
Now here’s the catch: When you have a lot (a lot!) of data, you have a statistically reliable data set. This is where the oil industry has the opportunity to turn the tide. If we are to use machine learning as a tool in our industry, then we need to learn what approaches work best and train our data on basin-scale data sets.
Even better (to answer the second question of “Where have I seen it before?”) would be to train on data sets from many basins. This is a commercial and political challenge, rather than a technical one. Only the largest operators have anything approaching the volumes of seismic data and well logs that could make true machine learning a possibility. Some of the national repositories (UK Oil and Gas Authority/CDA, Norwegian Petroleum Directorate) possess large and diverse data sets that could serve this purpose on an industrial scale.
How can this become a viable business model? I’m not for a minute suggesting that all of the high-cost, high-value offshore data is provisioned for free and open use. What I am suggesting is that the industry does what it’s good at — playing the cooperation/competition game.
"To achieve the scale of data demanded by machine learning to succeed, we require leadership"
The tide of data will lift all ships, and the bigger industry players should cooperate by creating massive training datasets behind a firewall, letting people work on it — either to develop the tools that could later become competitive enablers or for immediate business insight through prospecting and licensing. This could even be monetised by government agencies if they charged by the CPU clock cycle: entirely viable on public cloud infrastructure today. To achieve the scale of data demanded by machine learning to succeed, we require leadership.
The UK Oil and Gas Authority just announced free access to a significant amount of offshore data, which sets the bar high, and I look forward to seeing what learnings — both technical and economic — will arise.
On that second question of “Where have I seen this before?” — once we’ve worked out how to use machine learning to quantify what “this” is when it comes to seismic facies, well log properties, etc., the pressing question in the North Sea sector becomes that of understanding where bypassed pay exists based on an ML view of what a producing formation looks like, and feed this into the UK Continental Shelf decommissioning strategy whilst we still have the infrastructure to exploit reserves.
Ageing infrastructure: We will soon lose the means to extract North Sea resources
It is clear that as an industry we have the data, the people, and the tools. The issue we have is that we are still treading water when it comes to putting our resources to use in an effective way.