Lead Story: Using Python to Improve Your Geologic Interpretations

from December 2020 Outcrop

by The Rocky Mountain Association of Geologists

Online Lunch Talk: Dr. Tyler Lyson & Dr. Ian Miller

LEAD STORY USING PYTHON TO IMPROVE YOUR GEOLOGIC INTERPRETATIONS A Bottom Hole Temperature Workflow

BY MATTHEW W. BAUER, P.G.

How much does your team spend in time and money gathering datasets, fixing errors, and relating multiple data sources before you can interpret? Are there conventional workflows that are too time-intensive to use over regional areas? What about datasets where the variable and sample counts are so large it’s hard to wrap your mind around? Have you tried to understand the economics and risk of a project without having absolute inputs? Adding python programming to your workflows can help. It isn’t a magic bullet so understanding what it can and can’t do is important. Workflow automation and machine learning can’t replace the domain expertise, abstract thought, or creativity of a good geologist. That said, coding literacy provides large benefits to earth scientists by being able to acquire and utilize large datasets efficiently. I also argue that those benefits can start to be realized earlier in the learning process than the traditional “10,000-hour” learning threshold. Especially time savings in accessing and cleaning data. So, what packages should you start learning? OUTCROP | December 2020 12 Vol. 69, No. 12 | www.rmag.org

»CONTINUED FROM PAGE 12

The python library pandas allows us to efficiently open a larger number of file formats in larger sizes than Excel can handle. Finding and fixing errors in datasets with boolean filters and regular expressions can eliminate the days of sifting through an Excel workbook fixing issues by hand. The ease and speed in which higher-level math can be applied to these multidimensional arrays can simplify the multiple columns and complex nested functions required in Excel. Merging datasets in pandas greatly simplifies the sometimes tedious vlookup process. The fuzzy string matching capabilities of the library fuzzywuzzy combined with pandas can easily make looking up missing API numbers from well names with slight variations in spelling a thing of the past. The python libraries lasio and welly allow us to open, interact with, and save LAS logs. Limitations on time and data budgets can limit the number of well logs, tops, and downhole testing that we base our interpretations on and, potentially, increase a project’s risk. Automating the process of accessing public data, also known as scraping, with the libraries requests, selenium, and urllib can increase the amount of data we can access and vastly improve our understanding of natural systems.

Unfamiliar with the python packages mentioned so far in shaded text? Does this all seem foreign? If you have the aptitude and drive I want to help. I contribute to some of these projects but all of the following are the cream of my recommendations of where to get started. • General Python Course (Free): Dr. Chuck’s Python for Everybody • General Python Course (Free): MIT 6.0001o Community (Free): The Software Underground • O & G Applied Python (Free): HackersVillageo If I had to pick one blog: Agile* • Geostatistics(Free): Dr. Michael Pyrcz aka GeostatsGuyo Continuing Education: Practical Python for Earth Scientists (RMAG, Mines, GSA) • Community (Free): The Software Undergroundo Graduate Certificate: Mines Earth Resource – Data Science • If I had to pick one blog: Agile* • Continuing Education: Practical Python for Earth Scientists (RMAG, Mines, GSA)Bottom Hole Temperatures • Graduate Certificate: Mines Earth Resource– Data Science Let us take a look at an example of an automated workflow for LAS logs. Whether you're mapping geothermal prospects or building maturity models, bottom hole temperature (BHT) BOTTOM HOLE TEMPERATURESdata is essential. Some G&G software packages allow parsing of LAS file headers, otherwise

Let us take a look at an example of an automated workflow for LAS logs. Whether you’re mapping geother you are left reviewing one file at a time. Parsing can be automated, which makes the processing mal prospects or building maturity models, bottom hole temperature (BHT) data is essential. Some G&G software of approximately 64k files on a solid-state drive for BHT data achievable overnight. First, we will packages allow parsing of LAS file headers, otherwise you are left reviewing one file at a time. Parsing can be auto-import the libraries that we will be using and set some variables for working in a jupyter mated, which makes the processing of approximately 64k files on a solid-state drive for BHT data achievable over-notebook. Do you want to follow along with the code in this notebook? You can find it here night. First, we will import the libraries that we will be using and set some variables for working in a jupyter note-(https://youtu.be/KUuoFRye37w) and instructions to install python and open a jupyter book. Do you want to follow along with the code in this notebook? You can find it here and instructions to install python and open a jupyter notebook here. notebook here (https://github.com/Rocks-n-Code/PythonCourse/tree/master/BHT).

#Import libraries import pandas as pd #library for working with dataframes import geopandas as gpd #library for working with geospatial data import lasio #library for working with LAS files import glob #library for finding files import os #library for interacting with files import re #library for regular expressions

#Set options pd.set_option('display.max_columns', None) %matplotlib notebook

With the library glob we can search for our files with the file path and a wildcard character of » CONTINUED ON PAGE 15 "*". For Windows users, once we have the list of files then we’ll make the slash direction uniform. Wonder where I acquired such a large set of digital well logs? I’ve previously scraped OUTCROP | December 2020 14these logs from the COGCC website with python. Vol. 69, No. 12 | www.rmag.org

»CONTINUED FROM PAGE 14 With the library glob we can search for our files with the file path and a wildcard character of

With the library glob we can search for our files with the file path and a wildcard character of “*”. For Win-"*". For Windows users, once we have the list of files then we’ll make the slash direction dows users, once we have the list of files then we’ll make the slash direction uniform. Wonder where I acquired uniform. Wonder where I acquired such a large set of digital well logs? I’ve previously scraped such a large set of digital well logs? I’ve previously scraped these logs from the COGCC website with python. these logs from the COGCC website with python.

#Find your LAS files las_path = "D:/CO/CO_LAS/*.las" files = [x.replace('\\','/') for x in glob.glob(las_path)] print(len(files),'LAS files found.')

Next, we will loop through those files and use the library lasio to open each file. We will then

Next, we will loop through those files and use the library lasio to open each file. We will then pull the LAS pull the LAS file header parameters into a temporary pandas dataframe and filter to aliases of file header parameters into a temporary pandas dataframe and filter to aliases of temperature readings in the temperature readings in the header info and collect the highest reading. We can also collect the header info and collect the highest reading. We can also collect the maximum measured depth from each log. We’ll maximum measured depth from each log. We'll add this, or concatenate, to our primary add this, or concatenate, to our primary dataframe that we’ll save out occasionally and once we are finished. dataframe that we'll save out occasionally and once we are finished.

temp_alias = ['BHT','BHT:1','BHT:2','BHT:3','BHT:4', 'MRT','MRT:1','MRT:2','MRT2:1', 'MRT1:1','MRT2:2','MRT1:2','MRT:3', 'MRT2','MRT3','MRT4','MRT1', 'MRT 192', 'MAXRECTEMP', 'BOTTEMP','BOTTOMHOLETEMP','TEMP','TEMPERATURE','MAXTEMP', 'BHTEMP','BHTEMP:1','BHTEMP:2','BHTEMP_SRC']

celcius_units = ['degC','DEGC']

depth_alias = ['DEPT','DEPTH','M_DEPTH','DPTH','DEPT:1','MD','DEPTH:1', 'DEPTH_HOLE','BDEP','DMEA','TOTAL_DEPTH','TVD','DEP', 'TDEP','DEPTMEAS','DEPT_PNN','DEPT_CBL','"DEPTH"', 'TVD:1','DEP:1']

#Make empty list and dataframe err_files = [] bht_df = pd.DataFrame()

#Place file paths that cause hangups here hangups = []

#Saveout counter i = 0

#Loop through files for file in files: if file in hangups: continue » CONTINUED ON PAGE 16 i += 1 try: las = lasio.read(file) params = {'mnemonic' : [x.mnemonic for x in las.params], 'unit' : [x.unit for x in las.params], Vol. 69, No. 12 | www.rmag.org 15 OUTCROP | December 2020'value' : [x.value for x in las.params], 'descr' : [x.descr for x in las.params]}

»CONTINUED FROM PAGE 15#Saveout counter i = 0

#Loop through files for file in files: if file in hangups: continue i += 1 try: las = lasio.read(file) params = {'mnemonic' : [x.mnemonic for x in las.params], 'unit' : [x.unit for x in las.params], 'value' : [x.value for x in las.params], 'descr' : [x.descr for x in las.params]}

temp = pd.DataFrame(params) temp['API'] = file.split('/')[-1].split('_')[0] temp['file'] = file.split('/')[-1]

#Pull Maximum Depth depths = [x.mnemonic for x in las.curves if x.mnemonic in depth_alias] max_depth = max([las[x].max() for x in depths]) temp['MaxMD'] = max_depth

#Filter to just temperature alisas temp = temp[temp.mnemonic.isin(temp_alias)] bht_df = pd.concat([bht_df,temp],ignore_index=True)

except Exception as e: print(file,e) err_files.append(file)

if i > 100: i = 0 print(round(files.index(file)/len(files)*100,2),'%complete') bht_df.to_csv('bht_df.csv',index=False)

bht_df.to_csv('bht_df.csv',index=False)

Once we have the BHT data parsed from the LAS files, we need to clean it up to make it usable. The values con-Once we have the BHT data parsed from the LAS files, we need to clean it up to make it usable. tain non-number characters such as “°” or “ F” that we will need to remove. Regular expressions, or re, can do this The values contain non-number characters such as "°" or " F" that we will need to remove. easily in a single line of code. Regular expressions do this by expanding our search capabilities to ranges of charac-Regular expressions, or re, can do this easily in a single line of code. Regular expressions do ters or even patterns of characters. In this case, we’ll use re to search for anything that isn’t a number or a deci-this by expanding our search capabilities to ranges of characters or even patterns of characters. mal place and remove those characters. In this case, we'll use re to search for anything that isn't a number or a decimal place and remove those characters. » CONTINUED ON PAGE 17

pre_count = bht_df.shape[0]

#Drop Null Values bht_df = bht_df[bht_df.value.notnull()] OUTCROP | December 2020 16 Vol. 69, No. 12 | www.rmag.org #Remove non-number characters other than "." bht_df.value = bht_df.value.apply(lambda x: re.sub("[^0-9.]","",x))

Regular expressions, or , can do this easily in a single line of code. Regular expressions do The values contain non-number characters such as "°" or " F" that we will need to remove. this by expanding our search capabilities to ranges of characters or even patterns of characters. Regular expressions, or re, can do this easily in a single line of code. Regular expressions do In this case, we'll use re to search for anything that isn't a number or a decimal place and LEAD STORY this by expanding our search capabilities to ranges of characters or even patterns of characters. In this case, we'll use re to search for anything that isn't a number or a decimal place and remove those characters. » CONTINUED FROM PAGE 16 remove those characters. pre_count = bht_df.shape[0]

pre_count = bht_df.shape[0]#Drop Null Values bht_df = bht_df[bht_df.value.notnull()] #Drop Null Values bht_df = bht_df[bht_df.value.notnull()] #Remove non-number characters other than "." bht_df.value = bht_df.value.apply(lambda x: re.sub("[^0-9.]","",x)) #Remove non-number characters other than "." bht_df.value = bht_df.value.apply(lambda x: re.sub("[^0-9.]","",x))#Drop empty values bht_df = bht_df[bht_df.value != ''] #Drop empty values bht_df = bht_df[bht_df.value != ''] #Drop multiple "." Tool IP addresses? bht_df = bht_df[bht_df.value.apply(lambda x: str(x).count('.') <= 1)] #Drop multiple "." Tool IP addresses? bht_df = bht_df[bht_df.value.apply(lambda x: str(x).count('.') <= 1)] We can now change the variable type from strings into float numbers. Once the data is actually We can now change the variable type from strings into float numbers. Once the data is actually numbers we can convert the Celsius values to Fahrenheit and change the unit label. Doing this with where allows us to convert units in select locations by using a boolean that is only false where we want to change the data in that column. We can now change the variable type from strings into float numbers. Once the data is actually numbers we can convert the Celsius values to Fahrenheit and change the unit label. Doing this with where allows us to convert units in select locations by using a boolean that is only false numbers we can convert the Celsius values to Fahrenheit and change the unit label. Doing this with where allows us to convert units in select locations by using a boolean that is only false where we want to change the data in that column. where we want to change the data in that column. bht_df['value'] = bht_df['value'].astype(float)

bht_df['value'] = bht_df['value'].astype(float)for C_col in ['DEGC','degC']: bht_df['value'] = bht_df['value'].where(bht_df['unit'] != C_col, for C_col in ['DEGC','degC']: bht_df['value'] = bht_df['value'].where(bht_df['unit'] != C_col, other=bht_df['value'].apply(lambda x: (9/5)*x + 32))

bht_df['unit'] = bht_df['unit'].where(bht_df['unit'] != C_col, other='degF') #Drop Bad Values bht_df = bht_df[bht_df.value >= 0] bht_df = bht_df[bht_df.value < 500]

print(pre_count - bht_df.shape[0],'of',pre_count,'rows dropped.')

MERGING DATASETSMerging Datasets

We still need to add location and TVD information to the BHT data. To accomplish this we are going to use geopandas to open a well spot shapefile from COGCC and manipulate it with the same syntax as pandas. We will We still need to add location and TVD information to the BHT data. To accomplish this we are use this shapefile for the spatial locations for each well and merge it to our bht_df dataframe.going to use geopandas to open a well spot shapefile from COGCC and manipulate it with the » CONTINUED ON PAGE 18same syntax as pandas. We will use this shapefile for the spatial locations for each well and merge it to our bht_df dataframe.

wells = gpd.read_file('./WellSpot/Wells.shp') wells.head()

LEAD STORY wells = gpd.read_file('./WellSpot/Wells.shp') wells.head() » CONTINUED FROM PAGE 17

To merge depth and location data from the well geodataframe to the bht_df dataframe we

To merge depth and location data from the well geodataframe to the bht_df dataframe we will need a comwill need a common column. To achieve this we'll format the API number in bht_df to create a mon column. To achieve this we’ll format the API number in bht_df to create a new column, “API_Label”, that is new column, “API_Label”, that is the same format as wells. Applying a lambda function makes the same format as wells. Applying a lambda function makes this easy and efficient.this easy and efficient.

#Make API_Label bht_df['API_Label'] = bht_df.API.apply(lambda x: x[:2] + '-' + x[2:5] + '-' + x[5:10]) bht_df.merge(wells[['API_Label','Max_MD','Max_TVD','geometry']], how='left', #Merge on='API_Label') bht_df = bht_df.merge(wells[['API_Label','Max_MD','Max_TVD','geometry']], how='left', on='API_Label')

Plotting Data for QA

PLOTTING DATA FOR QAPandas allows us to easily visualize our data in a scatter plot and colorize it with a property of Pandas allows us to easily visualize our data in a scatter plot and colorize it with a property of our data. This makes it easier to identify potentially erroneous data points. We will calculate the count of each unique value using value_counts then plot setting the color to that property. Plotting Data for QA our data. This makes it easier to identify potentially erroneous data points. We will calculate the count of each unique value using value_counts then plot setting the color to that property. Pandas allows us to easily visualize our data in a scatter plot and colorize it with a property of our data. This makes it easier to identify potentially erroneous data points. We will calculate the #Creating Counts column count of each unique value using value_counts then plot setting the color to that property.vcounts = bht_df.value.value_counts() values = vcounts.keys().tolist() counts = vcounts.tolist() #Creating Counts column counts_dict = dict(zip(values, counts)) vcounts = bht_df.value.value_counts() bht_df['count'] = bht_df.value.apply(lambda x: counts_dict[x]) values = vcounts.keys().tolist() counts = vcounts.tolist() #Plotting Values with TVD counts_dict = dict(zip(values, counts)) bht_df[bht_df.value < 400].plot.scatter(x='value', bht_df['count'] = bht_df.value.apply(lambda x: counts_dict[x]) y='TVD', c='count', #Plotting Values with TVD colormap='viridis') bht_df[bht_df.value < 400].plot.scatter(x='value', y='TVD', » CONTINUED ON PAGE 19 c='count', OUTCROP | December 2020 18 Vol. 69, No. 12 | www.rmag.org colormap='viridis')

»CONTINUED FROM PAGE 18

Note the values that are not following the depth trend. Whether defaults or "pencil-whipped", the Note the values that are not following the depth trend. Whether defaults or “pencil-whipped”, the values do values do not appear to be correct so let's remove them. We can do this easily with ~, which means the opposite of, the isin boolean which checks for the presence of a value in a list. not appear to be correct so let’s remove them. We can do this easily with ~, which means the opposite of, the isin boolean which checks for the presence of a value in a list. Note the values that are not following the depth trend. Whether defaults or "pencil-whipped", the values do not appear to be correct so let's remove them. We can do this easily with ~, which means the opposite of, the isin boolean which checks for the presence of a value in a list. #Removing suspect values bad_values = [212,211.99986,200,150,70,32,0] bht_df = bht_df[~bht_df['value'].isin(bad_values)] #Removing suspect values bad_values = [212,211.99986,200,150,70,32,0] #Plot data bht_df = bht_df[~bht_df['value'].isin(bad_values)] bht_df.plot.scatter(x='value', #Plot data y='Max_TVD', c='count', bht_df.plot.scatter(x='value', colormap='viridis') y='Max_TVD', c='count', colormap='viridis')

»CONTINUED ON PAGE 21

FIGURE 1: BHT data in the state of Colorado. Final product from the workflow within this article.

Reach: Further, higher.

Advertise in The Outcrop

The Rocky Mountain Association of Geologists combines the industry's most advanced technology, precise targeting and a quality network to deliver results for advertisers & publishers. For more information on how you can advertise in upcoming issues of The Outcrop, including basic information, how to submit an ad, size options, advertising rates, and the agreement, click on the link below. click here to Learn more.

»CONTINUED FROM PAGE 19

SAVING DATA Depending on how you’d like to use this data would dictate the file output type. Let’s look at three of the most common. Saving Data Depending on how you'd like to use this data would dictate the file output type. Let's look at Saving Data Depending on how you'd like to use this data would dictate the file output type. Let's look at three of the most common. three of the most common. CSV CSVs allow us a lot of flexibility with their small size and wide range of programs that can open them. If you’re a Petrel user you may need tab-delimited import data so let’s save out that format as well. CSV CSVs allow us a lot of flexibility with their small size and wide range of programs that can open them. If you're a Petrel user you may need tab-delimited import data so let's save out that CSV CSVs allow us a lot of flexibility with their small size and wide range of programs that can open them. If you're a Petrel user you may need tab-delimited import data so let's save out that format as well. format as well. #Save out to csv bht_df.to_csv("CO_BHT.csv", index=False)#Save out to csv bht_df.to_csv("CO_BHT.csv", index=False)#Save out to tab-delimited bht_df.to_csv("CO_BHT_tab.csv", index=False, sep="\t")#Save out to tab-delimited bht_df.to_csv("CO_BHT_tab.csv", index=False, sep="\t")

EXCEL

Don’t think that while I may rip on Excel that it doesn’t have its place in data analysis. It is the standard soft Excel ware for working with data in most industries, makes nice plots, and has some decent tools built-in. Because of Excel Don't think that while I may rip on Excel that it doesn't have its place in data analysis. It is the that, Excel is an excellent choice for sharing work with other people. Don't think that while I may rip on Excel that it doesn't have its place in data analysis. It is the standard software for working with data in most industries, makes nice plots, and has some » CONTINUED ON PAGE 22standard software for working with data in most industries, makes nice plots, and has some

decent tools built-in. Because of that, Excel is an excellent choice for sharing work with other people. LEAD STORY decent tools built-in. Because of that, Excel is an excellent choice for sharing work with other » CONTINUED FROM PAGE 21 people.bht_df.to_excel("CO_BHT.xlsx", index=False)

bht_df.to_excel("CO_BHT.xlsx", index=False)

GIS

GIS The power of preserving & being able to reference data's spatial location cannot be

The power of preserving & being able to reference data’s spatial location cannot be understated. Geopandas GIS understated. Geopandas allows us to convert to a geodataframe and save easily. It also makes allows us to convert to a geodataframe and save easily. It also makes managing CRS a breeze. The power of preserving & being able to reference data's spatial location cannot be managing CRS a breeze. understated. Geopandas allows us to convert to a geodataframe and save easily. It also makes managing CRS a breeze. #Make the GeoDataFrame gdf = gpd.GeoDataFrame(bht_df, geometry='geometry',crs=wells.crs) #Make the GeoDataFrame gdf = gpd.GeoDataFrame(bht_df, geometry='geometry',crs=wells.crs) #Save GeoDataFrame gdf.to_file('bht_gdf.shp') #Save GeoDataFrame gdf.to_file('bht_gdf.shp') #Change CRS gdf.to_crs({'init':'EPSG:4326'},inplace=True) #Change CRS gdf.to_crs({'init':'EPSG:4326'},inplace=True) #Save Out Copy gdf.to_file('bht_gdf_WGS84.shp') #Save Out Copy CONCLUSIONgdf.to_file('bht_gdf_WGS84.shp')

In this workflow, we’ve shown how to use python to parse a large number of LAS files for information data out of the header. While not yet corrected BHT, you can already see regional trends. With that, you can identify areas Conclusion of interest and make corrections from wells we have identified with BHT data parsed from public LAS files. This workflow can be easily adapted for automating other time intensive tasks. Combined with top information (check out COGCCpy) we can select a formation of interest, calculate properties such as Vshale, maximum resistivity, Sw,, Conclusion In this workflow, we've shown how to use python to parse a large number of LAS files for or SHmax and map it over an area. Curve properties gathered with a spatial selection can be used to normalize information data out of the header. While not yet corrected BHT, you can already see regional curves without washing out basin wide trends. Pay flag curves can be added to the log files not just with curve val In this workflow, we've shown how to use python to parse a large number of LAS files for trends. With that, you can identify areas of interest and make corrections from wells we have ues, but curve derivatives, providing a non-biased guide to identify areas of progradation and improved potential information data out of the header. While not yet corrected BHT, you can already see regional identified with BHT data parsed from public LAS files. This workflow can be easily adapted for for TOC preservation.trends. With that, you can identify areas of interest and make corrections from wells we have automating other time intensive tasks. Combined with top information (check out COGCCpy) we

If you’d like this code in a Jupyter notebook, the data it produced, python lessons, or other workflows visit my identified with BHT data parsed from public LAS files. This workflow can be easily adapted for can select a formation of interest, calculate properties such as Vshale, maximum resistivity, Sw,, GitHub. automating other time intensive tasks. Combined with top information (check out COGCCpy) we or SHmax and map it over an area. Curve properties gathered with a spatial selection can be can select a formation of interest, calculate properties such as Vshale, maximum resistivity, Sw,, used to normalize curves without washing out basin wide trends. Pay flag curves can be added or SHmax and map it over an area. Curve properties gathered with a spatial selection can be to the log files not just with curve values, but curve derivatives, providing a non-biased guide to The code contained within this article is covered by an MIT license. used to normalize curves without washing out basin wide trends. Pay flag curves can be added identify areas of progradation and improved potential for TOC preservation. to the log files not just with curve values, but curve derivatives, providing a non-biased guide to identify areas of progradation and improved potential for TOC preservation.

Lead Story: Using Python to Improve Your Geologic Interpretations

Next Article

Online Lunch Talk: Dr. Tyler Lyson & Dr. Ian Miller

LEAD STORY USING PYTHON TO IMPROVE YOUR GEOLOGIC INTERPRETATIONS A Bottom Hole Temperature Workflow

MERGING DATASETSMerging Datasets

Plotting Data for QA

Reach: Further, higher.

Advertise in The Outcrop

EXCEL

GIS

More articles from this publication:

Online Lunch Talk: Dr. Tyler Lyson & Dr. Ian Miller

News From The RMAG Foundation

This article is from:

December 2020 Outcrop