10 minute read
Mining With Machine Learning
Mikael Artursson, Minalyze Pty Ltd, Australia, considers why machine learning should be used in the mining industry.
Machine learning (ML), artificial intelligence, and deep learning are all buzz words that have been frequently mentioned in the mining industry over the past few years. A question that oft en comes up in discussions related to data in the industry is: “Can ML do this?”. Very oft en, the answer is yes, but there is no point introducing ML just for the sake of it. The first step is to define the problem that one would like to solve. Why should ML be considered in the first place? In general, ML could be set up to solve a well-defined problem in a fast, objective, and cost-eff icient way. There is no doubt that ML could be used to solve almost any problem out there, although this is not without certain pre-requisites and it does not mean it is practically the best solution for just any issue.
Solving a real issue
The issue that this article focuses on solving is a well-known issue related to classification of rock-types when logging lithologies. This is something that any geologist that has been logging rock core would be able to relate to. The task of logging and classifying rock types is very time consuming, subjective and iterative. The subjective nature of the task, in particular, means that results vary based on the person, their educational background and, of course, any previous experience. A classic saying is that if four geologists look at a piece of rock, there are at least five answers to what type of rock it is. This problem could escalate quickly, since there is a turnover of people in the industry and the new persons need to be calibrated towards the
task anew, which takes time and aff ects the quality and time of logging.
Attempting to solve this particular issue is not new, there have been several attempts in the past and present, but it generally comes down to what data is available and at hand for solving the problem.
Getting the ones and zeroes
The approach
In order to embark on the ML journey, the primary pre-requisite is data, and lots of it. To complicate things further, this data should also preferably be consistent, representative and, most importantly, well understood and classified. In this case, consistent means that the data is acquired in the same way and of the same quality. Take a photograph, for example: the photograph in this case should be taken using the same light source, the same focus, and in the same format and resolution.
One approach in solving this specific task has been related to using photographs of the rock. However, a lot of the information that is important in deriving the class of a rock is a function of both visual (i.e. textures and colour) and compositional (i.e. geochemical or mineralogical) features in the rock, which might be hard to discern from one type of data alone.
Another approach has been to use compositional data, but this has generally been based on too few sample points, or a spatial resolution that is not high enough to avoid sampling over boundaries or to pick up shorter intervals of distinct features as they would be diluted. Another issue with depth related data is that it is hard to properly match where a certain dataset is coming from in relation to the rock itself.
The approach that this article will focus on combines a wider set of data that covers both visual and compositional aspects of the rock, that is acquired at high-resolution, standardised to a consistent quality, and correctly depth related.
In 2014, a new core analysis and digitalisation instrument called the Minalyzer Core Scanner was released onto the market, defining a new category of industrial, high-throughput instruments. It was designed to address many of the data acquisition issues addressed previously in this article (Figure 1).
It is a proprietary and patented system that acquires high-resolution photography, alongside a LiDAR that maps the topology of the core tray and sample in 3D. This enables for the detailed, fast and non-destructive elemental analysis of core samples through X-ray fluorescence (XRF) analysis, with results available directly on site or online within hours, rather than weeks or months.
At the point of release, ML in this context was not generally discussed and very much at a concept state. Over time, the data has proven to work very well as a base for ML, primarily due to its consistent nature.
It is important that instrument manufacturers and data providers engage in and understand common ML algorithms and workflows, in order to prepare the data in such a way that it is fit for purpose for the ML exercise, as well as being able to seamlessly deliver and present the data to the algorithm. Data collection and preparation oft en amounts to up to 80% of the data scientists time in reaching a result from ML.1
The type of operations necessary for any numerical data is to make sure the data is ‘clean’. Clean data in this context refers to mean data that does not contain odd characters and has been normalised. With regards to a photograph, cleaning the dataset might be compared to cropping the image to a consistent resolution and size, ensuring that non-retrievable data is coloured in a consistent colour, to name but a few examples. An example of a core tray that has been divided in several intervals with associated geochemistry can be seen in Figure 2.
The ground truth
Figure 1. Minalyzer Core Scanner.
Figure 2. Core tray with intervals. To train a ML model, a geological log needs to be produced classifying each high-resolution interval along a set of drill holes. These classifications need to be based on the geochemical interval data set derived from the XRF elemental analysis, as well as the corresponding core photograph from the interval. An automated log can then be generated (Figure 3). To achieve this, the high-resolution interval geochemical and core photography data of a new hole containing the same/similar rock types was presented to the deep learning neural network that was trained in the ML model. The diff erent rock types have been visualised as diff erent colours along the X-axis (downhole). For each
interval, the probability of the other rock types is displayed of a 100% total probability in the Y-axis.
It is apparent that the training of a ML model is one of the most important steps in reaching a good and relevant result. It is a slow process and that should be the case, especially when determining and establishing the so called ‘ground truth’. The ground truth is the step where a geologist classifies each distinct sample, and it is the quality of this exercise that will determine the outcome of the training and future prediction of rock types in unseen drill holes.
Another aspect that makes the process of training slow is the hardware required to perform the training, coupled with the number of ‘epochs’ or training iterations. The old saying ‘practice makes perfect’ very much applies to ML as well. The more iterations one uses when training, the better the training will become until a limit is reached, where one might be overtraining both the numbers of repeats and the number of inputs or variables.
A good rule of thumb that would apply in any teaching is ‘if the teacher is confused, then the students will be confused as well’. By not classifying the dataset correctly, the algorithm will be confused, consequently aff ecting the end prediction.
The main benefit of having trained an algorithm is the possibility to utilise it for generating a first pass lithological log, fully automated and at speed – the example log in Figure 3 took approximately just 50 seconds to generate, as opposed to a week by traditional means.
Subjectivity remains an important factor to keep in mind when evaluating an automated log, such as one generated by an ML algorithm. As an example, say the prediction made in the Figure 3 log reached an accuracy of 79%. One might say that 21% of the data in the downhole classification is wrongly classified. However, others would argue that the algorithm is likely to be correct in most of the samples that have been deemed wrong, in comparison to a human generated lithological log. The truth is probably somewhere in between. One contributing factor is likely the resolution or detail to which a geologist has logged the core, but undoubtedly human subjectivity does play part as well.
Integrating into the workfl ow
The most important part of integrating ML is the full implementation of the algorithm into existing workflows. It is a diff icult transition to make, mainly due to the nature of change and people’s natural reluctance to change, but it returns significant rewards.
In addition to this, there is also another aspect to consider: the workflow and how the solutions fit in. ML must be implemented in such a way that the overall throughput and flow of core processing is not aff ected negatively. Therefore, the data acquisition, followed using the algorithm and the subsequent manual logging based on the first pass automatic log, must be in symbiosis and balance out. Depending on the amount of drill rigs operating on site, the appropriate data acquisition instruments needs to be balanced. The algorithm is most likely never going to be the bottleneck.
Benefi ts
Going about implementing a ML-based automatic logging algorithm for the task of lithological logging at a mining operation has several benefits.
The first benefit is that before the development of the algorithm and the underlying knowledge of the deposit takes place, the data acquisition will likely have to be improved – both in terms of the way and speed at which the data is captured, but also the quality and density of the data, which will help to make it meaningful for the algorithm. The users of the data will also find that the increased quality and standardisation will aid them in any manual processes.
The second benefit is that the knowledge around the deposit and which input variables are important for fingerprinting the diff erent rock types increases, as part of the journey entails a more detailed logging and classification of the rock types for training purposes. To this point, any confusion between rock types makes it more apparent where to focus any further deeper analysis aiming to better distinguish and diff erentiate between rock types.
The third benefit is the most obvious, and constitutes the goal of the exercise itself: making the task of lithological logging faster, more objective, and less iterative. ML makes the task faster by using a computer to process large amounts of data through a trained algorithm; objective since the decisions are based on trained and validated data; and less iterative since the drill core is likely to not have to be re-logged aft er subsequent analysis.
Additional benefits include less operational sensitivity to turnover of employees, since training can be faster and based on more standardised logging. The logs could possibly be used to improve the selection of samples for further analysis, thus saving on time and spending related to sample preparation and, ultimately, the assay budget itself.
Conclusion
Considering the benefits mentioned in this article, the original question of 'why should ML be used in the mining industry?' changes to 'why not use ML in the mining industry?'. If one has the data, why not put it to good use?
References
1. ‘Why data preparation is an important part of data science?’, ProjectPro, (2021) https://www.dezyre.com/article/why-data-preparation-is-animportant-part-of-data-science/242, [Accessed 30 June 2021].
Figure 3. Automated lithological log.