The Alternative to Machine Learning
Paul Compton
The University of NSW, Sydney, Australia
Byeong Ho Kang
The University of Tasmania, Hobart, Australia
First edition published 2021 by CRC Press
6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742
and by CRC Press
2 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN
© 2021 Paul Compton, Byeong Ho Kang
CRC Press is an imprint of Taylor & Francis Group, LLC
The right of Paul Compton, Byeong Ho Kang to be identified as authors of this work has been asserted by them in accordance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988.
Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, access www. copyright.com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. For works that are not available on CCC please contact mpkbookspermissions@tandf.co.uk
Trademark notice: Product or corporate names may be trademarks or registered trademarks and are used only for identification and explanation without intent to infringe.
Library of Congress Cataloging‑in‑Publication Data
Names: Compton, Paul, 1944- author. | Kang, Byeong-Ho, author.
Title: Ripple-down rules : the alternative to machine learning / Paul Compton, The University of NSW, Sydney, Australia, Byeong Ho Kang, The University of Tasmania, Hobart, Australia. Description: First edition. | Boca Raton : CRC Press, 2021. | Includes bibliographical references and index.
Identifiers: LCCN 2020047019 | ISBN 9780367644321 (paperback) | ISBN 9780367647667 (hardback) | ISBN 9781003126157 (ebook)
Subjects: LCSH: Expert systems (Computer science) | Ripple down rules (Machine learning) Classification: LCC QA76.76.E95 C653 2021 | DDC 006.3/3–dc23
LC record available at https://lccn.loc.gov/2020047019
ISBN: 978-0-367-64766-7 (hbk)
ISBN: 978-0-367-64432-1 (pbk)
ISBN: 978-1-003-12615-7 (ebk)
Typeset in Minion Pro by SPi Global, India
REFERENCES,
INDEX, 175
Preface
Artificial Intelligence (AI) once again seems to hold out the promise of doing extraordinary things, particularly through the magic of machine learning. But to do extraordinary things an AI system needs to have a lot of knowledge. For example:
• Self-driving cars have to understand what their vision system, and their other sensors, are telling them about the world and they have to know what to do in all the circumstances they will encounter. A huge amount of knowledge about the world and how to interact with it is fundamental to being able to drive well.
• If a medical AI system which provides advice to GPs about laboratory results is going to provide detailed patient-specific advice at the level of a consultant chemical pathologist, it has to know not only about the whole range of diagnoses in relation to laboratory results but also how these are affected by drugs, patient history and so on –and how the results impact on-going patient management.
The current focus is on seeking to acquire such knowledge through machine learning, but as will be discussed, datasets of sufficient quality to learn such detailed knowledge are problematic. Alternatively, one can also seek to obtain knowledge from an expert, through a process of knowledge elicitation and engineering. The difficulties in doing this will also be discussed, but Ed Feigenbaum’s comment over 30 years ago still applies.
“The problem of knowledge acquisition is the critical bottleneck problem in artificial intelligence.” (Feigenbaum 1984)
One approach to the problem of acquiring knowledge from people is known as Ripple-Down Rules (RDR) and is the focus of this book.
RDR has had significant practical success. For example, IBM investigated using RDR to data cleanse Indian street address data, apparently a difficult problem (Dani et al. 2010). Their RDR method outperformed the machine learning methods and the commercial system they also investigated, and the two main researchers received an IBM award granted if a piece of research leads to more than $10M of new business. Pacific Knowledge Systems (PKS)1 provides RDR technology for chemical pathology laboratories to provide patient-specific interpretations of lab results. Worldwide, there are over 800 PKS RDR knowledge bases deployed, ranging from 100s to over 10,000 rules. These are developed by pathologists themselves after a couple of day’s training. They build a system while it is in use, adding a rule whenever they notice that one of the patient reports they monitor as part of their duties has been given an inappropriate interpretation by the evolving knowledge base. It takes on average a couple of minutes for them to add and validate a new rule. To our knowledge, no other knowledge-based system technology has achieved this level of empowering users. Occasionally domain experts do build knowledge bases themselves using other technologies – but essentially by becoming knowledge engineers, whereas with RDR, rule building is a minor extension to their normal duties.
Given results like this, one would expect substantial industry uptake of RDR technology and there are at least eight companies using RDR, but all but one of these had some personal connection with another RDR project elsewhere. Why has this personal connection and direct experience been needed?
We suspect that the reasons are firstly that RDR can seem quite counterintuitive. As discussed in Chapter 2, it is based on a different philosophical understanding from most other AI on what knowledge is. In particular, a key principle in every AI textbook and university AI course in the last 40–50 years has been that knowledge and reasoning have to be separate. This is particularly the case for knowledge-based systems with their separation of the knowledge base and the inference engine. In contrast, RDR explicitly rejects this separation and the knowledge base itself specifies the order of evaluation of rules. Perhaps even more counter-intuitive to an engineer is that there is no requirement to structure the knowledge; it is just added over time as cases require rules. In this book we make the further seemingly counter-intuitive suggestion that building an RDR system is probably going to be cheaper and perform better than a
machine-learning-based system if the data labels for the training data are based on human assessment – hence the title of this book.
Professor Ashwin Srinivasan from BITS Pilani worked on PEIRS, the first large deployed RDR system, and introduced RDR to IBM when he worked for IBM research. He has repeatedly suggested to us that what is needed is not more academic papers, but an RDR manual so that people could experience for themselves how effective RDR is. This book is the response to Ashwin’s suggestion.
The book is structured as follows:
Chapters 1 and 2 discuss why an RDR approach is needed, covering the problems with both machine learning and knowledge acquisition from experts. Readers who want to get straight into the technology might want to skip these chapters, but since RDR is based on different philosophical assumptions about knowledge, these chapters may be important for appreciating: why RDR?
Chapters 3–7 are essentially the manual for various types of RDR. Some readers will want to move quickly to using the open-source Excel_RDR tools to understand RDR, but on the other hand the worked examples in these chapters should provide sufficient detail to understand exactly how RDR works.
Chapter 8 provides some implementation advice, particularly relating to validation.
Chapter 9 revisits machine learning and how RDR can be used with machine learning or as an alternative.
Appendices 1 and 2 outline the various applications where RDR has been used in industry or demonstrated in research.
Currently the Excel_RDR software used in Chapters 5 to 7 and the associated manual can be downloaded only from http://www.cse.unsw. edu.au/~compton/RDR_software.html. It will also become available on general download sites.
Finally, we might note that RDR has another perhaps more political difference from other AI technology. We live in an age where people are very concerned that AI is going to intervene increasingly in their lives. RDR is not so much an AI, an artificial intelligence, but an IA or IB, an intelligence amplifier or broadcaster. RDR systems are built and completely controlled by their user to do exactly what the user wants them to do and can be endlessly adapted as the user requires. Essentially, what
RDR does is empower users to do their job better, to take over the more boring repeated exercise of their expertise. Knowledge-based systems have always held out this hope – but RDR does this in a way that empowers the user, rather than replacing them.
NOTE
1 https://pks.com.au Paul Compton was a co-founder of PKS and until 2019 a minority shareholder.
Problems with Machine Learning and Knowledge Acquisition
1.1 INTRODUCTION
Ripple-Down Rules (RDR) are intended for problems where there is insufficient data for machine learning and suitable data is too costly to obtain. On the other hand, RDR avoids the major problems with building systems by acquiring knowledge from domain experts. There are various types of RDR, and this book presents three of them. Although RDR is a knowledge acquisition method acquiring knowledge from people, it is perhaps more like machine learning than conventional knowledge engineering.
In the 1970s and early 80s there were huge expectations about what could be achieved with expert or knowledge-based systems based on acquiring knowledge from domain experts. Despite the considerable achievements with expert systems, they turned out to be much more difficult to build than expected, resulting in disillusionment and a major downturn in the new AI industry. We are now in a new phase of huge expectations about machine learning, particularly deep learning. A 2018 Deloitte survey found that 63% of the companies surveyed were using machine learning in their businesses with 50% using deep learning (Loucks, Davenport, and Schatsky 2018). The same survey in 2019 shows a small increase in the use of machine learning over 2018, but also that 97% of respondents plan to use machine learning and 95% deep learning, respectively, in the next year (Ammanath, Hupfer, and Jarvis 2019). It appears that machine learning is
a magic new technology whereas in fact the history of machine learning goes back to the early 1950s when the first neural network machine was developed based on ideas developed in the 1940s. The first convolutional neural networks, a major form of deep learning, were developed in the late 1970s. Although this is the first book on Ripple-Down Rules, RDR also has some history. An RDR approach to address the maintenance challenges with GARVAN-ES1, a medical expert system, was first proposed in 1988 (Compton and Jansen 1988) only three years after GARVAN-ES1 was first reported (Horn et al. 1985) and two years after GARVAN-ES1 was reported as one of the first four medical expert systems to go into clinical use (Buchanan 1986). The reason for an RDR book now is to present a fall-back technology as industry becomes increasingly aware of the challenges in providing data good enough for machine learning to produce the systems they want. We will first look briefly at the limitations and problems with machine learning and knowledge acquisition.
1.2 MACHINE LEARNING
Despite the extraordinary results that machine learning has produced, a key issue is whether there is sufficient reliably labelled data to learn the concepts required. Despite this being the era of big data, providing adequate appropriate data is not straightforward. If we take medicine as an example: A 2019 investigation into machine learning methods for medical diagnosis identified 17 benchmark datasets (Jha et al. 2019). Each of these has at most a few hundred cases and a few classes, with one dataset having 24 classes. This sort of data does not represent the precision of human clinical decision making. We will later discuss knowledge bases in Chemical Pathology which are used to provide expert pathologist advice to clinicians on interpreting patient results. Some of these knowledge bases provide hundreds, and some even thousands of different conclusions. Perhaps Jha et al.’s investigation into machine learning in medicine did not uncover all the datasets available, but machine learning would fall far short of being able to provide hundreds of different classifications from the datasets they did identify.
Hospitals receive funding largely based on the discharge codes assigned to patients. A major review of previous studies of discharge coding accuracy found the median accuracy to be 83.2% (Burns et al. 2011). More recent studies in more specialised, and probably, more difficult domains show even less accuracy (Ewings, Konofaos, and Wallace 2017, Korb et al. 2016). Chavis provides an informal discussion of the problems with
accurate coding (Chavis 2010). No doubt discharge coding has difficulties, but given that hospital funding relies on it, and hospitals are motivated to get it right, it appears unlikely that large data bases with sufficient accuracy to be used by machine learning for more challenging problems are going to be available any time soon.
At the other end of the scale we have had all sorts of extraordinary claims about how IBM’s Watson was going to transform medicine by being able to learn from all published medical findings. It was the ultimate claim that given the massive amount of information in medical journal articles and implicit in other data, machine learning should be able to extract the knowledge implicit in these data resources. Despite Watson’s success playing Jeopardy, this has not really translated to medicine (Strickland 2019). For example, in a study of colon cancer treatment advice in Korea, Watson’s recommended treatment only agreed with the multidisciplinary team’s primary treatment recommendations 41.5% of the time, but it did agree on treatments that could be considered, 87.7% of the time (Choi et al. 2019). It was suggested that the discordance in the recommended treatments was because of different medical circumstances between the Korean Gachon Gil Medical Centre and the Sloan Kettering Cancer Centre. This further highlights a central challenge for machine learning: that what is learned is totally dependent on the quality and relevance of the data available. There is also the question of how much human effort goes into developing a machine learning system. In IBM’s collaboration with WellPoint Inc. 14,700 hours of nurse-clinician training were used as well as massive amounts of data (Doyle-Lindrud 2015). The details of what this training involved are not available, but 6–7 man-years of effort is a very large effort on top of the machine learning involved. This collaboration led to the lung cancer program at the Sloan Kettering Cancer Centre using Watson; however, recent reports of this application indicate that for the system used at the Sloan Kettering, Watson was in fact trained on only hundreds of synthetic cases developed by one or two doctors and its recommendations were biased because of this training (Bennett 2018). Data on the time taken to develop these synthetic cases does not seem to be available. If one scans the Watson medical literature, the majority of the publications are about the potential of the approach, rather than results. There is no doubt that the Watson’s approach has huge potential and will eventually achieve great things, but it is also clear that the results so far have depended on a lot more than just applying learning to data – and have a long way to go to match expert human judgement.
This central issue of data quality was identified in IBM’s 2012 Global Technology Outlook Report (IBM Research 2012) naming “Managing Uncertain Data at Scale” as a key challenge for analytics and learning. A particular issue is the accuracy of the label or classification applied to the data, as shown in the discharge coding example above. If a label attached to a case is produced automatically, it is likely to be produced consistently and the dataset is likely to be highly useful. For example, if data on the actual outcome from an industrial process is available as well as data from sensors used in the process, then the data should be excellent for learning. In fact, one of the earliest successful applications of machine learning was for a Westinghouse fuel sintering process where a decision tree algorithm discovered the parameter settings to produce better pellets, boosting Westinghouse’s 1984 income by over $10M per year (Langley and Simon 1995). Apparently, the system outperformed engineers in predicting problems. The ideal application for machine learning is not only when there is a large number of cases, but where the label or classification attached to the case is independent of human judgement; e.g. the borrower did actually default on their loan repayment, regardless of any human assessment.
Human biases in making judgements are well known (Kahneman, Slovic, and Tversky 1982), but we are also inconsistent in applying labels to data. In a project where expert doctors had to assess the likelihood of kickback payments from pathology companies to GPs, the experts tended to be a bit inconsistent in their estimates of the likelihood of kickback. However, if they were constrained to identify differentiating features to justify a different judgement about a case, they became more consistent (Wang et al. 1996). It so happened that RDR were used to ensure they selected differentiating features, but the point for the discussion here is simply that it is difficult for people to be consistent in subtle judgements. As will be discussed, human judgement is always made in a context and may vary with the context.
One approach to human labelling of large datasets is crowdsourcing, but careful attention has to be paid to quality. Workers can have different levels of expertise or may be careless or even malicious, so there have to be ways of aggregating answers to minimise errors (Li et al. 2016). But clearly crowdsourcing has great value in areas such as labelling and annotating images, and when deep learning techniques are applied to such datasets extremely good results are achieved, which could not be achieved any other way.
An underlying question is: how much data and what sort of data is needed for a machine learning algorithm to be able to do as well as human judgement. An IBM project on data cleansing for Indian street address data provides an example of this issue (Dani et al. 2010). Apparently, it is very difficult to get a clean dataset for Indian street addresses. The methods used in this study were RDR, a decision tree learner, a conditional random field learner and a commercial system. The version of RDR used, with different knowledge bases for each of the address fields and the use of dictionaries, was more sophisticated (or rather specialised) than the RDR systems described in this book. The systems developed were trained on Mumbai data and then tested on other Mumbai data and data from all of India.
TABLE 1.1 Precision of various methods to learn how to clean Indian street address data. This table has been constructed from data presented in (Dani et al. 2010)
Method no of training examples Mumbai test data All India test data
As seen in Table 1.1 all the methods, except for the commercial system, performed comparably when tested on data similar to the data on which they were trained, with a precision of 75–80%. However, when tested on data from all of India, although all methods degrade, the RDR method degrades much less than the statistical methods. The issue is not so much the use of RDR, but that a method using human knowledge, based on years of lived experience, is likely to do better than purely statistical methods – unless they have a lot of data to learn from.
If more training data had been available than the 600 cases used, no doubt the machine learning methods would have done better, but the question arises: where does this data comes from? To create training data, the correct address has to be matched with the ill-formed addresses and presumably people do this matching. If this matching could have been automated there would have been no need for this research which led to commercial application. If people have to be used to do the labelling, then why not get the same people to write rules as they go. This is a central motivation for an RDR approach and if data from Pacific Knowledge
Systems customers (see Appendix 1) is typical, it will take them only a couple of minutes or less to write a rule. This leads to the perhaps counterintuitive suggestion that if data for machine learning has to be labelled by people, then there may well be less human effort required in building an RDR knowledge base than in labelling sufficient cases for a machine learner to produce a knowledge base of the same quality. This recalls Achim Hoffmann’s paper on the general limitations of machine learning (Hoffmann 1990). Hoffman argued that people have to do the same work whether they write a complex program or provide a learner with sufficient data; that is, you can’t expect magic, you either write the program or provide sufficient data where the information needed is embedded – there is no free lunch. Of course, there are short cuts, if e.g. data can be labelled automatically as in the Westinghouse example. On the other hand, should we expect there will always be some cases which are so rare that it is almost impossible to get sufficient examples? But a human expert will know exactly what conclusion or label should apply to this data in the context, and why.
All of this discussion leads to the conclusion that despite the power of machine learning and its wide application, providing sufficient high-quality data for a learner can be a difficult problem and it may perhaps be simpler to incorporate human knowledge into a program – an expert or knowledge-based system. But it was precisely because of the difficulties in incorporating human knowledge into knowledge-based systems that machine learning has come so much to the fore!
What we have been discussing is supervised learning: machine learning where the data includes the label that is to be learned. This is where learning from domain experts is relevant as they have the expertise to assign labels to case data in their domain, e.g. in medical diagnosis. There are also other forms of machine learning, with the furthest from supervised learning being unsupervised learning where the data does not have any labels. Obviously, you can’t learn labels if the data doesn’t have labels, but you can learn how to cluster or simplify data in some way, which can be very useful. If we assume there are patterns in the data rather than it being completely random, an autoencoder can be used to learn a reduced encoding of the data according to some appropriate criteria. For example, back-propagation deep learning can learn a mapping between 2D images and 3D human poses (Hong et al. 2015). What is being learned with these types of learners are overall patterns in the data rather than a label or classification for a particular case or data instance.
But even if we can provide all the high-quality data needed for supervised machine learning, there remains one last issue that is of increasing importance for both supervised and unsupervised learning. The power of deep learning is that the neural network layers are able to discover implicit features, and this is where its learning power comes from. But these “features” are not part of human language, so how can a deep learner explain and justify its decisions? One obvious area of concern is in policing. If you are using deep learning to identify possible criminal intent, do you end up just targeting the marginalised? If you use deep learning for credit scoring do you again end up automatically giving a lower score to the more marginalised? These are critical problems and since 2018 the European Union General Data Protection Regulation has required that AI or other systems should be able to explain their decisions, and these should be able to be challenged. Perhaps purely algorithmic approaches will be able to produce the required explanations, but it seems more likely that some sort of combining with human knowledge will be required.
1.3 KNOWLEDGE ACQUISITION
Machine learning has become so prominent largely because of the difficulty of incorporating human knowledge into a knowledge base. The phrase “the knowledge engineering bottleneck” has been used since the 1980s and was probably introduced by Ed Feigenbaum as it is also referred to as “the Feigenbaum bottleneck”. It was assumed that since rules were modular and independent of the inference engine, acquiring knowledge should have been a simple matter of domain experts providing rules. This has never been the case; Bobrow et al. (Bobrow, Mittal, and Stefik 1986) in their survey of three well-known early systems, R1, Pride and the Dipmeter advisor concluded:
Expert Knowledge Has to Be Acquired Incrementally and Tested. Expert knowledge is not acquired all at once: The process of building an expert system spans several months and sometimes several years. In the course of this development, it is typical to expand and reformulate the knowledge base many times. In the beginning, this is because choosing the terminology and ways of factoring the knowledge base is subject to so much experimentation. In the middle phases, cases at the limits of the systems capabilities often expose the need to reconsider basic categories and organization. Approaches viable for a small knowledge base
and simple test cases may prove impractical as larger problems are attempted. . . . . Toy programs for a small demonstration can be built quickly-often in just a few months using current technology. However, for large-scale systems with knowledge spanning a wide domain, the time needed to develop a system that can be put in the field can be measured in years, not months, and in tens of worker-years, not worker-months.
Matthew Fox in “AI and expert system myths, legends and facts” (Fox 1990) identified the difficulty as follows:
LEGEND: AI systems are easy to maintain. Using rules as a programming language provides programmers with a high degree of program decomposability; that is, rules are separate knowledge chunks that uniquely define the context of their applicability. To the extent that we use them in this manner, we can add or remove rules independently of other rules in the system, thereby simplifying maintenance. Building rule-based systems differs from this ideal. Various problem-solving methods (including iteration) require that rules implementing these methods have knowledge of other rules, which breaks the independence assumption and makes the rule base harder to maintain. The much-heralded XCON system has reached its maintainability limit (about 10,000 rules). The complexity of rule interactions at this level exceeds maintainer abilities.
Zacharias’ survey of modern rule-system developers (Zacharias 2008) came to similar conclusons 18 years later. 64 of the 76 respondents answered most questions with reference to the largest knowledge base they had worked on in the last five years. The respondents had over 6.6 years’ experience developing knowledge-based systems and used a range of rule technologies. 60% of the respondents indicated that their knowledge bases frequently failed to give the correct result and 34% indicated that sometimes the incorrect results were given. The biggest need was identified as debugging tools to correct such errors – confirming the observation of Bobrow et al. 22 years, and Fox 18 years, earlier – that it is tedious and messy to build a sophisticated knowledge base and one needs to painstakingly test and fix rule interactions.Elsewhere Zacharias wrote:
The One Rule Fallacy: Because one rule is relatively simple and because the interaction between rules is handled automatically by the inference engine, it is often assumed that a rule base as a whole is automatically simple to create. . . . . . . However, it is an illusion to assume that rules created in isolation will work together automatically in all situations. Rules have to be tested in their interaction with other rules on as many diverse problems as possible to have a chance for them to work in novel situations.
(Z acharias 2009)
In the 1990s the dominant approach to improving knowledge engineering was to move away from the notion of obtaining knowledge and rather consider the domain expert and knowledge engineer as collaborating in building a problem-solving model for the domain. The best-known example of this approach is probably CommonKADS (Schreiber et al. 1994, Schreiber et al. 1999, Speel et al. 2001), essentially a comprehensive software engineering framework for building knowledge-based systems. Despite the obvious value in such systematic approaches, the same problem of debugging and maintenance remains. As the CommonKADS authors themselves note:
Although methodologies such as CommonKADS support the knowledge acquisition process in a number of ways (e.g. by providing modelling constructs and template models) experience shows that conceptual modelling remains a difficult and timeconsuming activity.
(Speel et al. 2001)
Despite these clear statements that knowledge-base debugging and maintenance is and has always been a major problem, there seem to be few case studies documenting maintenance problems. Wagner in a longitudinal survey of 311 published expert system case studies did not find anything on maintenance (Wagner 2017). This failure to report on maintenance problems is perhaps because researchers tend to write up successful expert system developments early on. The only two long-term maintenance reports we are aware of are the reports on XCON, one of the landmark developments in knowledge-based systems and the much smaller
GARVAN-ES1, which nevertheless was reviewed as one of the first four medical expert systems to go into routine clinical use (Buchanan 1986).
XCON was used to configure DEC VAX computers against customer requirements and was the outstanding expert system in industry use in the early years of expert systems. The initial system was developed in Carnegie-Mellon University (CMU), but deploying XCON at DEC involved over a year of training for DEC engineers in how to maintain XCON, then with about 1,000 rules. These maintenance demands meant engineers were unable to also maintain other expert systems introduced from CMU to DEC (Polit 1984). XCON eventually had 6,500 rules with 50% changed every year, as new products and versions were introduced – a major maintenance challenge (Soloway, Bachant, and Jensen 1987). Apparently 40 programmer/knowledge engineers were required to maintain XCON (Sviokla 1990) and as noted by Fox the limit of maintainability for XCON was probably about 10,000 rules (Fox 1990). XCON was built using essentially the same technology as modern expert systems are based on, the OPS RETE architecture (Forgy and McDermott 1977).
GARVAN-ES1 was a small medical expert system providing interpretative comments for thyroid laboratory reports (Horn et al. 1985). The purpose of a comment appended to a report of thyroid test results was to advise the referring GP on the clinical interpretation of the results. As of 1989, GARVAN-ES1 had 276 rules, but it also allowed disjunctions and there were 262 disjunctions suggesting about 500–600 rules if disjunctions were disallowed, and this approximates later experimental rebuilds. GARVAN-ES1 was put into clinical use after a standard evaluation on unseen clinical data showing experts agreed with its conclusions 96% of the time. However, since the performance of expert systems in actual clinical practice was such an unknown (GARVAN-ES1 being one of the first four medical expert systems in clinical use(Buchanan 1986)), all reports produced by the system were checked by one of the endocrine registrars and the rules updated whenever a registrar was not happy1 with the comment made by GARVAN-ES1. The rules were constantly edited and updated to provide the required comments, except where the change requested seemed to the knowledge engineer to be too minor and this was confirmed with the Garvan Institute’s senior endocrinologist.
The GARVAN-ES1 maintenance experience is illustrated in Figure 1.1 and shows the changing size of the knowledge base over four years maintenance. The knowledge base size is shown in kilobytes rather than numbers of rules as rules contained disjunctions. Over four years the knowledge
FIGURE 1.1 The increasing size of the GARVAN-ESI knowledge base (redrawn from Compton and Jansen 1990).
base doubled in size while the accuracy went from 96% to 99.7%. Perhaps a lot of these changes were not really necessary and the registrars involved wanted a level of clinical detail in the comments that was not really necessary; nevertheless whether the changes were trivial or important they were made to enable the system to do what the users expected it to do.
Another view of these changes is shown in Figure 1.2. Every time a rule was changed the case that prompted the change was stored as a “cornerstone” case – that is, a case for which rules have been added or changed.
FIGURE 1.2 The number of cornerstone cases for each conclusion (redrawn from Compton et al. 1988).
Figure 1.2 shows the number of cornerstone cases that had been seen for the 60 different interpretations given (59 comments plus no comment as the default for normal results). The number of cases for each interpretation is shown only for the 15 interpretations with the greatest number of cases and the average number of cornerstone cases is shown for interpretations 16–60. The interpretation “toxic” is classic primary hyperthyroidism while “hypothyroid” is classic primary hypothyroidism. The key diagnostic features of primary hyperthyroidism are trivial and well known: elevated thyroid hormones and suppressed thyroid stimulating hormone and conversely the key diagnostic features for primary hypothyroidism are elevated thyroid stimulating hormone and low thyroid hormones. Despite the apparent clarity of the reasons for these diagnoses, Figure 1.2 shows that during the evolution of the GARVAN-ES1 knowledge base, errors were made on 21 toxic (primary hyperthyroidism) cases and 17 primary hypothyroidism cases which required rules to be corrected. As well, 32 normal cases were misdiagnosed as the knowledge base evolved. Superficially, these are very surprising results and provide a classic example of the so-called knowledge engineering bottleneck – experts don’t readily tell the knowledge engineer everything that the knowledge base needs to know. This isn’t a lack of expertise on the expert’s part, the expert is perfectly capable of correctly interpreting any case they are presented with, but this is quite different from providing rules which will cover other unseen cases.
We believe that the problems in knowledge engineering, ultimately, are not related to any failure of experts to report on their mental processes or the knowledge they use, but are because our expectations and assumptions about knowledge are mistaken – we have the wrong philosophical assumptions as will be discussed in Chapter 2 .
In the discussion above we have assumed that the domain expert can articulate their knowledge or explain their decision about a case in terms that can be easily communicated to a computer. For example, the expert might refer to: age > 75, temperature is increasing, email subject heading contains the word ‘crisis’ etc. All of these can be fairly readily coded for a computer. The knowledge acquisition problem we have discussed is the difficulty in getting an expert to provide sufficient and appropriate knowledge of this type to develop a truly expert system.
There is a quite different problem that in some domains experts simply do not provide their knowledge in a way that be coded. For example, a radiologist may look at a lung X-ray and justify their decision by referring
Another random document with no related content on Scribd:
donna pas aux chrétiens la confiance de se montrer au grand jour: ils sentaient qu'elle lui était arrachée par la crainte; et déja une fois trompés, ils ne comptaient plus sur ces apparences de douceur. D'ailleurs on remarquait une différence sensible entre l'édit de Constantin et celui de Maximin: le premier permettait expressément aux chrétiens de s'assembler, de bâtir des églises et de célébrer publiquement toutes les cérémonies de leur religion; Maximin, sans dire un mot de cette permission, se contentait de défendre qu'on leur fît aucun mal. Ainsi ils demeurèrent cachés, et attendirent leur liberté du souverain maître des empereurs et des empires.
Maximin depuis la mort de Galérius n'avait reconnu d'autres consuls que lui-même, et son grand-trésorier Peucétius. Il le choisit encore pour collègue au commencement de l'année 313.
Constantin se déclara consul avec Licinius: ils l'étaient tous deux pour la troisième fois. Soit qu'il fût à Rome le 18 de janvier, soit qu'il en fût parti quelque temps auparavant, il fit une loi trèséquitable, donnée ou affichée à Rome ce jour-là: elle remédiait aux injustices des greffiers des tailles, qui déchargeaient les riches aux dépens des pauvres.
Licinius n'avait pris aucune part à la guerre contre Maxence. Cependant Constantin se crut obligé d'exécuter la promesse qu'il lui avait faite, de lui donner sa sœur Constantia en mariage. Les deux empereurs se rendirent à Milan, où les noces furent célébrées. Ils y invitèrent Dioclétien. Ce prince s'étant excusé sur son grand âge, ils lui écrivirent une lettre menaçante, dans laquelle ils l'accusaient d'avoir été attaché à Maxence, et de l'être encore à Maximin leur ennemi caché.
A 313.