Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

Page 1

Master Thesis

Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems Astrid Folkvord Janbu NTNU

2009


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

PREFACE This master thesis is written at the Department of Production and Quality Engineering at NTNU, during the spring of 2009. The thesis is a part of the 5th year master program in Reliability, Availability, Maintainability and Safety (RAMS). The project treats the topic “Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems” and is written in co‐operation with Det Norske Veritas (DNV). The thesis is based on a literature study described in the project thesis and a case study of a Safety Instrumented System (SIS). It is assumed that the reader of this thesis has taken an introduction course in system reliability theory or has similar knowledge. I want to thank my supervisor at NTNU, Prof. Marvin Rausand, for teaching me how to think thoroughly through different problematic topics. I also would like to thank my other supervisor at NTNU, Ph.D. Mary Ann Lundteigen, for good advices during this semester. At DNV, I would like to thank senior consultant, Marius Lande, for patience and help through the case study. Trondheim, 5th June 2009 Astrid Folkvord Janbu

Master Thesis Astrid Folkvord Janbu

i


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

ABSTRACT Safety instrumented systems (SIS) are employed to control and mitigate the risk to personnel, environment and assets in many industries and everyday life. Due to the main purpose of a SIS and its degree of independence of human actions, reliability is of high importance. Reliability assessments of SIS provide an important basis for decision making and are performed as part of compliance studies in order to document whether a SIS meets stated safety requirements or not. Unfortunately, there are several aspects in a reliability assessment that cause uncertainty associated with the results. Uncertainty in reliability assessments reduces the confidence in the results, increases the risk of making wrong decision and should therefore be communicated to the decision maker. The main objective of this master thesis was to study the reliability assessment procedures that are used when developing compliance reports and examine how a representation of the uncertainties in the results may be implemented in compliance studies as a decision support. A literature study of uncertainty assessments was carried out in order to identify the main sources of uncertainty in reliability assessments and techniques for quantifying their effects. It was also investigated how to present the results from uncertainty assessments in compliance reports. Further, reliability� and uncertainty assessments of a case study were performed with fault tree analysis (FTA) and simulation in order to analyse the uncertainty associated with the results. The findings were discussed. The literature study showed that there are three main sources of uncertainty in reliability assessments; completeness uncertainty, model uncertainty and data uncertainty. The broadly accepted standard for design and operation of SIS, IEC 61508, does not explicitly treat the subject of uncertainty. But still it indicates doubts about the validity of the reliability assessment’s results through architectural constraints and suggested 70 % upper limit confidence interval for a conservative approximation of failure rates. Uncertainty assessment techniques may be used to quantify and evaluate the effects of uncertainty in reliability assessments. It was found that both sensitivity and importance measures are well applicable to the main sources to uncertainty, while uncertainty propagation is limited to the treatment of data uncertainty. Compliance studies are usually carried out during the design phase. Detailed and relevant information may not be in place during early design, which causes completeness uncertainty. The use of generic data causes data uncertainty due to the inhomogeneity in the data samples, lack of relevance and modelling of failure data. The system designer may also not be familiar with the future characteristics of the system to be developed. Hence; model uncertainty arises when less suited models are used. The level of uncertainty is higher during early phases of a lifecycle and for new technology due to the lack of experience and knowledge about the system. The predicted reliability as described in the compliance report may therefore, due to the sources to uncertainty, be quite different from the field reliability. The reliability assessments of the case study gave some interesting findings. The simulation was expected to give the most realistic results due to the random sampling from assumed lifetime distributions. It was therefore interesting to see that the unavailability predicted by simulation was significantly closer to the result from the conservative FTA approach developed by Lundteigen and Rausand, than the cut set approximation used by CARA FaultTree. In light of the experiences from ii

Master Thesis Astrid Folkvord Janbu


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

the reliability assessments, it was concluded that FTA is a sufficiently good model for reliability assessments of SIS and that the conservative approximation is recommended since it compensates for some of the uncertainty involved. Simulation was found to be an unnecessarily advanced and less suited modelling tool for PFD calculations. The low failure rates of a low demand SIS require a high number of runs in order to stabilize the results and thus cause the simulation to become a consuming process. The case study showed that sensitivity and importance measures often are sufficient remedies for uncertainty assessment in a compliance study since they identify which sources to uncertainty that are critical to the results. These techniques do not quantify the level of uncertainty associated with the results, only the effects of changes in the input. The information provided may anyway be used as input for a qualitative uncertainty evaluation and is further valuable for reliability improvement during later phases in the lifecycle and identification of efficient maintenance strategies. When investigating the level of data uncertainty in the results, uncertainty propagation should be applied. The results are not as intuitively understood as for sensitivity analyses and importance measures, and require hence competence for interpretation of the statistical quantities from the analysis. An important experience regarding uncertainty propagation arose from the case study. Uncertainty propagation should always be performed by using deterministic models or analytical methods. When performing uncertainty propagation for a simulation model, the number of runs increases with a factor equal to the number of uncertainty simulation runs, which makes the analysis very time consuming. The high level protection system did not comply with the SIL 3 requirement for the combined loop, or SIL 2 requirement for the single loops. Uncertainty assessments were also performed to see if acceptance could still be recommended, but none of the assessment’s results indicated that the high level system should be SIL 3 verified. The results from a compliance study should not be seen as any definite property of the system, and this should be communicated to the decision maker. A qualitative uncertainty evaluation of the results should therefore be included as a part of the compliance report. The need of feeling safe should be nuanced with the truth that there is no guarantee in the SIL compliance or future failure behaviour of the system. Instead of interpreting uncertainty as a necessary evil, one may achieve the advantage of it by reflecting a more realistic result, and in such way raise awareness of the risks involved in the decision process. By being informed about the uncertainties one may also easier reduce them.

Master Thesis Astrid Folkvord Janbu

iii


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

TABLE OF CONTENTS Preface...................................................................................................................................................... i Abstract ....................................................................................................................................................ii Table of Contents ....................................................................................................................................iv Table of Figures ...................................................................................................................................... vii List of Tables .......................................................................................................................................... viii 1

2

Introduction..................................................................................................................................... 1 1.1

Objective ................................................................................................................................. 1

1.2

Limitation and scope ............................................................................................................... 2

1.3

Structure .................................................................................................................................. 2

Reliability of Safety Instrumented Systems..................................................................................... 4 2.1

General .................................................................................................................................... 4

2.2

IEC 61508 ................................................................................................................................. 5

2.2.1 3

Uncertainty in Reliability Assessments of Safety Instrumented Systems ....................................... 7 3.1

Model uncertainty ........................................................................................................... 7

3.1.2

Data uncertainty .............................................................................................................. 8

3.1.3

Completeness uncertainty .............................................................................................. 8

3.1.4

Remarks ........................................................................................................................... 9

3.1.5

Terminology..................................................................................................................... 9

4.1

Categorization of uncertainties ............................................................................................. 11

4.2

Interpretations of uncertainty............................................................................................... 12

4.2.1

Realist interpretation .................................................................................................... 12

4.2.2

Subjective interpretation .............................................................................................. 13

Summary ............................................................................................................................... 14

Approaches for Uncertainty Assessments .................................................................................... 15 5.1

iv

Uncertainty representation ..................................................................................................... 9

Different Perspectives on Uncertainty .......................................................................................... 11

4.3 5

Contributions to uncertainty ................................................................................................... 7

3.1.1

3.2 4

Safety integrity ................................................................................................................ 5

General .................................................................................................................................. 15 Master Thesis Astrid Folkvord Janbu


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems 5.2

Sensitivity analysis ................................................................................................................. 16

5.3

Importance measures............................................................................................................ 17

5.3.1

Birnbaum’s measure ..................................................................................................... 18

5.3.2

Improvement potential measure .................................................................................. 19

5.3.3

The criticality importance measure............................................................................... 20

5.4

Monte Carlo sampling ................................................................................................... 22

5.4.2

Latin Hypercube sampling (LHS) .................................................................................... 22

Safety lifecycle ....................................................................................................................... 24

6.1.1

Analysis .......................................................................................................................... 25

6.1.2

Realization ..................................................................................................................... 25

6.1.3

Operation ...................................................................................................................... 26

6.1.4

Documentation.............................................................................................................. 26

6.2

Uncertainty in a lifecycle perspective ................................................................................... 27

6.2.1

Completeness uncertainty ............................................................................................ 29

6.2.2

Model uncertainty ......................................................................................................... 29

6.2.3

Data uncertainty ............................................................................................................ 30

6.2.4

New Technology ............................................................................................................ 30

6.3

Integrated decision making ................................................................................................... 31

High Level Protection System ........................................................................................................ 34 7.1

System description ................................................................................................................ 34

7.1.1

Flare KO drum................................................................................................................ 35

7.1.2

1st and 2nd stage separator ............................................................................................ 36

7.1.3

Degasser ........................................................................................................................ 36

7.2 8

Sensitivity vs. uncertainty...................................................................................................... 23

Uncertainty Representation in Compliance Studies ..................................................................... 24 6.1

7

Uncertainty propagation ....................................................................................................... 20

5.4.1

5.5 6

NTNU

Base case ............................................................................................................................... 36

Reliability Assessment of High Level Protection System ............................................................... 37

Master Thesis Astrid Folkvord Janbu

v


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems 8.1

NTNU

Scope and scenario ................................................................................................................ 37

8.1.1

Limitations and assumptions ......................................................................................... 37

8.2

Data collection ....................................................................................................................... 38

8.3

Model selection ..................................................................................................................... 40

9

Fault Tree Analysis......................................................................................................................... 42 9.1

CARA FaultTree ...................................................................................................................... 42

9.2

PFD calculations..................................................................................................................... 42

9.3

Conservative calculation of fault tree ................................................................................... 43

9.4

Sensitivity analysis and importance measures ...................................................................... 44

9.5

Uncertainty propagation ....................................................................................................... 46

10

Simulation.................................................................................................................................. 49

10.1

ExtendSim software............................................................................................................... 49

10.2

PFD calculations..................................................................................................................... 49

10.3

Sensitivity analysis and importance measures ...................................................................... 52

10.4

Uncertainty propagation ....................................................................................................... 53

11

Discussion .................................................................................................................................. 57

11.1

Comparison of results ........................................................................................................... 57

11.1.1

Problems with the uncertainty propagation in Extend ................................................. 59

11.2

Completeness uncertainty .................................................................................................... 60

11.3

Model uncertainty ................................................................................................................. 61

11.4

Data uncertainty .................................................................................................................... 61

11.5

Experiences from the case study........................................................................................... 62

12 12.1 13

Conclusions................................................................................................................................ 64 Further work .......................................................................................................................... 65 Bibliography............................................................................................................................... 67

Appendix A

Data dossier ................................................................................................................... 70

Appendix B

Conservative FTA ........................................................................................................... 72

Appendix C

Preliminary study........................................................................................................... 74

Project Objectives ............................................................................................................................. 74 vi

Master Thesis Astrid Folkvord Janbu


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

Work approach .................................................................................................................................. 74 Activity plan ....................................................................................................................................... 75 Work Breakdown Structure ............................................................................................................... 76

TABLE OF FIGURES Figure 1 Simplified model of a safety instrumented system (SIS) ........................................................... 4 Figure 2 SIL requirements, adapted from compliance report (2008) ..................................................... 6 Figure 3 Contributing factors to uncertainty in reliability assessment of safety instrumented systems 7 Figure 4 Framework for uncertainty assessments, adapted from de Rocquigny, Devictor and Tarantola (2008) .................................................................................................................................... 16 Figure 5 Birbaum’s measure (Rausand and Høyland 2004) .................................................................. 19 Figure 6 Improvement potential ........................................................................................................... 19 Figure 7 Uncertainty propagation (NASA 2002) .................................................................................... 21 Figure 8 Safety lifecycle (IEC 61508 1997) ............................................................................................ 24 Figure 9 Relationship between FSA, validation and verification (OLF 070 2004) ................................. 26 Figure 10 Documentation hierarchy ..................................................................................................... 27 Figure 11 Lifecycle from a producer perspective (Murthy, Østerås and Rausand 2007)...................... 28 Figure 12 Overview of data related to PFD, adapted from (Lundteigen 2008)..................................... 29 Figure 13 Integrated decision making for hardware safety integrity.................................................... 31 Figure 14 PSD and ESD High level protection system for topside plant (Compliance report 2008) ..... 34 Figure 15 Base case of high level protection system for reliability assessments (Compliance report 2008)...................................................................................................................................................... 36 Figure 16 Fault tree of combined loop for typical vessel ...................................................................... 42 Figure 17 Fault tree of combined loop for typical vessel, implicit CCF modelling ................................ 44 Figure 18 Uncertainty propagation through CARA FaultTree ............................................................... 48 Figure 19 Extend blocks ......................................................................................................................... 49 Figure 20 High level protection system modelled in Extend................................................................. 50 Figure 21 Extend model for uncertainty propagation ........................................................................... 54 Master Thesis Astrid Folkvord Janbu

vii


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

Figure 22 Uncertainty propagation plot for Extend simulation ............................................................ 55

LIST OF TABLES Table 1 Failure classification for Safety Instrumented Systems (SIS) (IEC 61508 1997) ......................... 4 Table 2 Safety Integrity Levels (IEC 61508 1997) .................................................................................... 6 Table 3 Notions of uncertainty and associated representations, adapted from Flage, Aven and Zio (2009) .................................................................................................................................................... 14 Table 4 Summary of uncertainty assessment methods ........................................................................ 23 Table 5 Classification of technology (DNV 2001) .................................................................................. 30 Table 6 Printout from OREDA (OREDA 2009) ........................................................................................ 39 Table 7 Comparison of selected models (Janbu 2008).......................................................................... 40 Table 8 Importance measures for combined loop typical, FTA............................................................. 45 Table 9 OREDA taxonomy (OREDA 2002) .............................................................................................. 47 Table 10 Extend results for combined loop .......................................................................................... 51 Table 11 Importance measures for combined loop typical, simulation................................................ 52 Table 12 Parameters for gamma distribution ....................................................................................... 55 Table 13 Comparison of results ............................................................................................................. 57 Table 14 Architectural Constraints on Type A and B systems (IEC 61508 1997) .................................. 58

viii

Master Thesis Astrid Folkvord Janbu


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

1 INTRODUCTION Safety Instrumented Systems (SIS) are crucial for controlling and mitigating risk, in many industries and everyday life. Their independence of human actions makes reliability of high importance. Reliability assessments of SIS are performed to evaluate the reliability and provide a basis for decision making regarding design, safety, economic stakes and legal requirements. Unfortunately, there are several aspects in a reliability assessment process that cause uncertainty associated with the final results. The main contributions to uncertainty are due to modelling, data and incompleteness in assessments. Also, underlying factors like time pressure, competence and system complexity, affect the level of uncertainty. Uncertainty in reliability assessments reduces the validity of the results and thus increases the risk of making wrong decision. Effort should therefore be paid in order to minimize the uncertainty (Janbu 2008). Uncertainty assessments techniques may be used for estimation and evaluation of uncertainty in probabilistic assessments. The assessments provide valuable information for further uncertainty reduction and reliability engineering through design and development. In practice, the use of such techniques varies a lot between different industries (Rausand and Ă˜ien 2004). Compliance reports are issued to document if the results from reliability assessments of SIS meet the stated requirements or not. But, a reliability assessment does not provide any truth about the future reliability, only a prediction. The communication of this fact to the decision maker should be improved. Awareness of uncertainties stresses the focus for both riskâ€? and uncertainty reduction. A new framework for reliability assessments should be established for improving the basis for making decisions in a more realistic manner, where uncertainties are reflected and integrated into compliance reports as a decision support.

1.1 Objective The project thesis (Janbu 2008) identified the main contributions to uncertainty in reliability assessments and how these are related to the result of the assessment. It was further found that uncertainty evaluation in reliability assessments of safety instrumented systems is of high importance in order to ensure that decision making is based on the right foundation. The main objective of this thesis is to study the reliability assessment procedures that are used when developing compliance reports and, based on the findings from Janbu (2008), examine how a representation of the uncertainties in the results may be implemented in compliance studies as a decision support. The main objective may be divided into the following sub�objectives: 1. Become familiar with a specified SIS (case study) and outline when and how the compliance report should be developed. 2. Identify issues (sources to uncertainty) in the development of the compliance reports that may influence the uncertainty of the results. 3. Discuss various approaches for uncertainty assessments and how the results of such analyses may be implemented in compliance studies. Master Thesis Astrid Folkvord Janbu

1


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

4. Apply different models for reliability assessments for a specific safety instrumented system. Compare the results and discuss the differences and uncertainty associated with the results.

1.2 Limitation and scope This master thesis is written in co‐operation with DNV Energy. The information for the case study is provided by DNV Energy and is thus strictly confidential. The information presented in this thesis is made anonymous such that this report is open for everyone to read. The master thesis is restricted to low demand SIS. These types of systems are barriers against consequences generally associated with high energy, and initiating events with low frequencies (maximum once a year and twice the proof‐test frequency). Reliability assessments of low demand systems often contain more statistical uncertainty in the estimation of parameters due to the rare events. The results reflect this uncertainty. Assessments of these systems are therefore more challenging with regard to uncertainty management. The project is based on a relative‐frequency interpretation of probability. This is especially in contradiction to the predictive epistemic approach, where only events and observable quantities are given probabilities, and it therefore is conflicting to discuss uncertainty in probabilistic results. Hence; several of the topics in this thesis will be in conflict with the predictive epistemic approach to probability. Uncertainty is also limited to be quantitatively treated in a probabilistic framework. This means that the uncertainty is expressed in terms of probability distributions. It is assumed that the reader is familiar with reliability analysis and safety instrumented systems. This prevents explanation of basic concepts and reproduction of material from the project thesis and introduction courses in system reliability theory. A glossary for this thesis may be found in the doctoral thesis “Safety instrumented systems in the oil and gas industry” by Mary Ann Lundteigen (2009)

1.3 Structure This report answers the sub objectives given in section 1.2; Chapter 2 presents basic concepts and aspects related to the reliability of SIS and IEC 61508. Chapter 3 gives an introduction to how uncertainties are involved in reliability assessments. The chapter presents the main findings from the project thesis “Uncertainty in Reliability Assessment of Safety Instrumented Systems” (Janbu 2008). Uncertainty is described more thoroughly in chapter 4. Here, theoretical discussions regarding different interpretations are presented. Uncertainty assessments techniques are then handled in chapter 5. Sensitivity analysis, importance measures and uncertainty propagation are presented, described and linked to the treatment of the main contributions to uncertainty in reliability assessments. Chapter 6 presents a discussion of when and how the compliance reports are developed, and uncertainties are related to this process. Further, the chapter also discusses how to implement the results from an uncertainty assessment into the compliance reports. 2

Master Thesis Astrid Folkvord Janbu


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

Chapter 7 describes the case study used for this thesis which is a high level protection system for a topside process plant. Chapter 8 discusses the scope, model and data used for reliability assessment of the case study. The chapter also discusses the possible uncertainty related to these elements. Chapter 9 is the fault tree analysis of the case study, where CARA FaultTree is used to perform calculations. Unavailability results for base case are presented, together with a conservative FTA approximation. Results from sensitivity analyses, importance measures and uncertainty propagation are also presented. Chapter 10 is the simulation analysis of the case study system, where Extend is used as software tool for the simulations. The unavailability of the base case is calculated. Results from sensitivity analyses, importance measures and uncertainty propagation are also presented. Chapter 11 discusses the results from Chapter 9 and 10, and the uncertainty related to the results. The conclusions from this master thesis are then presented in chapter 12.

Master Thesis Astrid Folkvord Janbu

3


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

2 RELIABILITY OF SAFETY INSTRUMENTED SYSTEMS Safety Instrumented Systems (SIS) provide an independent protection layer with the main objective to mitigate the risk to personnel, environment and assets. Due to the purpose of a SIS, reliability and safety are of high importance and may be verified through compliance reports. This chapter presents important concepts and aspects related to SIS and its reliability.

2.1 General A SIS is used to mitigate risks associated with the operation of a specified hazardous system, by detecting the onset of a hazardous event or reducing the consequences. The specified hazardous system is referred to as equipment under control and should be considered as the source of the hazard and can vary from being a single component to an entire plant. The equipment under control is protected by safety instrumented functions (SIF) in a SIS or other suitable protection measures that will control the hazard. The special about SIS compared to other safety systems is its ability to evaluate signals by the help of instrumentation and thus perform decisions to carry out barrier function upon a demand, independent of human actions. Figure 1 shows a simplified model of a SIS.

Figure 1 Simplified model of a safety instrumented system (SIS)

A SIS is composed of three main elements; input elements for detection, logic solvers for evaluation and decisions and final elements for action if needed. Input elements may be gas or fire detectors, a logic solver may be a computer and the final element a safety valve. The SIS itself may consist of several SIF. The most important reliability measure for a SIF is called Probability of Failure on Demand (PFD). This measure quantifies the safety unavailability due to random hardware failures and denotes the probability that a SIF will fail to respond adequately upon a demand, a so‐called dangerous failure. A SIS failure may be classified with regard to three aspects; cause, effect and detectability. Table 1 show the IEC 61508 definitions of these failure classification criteria. Table 1 Failure classification for Safety Instrumented Systems (SIS) (IEC 61508 1997)

Causes

Effects

Detectability

Random hardware failures

Dangerous failure

Detected failure

Failure, occurring at a random time, which results from one or more of the possible degradation mechanisms in the hardware

Failure which has the potential to put the safety‐related system in a hazardous or fail‐to‐function state

In relation with hardware, detected by the diagnostic tests, proof tests, operator intervention or through normal operation

Safe failure Systematic failures Failure, related in a deterministic

4

Failure which does not have the potential to put the safety‐related

Undetected failure In relation with hardware,

Master Thesis Astrid Folkvord Janbu


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems way to a certain cause, which can only be eliminated by a modification of the design or of the manufacturing process, operational procedures, documentation or other relevant factors.

system in a hazardous or fail‐to‐ function state

NTNU

undetected by the diagnostic tests, proof tests, operator intervention or through normal operation

2.2 IEC 61508 IEC 61508 "Functional safety of electrical/electronic/programmable electronic (E/E/PE) safety‐related systems" is a broadly accepted standard for design and operation of SIS. In the standard, a SIS is referred to as an E/E/PE safety related system. It is generic, which means that it is applicable to all types of industries. The oil and gas industry often uses the standard IEC 61511 “Functional safety – Safety instrumented systems for the process industry” instead. This is an application specific standard for the process industry with more specific requirements presented. IEC 61508 requires a quantitative and qualitative safety and reliability assessment in order to comply with the requirements given by the standard. There are two types of safety requirements (Lundteigen 2008); • •

Functional safety requirements describes what the safety function shall perform Safety integrity requirements describes how well the safety function shall perform

2.2.1 Safety integrity Safety integrity is an important concept in IEC 61508 and is defined as the “probability of a safety‐ related system satisfactorily performing the required functions under all stated conditions within a stated period of time” (IEC 61508 1997). Safety integrity can here thus be interpreted as reliability. Safety integrity may be divided into four requirements levels called safety integrity levels (SIL). A SIL is defined as “discrete level (one out of a possible four) for specifying the safety integrity requirements of the safety functions to be allocated to the E/E/PE safety related systems...”(IEC 61508 1997). In order to document compliance with the standard, a reliability assessment of the SIS must document that the calculated PFD satisfies the quantitative hardware requirement, as shown in Table 2. IEC 61508 has additional requirements for hardware verification, besides the PFD requirements. These are called the architectural constraints and are semi‐quantitative requirements expressed in terms of the safe failure fraction (SFF), system type (A or B) and hardware fault tolerance (HWFT). The SFF is the fraction of failures which are defined to be “safe”, among all failures. A “safe” failure is a failure that do not cause loss of safety function or a failure that is immediately detected and corrected. Type A systems are systems with low complexity such that the failure modes and behaviour under fault conditions can be detected. Type B systems are complex systems, typically programmable units. For these systems, a complete overview of failure modes and effects is not possible to achieve. The HWFT is the number of failures that is tolerated before loss of safety function. A 1oo3 logic for example, needs only one of the components to function in order to still work as a safety barrier. Hence; the system tolerates two failures before loss of safety function and HWFT thus equals 2. Master Thesis Astrid Folkvord Janbu

5


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

Table 2 Safety Integrity Levels (IEC 61508 1997)

Safety Integrity Level (SIL)

Low Demand Mode of Operation (Average probability of failure to perform its design function on demand)

4

≥ 10‐5 to < 10‐4

3

≥ 10‐4 to < 10‐3

2

≥ 10‐3 to < 10‐2

1

≥ 10‐2 to < 10‐1

In addition to the requirements for hardware, one must also show compliance for the software‐ and systematic safety requirements. The software requirement is a qualitative requirement which expresses the level of functional safety and quality assurance program required for the software development, testing and integration. This involves techniques for control and avoidance of systematic failures in the software. Avoidance and control are also the main focus in the qualitative requirements, also called systematic safety integrity requirements. Systematic safety requirements, similar to software requirements, are expressed in terms of adequacy of the management of functional safety and required quality assurance program (Compliance report 2008). Figure 2 show the SIL requirements from IEC 61508.

Figure 2 SIL requirements, adapted from compliance report (2008)

6

Master Thesis Astrid Folkvord Janbu


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

3 UNCERTAINTY IN RELIABILITY ASSESSMENTS OF SAFETY INSTRUMENTED SYSTEMS A reliability assessment is an important basis for decision making. Uncertainty in reliability assessments reduces the validity of the results and hence increases the risk of making wrong decisions. It should therefore be strived at minimizing it. This chapter presents the main conclusions from the project thesis report “Uncertainty in Reliability Assessment of Safety Instrumented Systems” (Janbu 2008).

3.1 Contributions to uncertainty Uncertainty can be defined as something “not definitely ascertainable or fixed” (Webster 1989). Uncertainty in reliability assessments thus reduces our confidence in the results. It is therefore of high importance that the decision makers are aware of how the uncertainties are involved in the assessment process such that they can be taken into consideration as a decision support.

Figure 3 Contributing factors to uncertainty in reliability assessment of safety instrumented systems

Uncertainty in reliability assessments arises from limitations in perfectly reflecting the real life system and its environment. There are three direct factors which affect the results and hence the associated level of uncertainty; the model, data and the completeness of the assessment (Drouin, et al. 2009). Figure 3 shows the underlying and direct factors in a reliability assessment that affect the numerical value of the PFD and thus the uncertainty.

3.1.1 Model uncertainty Reliability assessments use both an architectural model and one ore more reliability models in order to model the system’s characteristics. The architectural models represent the logical/functional

Master Thesis Astrid Folkvord Janbu

7


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

structure of the system and are purely deterministic. Reliability models are applied to the architectural structure and hence the system model achieves probabilistic properties. A model can be described as an analyst’s attempt to represent a system (Parry 1996). The model used for the assessments is therefore strongly dependent on the properties of the system and the analyst’s competence. The analyst has to struggle with the trade‐off between the need to simplify and accuracy. But there are also other underlying factors for the choice of model. Regulations, standards, guidelines and internal company policies may often require or recommend specific types of models. Further, the model is also dependent on which life cycle phase the assessment should be carried out in since the level of detail required in an assessment often is increasing with time. The level of detail or suitability of a model is also restricted by the time, approximation formulas and software solutions available. Choice of model forces the analyst into a system structure that is more or less is in accordance with the real life system. The model uncertainty is dependent on the validity of the model assumptions. Due to the limitations in including the natural variability in the real life system, a model will at its best only be an approximation (NASA 2002). Model uncertainty will therefore, up to a certain degree, always exist.

3.1.2 Data uncertainty Data uncertainty, also called parameter uncertainty, is another main source of uncertainty in reliability assessments. Since SIS are highly reliable systems, they fail rarely and produce small amounts of failure data. This is especially the case for new technology at an early stage in the lifecycle where the system properties to some degree are unknown. Lack of data cause statistical uncertainties in the estimated parameters, and will be reflected in the final results. Reliability models often use assumptions to overcome data shortcomings. Lack of data also lay heavy reliance on the analyst’s judgements and may significantly affect the results (NASA 2002). Generic databases are established to provide data for reliability assessments, but they also introduce uncertainty due to lack of relevance. Differences in plant specific conditions like operational environment, maintenance procedures, collection methods and rapidly changing technology may result in a data material which will not be relevant for the specific system under evaluation. Competence is thus needed in order to reduce data uncertainty by ensuring good collection methods and selection of relevant data for the assessment.

3.1.3 Completeness uncertainty Another main source of uncertainty is incompleteness in the assessment. This uncertainty is either known but not included in the assessment or not known and hence not included in the assessment (Drouin, et al. 2009). The known uncertainties may be due to omission of factors, like failure modes, assumed to be negligible for the assessment’s results or outside the scope. The uncertainty related to incomplete level of detail or scope may be reduced by including the missing factors or conservative estimates. The unknown completeness uncertainties may be lack of agreement of how to address effects, like effects from organizational factors or failure mechanisms. Another aspect of the unknown completeness uncertainty is lack of knowledge, like the exclusion of unknown failure modes. Truly unknown uncertainties are hard to reduce because they are not visible for the analyst. These unknown uncertainties are also an argument for always using conservative estimates. 8

Master Thesis Astrid Folkvord Janbu


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

3.1.4 Remarks The contributing factors to uncertainty in reliability assessments are closely linked and can be hard to separate. Example of this can be uncertainty in reliability data due to poor statistical modelling. What is defined as data uncertainty and what is model uncertainty in this case? Another example can be lack of understanding of some system property. Is the system too complex or is it due to lack of competence? The separation is a challenge for both the underlying and direct factors.

3.1.5 Terminology An important aspect in this thesis is the treatment of the different concepts model, method, tool and technique. The concepts are often used in the same context, hence; a clarifying of what the different concepts comprise of is necessary: A model is a simplification of “the world” and can be either architectural or mathematical. An architectural model is a structural model which shows the logic/relations between the different model elements. A mathematical model is a model described in terms of mathematical operators, constants and variables. “Model” is easily confused with the expression “method”, which is a systematic and logical arranged description of how you do something. Take for example a mathematical expression. In one way it is a model due to its way of simplifying the world in terms of mathematics. On the other hand it is a logical description of how to calculate something. In this thesis the expression “model” is preferred in such cases as for the mathematical expression. But the expression method is still used where it is suited, like for an algorithm. A technique is seen as the same as method. A tool is a remedy for employment of a model, method or technique. In literature, the concept reliability analysis usually covers the systematic approach for describing and/or calculating reliability, while reliability assessment covers the overall process of reliability analysis and reliability evaluation. Since this thesis treats reliability analysis in the context of compliance studies, it is rather referred to reliability assessments than analyses. There may anyway be some sentences where reliability analysis should be used, but assessment is preferred in order to not confuse the reader.

3.2 Uncertainty representation Compliance with IEC 61508 has become a quality assurance within many industries. The standard does not explicitly treat the subject of uncertainties in quantitative reliability assessments of SIS, but still it indicates doubts about the validity of the results. The intention of AC is to ensure that the SIF is activated if a failure occurs. The AC prevents the SIS designers and system integrator from selecting a design fully based on the quantitative safety assessment and PFD, by requiring a certain level of redundancy based on the relationship between the SIL, HWFT and SFF. The constraints may therefore be interpreted as a limitation in hardware architecture and mistrust to the PFD (Lundteigen 2009). Further, the IEC 61508 indicates mistrust to the quality of the data used for the assessment. In Part 2, Annex C, IEC 61508 require a single‐sided confidence interval of at least 70% to be used when evaluating the failure rate (IEC 61508 1997). This is because the data material may not be sufficiently large enough to achieve statistical accuracy and data uncertainties may therefore be present. An upper limit failure rate from a 70% confidence interval reduces the probability to 0.3 that the real Master Thesis Astrid Folkvord Janbu

9


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

lambda is greater than the upper limit. Hence, the use of upper limit failure rate is a conservative approximation. There are several methods for identification and quantification of the different sources to uncertainty. In most cases, the focus is on data uncertainty related to the model parameters instead of the uncertainty related to the model itself. Quantitative uncertainty assessments, like uncertainty propagation, sensitivity analysis and importance measures, are generally for data uncertainty. Model uncertainty may also be treated with sensitivity studies and importance measures, but not quantified. Quantification of model uncertainty is thus still a research subject, and consists of development of epistemic models for model assumptions (NASA 2002). Uncertainty from incompleteness is a recently discovered topic and has not yet been paid much attention. The development of conservative calculation approaches increases and hence the completeness uncertainty may be taken into account. Quantitative uncertainty assessment in reliability assessments has been treated quite differently within different industries. The Norwegian offshore industry has never explicitly included uncertainty into reliability and risk assessments. It has been argued that the results themselves contain so much uncertainty that it would be meaningless to express uncertainty about uncertainty. The offshore industry has handled uncertainty in assessments by operating with conservative estimates, and makes that as an argument for always being “on the right side” within a confidence interval for a best estimate (Rausand and Øien 2004). Uncertainty is treated a lot more comprehensive within the nuclear industry, where uncertainty assessments always has been explicitly included in reliability assessment or Probabilistic Safety Assessment (PSA, named PRA in the U.S.). This has led to a systematic work of developing new methods for identifying and reducing the uncertainties involved in the assessments. Reliability assessments of nuclear power plants have also been required to be public, which has resulted in open criticism and new requirements for improvements (Rausand and Øien 2004).

10

Master Thesis Astrid Folkvord Janbu


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

4 DIFFERENT PERSPECTIVES ON UNCERTAINTY Reliability assessments express the uncertainty about the failure behaviour of a system. Uncertainty can be measured and described by its mathematical language, probability. Though the mathematical theories behind probability are well accepted, the interpretation of uncertainty is not. This chapter presents known perspectives and interpretations of uncertainty.

4.1 Categorization of uncertainties Uncertainties are categorized in order to evaluate the different contributions, like model‐, data‐, and completeness uncertainty. Further, the uncertainties may be categorized into two basic types, aleatory and epistemic uncertainty (Parry 1996). Aleatory derives from the Latin word alea, the rolling of dice. It describes something random that is not predictable. Aleatory uncertainty can therefore be defined as “uncertainty arising from or associated with the inherent, irreducible, natural randomness of a system or process” (nature.com 2008). Reliability assessments deal with many processes and systems which consist of aleatory uncertainties. It is for example impossible to predict exactly on which demand a Safety Instrumented System (SIS) will fail to respond. This is due to variability in the system that can not be eliminated because of inherent randomness which causes events with stochastic properties. This is why also aleatory uncertainty often is referred to as “stochastic uncertainty” (Mosleh, et al. 1995). Some processes or system behaviour are anyway to a certain degree predictable. Our predictions are based on our knowledge of the phenomena at hand. That is, if we have knowledge of how a certain phenomena act, it is easier to predict the outcome. In order to predict, a model is needed. The properties that define the model, like parameters and initial conditions, are based on available knowledge. Uncertainty then arises from the limitations in exactly assessing these properties (Kiureghian and Ditlevsen 2009). This type of uncertainty is called epistemic uncertainty and can be defined as “lack of knowledge about the performance of a system” (Aven 2003). As opposed to aleatory uncertainty, epistemic uncertainty can be reduced by gaining more information. Durga Rao et al., stresses the importance of separating between aleatory and epistemic uncertainties due to the various effects the different types may have on the system model. For mathematical reasons, by mixing both types of uncertainty, it is impossible to see what proportions of the uncertainty that can be directed to respectively aleatory and epistemic uncertainties (Rao, et al. 2007). A goal within uncertainty assessment is to reduce the total uncertainty of the model as much as possible in order to give confidence to the results. Since epistemic uncertainty is the only reducible uncertainty, it is crucial to address the uncertainties correctly in order to achieve reduction if possible. It is also important to address uncertainty for efficiency reasons; if all of the uncertainty is due to variability (aleatory uncertainty), it would be a waste of time and resources by collecting more data. History has shown that uncertainty that earlier was classified as aleatory uncertainty now is classified as epistemic. This fact has been used as an argument for several researchers who doubts if there exist something as aleatory uncertainty at all. The development of new technology has made it possible to achieve new valuable information for understanding phenomena which were assumed to be caused by natural variation. O’Hagan and Oakley argue that “…this variation (uncertainty) would be eliminated (or at least reduced) if only we were able to recognize and to specify within the model Master Thesis Astrid Folkvord Janbu

11


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

some more conditions” (O'Hagan og Oakley 2004). This discussion raises a question of where the limit goes between our lack of ability to understand and true randomness, or if there exist such a limit at all. At present time, with available technology and resources, it is practicable impossible to achieve complete knowledge about every system or process within reasonable time. The only advantage must be to separate those uncertainties that can be reduced from those that are less prone to reduction in nearest future (Kiureghian and Ditlevsen 2009). In the words of Kiureghian and Ditlevsen: “…there is a degree of subjectivity in the selection of models and categorization of uncertainties in engineering problem solving. This is inevitable. It constitutes the “art” of engineering. Ironically, this is the element of engineering that most distinguishes the quality of its practice.”

4.2 Interpretations of uncertainty Uncertainty can be defined as “lack of knowledge about the performance of a system” (Aven 2003). Reliability assessments express the uncertainty about future events, often in terms of probabilities. Application of probability is a confession of our lack of knowledge, because it states the uncertainty related to the unknown events. This is also why probability has a wide area of utilization; it realizes quantification of uncertainty by using mathematical expressions. The mathematical theories behind probability are widely accepted, but how we interpret it, is not. This is an important issue when it comes to reliability assessments as a decision support; how we understand the results may be different, depending on our point of view.

4.2.1 Realist interpretation In reliability assessments it is common to interpret failure rates or PFD as a characteristic of the system at hand. This is according to the realist interpretation which sees probability as a measure of a property, just like any other physical property (Watson 1993). Several standards, laws and regulations, including IEC 61508, are well suited to this way of interpreting probability. For example, a SIS designer has to show that the designed SIS has a probability of experiencing a dangerous undetected failure less or equal to a certain SIL level in order to comply with IEC 61508. Such requirements make no sense if reliability were not assumed to be an objective measurable property of the world. This interpretation is in conflict with knowledge beyond what is objectively measurable, like for example expert judgement. There are three well known philosophies which give meaning to the realist interpretation; the classical‐, the relative frequency‐ and the a priori theories. Classical The classical interpretation is the oldest interpretation and origins from probability in games of chance (Bedford og Cooke 2001). The theory is based on the assumption that every outcome has equal probabilities of occurring. This leads to the situation where the probability for an event is equal to the number of outcomes that satisfies the criteria for the event divided by total number of possible outcomes. The application is suited for problems with equally likely outcomes, like the rolling of a dice. The classical interpretation is therefore less convenient for reliability assessments

12

Master Thesis Astrid Folkvord Janbu


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

due to that the outcomes, like the probability of one failure vs. zero failure within a time‐interval, are in most cases not equally likely to occur. Relative frequency The relative‐frequency theory defines the probability of an event as the proportion of times the event occurs during a long series of repetition of independent and identical trials. This can often be difficult in practice, especially for problems where events rarely occurs like failures of safety instrumented systems. The observation time for each interval must be sufficiently large in order for the relative frequency to reflect the “true” probability. Collection of data for such systems is therefore very expensive and time‐consuming. Generic databases help the analyst in collecting data based on several similar systems. These data are not plant‐specific and one can therefore question how identical the trials must be before it significantly affects the quality of the data. A priori A parallel development with the relative frequency theory was the a priori theories which integrates the other interpretation of probability; a measure of degree of belief (Watson 1993). In this philosophy, a priori knowledge refers to prior knowledge about a population, rather than that estimated by recent observation. The a priori theory may be seen as both objective and subjective; it is objective due to that a priori knowledge should be based on available data, and not be a personal opinion. But it is also subjective since the evaluation is based on what one person belief, which may vary and is not a solid answer. Hence; a priori knowledge is the limit between the Bayesian and Frequentist approach to statistics.

4.2.2 Subjective interpretation The subjective interpretation of probability defines probability as a degree of belief, which means that the same event can have different probability, depending on which person you ask. It follows from this that the uncertainty which the subjective probability represent is purely epistemic due to its nature of only being knowledge‐based. The use of subjective probability is lacking the objectivity that is required in scientific problem solving or analyses of severe problems like reliability assessments. But subjective probability can often be used in combination with other applications when there is lack of quality data. Bayesian update with expert judgment is an example of that. Predictive epistemic The predictive epistemic approach is a fully subjective approach to risk and reliability assessments and defines probability as a measure of uncertainty. In contrast to the realist interpretation where the focus is on estimating the true statistical quantities, like PFD and failure rates, the predictive epistemic approach focuses on predicting observable quantities (Aven 2007). Observable quantities are phenomena that are unknown at the time of assessment, but occur (or may occur) in the future, like for example a failure on demand, a number of fatalities etc. In this approach, uncertainty can only exist in relation to observable quantities (future events). Hence; in this approach it is meaningless to talk about uncertainties in a statistical quantity like the PFD. The uncertainty is also assumed to be purely epistemic. That is; the uncertainty only arises from our limitations in predicting future events. Even though the framework is recently developed, it has created a lot of enthusiasm in the risk and reliability engineering community. Master Thesis Astrid Folkvord Janbu

13


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

4.3 Summary According to the relative‐frequency interpretation, it is due to aleatory uncertainty that we cannot predict if the event, A, will occur in the next trial or not. For every trial performed, the fraction of successes will tend to limit against the “true” probability, P(A). P(A) therefore reflects the aleatory uncertainty of the occurrence of the event A. Our lack of knowledge about the “true” probability, P(A), is due to epistemic uncertainty and can be reduced by collecting more data (Flage, Aven og Zio 2009). This report is limited to the probabilistic framework. This framework has received its name due to the use of probability density functions to describe the uncertainty. Other methods for representing uncertainty than the probabilistic framework are the imprecise/interval probability, evidence theory, fuzzy probability or possibility theory. For more information about alternative uncertainty representation the reader is recommended to read the article by Flage, Aven and Zio (2009). How you understand uncertainty will define how you represent it, see Table 3. Table 3 Notions of uncertainty and associated representations, adapted from Flage, Aven and Zio (2009)

Notions of uncertainty

Representation

Randomness (aleatory uncertainty)

Probability (relative frequency‐based)

Lack of Knowledge (epistemic uncertainty)

Probability (epistemic‐based), Evidence theory

Indeterminacy

Imprecise/interval probability

Lack of precision (imprecision, vagueness, ambiguity, fuzziness)

Fuzzy probability, Possibility theory

The realist interpretation is the most common way of understanding probability in reliability assessments. It is objective and thus it is also uncontroversial as a basis for decision making. But it is also in conflict with an important part of reliability assessments; the subjective interpretation and hence the use of expert judgement. This contradiction has been discussed since the origins of probabilities. The need for objective interpretation results in understanding probability as purely statistical based on stochastic laws of chance processes. But as a result of the objective assessments’ lack of including knowledge, a need for epistemological interpretation arises. Here, probability is defined as a degree of belief which is great contrast to the objective view (Watson 1993). IEC 61508 and other standards or regulations often require demonstration of compliance according to a quantitative criterion. Such requirements are built upon a realist interpretation where probabilities are objective and measurable as a true property of the system under evaluation, or else such requirements would not make sense. The predictive epistemic approach is fully subjective and believes that probability is only an uncertainty measure for (possible) observable quantities. In this framework it is therefore meaningless to discuss uncertainties in the PFD estimate. This is often problematic since there are several aspects in a reliability assessment that give reasonable doubts if the right answer is achieved.

14

Master Thesis Astrid Folkvord Janbu


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

5 APPROACHES FOR UNCERTAINTY ASSESSMENTS Risk and reliability assessments will to some degree involve uncertainties due to the nature of the assessment processes. Uncertainty assessment is a useful remedy when making critical decisions under uncertainty. This chapter presents well known quantitative uncertainty assessments techniques with a discussion of their application areas in reliability assessments.

5.1 General Randomness of physical processes modelled in reliability assessments leads to the use of probabilistic models. Based on the scenario, model assumption and parameters are set based on available knowledge about the behaviour of the system at hand. Uncertainty is associated with these conditions, and hence probabilistic models are also used to represent our state of knowledge regarding the numerical values of the parameters and the validity of the model assumptions (NASA 2002). It is important that the uncertainties in natural variability of physical processes like (aleatory uncertainty) and uncertainties in knowledge of these processes (epistemic uncertainty) are sufficiently accounted for in the decision making process. The main difference between a reliability assessment and an uncertainty assessment is that reliability assessments express the aleatory uncertainty about the future failure behaviour of a system, while uncertainty assessments express mainly epistemic uncertainty about the information (model output) which the reliability assessment provide. That is, the uncertainty in the system model prediction. Generally, there are three main techniques used for quantifying the effect uncertain model input has on the model output: • • •

Sensitivity analysis: methods for analyzing how the variation (uncertainty) in the model output can be apportioned to different sources of variation in the model input Importance measures (sensitivity coefficients): methods used to identify the dominant contributors to failures in a system model Uncertainty propagation (uncertainty analysis): methods for analyzing how an input uncertainty transforms onto the model output

All these techniques are well suited for describing data uncertainty due their application on numerical values. But, they can also describe model uncertainty by changing assumptions and structure of the model and observe how this affects the level of uncertainty. Further, uncertainties due to incompleteness may be described through sensitivity analysis or conservative approaches, where the model and/or data are updated with the new scope. As discussed in Chapter 3, there are several sources to uncertainty in a reliability assessment. Generally the process of a reliability assessment can be describes as described in the stippled frame in Figure 4 shows. Here, the model inputs are assumed consist of both uncertain and fixed inputs. The system model with the given input, like failure rates, calculates the model output which may be results like PFD. The input uncertainty may be the assumed uncertainty distribution of the uncertain input, like the assumed uncertainty distribution for a failure rate. The output uncertainty is the uncertainty distribution for the estimated results, like the results from uncertainty propagation. It is also often of Master Thesis Astrid Folkvord Janbu

15


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

interest to study quantities of the uncertainty distribution, called quantities of interests. These may be the variance of the uncertainty distribution, the min and max values from a simulation etc. If a decision is to be made after the uncertainty assessment, a predefined criterion has to be set. The criterion may either be linked to the results from model output, like the PFD, but also to the output uncertainty, like confidence bounds which the model output should have satisfied. If the decision criteria are not met, the feedback actions should use the results from the assessments to reduce the dominant contributors to uncertainty. This could be done through design changes, expert evaluations, extended data collection etc, depending on what type of uncertainty that is dominating.

Figure 4 Framework for uncertainty assessments, adapted from de Rocquigny, Devictor and Tarantola (2008)

5.2 Sensitivity analysis Sensitivity can be defined as the degree of response of an output to change in the input. An input may be a model element like a numerical parameter value for a component or a model assumption. A sensitivity analysis is “...the study of how the variation (uncertainty) in the output of a mathematical model can be apportioned, qualitatively or quantitatively, to different sources of variation in the input of a model” (Saltelli, et al. 2008). Sensitivity analyses applied in reliability assessments study how sensitive the reliability is with respect to changes in input parameters or model assumption of the system model. Such analyses are performed to provide information about mainly two things (NASA 2002); ‐ ‐

Indication of which of the inputs in the analysis whose change in value cause the largest change in reliability Identification of which components whose data quality are sensitive or not for the analysis

If the portion change in the model output (result), is large compared to the change in input, we say that the system is sensitive to the input element that was changed. It is important that only one

16

Master Thesis Astrid Folkvord Janbu


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

element is change at a time, and the others remain fixed at a baseline value, in order to make the results comparable. Notice that sensitivity analyses do not describe the level of uncertainty related with uncertain inputs or elements, only the effect of their changes. Elements with high sensitivity may therefore not necessarily be associated with great uncertainty. However, uncertain inputs that are found to have a great impact on the result are also suspected to have the associated level of uncertainty spread onto the result with the same degree of impact. Hence; input with high sensitivity should be further investigated with uncertainty analysis. Elements with low sensitivity, on the other hand, should not be dedicated resources for further analysis since their impact on the results are not of a significant order. Sensitivity analyses can therefore be used as a tool for identification of which sources to uncertainty that weights more on the conclusions of a reliability assessment, and then allocating resources more efficiently. Further, sensitivity analyses can also be applied as a quality assurance tool for the robustness of the reliability modelling and hence avoidance of model uncertainty. The modelling is conditional on the validity of its assumptions. The assumptions may change the architectural model or data used for the assessment, and can have a high influence on the final results. Model assumptions are in fact often used to overcome data uncertainty due to shortcomings of data. In order to balance for lacking information, great weight is put on the analyst’s judgment (NASA 2002). The impact of such assumption can also be dealt with sensitivity analysis for ensuring that the simplifications/limitations made will not significantly affect the results, in case of high model uncertainty. As discussed in this section, sensitivity analysis is well suited for investigating the possible effects that both input data and model uncertainty may have on the results. Also, uncertainty from incompleteness may be investigated with this analysis technique. A sensitivity analysis can also be performed for measuring the effects of completeness uncertainty by including or excluding possible relevant elements like failure modes and then evaluate if they are significant for the results or not.

5.3 Importance measures A sensitivity analysis provides information about how the variation in the reliability can be blamed on the input related to the components in the model. Some components in a system are more important for the system reliability than other components. Consider two components. If they are modelled in a parallel structure, the lifetime of the system is equal to the longest lifetime of the two components. If they are modelled as a series structure, the lifetime of the system is equal to the smallest lifetime of the two components. The series structure must therefore comprise of components with higher reliability in order to achieve the same system reliability as the parallel structure. From this example it is obvious that component importance is sensitive with regard to both the reliability of the components (input) and the system structure (model) built to achieve a function, and that the same components may have different importance depending on the structure. For larger system models it is more difficult to understand the importance of each component without further analysis. It is thus often of great interest to rank components with regard to certain quantitative sensitivity coefficients called importance measures. The different importance measures may be used in order to rank the relative values of the components with regard to improvement Master Thesis Astrid Folkvord Janbu

17


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

potential, contribution to unavailability (blaming) and importance for achievement of system function. The information generated by the ranking process may be used for redesign, allocation of redundancy, risk mitigation, risk based decision making, allocation of resources etc. Importance measures serve as a key part of sensitivity analyses and aids in debugging the model but do also provide valuable information about the results even after the model is considered to be complete. Sometimes the relative reliability between components is more important knowledge than the overall reliability itself, since it might pinpoint vulnerabilities in the system that needs to be addressed (NASA 2002). The following quantitative importance measures are introduced in this section and used later on in sensitivity analyses in Chapter 9 and 10:

5.3.1 Birnbaum’s measure This measure describes the reliability importance of a component i as I B (i | t ) =

∂ h( p(t )) , for i = 1,2,…, n ∂ pi (t )

which is the partial differentiation of the system reliability with respect to the component reliability, pi. The letter h is here used to express that the components are independent. We see from this that a large value of Birbaum’s measure occurs when a small change in the reliability of component i results in a large change in the overall system reliability, and hence the system is sensitive for component i. We recognize this as classical sensitivity analysis. Through pivotal decomposition when the components are independent, Birbaum’s measure can further be written as I B (i | t ) =

∂ h( p(t )) = h(1 i , p(t )) − h(0 i , p(t )) . ∂ pi (t )

Birbaum’s measure can then be written as the difference in system reliability from a when component i is considered perfect, h(1i, p(t)), till the situation when the component has failed h(0i, p(t)). This situation is illustrated in Figure 5. We see that the slope of the line is equal to Birbaum’s measure since the function is with respect to pi(t) which increases here with one unit. Mark that Birbaum’s measure does only depend on the system structure and reliability of the other components, not the actual reliability of component i. This may be considered as a weakness of Birbaum’s measure since it provides an importance measure with respect to a specific component and is independent of the reliability of the component to be measured. A component i is said to be critical for the system if the other components in the system are in such a state that the system only functions if component i functions. Rausand and Høyland showed that Birbaum’s measure can, based on this definition of a critical component, be defined as “... the probability that the system is in such a state at time t that component i is critical for the system”. For the deduction of this definition, the reader is referred to Rausand and Høyland (2004). A component with a Birbaum’s measure equal to a high value indicates a component that the system reliability is vulnerable to. A failure of this component is thus critical for the overall reliability. This is not the same as saying that the component is probable to cause a system failure (such probabilities are often called blaming measures). Take for example a parallel structure of two 1oo1 voting components, where a CCF of the two components in parallel is modelled as a component in series 18

Master Thesis Astrid Folkvord Janbu


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

with the parallel structure. The CCF “component” will score a high value according to Birbaum’s measure since the change in system reliability will be large due to that the system fails if the CCF occurs, h(0CCF, p(t)) = 0. But, given that the system fails the CCF may not be the most likely to blame since the probability of a CCF failure may be lower than the probability of failure for the parallel structure. Birbaum’s measure evaluates the effect of a failure with regard to system reliability rather than the likelihood of a failure with regard to system reliability (blaming). Two well known blaming measures are the improvement potential measure and the criticality importance measure.

Figure 5 Birbaum’s measure (Rausand and Høyland 2004)

5.3.2 Improvement potential measure The improvement potential is an extension of Birbaum’s measure and has received its name because it measures the improvement in system reliability by assuming that a component i is a perfect component such that pi (t) = 1. The improvement is based on the difference between h(1i, p(t)) and h(p(t)) and is denoted IIP = (i|t) for a component i at time t;

IIP (i |t) = h(1i , p(t)) − h(p(t)) for i = 1, 2, …., n. This formula is shown graphically in Figure 6:

Figure 6 Improvement potential

Since the ranking of this measure give an indication of which components that is most important to improve the reliability of with respect to the overall system reliability, it can thus be interpreted as which of the components that is also most likely to blame if a failure occurs. In reality is impossible to Master Thesis Astrid Folkvord Janbu

19


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

achieve perfect reliability for a component. Thus, it may also be possible to calculate the improvement potential for a more credible value which reflects the possible achievable reliability, noted pi(n)(t). A credible improvement potential (CIP) is then

ICIP (i |t) = h(pi(n) , p(t)) − h(p(t)) 5.3.3 The criticality importance measure The probability that the system is in such a state that component i is critical, is equal to Birbaum’s measure. Mark that the definition of a critical component is a description of the states to the other components in the system, not the critical component itself. If component i is critical and fails, the system will also fail. If we turn the case around and ask for the opposite; given that the system is failed at time t, what is the probability that a component i is critical for the system and is failed at time t? This probability is called the criticality importance measure. The criticality importance measure is also related to Birbaum’s measure and is defined as “ ... the probability that component i is critical for the system and is failed at time t, when we know that the system is failed at the time“(Rausand and Høyland 2004). This is mathematically formulated as;

I CR (i |t ) =

I B (i |t ) ⋅ qi (t ) , Q0 (t )

where qi(t) and Q0(t) is respectively the fault tree notation of component (basic event) and system unavailability. In other words; the criticality importance measure describes the probability that component i caused the system failure, when we know that the system is failed at time t. In practice, this measure is frequently used to prioritize maintenance task, because a repair of the blamed component will make the system function again. Such prioritizing of maintenance tasks is a great time‐saver, especially for large, complex systems.

5.4 Uncertainty propagation While sensitivity analysis and importance measures study the effects that changes in numerical input values or model assumptions have on the output, uncertainty propagation study how the uncertainty related to the input parameters spreads onto the output of the model (deRocquigny, Devictor and Tarantola 2008). Since this report is limited to embrace only the probabilistic framework for treatment of uncertainty, fuzzy set approach for uncertainty propagation is not presented here. Probabilistic approach for uncertainty propagation is also most commonly used. As discussed in section 3.1.2, there are many aspects that may lead to data uncertainty, also called parameter uncertainty. The epistemic uncertainty related the input parameters, λ, is not reflected in the calculated reliability. Thus uncertainty propagation is a useful remedy in order to analyse the level of uncertainty associated with the results. The probabilistic framework for uncertainty propagation follows a two step process (NASA 2002); • 20

First, assign a probability density function (pdf) to each of the random (uncertain) input parameters. The pdf reflects the state of knowledge and represents the epistemic Master Thesis Astrid Folkvord Janbu


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

•

NTNU

uncertainty related to the parameter. The pdf can be selected from different distributions, depending on what properties that is best suited for the component or system they represent. In reliability assessments, the lognormal or gamma distribution is usually used as pdf for data uncertainty. Then, generating a pdf for the output function by combining the input pdf

The combined pdf then reflects the uncertainty associated with the estimated reliability, but should still be carefully interpreted because the combined distribution itself does only reflect a portion of the uncertainty, namely the data uncertainty. In addition, the confidence of combined pdf also depends on the validity of the model assumptions of the system model and the selected distributions assumed for the input parameters (NASA 2002). Figure 7 shows the relation between the uncertain parameters, Îť, the uncertain events like the unavailability of components, x, and the reliability of the system as a function of x, R = h (x1, x2, ...). Three main techniques are used to propagate uncertainty; simulation, moment propagation and discrete probability distribution (NASA 2002). Due to the development of computer tools, simulation has become the most common technique and thus this section is limited to simulation techniques. By the use of computer software tools it is possible to simulate the data uncertainty through a system model several times, by generating the combined pdf several times and let the state statistics define the properties, like fit, mean, variance, percentiles and other quantities of interest (deRocquigny, Devictor and Tarantola 2008). The simulation tool repeats scenarios either a number of times or over a defined scope of time. Simulation is a useful tool when modelling future behaviour of systems with low frequencies since the real life system produce small amounts of data, and thus the data uncertainty may be large.

Figure 7 Uncertainty propagation (NASA 2002)

Master Thesis Astrid Folkvord Janbu

21


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

The simulation maps distributions for the input parameter into a combined distribution by using sampling techniques. The two most known sampling techniques in software tools at present time is the Monte Carlo or Latin Hypercube sampling (LHS).

5.4.1 Monte Carlo sampling Monte Carlo sampling is completely random; the method generates random failure times for each component’s failure distribution. Consider a survival function from a desired distribution, R(t), where 0 < R(T) < 1. We assume that the values of R(T) are uniformly distributed over the interval between 0 and 1. Monte Carlo sampling generates a uniformly distributed random number, U, between 0 and 1. We then let U represent R(T). This means that; U = R(T) and T = R‐1(U). We see that when random variable U generates values between 0 and 1, the inverse survival function of U generates randomly distributed failure times from a predefined probability distribution. This technique is valid for any uniform random number U, 0 < U < 1. The procedure is repeated until the desired number of simulated failure times, T, is reached. The same methodology can be applied for different types of distributions. The flow network can therefore consist of components with different probability distributions and different parameters, which makes it a flexible technique. Often, two‐phase Monte Carlo sampling is necessary, especially when simulating uncertainty at different levels, like for parameters and events as illustrated in Figure 7. Two phase Monte Carlo sampling is built upon the same principles as for standard Monte Carlo sampling. The difference lies in that two‐phase sampling samples in two loops, one inner loop, then uses the samples from the inner loop as input for the outer loop. A problem with Monte Carlo sampling is that the samples are more likely to be drawn from areas of the distribution where the probability of occurrence is higher. Extreme values, from the tails of the distributions, are then likely to not be represented sufficiently in samples. In order to solve this problem, a high number of repetitions are needed and this may be quite extensive and time consuming. This issue is problematic for reliability models that employ skewed probability distributions like the lognormal or gamma distribution, where the right tail may be long (Morgan and Henrion 1990).

5.4.2 Latin Hypercube sampling (LHS) Latin Hypercube sampling (LHS) was developed in order to solve the problem with sampling of extreme values which Monte Carlo sampling induced. In order to ensure that the whole spectre from a distribution is represented in a sample, LHS use a stratified, also called layered, sampling method. The cumulative distribution function for an input parameter is divided into n intervals, where n is the number of simulations to be run. In contrast to Monte Carlo sampling, LHS samples a random value from the input pdf from within each interval, without replacement. The generation of a random variable within an interval is found in the same way as for Monte Carlo sampling. It is the layered sampling, where one interval is selected only once, that is the main difference. In this way, the coverage of the distribution domain is uniform and more representative and hence a smaller number of samples are needed.

22

Master Thesis Astrid Folkvord Janbu


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

5.5 Sensitivity vs. uncertainty It is common to mix the expressions “uncertainty” and “sensitivity”. Generally speaking, uncertainty and sensitivity analyses investigate the robustness of a study including mathematical modelling. But it should be noticed that a sensitivity analysis is not the same as uncertainty analysis, also called uncertainty propagation. This is because the sensitivity analysis does not express the uncertainty related to the uncertain inputs, only the effect of changes of them. The analyst may, however, based on the sensitivity study understand which model input or assumptions that may be crucial for level of uncertainty in the assessment, based on their importance. Hence, a sensitivity study is well suited as a basis for an uncertainty analysis. While sensitivity analysis identifies what source of uncertainty weights more on the study's conclusions, an uncertainty analysis is the only technique that actually describes the level of uncertainty related to the conclusions. The different techniques vary in suitability for describing the different sources to uncertainty; model uncertainty, data uncertainty and completeness uncertainty. As an end to this chapter, Table 4 presents how the different techniques represent the different uncertainty contributors. Table 4 Summary of uncertainty assessment methods

Source/Method

Importance ranking / Sensitivity analyses

Uncertainty propagation

Data uncertainty

Well suited as a basis for uncertainty propagation. The results of these techniques can be used as a recommendation for which input that should be further analyzed. Does not express the level of data uncertainty.

Well suited! The technique show how the data uncertainty spreads onto the result. Perfect for reflecting the associated data uncertainty in the conclusions.

Model uncertainty

Well suited for evaluating the effects of model assumptions in the results. Can be used as a basis for evaluating if the validity of assumptions should be closer investigated.

Rather use sensitivity analysis.

Completeness uncertainty

Suited for investigating the effects of increasing or decreasing scope. May analyze the effects of including possible relevant failure modes.

Rather use sensitivity analysis, importance ranking or conservative approximations.

Master Thesis Astrid Folkvord Janbu

23


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

6

NTNU

UNCERTAINTY REPRESENTATION IN COMPLIANCE STUDIES

A compliance study is the documented verification process of functional safety for a SIS. The results are foundations for important decisions with regard to design, safety, economic stakes and legal requirements. The information provided by the reliability assessments is associated with uncertainty due to the process of the assessment. The level of uncertainty in the results should be presented to the decision maker. This chapter presents the process of documentation and compliance studies through a lifecycle and a framework for uncertainty representation.

6.1 Safety lifecycle Compliance with IEC 61508 has become a quality assurance within many industries. IEC 61508 is not mandatory, but is often given as a “legal” requirement by authorities. The standard is frequently used to specify and verify safety requirements through the life cycle of a product. Compliance reports are the documentation of SIL verification for a product, and are therefore of high importance for both the manufacturer and customer. The process of reliability verification for a safety instrumented system is a complex process linked to the different phases of the product’s lifecycle. The need for a systematic activity plan for specification and achievement of safety requirements resulted in the IEC 61508’s safety lifecycle, shown in Figure 8.

Figure 8 Safety lifecycle (IEC 61508 1997)

24

Master Thesis Astrid Folkvord Janbu


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

The safety life cycle has 16 phases, each of them described in detail in IEC 61508‐1, section 7. Roughly speaking, we can divide them into the three following groups: • • •

Phase 1‐5 address analysis Phase 6‐13 address realization Phase 14‐16 address operation

6.1.1 Analysis A concept and scope is selected based on an understanding of the equipment under control and its environment. A functional safety management plan then defines the responsibilities, organisation and management planning for the for the SIS development. Hazard and risk analyses are performed in order to identify the necessary safety functions for prevention and control of the risk associated with the equipment under control. Thus, we say that IEC 61508 is a risk‐based approach. The overall safety requirements are defined and documented in a safety requirement specification (SRS) report. The requirements are allocated to the technology and functions, and a preliminary design is established.

6.1.2 Realization A compliance study is then carried out in the beginning of the realization phase, where the reliability calculations of safety functions are performed in order to evaluate if requirements from SRS are met. At this time, the detailed design has just started and the data available for reliability assessment vary a lot. OLF 070 recommends the use of generic data for compliance studies, but this it not always possible mainly due to three things; the newness of the technology, the progress of the design (vendors may already have been selected before execution of compliance study) and the technology available on the market. Generic data do not exist for new technology due to the lack of field experience. The SIS integrator (manufacturer) may also already have decided which supplier to use for the different components and thus have vendor data available. In some cases, the possible suppliers for a certain technology are limited. Hence; the SIS integrator is forced to use or prefers a specific supplier and thus know which vendor data to use from earlier experience. The SRS may be updated based on the results from the compliance study. During realization, the detailed design is updated with possible new vendor data and more specific requirements from the SRS. A functional safety assessment (FSA) should be carried out at during realization in order to verify the hardware, software and integrated system against the specified requirements. Detailed plans are developed for maintenance, operation, safety validation, installation and commissioning. Vendors for components are selected and have to deliver a safety analysis report where it is shall be documented that the equipment to be delivered satisfies the requirements from the SRS. Installation and commissioning are executed according to plans. Overall validation is then performed before operation begins. During the whole realization phase, the focus should be on revealing failures and avoid introducing them.

Master Thesis Astrid Folkvord Janbu

25


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

6.1.3 Operation Operation should be performed according to the plans. Also here the focus should be on both failures and avoidance of introducing them. This phase is extremely important for collection of field data. The data may be used as input to generic databases, but also as input for verification that the SIS has achieved required SIL. Necessary modifications are performed, then iterated back to relevant phase until decommissioning.

6.1.4 Documentation The documentation through the safety lifecycle is vital for achieving function safety in a proper manner. IEC 61508 is seen as a large and complex standard and there have been published several reports for how to interpret and apply it. Among these are the OLF 070, guidelines for application of IEC 61508 and IEC 61511 in the Norwegian petroleum industry. The guideline stresses the difference between verification and validation. According to IEC standards, verification is an independent check for each phase in the safety lifecycle in order to demonstrate that the deliverables meet the requirements. Validation is seen as an extension of verification, where the check covers several phases, not only one. The quality assurance of the safety requirements specifications is not defined as validation, but as a FSA. The difference between the concepts is shown in Figure 9.

Figure 9 Relationship between FSA, validation and verification (OLF 070 2004)

During the whole lifecycle process FSA are carried out through typically audits. The FSA may be performed by an independent 3rd party in order to ensure that the process of achieving functional safety for the SIS during a safety lifecycle is on the right track. The compliance reports are the documentation of whether the safety integrity requirements are met or not. The FSA, on the other hand, checks if the overall functional safety for the SIS is met through the specification and achievement of requirements during the safety lifecycle. IEC 61508 does not provide an explicit method to perform a FSA, only a framework. The necessary documentation through the lifecycle and responsible is shown in Figure 10.

26

Master Thesis Astrid Folkvord Janbu


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

Figure 10 Documentation hierarchy

Due to the IEC’s interpretation of validation and verification, a compliance study may be seen as both verification and validation. For example, the results from the reliability assessments are checked against the quantitative hardware requirements as stated in phase 5. But these requirements were already stated in the initial SRS from phase 4, thus the hardware compliance is also a validation. A compliance study is performed by the SIS integrator, who has to document that the safety requirements from the SRS are fulfilled for the integrated system. This means showing compliance with all the SIL requirements, as shown in Figure 2: 1. The calculated PFD shall be less than the maximum accepted PFD from required SIL (Checked through reliability assessments compared against SIL requirements for hardware in IEC61508‐1, table 2 and 3) 2. Required HWFT shall be achieved (Checked through tables in IEC 61508‐2, 7.4.3.1.2) 3. Software requirements shall be fulfilled (Fulfilment of requirements from IEC61508‐3, usually covered by the instrumentation supplier) 4. Avoidance and control of Systematic Failures (Must be able to document low technological uncertainties and a QA system. The QA system is usually documented through confirmation to ISO 9001:2000. If new technology one shall document avoidance as described in IEC 61508‐2, clause 7.4.4 & 7.4.5 and Annex A & B. Detailed completion of tables in annex A and B is regarded as compliance with requirements)

6.2 Uncertainty in a lifecycle perspective The level of uncertainty associated with the results from a compliance study may depend on where in a lifecycle the study is executed. A lifecycle from a producer’s perspective is thus presented as a remedy for this discussion. The lifecycle is found in Murthy et al. (2007) and illustrated in Figure 11. The lifecycle consists of 8 phases which is located within three development stages and three detail levels. Each phase is described in detail in Murthy et al. (2007), but a short presentation follows below; Stage I is the pre‐development phase and includes phase 1, 2 and 3. A product is to be developed based on a need because of an identified business or technology gap between the current situation and a potential future. At business level a concept is selected, reliability requirements are set and allocated at system and component level. A great challenge during the early period of stage I is that Master Thesis Astrid Folkvord Janbu

27


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

the system components and reliability models are not selected yet. Hence, the necessary data for the product development are not known at this point of time. Stage II is the development stage and includes phase 4 and 5. This stage is mainly used for new technology since proven technology usually is sufficiently developed and verified. The specifications derived from the pre‐development stage are tested in a closed environment, similar to future field environment, both at component and system level. For new technology, this stage will be used to review the design in phase 2 and 3. Stage III is the post‐development stage and includes phase 6, 7 and 8. Phase 6 deals with the processes from production until commissioning, which involves production, construction and overall installation and commissioning planning. Phase 7 treats the logistics, installation, operation and disposal of the product. Phase 8 is an evaluation phase of the product at a business level, where actual reliability and improvement potential are presented (Janbu 2008).

Figure 11 Lifecycle from a producer perspective (Murthy, Østerås and Rausand 2007)

Three important PFD measures are discussed in relation to the lifecycle model; the required PFD, the predicted PFD and the estimated PFD. The required PFD is the requirement which arise from the SRS. This value is a result of the reliability allocation to the SIF. The compliance report has to document whether the SIS, during early design, satisfy the required PFD by performing reliability calculations with either generic or vendor data. The calculated PFD from the reliability assessment is called the predicted PFD. The estimated PFD is discovered during operation and field experience, and is the PFD calculated with the data collected out in the field. For an overview of these concepts, see Figure 12 The requirements are set, allocated and thereafter documented in the SRS in phase 1. A compliance study should then be executed after the preliminary design in phase 2. Thus a compliance study is often performed during phase 2 or in the beginning of the detailed design in phase 3. Early at stage I the product is still roughly planned. The components, reliability model and data to be used may still be unknown to the SIS integrator. The results and thus the predicted PFD from a

28

Master Thesis Astrid Folkvord Janbu


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

reliability assessment at this stage are therefore more uncertain than at a later stage in the lifecycle due to the lack of knowledge. The development stage (stage II) often results in a redesign due to failures or improvement potential identified by the laboratory testing and design review of the detailed design from phase 2 and 3. If redesign is necessary, the producer has to go back to phase 2 and 3.

Figure 12 Overview of data related to PFD, adapted from (Lundteigen 2008)

6.2.1 Completeness uncertainty An important aspect in a reliability assessment is the analyst’s knowledge about the system properties, which is especially vulnerable during the early phases of a lifecycle and for new technology. The information available will be crucial for the quality and coverage of the assessment. The analyst must be able to define a clear scope, identify scenarios, consequences and belonging data. Some of the system characteristics, like hidden failure modes, may not be encountered for in reliability assessments. This increases the uncertainty level due to the incomplete scope and may lead to a non�conservative estimation of the reliability. Relevant system characteristics that are not accounted for may anyway be discovered during later phases through redesign, functional testing and operation. Completeness uncertainty is thus often reduced during the lifecycle because of the updates of the scope.

6.2.2 Model uncertainty The analyst must also understand the system behaviour in order to select the best suited model for the assessment. When a compliance study is performed in phase 2 or 3, information for selection of suitable model may not be available. The calculation models used could therefore be less suited the properties of the final developed system. The more a model reflects the real system, the more model uncertainty is reduced. After detail design in phase 3, the SIS integrator will have a new foundation for model selection, where system characteristics are defined.

Master Thesis Astrid Folkvord Janbu

29


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

6.2.3 Data uncertainty Data uncertainty decreases during the lifecycle phases due to the constantly updates of new data, especially during operation and field experience. In stage I, generic data are used in phase 1 and in the beginning of phase 2. Between phase 2 and 3, vendors for equipment are selected and hence vendor data are also available. During stage II, data may be updated with test data. SIS is often tested with accelerated testing in order to compensate for the low failure frequencies. Hence, these data may be biased due to the unnatural testing environment. The best data are achieved when the SIS is placed in field environment, like during operation in phase 7. Bayesian updating is well suited for implementing new data into generic data such that data uncertainty may be reduced the more proven a SIS is. In the case of lack of data, as for new technology, expert judgement is frequently used.

6.2.4 New Technology New technology is especially vulnerable to uncertainties in reliability assessments due to the lack of knowledge and the risk of omitting important aspects during product development. Det Norske Veritas (DNV) has produced a recommended practice for qualification procedures for new technology, DNV‐RP‐A203. The main objective of the recommended practice is to “…provide a systematic approach to the qualification of new technology, ensuring that the technology functions reliable within specified limits” (DNV 2001). The procedure is applicable for components, equipments and assemblies defined as new technology, but also in cases where new technology is to be integrated in into a larger system comprised of proven technology. DNV‐RP‐A203 describes the level of technical uncertainty related to the technology used, where the technology is classified according to its newness and the newness of its application area, see Table 5. Table 5 Classification of technology (DNV 2001)

Application Area

Technology Proven

Limited field history

New or unproven

Known

1

2

3

New

2

3

4

This classification implies the following: 1) No new technical uncertainties. 2) New technical uncertainties. 3) New technical challenges. 4) Demanding new technical challenges. The classification scheme presented in Table 5 may be used as a basis for a qualitative determination of the level of uncertainty in the available information. Uncertainty in the model input will also be reflected in the model output. Hence; if technology is classified within category 2, 3 or 4, this should 30

Master Thesis Astrid Folkvord Janbu


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

be taken into consideration when setting scope and evaluating the results. Such uncertainties may be taken into account for when using conservative approximations, performing uncertainty propagation for the available data and use of expert judgement where data is not available or limited.

6.3 Integrated decision making As described in section 6.2, the uncertainty is at the highest level early in a lifecycle perspective. It should be marked that this is also when most compliance studies are executed. The uncertainties associated with the results should be reported to the decision maker such that he or she is aware of the risks related to the decision. A framework for integrating uncertainty into the basis for decision making is presented in Figure 13. The dotted line covers what is treated in this thesis and answered to for the case study. The framework is motivated by the report “Guidance on the Treatment of Uncertainties Associated with PRAs in Risk‐Informed Decision Making” by United States Regulatory Commission (2009).

Figure 13 Integrated decision making for hardware safety integrity

For compliance studies, the whole framework presented is the SIS integrator’s responsibility. In practice, reliability assessments are often handled by consultancy companies in order to hire competence or due to independent evaluations. Definition of decision forms the problem to be assessed. In compliance studies, the decision for hardware safety integrity is if the SIS complies with the required SIL according to procedures from IEC 61508. The requirements then have to be assessed and later documented in a SRS. STEP 1 involves initiating a reliability assessment, where scope has to be defined, models have to be selected and data collected. The scope has to include all significant contributors to unavailability. The level of detail should also be of such a degree that the assessment is sufficiently documented. STEP 1 for the case study is presented in Chapter 8. STEP 2 is the execution of the reliability assessments as described from STEP 1. Probabilistic assessment is here mentioned due to the following quantitative uncertainty assessments in STEP 3.

Master Thesis Astrid Folkvord Janbu

31


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

For case study this is presented in Chapter 9 (fault tree analysis) and 10 (simulation). For further information about reliability assessments, the reader is referred to Janbu (2008). STEP 3 comprises of the uncertainty assessments for model‐, data‐ and completeness uncertainty, both quantitative and qualitative assessments (judgements). Relevant methods are described in chapter 5, and performed for case study in chapter 9 and 10. STEP 4 is the basis for decision making. The conclusions drawn from the uncertainty assessment in STEP 3 should be included in the compliance report in order to state the uncertainties related to the results. A qualitative evaluation of the level of uncertainty associated with the predicted reliability should be presented. The decision maker may not have the competence needed in order to interpret quantitative results from an uncertainty assessment. A presentation of only these may therefore be less valuable. The qualitative uncertainty evaluation may be presented as an own chapter before the conclusions. As a part of this evaluation, the compliance report should include a process description of the reliability assessment. Further, the following information from the uncertainty assessments should be presented: •

Sensitivity analysis: The information from a sensitivity analysis should provide an overall evaluation of the significance of modelling assumptions, scope and data that are suspected to be uncertain. The reasons for suspicion should also be documented. The results may be presented in a table including the original assumption (base case), the alternative assumption, and the change in the numerical results. The difference in numerical value between the quantities may easily state the possible significance. Importance measures: The results should include an overview of system features that are the dominating contributors to unavailability, and thus risk. Different measures should be used in order to reflect several important aspects of the system regarding reliability. A qualitative evaluation of the numerical values should be given. Uncertainty propagation: The statistical quantities, and their scope, from an uncertainty propagation analysis may be difficult to interpret. It is therefore important that a qualitative evaluation of the results is presented where the conclusions drawn from plot, min and max interval and standard deviation are explained.

Based on this information, an overall evaluation of the three main sources to uncertainty should be discussed. Further, it should also be emphasized which uncertainties that are critical to the results or not (Drouin, et al. 2009). The accuracy of the results should be at such a level that the decision maker may distinguish risk‐significant elements from those of less importance. The compliance report should in the conclusion recommend compliance or not with the requirements from the SRS based on the results from both the reliability‐ and uncertainty assessment. The confidence in the recommendation should also be described in light of findings from STEP 2 and 3. If the uncertainty assessment concludes another recommendation regarding compliance than what the reliability assessment does, this should be argued for with results from the uncertainty assessment and documented in the report.

32

Master Thesis Astrid Folkvord Janbu


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

If the base case results from STEP 2 do not meet the requirements from SRS, but acceptance is still recommended based on findings in STEP 3, then the following should be documented (Drouin, et al. 2009); • • • •

Identify possible significant conservatism in the assessment through screening of elements like model assumptions, data selection and scope. Justification of compensation for conservatism. Justification of why any limitations of applicability are proposed. Assessment of the confidence in the recommendation.

In the opposite case where acceptance is met, but not recommended, the analyst should do the following; • • •

Identify possible critical uncertainties related to the results (STEP 3) and use conservative approximations for the sources to uncertainty. Justification of the conservatism. Assessment of the confidence in the recommendation.

STEP 4 for the case study is presented in chapter 11. The decision maker then performs a decision based on the results and recommendations from STEP 4. The uncertainty in the results is then integrated into the basis for the decision making, thus the name integrated decision making.

Master Thesis Astrid Folkvord Janbu

33


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

7 HIGH LEVEL PROTECTION SYSTEM In order to evaluate and interpret the results of a reliability assessment of SIS with regard to uncertainty, a case study is used as an example. The selected case is high level protection safety instrumented systems for prevention of overfilling/over pressurization of four process vessels at a FPSO. This chapter presents the main functions and properties of the systems and discusses some challenges in modelling the case.

7.1 System description A simplified drawing of the topside plant with the belonging high level protection system is shown in Figure 14. The blue line illustrates the process shutdown (PSD) safety functions and the green line the (ESD) safety functions. The information about the high level protection systems is provided by DNV Energy and classified as confidential. Hence, the system with its tags is made anonymous.

Figure 14 PSD and ESD High level protection system for topside plant (Compliance report 2008)

The floating production, storage and offloading (FPSO) vessel has a topside plant designed for separation and stabilization of the produced crude from the field (Compliance report 2008). The process plant consists of four vessels; 1st stage separator, 2nd stage separator, degasser and knock out (KO) drum. The crude first arrives at the 1st stage separator, where oil is separated from gas, water, sand etc. Pressure drop at the separator inlet encourages the gas to flash off. Water and sand, due to mass density, settles out below oil and is kept behind the weir. The oil then flows over the weir and is pumped further to the second stage separator. Produced water is sent to the degasser for water treatment. The process may be controlled by monitoring level, pressure and flow rate.

34

Master Thesis Astrid Folkvord Janbu


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

The oil flows further from the 1st stage separator to the 2nd stage separator, where the flow usually is driven by pressure reduction. The oil from the 1st stage separator still contains about 5‐10% water. The oil is further cleaned in the 2nd stage separator into the right The oil is further cleaned in the 2nd stage separator, where it is separated into finer quality, right temperature and pressure. Gas is sorted out and routed to the KO drum and the oil is sent to storage tanks. Water remaining in the oil after this process is about 2% or less. The water portion usually increase with field time, especially if water injection is used (Vedvik 2004). If overpressure in the 1st or 2nd separator, pressure safety valves (PSV) routes gas to the flare KO drum. The produced water behind the weir flows to the degasser where gas is removed and sent to the flare KO drum which routes the gas to the flare. Gas at this process plant is only waste since it is not enough in the crude for production, and will not be re‐injected into the field. All gas is therefore sent to KO drum for flaring. The intention of the flare KO drum is to avoid liquid hydrocarbons into the flare stack. The routing of gas from the separators will cause condensate in the KO drum. This condensate will therefore be pumped back to the 2nd stage separator during normal circumstances. A scenario for this topside plant is the overfilling of liquids in the flare KO drum where the liquids flow into the flare. This would result in a possibility of burning hydrocarbons falling down on the FPSO and a build of backpressure within the flare and vent system (Compliance report 2008). In order to prevent overfilling, each vessel is equipped with safety instrumented systems (SIS). The purpose of the flare KO drum is to prevent liquid into the flare stack and hence the KO drum has no relief system for liquid. According to ISO 10418, for a pressure vessel where the relief system is not designed for liquid, two levels of protection for overfilling are required. The same argument is valid for the separators and degasser and thus, all four vessels each have two layers of protection. These are the high level shutdown through the PSD logic system and the high level shutdown through the ESD logic system. In order for the PSD and the ESD system to be counted as two different barriers, it is important that they are independent of each other. The main purpose of these systems is to, given a demand, close the inlet stream to the plant. The PSD and ESD system consist of more valves and solenoids than shown on Figure 14, but since the main purpose of the high level protection systems for each of the four vessels is to close the inlet stream to the plant, the most important valves are shown. A closer description of the SIS for each vessel is given in section 7.1.1, 7.1.2 and 7.1.3.

7.1.1 Flare KO drum The Flare KO drum is protected with two safety instrumented systems, the PSD and the ESD systems. In order to ensure independence between these systems, different instruments and measuring principles are used. The PSD system receives input from the level switch indicator transmitter, LSIT‐5, a magnetic‐rod transmitter, while the ESD system receives input from LSIT‐4, a radar‐level transmitter. The PSD‐ and ESD logic system use different data nodes, PSD node and ESD node, for signal interpretations and logic solvers.

Master Thesis Astrid Folkvord Janbu

35


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

The PSD system trips the XY solenoid on the XV‐3, and the PSD solenoid on the ESD valve EV‐1. The ESD system is of a higher safety rank than the PSD system and hence the ESD system, given a demand, trips the ESD valve EV‐1 and initiates a total process shutdown through the PSD system.

7.1.2 1st and 2nd stage separator The 1st and 2nd stage separator like the flare KO drum are protected with two safety instrumented systems, the PSD and ESD systems. They also have the same requirements for independence. For the 1st stage separator, the PSD system receives input from LSIT‐7, a differential pressure transmitter, while the ESD system receives input from LSIT‐6, a magnetic follower. The 2nd stage separator has a PSD system which receives input from LSIT‐14, a differential pressure transmitter, and an ESD system which receives input from LSIT‐13, a magnetic follower. Due to the main purpose of the high level protection systems, the two instrumented loops, the PSD‐ and the ESD system, and responding valves are the same as for the flare KO drum.

7.1.3 Degasser The degasser is like the other vessels, protected with two safety instrumented systems, the PSD and ESD systems with the same requirement of independence. The PSD system receives input from the level switch indicator transmitter, LSIT‐10, a differential pressure transmitter, while the ESD system receives input from LSIT‐9, a radar‐level transmitter. The PSD system trips the XY solenoid on XV‐8. The ESD system shuts the EV‐11 by using the ESD solenoid valve and also carries out a total process shutdown through the PSD system.

7.2 Base case Figure 15 show how the base case, the combined loop, is modelled in a reliability block diagram structure. The node modules are treated as one module in the assessments. The colour indicates the following; green is the initiators in the SIS, yellow the logic elements and red the final elements.

Figure 15 Base case of high level protection system for reliability assessments (Compliance report 2008)

36

Master Thesis Astrid Folkvord Janbu


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

8 RELIABILITY ASSESSMENT OF HIGH LEVEL PROTECTION SYSTEM A reliability assessment of the case study is performed in order to evaluate the associated uncertainty. This chapter presents the foundation for the assessment.

8.1 Scope and scenario The scope of this reliability assessment is limited to the PSD and ESD system for the topside plant as illustrated in Figure 14. In this case study we limit the assessment to comprise only of the scenario where a vessel is overfilled/over pressurized. High level generally occurs either due to failure of pumps and valves or blocked outlet route. Overfilling only occurs if the PSD or ESD system does not respond to high level alarm. Each vessel has two instrumented loops, the high level shutdown through the PSD system and the high level shutdown through the ESD system. The real threat for the process plant is overfilling/over pressurization of a vessels rather than failure of either the PSD or ESD system. Overfilling only occurs if the overall high level protection system, the combined PSD and ESD system, fails. Hence; it is of greater interest to study the reliability of the combined instrumented loop than the single instrumented loops. That is, the base case illustrated in Figure 15. Even though the combined instrumented loops are quite similar for each of the vessels, the consequence of overfilling is not. From a risk perspective; overfilling of the flare KO drum is the most severe overfilling of all the vessels due to that liquid hydrocarbons in the flare stack may cause burning rain on the topside plant. Secondly; the severity of overfilling of the other vessels also depends on the vessels. Both the 1st and 2nd stage separator contains hydrocarbons, and hence; overfilling of these vessels are critical for safety. The degasser contains mostly water and is therefore not as severe as overfilling of the other vessels. It should anyway be noticed that overfilling and over pressurization itself is a critical and uncontrolled situation, and thus overfilling of the degasser should not be seen as harmless! Reliability and risk assessments differ in their way of treating the consequences of failures. While reliability assessments treat all technical failures, risk assessments embrace those failures which may cause consequences to human lives, economy or environment. The case study presented in this thesis is a reliability assessment, not a risk assessment. But it should anyway be raised awareness on how the consequences are dependent on which vessel that is overfilled.

8.1.1 Limitations and assumptions The assessment is limited to evaluate the unavailability during operational time. This is because it is assumed that the SIS will not fail during stops. It is further assumed that the failure will not be discovered before a functional test which is performed once a year; see the data dossier in Appendix A. If a failure has occurred during the year, it will be repaired immediately. Mean time to repair is therefore assumed to be negligible. The reliability modelling of the combined loop is valid for all four vessels treated in this report due to the similarity between the safety instrumented systems protecting each vessel. One vessel can therefore be modelled as a typical case, representative for all four vessels. The only technical difference between the high level protections systems for the four vessels is the type of level Master Thesis Astrid Folkvord Janbu

37


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

transmitters used for respectively the PSD and ESD systems. According to documentation, the difference in failure rates for these level transmitters has a minor effect on the total reliability for the combined loops (Compliance report 2008). Identical failure rates for the level transmitters may therefore be assumed for both the PSD system and for the ESD systems. The environmental conditions inside the different vessels vary. The level transmitters in the 1st stage separator are exposed to a tougher environment than the level transmitters located in the KO drum. This may affect the failure rates since the vulnerability and thus the reliability of the level transmitters depend on the physical conditions. The failure rate for a level transmitter, even of the same type, may vary depending on the location. It may be vessel specific. An assumption of identical failure rates for a SIS located at different vessels may therefore not be valid. Since the level transmitters are exposed to such various environments, the data used for level transmitters for a typical vessel are assumed to be uncertain and should therefore be investigated further. The other components, the node, solenoid and valve, are assumed to have quite similar environment since they are not located inside the vessels and are therefore assumed to be independent of which vessel the components control.

8.2 Data collection The case study compliance report use vendor data, which are confidential. The data collected are therefore mostly collected from OLF 070, Table A.1�Applied failure rates (topside equipment)�. The data presented in this table are mostly gathered from OREDA or PDS Data Handbook. The data for the level transmitters are collected directly from OREDA. The data used for this reliability assessment are presented in a data dossier in appendix A. OREDA is a generic reliability database which provides reliability data for input to reliability assessments, maintenance planning, risk assessments etc. OREDA has been gathering data since 1981 for different equipment, installations and operating conditions (OREDA, Offshore Reliability Data 4th edition 2002). OREDA is frequently used as data source in reliability assessments in Norwegian Oil and Gas industry. If the analyst is not familiar with scope of the data presented, he or she might select data which are not appropriate for the assessment and hence introduce large amounts of data and completeness uncertainty. See Table 6 for a typical presentation of reliability data in OREDA. The taxonomy number explains from which item in a system hierarchy the data are collected. Populations describe how many items it has been collected data from, and installations describes number of locations. The time aspect is important when collecting data since it affects the value of the failure rate. The critical failure rate for the compressor in Table 6 is almost more than halved if calendar time is selected as time frame above operational time. This is because failures usually occur more frequently during operation than stops. All the failures presented for the compressor occurred during operation. It is important that time frame is clearly defined in the scope in order to avoid such confusion. The failure rates are presented with a mean value (OREDA estimator), an associated 90% confidence interval, standard deviation and maximum likelihood estimator (MLE) equal to n/τ. A common misunderstanding is that the mean is the average failure rate for all data collected. The mean

38

Master Thesis Astrid Folkvord Janbu


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

presented in OREDA is the average of the installation specific means. A 90 % confidence interval is also presented for each failure rate. Several dataset are said to be a homogeneous sample if the statistical properties of any of the datasets are valid for all datasets. The data used in OREDA are collected from several installations. Similar components within the same installation may be exposed to different environmental and operational conditions. The external conditions are even more different between various installations (Rausand and Høyland 2004). Hence; the data collected will therefore not be a homogeneous sample. The OREDA MLE is the average failure rate defined by number of failures per time unit. The measure is valid for the assumption that the collected data are from a homogenous sample. If the data collected from several installations satisfy the criteria for a homogeneous sample, the mean value in OREDA would be equal to the MLE. A big difference in these two values indicates the opposite; the samples are inhomogeneous. The data samples collected for the compressors in Table 6 clearly are inhomogeneous due to the great difference between the mean and MLE. The standard deviation for the mean describes the dispersion between the samples. A high value of the standard deviation may therefore indicate inhomogeneous samples (Rausand and Høyland 2004). The standard deviation in Table 6 confirms the suspicion about inhomogeneous samples for the compressors. The 90 % confidence interval is for the mean value. This may be interpreted as an average lambda from an installation has a 90 % probability of being within the confidence bounds. Table 6 Printout from OREDA (OREDA 2009) Taxonomy no 1.1.1.1.1

Item Machinery Compressors Centrifugal Electric Motor Driven (100‐1000) kW 6

Population Installations 5

Aggregated time in service (10 hours)

2

Failure mode

Calendar time * 0.1248

6

No of fail.

Critical

Failed to start

Fail while running

Unknown

Vibration

Master Thesis Astrid Folkvord Janbu

Operational time 0.0832

No of demands †

Failure rate (per 10 hours). Lower

Mean

Active

Upper

SD

MLE

Repair (manhours)

rep.hrs

Min

Mean

Max

23* † 23

1.31 2.02

217.71 471.18

827.93 1806.90

304.49 665.33

184.33 276.36

10.0

0.5

24.3

186.3

1* † 1

0.94 0.29

8.42 17.10

22.20 61.58

7.02 22.41

8.01 12.02

13.0

13.0

13.0

14* † 14

0.97 1.28

132.09 285.42

499.13 1093.54

183.39 402.61

112.20 168.22

10.0

0.5

24.0

186.3

1* † 1

0.94 0.29

8.42 17.10

22.20 61.58

7.02 22.41

8.01 12.02

11.4

11.4

11.4

7*

0.71

65.50

243.34

89.14

56.10

0.5

28.5

117.5

39


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems â€

7

0.70

140.94

538.64

198.24

NTNU

84.11

The data uncertainty increase with the degree of inhomogeneity due to the variation caused by the effects of external factors like operational�, environmental� and physical aspects. When using generic data from inhomogeneous samples, the effects of external factors and variation between samples will be a part of the reliability evaluation for a system with other plant specific condition than reflected in the data used. Inhomogeneous data samples are therefore unfortunate in reliability assessments.

8.3 Model selection The selection of architectural model is a vital step in a reliability assessment because it forces the analyst into a mindset and model structure that will affect the rest of the assessment. The model selection will also be more or less suited for the system at hand, depending on the system characteristics. Janbu (2008) performed an evaluation of the most recognized models for quantitative reliability assessments. The results are presented in Table 7. The high level protection system is assumed to have a static behaviour since components are assumed to either function or not. Both reliability block diagram (RBD) and fault tree analysis (FTA) are well suited for this type of system. FTA is intuitively more directed against failures than RBD through its causal analysis, due to that the failure described in the TOP event is the initiator for the analysis. RBD on the other hand, describes the success needed in the system to achieve the desired function, and hence is a success oriented network. Markov analysis may be used for the high level protection system, but may easily get too complex and difficult to understand. FTA and RBD are more appropriate with regard to suitability and complexity of the assessment. Simulation suits most kind of modelling issues, even the most complex. For small systems it might be an unnecessarily advanced tool compared to what is required in order to achieve a satisfactorily performed assessment. Simulation is anyway a flexible method and has a large advantage in its suitability for uncertainty propagation and other uncertainty analysis techniques. Table 7 Comparison of selected models (Janbu 2008)

Model

Properties

Advantages

Disadvantages

Reliability block diagram

Functional blocks in sequential structure Binary analysis Static system behaviour Success oriented network

Easier to identify how to achieve a function The structure may be used as a basis for other analysis models

Easy to forget a function Not intuitively understood Only two possible states; functioning or failed

Fault tree analysis

Events and causes in logic tree structure Binary analysis Static system behaviour Failure oriented network

Easier to identify failures Logical structure

More comprehensive than RBD Only two possible states; functioning or failed

40

Master Thesis Astrid Folkvord Janbu


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

Markov analysis

States and transitions in a state transition diagram Multiple state analysis Dynamic system behaviour State oriented network

Able to model more than two states

Large models for a high number of components and states Often difficult to comprehend

Simulation

System operation simulated in a software program Multiple state analysis Dynamic system behaviour Process oriented network

Simple to apply Able to model more than two states Produce results that are hard to solve analytically

Results depend on number of simulations Cannot simulate systems with static components Reliability data generated from a believed distribution

Both the PSD and ESD function as illustrated in Figure 15 are based on a series structure, where single elements all consist of 1oo1 logic. This means that a common cause failure (CCF) can only occur between PSD and ESD elements due to the similarity between the PSD and ESD. These elements are either the input elements, logic or output elements for the combined loop may fail due to a CCF. The CCF’s are modelled with a standard β‐factor model, where the β’s used are presented in the data dossier.

Master Thesis Astrid Folkvord Janbu

41


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

9 FAULT TREE ANALYSIS Fault tree is a logical structure and failure oriented network. FTA is frequently used to model SIS in reliability assessments in compliance reports. This chapter presents the FTA of the high level protection system, together with an uncertainty assessment.

9.1 CARA FaultTree There are several software tools for performing fault tree analyses. The assessment presented in this chapter uses CARA FaultTree, a software tool for constructing and analyzing fault trees. The program is owned by Exprosoft and is a leading tool for FTA in the Norwegian reliability engineering environment. CARA FaultTree presents the assessment’s results in a report, where the level of detail is optional. The program has two user levels; standard and expert. The main difference between the two user levels is the additional uncertainty analysis which is suited for uncertainty propagation analyses. Also, the calculation set‐up may be more flexible at an expert level.

9.2 PFD calculations The event “overfilling of vessel given a demand” is selected to be the TOP‐event. The fault tree is constructed as illustrated in Figure 16. All the components and subsystems are 1oo1 (1 out‐of 1) logic, and hence only common cause failures between the PSD/ESD system are explicitly modelled. Since the FTA is limited to only comprise of a technical system, the basic events are the failure of system components. Hence; the basic events are referred to as components.

Figure 16 Fault tree of combined loop for typical vessel

42

Master Thesis Astrid Folkvord Janbu


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

For each basic event i, the average PFD for a single component i is calculated in CARA FaultTree by the approximation formulas found in Rausand and Høyland (2003);

qi (t ) ≈

λDU ,iτ i 2

A fault tree with m minimal cut sets may be modelled as a series structure of the m minimal cut parallel structures. The PFD for a minimal cut set j with independent components can be written as mj

QMC j (t ) ≈ ∏ qi (t )

(*)

i =1

The PFD of the system can the be approximated with a conservative upper bound approximation; m

Q0 (t ) ≈ 1 − ∏(1 − QMC j (t)) j =1

CARA FaultTree and several other software tools for FTA use this calculation approach when estimating the PFD for the TOP‐event. Rausand and Høyland (2003) showed that this approximation good when the qi(t)’s are small. The probability of the TOP‐event ”overfilling of vessel given a demand” is calculated by CARA FaultTree and is equal to Q0(t) = 4,1357 ∙ 10‐3 This estimate is treated as the base case result for the rest of the fault tree analyses results.

9.3 Conservative calculation of fault tree A problem with the calculation approach used in CARA FaultTree as presented in section 9.1, is that the formula for calculating the unavailability of a minimal cut set (see formula *), is a product of averages. According to Schwartz’s inequality, the product of averages is not equal to the averages of products. This means that the actual unavailability is greater than the calculated unavailability of a minimal cut set. Non‐conservative approximations are very unfortunately in risk and reliability assessments because it underestimates the risk. A methodology for quantitative fault tree analysis developed and presented in an article by Lundteigen and Rausand in 2008, consists only of formulas that ensure conservative calculations. This approach models CCF implicitly, see Figure 17. The fault tree is only used for identification of minimal cut sets. For each cut set MCj, one must determine which components that are dependent or independent of each other, and thus place them in a suitable common cause component group. Dependent components are those that will fail simultaneously upon a CCF due to the same root cause. Dependent components will thus be in the common cause component group, CGj,i, for j = 1, 2, …, m cut set and i = 1, 2, …, rj, where rj is the number of common cause component groups in minimal cut set j. The components in a minimal cut set that are excluded from the common cause component groups are assumed to be independent (Lundteigen and Rausand 2008). The following minimal cut sets were identified from the fault tree for the combined loop with implicit modelling of CCF, shown in Figure 17; Master Thesis Astrid Folkvord Janbu

43


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

{PSD LT,ESD LT}, {PSD LT,ESD node}, {PSD LT,ESD sol}, {PSD LT,EV}, {PSD node,ESD LT}, {PSD node,ESD node}, {PSD node,ESD sol}, {PSD node,EV}, {PSD sol,ESD LT}, {PSD sol,ESD node}, {PSD sol,ESD sol}, {PSD sol,EV}, {XV,ESD LT}, {XV,ESD node}, {XV,ESD sol}, {XV,EV}

Figure 17 Fault tree of combined loop for typical vessel, implicit CCF modelling

There are four possible formulas to use (see article), depending on the conditions; 1) 2) 3) 4)

independent components identical and dependent components (β‐factor model applied) non‐identical and dependent components (β‐factor model applied) more complex minimal cuts

When the average PFDMCj is calculated for each cut set, the system unavailability can then be found by an conservative upper bound approximation; m

Q0 (t ) ≈ 1 − ∏(1 − QMC j (t)) j =1

The calculations are documented in Appendix B. The PFD for the conservative FTA was found to be Q0(t) = 5,1436 ∙ 10‐3

9.4 Sensitivity analysis and importance measures A sensitivity analysis was performed to evaluate the significance of the model and data characteristics. The gap between the failure rates for the level transmitters as described in Appendix A, were assumed to be significant for the results. This means that the PSD and ESD system should be modelled with their specific failure rate value. This assumption was tested with equal failure rates for the level transmitters. The value from OREDA, λ = 4,65 · 10‐3 [hours]‐1 is the average value and was thus used as the PSD and ESD failure rate for the level transmitters. Similar failure rates with data given from OREDA gave system unavailability equal to 44

Master Thesis Astrid Folkvord Janbu


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

Q0(t) = 4,1373 ∙ 10‐3 (0,039 % increase from base case result) The sensitivity analysis of the similar OREDA failure rates indicates that the gap in values from the PSD and ESD failure rates are insignificant for the results. But the value itself may not be insignificant. Another failure rate for level transmitters is found in OLF 070 and is equal to λ = 6,00 ∙ 10‐7 [hours]‐1. A sensitivity analysis was performed to check the difference between the use of the OREDA and OLF 070 failure rates (similar failure rates for PSD and ESD were used due to the results above); Q0(t) = 2,1962 ∙ 10‐3 (46,90 % decrease from base case result) The results show that two data sources can give quite different results and may be decisive for which decision that is to be taken. CARA FaultTree has built‐in algorithms for calculating importance measures. The CARA FaultTree report made the following print‐out of selected importance measures from combined loop base case with explicit modelling of common cause failures; Table 8 Importance measures for combined loop typical, FTA

Birbaum's measure

Criticality importance measure Improvement potential measure

CCF LT

9,96E‐01

ESD node

2,83E‐01

ESD node

1,17E‐03

CCF sol

9,96E‐01

PSD node

2,71E‐01

PSD node

1,12E‐03

CCF node

9,96E‐01

PSD LT

2,68E‐01

PSD LT

1,11E‐03

CCF valve

9,96E‐01

ESD LT

2,48E‐01

ESD LT

1,03E‐03

ESD node

5,43E‐02

EV

1,14E‐01

EV

4,73E‐04

ESD LT

5,43E‐02

XV

1,09E‐01

XV

4,53E‐04

EV

5,43E‐02

CCF LT

9,81E‐02

CCF LT

4,06E‐04

ESD sol

5,43E‐02

CCF sol

9,49E‐02

CCF sol

3,93E‐04

PSD node

5,20E‐02

CCF node

5,27E‐02

CCF node

2,18E‐04

PSD LT

5,20E‐02

ESD sol

5,16E‐02

ESD sol

2,13E‐04

XV

5,20E‐02

PSD sol

4,94E‐02

PSD sol

2,04E‐04

PSD sol

5,20E‐02

CCF valve

4,22E‐02

CCF valve

1,74E‐04

Birbaum’s measure ranks components after how sensitive the system is to a change in their reliability, which is dependent on the component’s failure data and the location of the component in the system structure. This implies that the components or basic events which the system reliability is vulnerable to, like CCF, will have a high ranking. The Birbaum’s measure for FTA ranks the CCF basic Master Thesis Astrid Folkvord Janbu

45


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

events first. These may thus be interpreted as the events which the system reliability is most vulnerable to, which also is the case since they are common cause failures. Remember from section 5.3.1 that Birbaum’s measure can be written as h(1i , p(t)) – h(0i , p(t)) for a component i. A numerical value of 9,96 ∙ 10‐1 as for the CCF basic events indicates that the average system reliability is equal to 9,96 ∙ 10‐1 since h(0i , p(t)) for a CCF will assumed to be equal to 0. In practice not all CCF cause system failure, but for this fault tree it will. After the ranking of CCF components follow the ESD components and then the PSD components. PSD and ESD are modelled similar in the system structure, but differs in their reliability data for the level transmitter. Birbaum’s measure is a statement of the other components than the one being analysed. Since the reliability data for the node, solenoid and valve is assumed to be the same for respectively the PSD and ESD system, Birbaum’s measure for the components in the parallel structure will therefore depend on the opposite safety system’s failure rate for the level transmitter. The branch in the parallel structure with the lowest reliability will therefore cause the highest ranking for the opposite branch due to the highest gap in h(1i , p(t)) – h(0i , p(t)). Since the PSD level transmitter failure rate is higher than ESD level transmitter failure rate, the ESD components are thus ranked higher. The improvement potential measures the possible improvement in system reliability by assuming that a component i is a perfect component such that pi(t) = 1. The potential will then be the difference between h(1i, p(t)) and h(p(t)). The improvement potential measure for the FTA ranks the ESD node first, followed by the PSD node, PSD LT and ESD LT. We see that the overall system reliability would be improved with 1,17 · 10‐3 if the ESD node was a perfect component. The ranking of this measure indicate which components that is most important to improve the reliability of with respect to the overall system reliability. The ranking of these components can thus be interpreted as which of the components that is most likely to blame if a failure occurs. Some software tools, like Miriam Regina, call therefore this measure for blaming. The criticality measure describes the probability that component i caused the system failure, when we know that the system is failed at time t. This is also a type of blaming measure, and we get the same ranking for the criticality importance measure as for the improvement potential, but with other numerical values. We see that if the system has failed, the ESD node is assumed, with a probability of almost p = 0.3, to have caused the failure. We see that the CCF basic events, which the system reliability is vulnerable to according to Birbaum’s measure, score low on the blaming measures’ ranking list. This is due to the associated low failure frequencies. The criticality of CCF’s are high, but the likelihood of their occurrence is low.

9.5 Uncertainty propagation The expert level in CARA FaultTree allows the user to perform an uncertainty analysis, which is the same as uncertainty propagation. The variation in the numerical value of λ is modelled by a lognormal distribution in CARA FaultTree. The distribution is defined by introducing an error factor, k, which describes the uncertainty related to the median value, λmedian. The error factor k should be chosen such that the true parameter value, with a probability of 90%, is between median/k and median×k (Rausand and Høyland 2004). That is;

46

Master Thesis Astrid Folkvord Janbu


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

λ  P  median < Λ < k ⋅ λmedian  = 0.90  k  The median value of a variable is selected to be the value such that for a large data sample about 50% of the data is greater and 50% is smaller (Sydvest 1999). The median and mean of a lognormal distribution are not equivalent. This is because the lognormal distribution is skewed to the right. The mean will always be greater than the median since the greatest portion of the distribution is located to the right for the most likely value to pick, λmedian < λmean. Selecting the median value is often difficult since it is usually not presented in reliability data sources. CARA FaultTree solves this problem by weighting the mean such that it represents a possible median value. It is assumed that the parameter values for the level transmitters are uncertain due to the reasons presented in section 3.1.2 and 8.2. The sensitivity analysis for different failure rates in section 9.4 indicates that this uncertainty may have significant impact on the results. These are thus the only components to be assigned a lognormal distribution in the fault tree. The error factor k may be found by evaluating confidence intervals in a generic database, like OREDA, see printout in Table 9 below; Table 9 OREDA taxonomy (OREDA 2002)

Taxonomy

Lower

Mean

Upper

SD

n/tau

4.2.2 Level transmitter

1,48E‐06

4,65E‐06

9,26E‐06

2,44E‐06

4,30E‐06

The taxonomy is for standard level transmitter, for critical failure modes during operational time. The confidence bounds presented are a 90 % confidence interval for the mean value. It should be noticed that the interval is presented for the mean, not the median. But it is here assumed that the variation in value is approximately the same. Hence; we use this variation to estimate the error factor k; λlower = λmean / k 1,48E‐06 = 4,65E‐06 / k k = 3,14 λupper = λmean · k 9,26E‐06 = 4,65E‐06 · k k = 1,99 A conservative value of k = 3 is therefore selected as error factor for the CCF, ESD and PSD level transmitters. The error factors are implemented into the fault tree. CARA FaultTree uses Monte Carlo simulation in order to approximate the distribution of Q0(t), which now is a random variable due to the random input parameters for the CCF, ESD and PSD level transmitters. For each run, parameter values for the level transmitters are generated from the lognormal distribution, and a Q0(t) for the TOP event is calculated. The distribution of Q0(t) and is shown in Figure 18:

Master Thesis Astrid Folkvord Janbu

47


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

Figure 18 Uncertainty propagation through CARA FaultTree

The printout from the CARA FaultTree report is shown below; Simulation results for Q0(t), with 1000 runs: Mean

= 0,00407673

Var

= 8,98473e‐007

St.dev.

= 0,000947878

Minimum

= 0,00251559

Maximum

= 0,00811804

The range between the results are 5,60 · 10‐3. The maximum value is above the conservative approximation achieved in section 9.3. The histogram indicates that the distribution of Q0(t) is skewed to the right. The unavailability of components, qi(t), with lognormal distributed parameters, will also be lognormal distributed since the property is closed under multiplication for deterministic calculation formulas. The skew distribution from qi(t) is spread onto the results. We therefore also achieve a skewed distribution of Q0(t). The standard deviation is a measure for the propagation of values from a stochastic variable, in this case the system unavailability, the PFD. The results from FTA show that the standard deviation is approximately 1,0 · 10‐3. The smaller a standard deviation is, the more sure we are about the value of the stochastic variable under the given conditions.

48

Master Thesis Astrid Folkvord Janbu


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

10 SIMULATION Simulation is a powerful tool when analyzing phenomena with high complexity due to its flexibility. This chapter presents the reliability assessments results for the high level protection system where simulation is used.

10.1 ExtendSim software The simulation software used for this assessment is ExtendSim7, developed by Imagine That. DNV Energy employs this simulation software for RAM analyses. The simulation is based on Monte Carlo sampling techniques. An advantage with ExtendSim is the modelling with graphical components, which makes it easily and intuitively understood. Dynamic data can be linked to external sources which is suitable when modelling uncertainty in numerical parameter values. ExtendSim also has built�in databases for complex models for several application areas from project planning till reliability assessments (Imagine That Inc. 2009). In addition, the ExtendSim customer may upload own libraries. The author has got access to DNV’s own developed and confidential reliability library which thus is not for public use.

10.2 PFD calculations The different function blocks used in this assessment are illustrated in Figure 19. The histogram block receives and illustrates the input values as a histogram. The case study block used as the link between an Excel file and Extend. A base case is established in Excel and is linked to the Extend model. The Excel file eases the identification and modification of data for predefined blocks. Also, the Excel file may be used for receiving data material from the extend model. The component blocks are used when modelling components with failure data. In this block one may select several properties linked to the reliability, like mean time to repair lifetime distribution, etc. The function block may be used for either predefined function, as max or min of inputs, but also for user specified functions. The sets block is applied for calculating importance measures. Input Random Number generates random numbers according to a selected distribution or user specified function. Mean and variance calculates statistical properties of the received input values, like mean and confidence intervals. The Executive block runs the simulation by discrete time steps.

Figure 19 Extend blocks

A model for the high level protection system is built in ExtendSim as shown in Figure 20. The components, with id tag below block symbol, are modelled binary, they either function or not. This Master Thesis Astrid Folkvord Janbu

49


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

also means that they are given one out of two possible values. In this case, it is used the value 1 if the component functions, and 0 if it is in a failed state. It is also assumed, as for the fault tree, no repair within the test interval of 8760 hours. The simulation is performed for independent test intervals for a specified number of times. The PFD will then be the average unavailability of the system for a certain number of simulation runs. The state variable, X(t), denotes the state of a safety system at time t, where X(t) = 0 equals the state where the safety system is not able to work as a safety barrier and X(t) = 1 denotes the state where the safety system works as a safety barrier(Rausand and Høyland 2004). The state for a component i may be denoted Xi(t). The function blocks decide if the input components are modelled as a series‐ or parallel structure. The function “min” refers to series structure since the first component failure, will lead to failure of the whole structure, X(t) = 0. The function “max” refers to parallel structure since the system structure will function until the last component fails, that is until all the n components in the structure is Xi(t) = 0, for i = 1, 2, …, n.

Figure 20 High level protection system modelled in Extend

The PSD and ESD structure are each modelled as a series structure, and the combined loop is modelled as a parallel structure. The common cause failures are modelled explicitly as a series structure in series with the combined loop. This is due to the assumption that if a CCF occurs, the system will also fail. The function block “Result” receives the minimum state variable from each branch since a failure of one of these branches will cause system failure, X(t) = 0. The system state variable in “Results” is time weighted such that the mean uptime for a test interval for the system can be calculated. The function “Downtime” calculates then the downtime by “Downtime” = 1 – 50

Master Thesis Astrid Folkvord Janbu


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

(output from “Results”). The “Stat” block calculates the mean and confidence bounds for the downtime during a test interval. The failure data used are collected from the data dossier in Appendix A and updated in the linked Excel sheet. Simulation generates random numbers from probability distributions. It is therefore important to run the model a sufficient number of times for achieving convergence and thus stable results. NASA (2002) presented the following five step technique in order to increase precision in simulation results, which also was used for the simulation in this report; 1. Set the number of iterations to at least 5000 and run the simulation. 2. Records the statistics for: a. mean; b. standard deviation; c. 5th percentile; d. median (50th percentile); and e. 95th percentile. 3. Perform additional simulations by increasing the number of iterations by increments of at least 1000. 4. Monitor the change in above statistics. 5. Stop if the average change for each statistic (in two consecutive simulations) is less than 1.5% The system analyzed is comprised of components with very low failure rates. A high number of runs are therefore needed in order to stable the output. It was discovered that about 750 000 runs were necessary in order to stable the results. Due to the high number of runs needed, step 3 were roughly performed by increasing the number with 100 000 each time. The change in the estimated PFD was then 0,76 %. The printout from Excel is calculated into unavailability by setting unavailability = 1 – estimated uptime. The results are listed in Table 10; Table 10 Extend results for combined loop

Block

Unavailability

CCF

1,2124E‐03

PSD

5,4050E‐02

ESD

5,2085E‐02

parallel

3,7158E‐03

Results

4,9235E-03

95% confidence interval Lower limit

4,8066E‐03

Upper limit

5,0404E‐03

Master Thesis Astrid Folkvord Janbu

51


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

We see from this that the estimated PFD for the base case, the combined loop, is Q0(t) = 4,9235 ∙ 10‐3 This value is treated as the base case result throughout this chapter. It should also be noticed that the PSD block counts for the greatest share of the unavailability, then the ESD, the parallel and the smallest share is caused by the CCF block.

10.3 Sensitivity analysis and importance measures As for the FTA, a sensitivity analysis was performed to see if there was a significant difference from the base case results when using similar failure rate for the PSD and ESD system. The similar failure rate used was gathered from OREDA and was equal to λ = 4,65 · 10‐3 [hours]‐1, see Appendix A. Q0(t) = 4,9042 ∙ 10‐3 (0,392 % decrease from the base case result) Then, a sensitivity analysis is performed for the OLF 070 failure rate, λ = 4,65 · 10‐3 [hours]‐1, see Appendix A. The estimated unavailability for the base case with OLF 070 failure rate is Q0(t) = 2,5856 ∙ 10‐3 (47,48 % decrease from base case result) As the FTA, OLF 070 failure rate approximately halves the numerical value from the base case study. The block ”Sets” calculates different importance measures and is linked to the output value from “Results” which give the value either 0 or 1, see Figure 20. The calculations are done by constantly comparing the states of the different blocks and monitor what the change in reliability for one block means for the system reliability. The two importance measures chosen are the one similar with selected from CARA FaultTree. This is due to make the results more comparable. A printout in Excel with importance measures for base case is shown in Table 11; Table 11 Importance measures for combined loop typical, simulation

Birnbaum’s measure

Improvement potential

CCF sol

9,95E‐01

ESD node

1,52E‐03

CCF LT

9,95E‐01

PSD node

1,51E‐03

CCF node

9,95E‐01

PSD LT

1,47E‐03

CCF valve

9,95E‐01

ESD LT

1,37E‐03

ESD LT

7,24E‐02

EV

6,02E‐04

ESD sol

7,14E‐02

XV

5,67E‐04

ESD node

7,05E‐02

CCF sol

4,36E‐04

PSD LT

6,97E‐02

CCF LT

4,10E‐04

PSD node

6,95E‐02

ESD sol

2,89E‐04

52

Master Thesis Astrid Folkvord Janbu


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems EV

6,89E‐02

PSD sol

2,49E‐04

XV

6,55E‐02

CCF node

2,26E‐04

PSD sol

6,30E‐02

CCF valve

1,67E‐04

NTNU

According to the Birbaum’s measure results from the simulation, the CCF LT, solenoid, node and valve are considered to be the most important components, almost like the results from FTA where CCF LT and solenoid have the opposite ranking. Simulation does not give the same results as the FTA due to that the formula for Q0(t) is not deterministic when simulation is applied, but stochastic. Hence; Birbaum’s measure will also be stochastic when using simulation. FTA will get the same ranking for the same data each time due to the deterministic formulas. A high number of runs ensure that the simulation results convergence against a limit, but some randomness due to the Monte Carlo sampling will still exist. Therefore we get some small differences in the ranking for those components that are similar in data and location in the system structure. The interpretation of the results may read in section 9.4. The improvement potential has the same top ranked components as FTA. This indicates that the same components are considered to be most important to improve their reliability, with regard to the overall system reliability. Due to the same reasons as described for Birbaum’s measure above, the ranking of components is some stochastic due to the use of Monte Carlo sampling techniques. The interpretation of the results may read in section 9.4.

10.4 Uncertainty propagation The uncertainty propagation is performed by generating values for lambda from a gamma distribution, and then use the generated values as input for mean time to failure (MTTF). The model is updated with blocks for random input values and histogram for plotting the results, see Figure 21 below. The “Rnd1” block generates a failure rate from a defined gamma distribution. The function block “MTTF” transforms the failure rate into a MTTF. Since the lifetime is assumed to be exponentially distributed, the MTTF is found by; MTTF = 1/λ. The MTTF is used as input parameter to “Rnd2”, and then a failure time for the level transmitters can be drawn from an exponential distribution with the generated MTTF.

Master Thesis Astrid Folkvord Janbu

53


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

Figure 21 Extend model for uncertainty propagation

The pdf for the gamma distribution can be written as λ

− 1 λ α −1 e β , f (λ ,α , β ) = α β Γ(α )

where α is the shape parameter and λ the scale parameter. The mean is defined as E[λ] = μ = α·β and Var[λ] = σ2 = α·β2. G. Rausand (2005) showed that the integration of the product of the two pdf, the gamma distribution for the parameter λ into the exponential distribution for the lifetime, give the marginal distribution of the lifetime t. This is the lifetime distribution we get when repeatedly generating a value λ from the gamma distribution and then generating a value from the exponential distribution (G. Rausand 2005). The marginal distribution of t in this case is then found to be a special version of the Pareto distribution;

fT (t) =

αβ −α (β −1 + t)α +1

Instead of drawing values from a gamma distribution and then from an exponential distribution, we could just draw directly from the marginal distribution and get the same results. But due to the built‐ in distributions in Extend, it is more traceable to use the gamma‐ and exponential distribution. The parameters can be found by;

54

Master Thesis Astrid Folkvord Janbu


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

1) E[λ] = μ = α·β 2) Var[λ] = σ2 = α·β2 From 2) we get β =

σ . Putting this into 1) gives; α

μ μ = α ⋅σ  α =   σ 

2

The mean and standard deviation is found in OREDA, see appendix A and Table 9 in section 9.5. The standard deviation and mean is values from OREDA multiplied with given beta‐factor. This gives the following parameters for the gamma distribution for level transmitters; Table 12 Parameters for gamma distribution

Parameter

PSD

ESD

CCF

Shape α

4,08

3,21

3,63

Scale β

1,21E‐06

1,36E‐06

2,56E‐08

The statistical quantities from this analysis are listed below, and the results from the uncertainty propagation runs are plotted in a histogram which is shown in Figure 21. ; Q0(t) = 4,8155 ∙ 10‐3 Lower confidence bound

= 4,6998 ∙ 10‐3

Upper confidence bound

= 4,9313 ∙ 10‐3

Standard deviation

= 1,1490 ∙ 10‐3

Figure 22 Uncertainty propagation plot for Extend simulation

Master Thesis Astrid Folkvord Janbu

55


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

The results have decreased a bit compared to the results presented for the base case in section 10.2. This may be due to the use of gamma distribution with given parameters which are skewed.

56

Master Thesis Astrid Folkvord Janbu


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

11 DISCUSSION This chapter discusses the uncertainty in reliability assessments in light of the results from the case study. The results are compared and each contribution to uncertainty is discussed.

11.1 Comparison of results The main results from chapter 10 and 11 are presented in Table 13: Table 13 Comparison of results

Estimated Probability of Failure on Demand (PFD) FTA

Simulation

Base case

Q0(t) = 4,1357 ∙ 10‐3

Q0(t) = 4,9235 ∙ 10‐3

Conservative

Q0(t) = 5,1436 ∙ 10‐3

Sensitivity analyses for base case Similar failure rates

Q0(t) = 4,1373 ∙ 10‐3

Q0(t) = 4,9042 ∙ 10‐3

OLF similar failure rates

Q0(t) = 2,1962 ∙ 10‐3

Q0(t) = 2,5856 ∙ 10‐3

Uncertainty propagation for base case Mean

Q0(t) = 4,07673∙ 10‐3

Q0(t) = 4,8155 ∙ 10‐3

Standard deviation

9,47878 ∙ 10‐4

1,1490 ∙ 10‐3 [4,6998 ∙ 10‐3, 4,9313 ∙ 10‐3]

95% confidence interval Min, max values from simulation

[2,5156 ∙ 10‐3, 8,1180 ∙ 10‐3]

The results are ranked as expected. The base case results show that the calculations performed in CARA FaultTree give the lowest unavailability which is due to the non‐conservative assumption of independent basic events. Simulation should reflect the most realistic result of the system unavailability since no approximations are used. The lifetimes are collected directly through assumed lifetime distributions. The unavailability result from simulation is higher than the unavailability estimated by CARA FaultTree, and lower than the unavailability estimated by the conservative approximation formulas. What is interesting to notice is that the unavailability result from the simulation is significant closer to the conservative approximation (0,2201 ∙ 10‐3 difference) than the unavailability estimated by CARA FaultTree (0,7878 ∙ 10‐3 difference). These results indicate that the conservative approximation formulas are good, and not too conservative. Another interesting aspect from the base case study is that none of the results gave compliance with the SIL 3 requirement for the combined loop. Even though the fault tree modelling of the base case as used in this report was exactly the same as used in the confidential compliance report, the results were still not equal due to the use of different data. The PSD and ESD single loops should also satisfy a SIL 2 requirement. The simulation results from Table 10 show that this requirement was not met Master Thesis Astrid Folkvord Janbu

57


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

for both the PSD and the ESD system. The data uncertainty problem is further discussed in section 11.4. The architectural constraints for the case study have so far not been treated. The architectural constraints tables from IEC 61508‐2, 7.4.3.1.2, for type A and B systems are shown below; Table 14 Architectural Constraints on Type A and B systems (IEC 61508 1997) Safe Failure Fraction

Type A Systems

Type B Systems

Hardware Fault Tolerance (HWFT) (SFF) 0

1

2

0

1

2

< 60 %

SIL 1

SIL 2

SIL 3

Not allowed

SIL 1

SIL 2

60 % ‐ 90 %

SIL 2

SIL 3

SIL 4

SIL 1

SIL 2

SIL 3

90 % ‐ 99 %

SIL 3

SIL 4

SIL 4

SIL 2

SIL 3

SIL 4

> 99 %

SIL 3

SIL 4

SIL 4

SIL 3

SIL 4

SIL 4

The compliance report classified all the nodes (instrumentation) and some of the level transmitters (radar for example) as type B systems. The PSD‐ and ESD system should each satisfy a SIL 2 requirement and their subsystems (level transmitters, node, solenoid valve and valve) are designed in a 1oo1 architecture, thus HWFT = 0. According to Table 14 they should satisfy a SFF of at least 60 % if they are type A systems and SFF of at least 90 % if they are type B systems. According to the data as shown in the data dossier in Appendix A, neither the node nor the level transmitters which are assumed to be of type B systems satisfy these criteria. The high level protection system does therefore not comply with the semi‐quantitative requirements either. The sensitivity analyses showed that the difference in numerical values for the PSD‐ and ESD level transmitters as modelled in the base case was insignificant for the results. An assumption of equal failure rates, like the rest of the functions in the system, would be sufficient. The sensitivity analyses also showed that if the failure rate from OLF 070 was used, the unavailability would be reduced with approximately 50 % compared to the base case. The OLF 070 failure rate could have been selected as the base case failure rate for the level transmitters for these reliability assessments. The results from the sensitivity analyses therefore show that selection of data can easily be the difference between compliance or not. The Birbaum’s measure ranks the CCF basic events first, both for the FTA and the simulation model. According to Birbaum’s measure, the system reliability is most vulnerable to the top ranked events, which means that the occurrence of these events cause the greatest change in system reliability. Since they are CCF, this seems logical. The Birbaum’s measure for the deterministic FTA further ranked the ESD and the PSD with quite similar results. Birbaum’s measure for the simulation model,

58

Master Thesis Astrid Folkvord Janbu


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

where Q0(t) is stochastic, ranked the PSD and ESD components in a mixed order. This is due to the similar results between the loops and randomness from Monte Carlo simulation. The ESD node, PSD node, PSD LT and ESD LT are ranked highest for the improvement potential measure, both for the FTA and simulation. The rank order of this measure indicates which components that is most important to improve the reliability of with respect to the overall system reliability. Both analysis techniques therefore see these components as the most likely components to blame if a failure occurs. Some software tools, like Miriam Regina, call therefore this measure for blaming. Another blaming measure is the criticality measure which was only calculated for the FTA. The criticality measure ranked the components in the same order as the improvement potential measure, but with other numerical results. We see that the CCF basic events, which the system reliability is vulnerable to according to Birbaum’s measure, score low on the blaming measures’ ranking lists. This is due to the low failure rates for CCF. The criticality of CCF for the high level system are high, but the likelihood of their occurrence is low. Uncertainty propagation in CARA FaultTree was performed correctly and gave a lower unavailability than the base case. It may be theoretically proven, due to the skew lognormal distribution for the failure rates, that for a system of non‐repairable components the mean value of Q0(t) (as estimated by the uncertainty analysis) is always below the value of Q0(t) obtained by computation using only the mean values of the failure rates (Sydvest 1999). The plot indicated a skew distribution of Q0(t), which is due to the skew lognormal distributed failure rates. The range of the unavailability results from the uncertainty propagation varied with 5,60 · 10‐3, but the lowest value did still not comply with the SIL 3 requirement. The standard deviation was found to be approximately 1 · 10‐3. We can expect that most of the results due to data uncertainty only will be within mean ± 2 standard deviations, that is a range of approximately 4 · 10‐3. The uncertainty propagation in Extend turned out to be a bit problematic, and is further discussed in section 11.1.1 below.

11.1.1

Problems with the uncertainty propagation in Extend

The uncertainty propagation in Extend was performed according to the description in section 10.4 and modelled as illustrated in Figure 21. It was found at that this way of performing uncertainty propagation is not right. The reason for this is that the simulation process itself produce uncertainty due to the random Monte Carlo sampling. It is therefore impossible to sort out from the results which uncertainty is caused by the data and which uncertainty that is caused by the simulation. Since we don’t know how these uncertainties interfere with each other, it is hard to draw any valid conclusion from these results. Many analysts today still use the incorrect assessment of uncertainty distribution, as performed in this case study. What is interesting to notice anyway, is that the results from uncertainty propagation, both for the FTA (performed correctly) and simulation, is quite similar relatively to the base case results. Both mean values from the propagation are less than the mean values from the base case. Also, the standard deviations are quite similar. The standard deviation for simulation is of higher value than Master Thesis Astrid Folkvord Janbu

59


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

the standard deviation from FTA. This is as expected since the standard deviation from simulation also includes the uncertainty from the Monte Carlo simulation. If uncertainty propagation should be performed with Extend, one would have to 1. Simulate a set of failure rates from uncertainty distributions for each parameter 2. Simulate the system unavailability with the simulated data set sufficient enough times to receive stable results 3. Repeat step 1) and 2) a number of times in order to get the uncertainty distribution for the data uncertainty (the spread of stable results with stochastic failure rates). This number is often selected to be about 1000 runs. The problem with performing this procedure in Extend is that the low failure frequencies and system structure demands about 750 000 runs for each system simulation in order to get stable results. This would demand 750 000 · 1000 = 750 000 000 runs in order to get the uncertainty distribution for the data. This is a very time consuming‐ and an unnecessarily complicated process. Uncertainty propagation is far easier in CARA FaultTree. This is because the calculation formulas for Q0(t) are deterministic and thus a predefined set of failure rates will give only one determined answer. CARA FaultTree only uses Monte Carlo sampling when simulating a set of failure rates for the components assumed to have uncertain data. This is done a 1000 times, and then an uncertainty distribution for Q0(t) is estimated. Helge Hellebust (1989) showed that a mean and a standard deviation for an uncertainty distribution for Q0(t) in a fault tree may be calculated by using only analytical methods (Hellebust 1989). Analytical methods for uncertainty propagation are often less time consuming. Uncertainty propagation by the use of simulation is in this case study found to be less suited for a simulation model.

11.2 Completeness uncertainty As described in section 6.2.4, lack of knowledge or experience of the technical system should be taken into consideration when performing reliability assessments since it may cause that relevant failure modes are not identified. Since the data used for these assessments are generic, it is difficult to trace back if relevant failure modes have been excluded. It should be noticed though, that the data sheets given in OREDA often contain a failure mode classification called “unknown”, see Table 6. This give reason to believe that there often may be critical failure modes that are not encountered for in reliability assessments. The only OREDA data used in the reliability assessments were for the level transmitters from taxonomy 4.2.2, where no critical failure modes were classified as unknown. The rest of the data used were found in OLF 070, where failure modes are not documented since the data are mostly collected from other data bases. It is difficult to estimate the unknown type of completeness uncertainty, like unknown failure modes. We have to rest confidence with that the assessments includes the most important and significant aspects of the system, but the conservative approximation may compensate for possible forgotten or unknown elements.

60

Master Thesis Astrid Folkvord Janbu


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

Further, the scope for the assessment defines what should be included or excluded from the assessment. Reliability databases, like OREDA, have clear boundary definitions. The incompleteness in the data due to scope definition is therefore assumed to be insignificant. The blaming measures rank the PSD and ESD nodes and level transmitters highest. These components are also classified as type B components, which are considered to be the most uncertain components with regard to completeness uncertainty. Since the components that are most probable to cause system failure also are associated with completeness uncertainty, effort should be paid to improve these components’ reliability. The reliability assessment modelling comprised of only one typical vessel. The level of detail could have been more precise with modelling and exact data for all four PSD and ESD systems. Some completeness uncertainty may therefore exist due to the level of detail in the assessment. The sensitivity analyses for similar failure rates indicated that a small difference in failure rates was insignificant compared to the use of similar failure rates for the PSD and ESD system. Hence; completeness uncertainty due to the modelling of a typical vessel instead of all four seems to be very small. This uncertainty may also be interpreted as a model uncertainty. This report has not treated the effects of human errors. This may be seen as an unknown completeness uncertainty which may be significant. Human involvement is common during maintenance, modifications and testing, and should be included into the reliability assessments. The effect of human involvements and human errors is a relatively new research area which is usually not paid very much attention. Many vendors claim that they deliver systems with a MTTF equal many thousands of years. Still, many operators experience critical failures. There are obviously some aspects that affect the reliability of SIS, which should be further investigated. The subject of human errors is one of them.

11.3 Model uncertainty Both the FTA and simulation assumed static behaviour for the high level protection system. This seems to be a sufficient assumption since the system either function or not upon a demand. The results as discussed in section 11.1 show that the model uncertainty may be quite large. When applying different architectural‐ and reliability models on the same data set, the unavailability results vary from 4,1357 ∙ 10‐3 to 5,1436 ∙ 10‐3, which is over 10‐3 in difference. It is also important to notice that the simulation model, which is assumed to give the most realistic prediction, is closer to the result from the conservative FTA model than the result from CARA FaultTree. The ESD level transmitter had a higher reliability than the PSD level transmitter due to the safety ranking of the safety systems. It was assumed that the difference in failure rates for respectively the PSD‐ and ESD level transmitters were significantly for the system reliability. This assumption was proved to be wrong by the sensitivity analyses. Both of the PSD and ESD loops could therefore have been modelled with the same data without any noticeable change in the results.

11.4 Data uncertainty As discussed in section 8.1.1, the environmental conditions inside the vessels vary, and thus the reliability for a combined loop may be some dependent on which vessel it is controlling. The main Master Thesis Astrid Folkvord Janbu

61


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

difference in component reliability due to environment is for the level transmitters, since they are the only part of the safety systems that are located inside the vessels. The failure rate for the level transmitters may therefore be uncertain when modelled for a typical vessel with generic data. This uncertainty is unknown since it only can be measured through vessel specific data which in this case are not available. The reliability assessments in this thesis have used generic data. Many compliance reports use vendor data for the equipment, which usually presents lower failure rates than what generic data do. This may be due to several reasons. The vendor data are often collected from newer technology than what the generic data are based on. Thus, the reliability of the same type of equipment may have been improved. But, vendor data can also have been collected during only laboratory testing, or limited field experience. If so, important factors like human involvement, field environment, etc are not reflected and may thus underestimate the failure rate. Large gaps between vendor and generic data contribute to data uncertainty. If the difference is significant, expert judgement should be used to weight the data. Two data sources were used in this assessment, OREDA and OLF 070. It is often difficult for an analyst to know which data to use. The sensitivity study showed that the two different failure rates for the level transmitters resulted in significant differences between the estimated unavailability. This indicates that the level of data uncertainty is quite high and may be decisive for which decision that is going to be made. An interesting observation is anyway the OREDA data for level transmitters, presented in Table 9, where the mean and MLE are quite similar and standard deviation is not significantly high. This can be seen as an indication of homogeneity between the samples. The data from OREDA may therefore be of a good quality. The data uncertainty in this report’s assessments is rather caused by confusion of what data to use. The numerical value of the failure rate for the level transmitters is some uncertain due to the reasons above. Uncertainty propagation was used and revealed that the base case for FTA had a range for the min and max unavailability results with approximately 6 ∙ 10‐3, where the min value still was higher than the SIL 3 requirement. In reality, the data uncertainty is probably greater due to that the uncertainty propagation was only related to level transmitters, the other components were given no uncertainty distribution.

11.5 Experiences from the case study The case study gave some important results and experiences; The high level protection system did not comply with the SIL 3 requirement for the combined loop, or the SIL 2 requirement for the single loops. Uncertainty assessments were also performed to see if acceptance could still be recommended, but none of the assessments’ results indicate that the high level system should still be SIL 3 verified. FTA is far easier to use for reliability assessments than simulation. Simulation for reliability assessments of SIS easily get time consuming due to the low failure rates, and thus the high number of runs needed to get stabilize the results. Due to the random sampling it is also impossible to verify the results achieved from a simulation model. For this case study, FTA would be sufficient for analysing the unavailability. But there are cases where simulation is preferred, as for multistate systems, modelling of complex maintenance strategies and other dynamic system properties. The 62

Master Thesis Astrid Folkvord Janbu


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

conservative approach is recommended for systems where FTA is applicable, because the approximation compensates for some of the uncertainty involved. Also, the approximation appears to not be unnecessarily conservative either. Another solution to the conservative approximation may be to use an upper limit failure rate from a 70% confidence interval, as suggested in IEC 61508 and described in section 3.2. The 70% confidence interval for an upper limit value is not theoretical proved, and may thus be interpreted as a bit random. It might be that implementing conservative failure rates into software tools like CARA FaultTree is easier than using the conservative approach developed by Lundteigen and Rausand. But, the use of such “random� limits can also be seen as less objective. The conservative approximation as used in this thesis is based on mathematical theories and may be proven to be conservative for the calculated unavailability. Sensitivity analyses and importance measures together with a qualitative uncertainty evaluation of the reliability assessment usually provide sufficient enough information about the total level of uncertainty. Importance measures also give valuable information for redesign and maintenance strategies. Redesign may be a result of a compliance report if requirements are not met. Importance measures identifies the bottlenecks and risk contributors such that reliability improvement during redesign may be done more efficiently. In cases where the data are suspected to have a high level of uncertainty, uncertainty propagation is recommended. Uncertainty propagation should be avoided for simulation models, due to the high number of runs needed. Deterministic models should rather be preferred if applying Monte Carlo simulation for estimation of uncertainty distribution to Q0(t). The results from uncertainty propagation may be difficult to interpret for a decision maker. It is more valuable to present the results as a qualitative evaluation. The decision maker may not have the required competence to understand the meaning and scope of statistical quantities.

Master Thesis Astrid Folkvord Janbu

63


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

12 CONCLUSIONS The main objective of this master thesis was to study the reliability assessment procedures that are used when developing compliance reports and, based on the findings from Janbu (2008), examine how a representation of the uncertainties in the results may be implemented in compliance studies as a decision support. The main objective was further divided into four sub objectives, which this report tries to give an adequate answer to. The first sub objective was to become familiar with a SIS through a case study and outline when and how compliance reports should be developed. The system familiarization was documented in Chapter 7 and the process of compliance reports was described in Chapter 6. The next sub objective was to identify issues in the development of compliance reports that may influence the uncertainty of the results. This sub objective was partly answered in the project thesis, but due to its importance in this thesis it was further described in Chapter 3. Sub objective number 3 was to discuss the various approaches for uncertainty assessments and how to implement the information from such assessments into compliance studies. The first part of the sub objective is documented in Chapter 5. How to implement results from uncertainty assessments into the basis for decision making is treated in Chapter 6.3 and in the discussion in Chapter 11. The last sub objective, number 4, was to perform reliability assessments for the case study, compare the results and discuss the level of uncertainty in the results. This objective was a lot of work and is documented for the FTA in Chapter 9, the simulation in Chapter 11 and discussed in Chapter 11. Reliability assessments of SIS provide valuable information to the decision maker regarding design and safety. Compliance reports are the documentation of whether a SIS meets the required SIL or not, whereas the required PFD has to be estimated by a reliability assessment. Due to the process of reliability assessments, uncertainty is introduced and reduces the confidence in the results. Not only does uncertainty lower the quality of the assessment, it also increases the risk of making wrong decisions. This report distinguishes between three main sources to uncertainty; model�, data� and completeness uncertainty. The compliance studies are usually executed during the design phase. At this stage, detailed and relevant information may not be in place, which causes completeness uncertainty. Generic databases are used where data uncertainty often arise due to confusion of what data to use, lack of relevance and the modelling of the data. Models are selected without complete knowledge about the system characteristics and thus the lack of representing the real life system cause model uncertainty. The level of uncertainty is higher during early phases of system development, therefore the predicted PFD provided by the compliance reports may not reflect the true PFD discovered during field experience. The uncertainties involved in the estimated PFD are seldom communicated to the decision maker. Hence; a misinterpretation that the predicted reliability is for certain according to requirements, frequently arise. IEC 61508 does not explicitly treat the subject of uncertainty, but indicates doubts about the validity of the results through the AC and suggested 70 % upper limit confidence interval for failure rates. The results from the case study of the high level protection system were ranked as expected, but still gave some interesting indications. The use of simulation in Extend was unnecessarily complex for a reliability assessment of SIS. SIS usually have very low failure rates which make the simulations needed for achieving stable results, a time consuming process. It is impossible to verify that the 64

Master Thesis Astrid Folkvord Janbu


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

results are correct due to the random sampling, but if performed correctly we can expect that the estimates provided by simulation are the most realistic. In light of this, it was therefore interesting to see that the predicted unavailability achieved from simulation was closer to the conservative FTA approach developed by Lundteigen and Rausand than the cut set approximation used by CARA FaultTree. The conclusions from the case study are that FTA is a sufficiently good model for reliability assessments of SIS and that the conservative approximation is recommended since it compensates for some of the uncertainty involved. The uncertainty assessments revealed that sensitivity analyses and importance measures often provide sufficient information about the identified sources to uncertainty in a compliance study since they manage to point out which of the sources that are critical to the results or not. Sensitivity analysis identifies the significance of scope, data selection, modelling structure and model assumptions. Importance measures may describe the component importance with regard to improvement potential, contribution to unavailability (blaming) and achievement of system function, information which also is valuable during redesign and identification of efficient maintenance strategies during operation. Uncertainty propagation is a useful method when investigating the level of uncertainty in the data, but the results are not as easy to interpret as the results from sensitivity analyses and importance measures. The uncertainty propagation may therefore be a limited remedy, depending on the analyst’s competence. It was also discovered that uncertainty propagation should not be performed for simulation models due to the extremely high number of runs needed to estimate the uncertainty distribution. A thumb rule should be to always use deterministic models when performing uncertainty propagation or use analytical methods. The decision maker should always be given an evaluation of the uncertainty assumed to be related with the results. The evaluation should rather qualitatively describe the level of confidence in the results than presenting the uncertainty assessments results, which often is not intuitively understood by the decision maker. The case study showed that compliance was not met for the SIS. Neither did the uncertainty assessment indicate any unnecessary conservatism that could have been reduced in order to recommend compliance. The level of uncertainty was seen to be greatest for the data due to the confusion of what data to use. The results from a compliance study should not be seen as any certain property of the system, and this should be communicated to the decision maker. The need of feeling safe should be nuanced with the truth that there is no guarantee in the SIL compliance. Instead of interpreting uncertainty as a necessary evil, one may achieve the advantage of it by reflecting a more realistic result, and thus raise awareness of the risks involved in the decision process. By being informed about the uncertainties one may also easier reduce them.

12.1 Further work The conservative FTA approximation is well suited for reliability assessments of SIS, but not for uncertainty assessments. A software tool or work sheet should be developed for uncertainty propagation and identification of importance measures. Further, the use of conservative approximations within FTA should be compared against other conservative approximation methods, like the use of upper limit confidence interval for conservative approximation of reliability data which can be directly implemented into CARA FaultTree. The results should be compared against the method developed by Lundteigen and Rausand for an optimal solution. Master Thesis Astrid Folkvord Janbu

65


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

The effects of human involvement on SIS’s reliability should be further studied. Human errors which are not accounted for in reliability assessments may be seen as a great source to completeness uncertainty. Some of the human errors may be included when generic data are used, but the amount of unavailability caused by human factors is highly uncertain. Equipment with estimated mean time to failure equal to many thousands of years is not consistent with the failures reported through maintenance records. Human errors are expected to be one of the factors to this and should be encountered for in order to not underestimate the risk. Software reliability in relation to the reliability of SIS is also a subject that should be paid more attention. Due to the complexity of the system, lack of knowledge cause uncertainty related to the failure behaviour. The completeness uncertainty is assumed to be higher for software systems since failure modes often are unknown due to the difficulties in identifying all relevant bugs. Software reliability should therefore be further investigated in order to reduce the related uncertainty and avoid underestimation of risk.

66

Master Thesis Astrid Folkvord Janbu


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

13 BIBLIOGRAPHY Aven, Terje. Foundations of Risk Analysis - A Knowledge and Decision-oriented Perspective. Chichester: Wiley, 2003. —. Risikostyring. Oslo: Universitetsforlaget, 2007. Bedford, Tim, and Roger Cooke. Probabilistic Risk Analysis - Foundations and Methods. Cambridge: Cambridge University Press, 2001. Compliance report. “SIL compliance report for xxxxx.” SIL compliance, 2008. deRocquigny, Etienne, Nicolas Devictor, and Stefano Tarantola. Uncertainty in Industrial Practice: A Guide to Quantitative Uncertainty Management. Chichester: Wiley, 2008. DNV. DNV-RP-A203 Qualification procedures for new technology . Standard, Høvik: Det Norske Veritas, 2001. Drouin, M., G. Parry, J. Lehner, G. Martinez‐Guridi, and J., Wheeler, T. LaChance. Guidance on the Treatment of Uncertainties Associated with PRAs in Risk-Informed Decision Making. Office of Nuclear Regulatory Research, Office of Nuclear Reactor Regulation, 2009. Flage, R., T. Aven, and E. Zio. “Alternative representations of uncertainty in system reliability and risk analysis ‐ Review and discussion.” Safety, Reliability and Risk Analysis: Theory, Methods and Applications, 2009. Hellebust, Helge. Exact and approximate calculations of uncertainty in system reliability evaluations. Master Thesis, Trondheim: NTNU, 1989. IEC 61508. Functional safety of electrical/electronic/programmable electronic safety-related systems. Standard, Geneva: International Electrtechnical Commision, 1997. Imagine That Inc. ExtendSim. 2009. http://www.extendsim.com/ (accessed April 1, 2009). Janbu, Astrid Folkvord. “Uncertainty in Reliability Assessments of Safety Instrumented Systems .” Project Thesis, Trondheim, 2008. Kiureghian, Armen Der, and Ove Ditlevsen. “Aleatory or epistemic? Does it matter?” Structural Safety, 2009: 105‐112. Lundteigen, Mary Ann. IEC 61508 and IEC 61511: What, when, why? Presentation, Trondheim: Department of Production and Quality Engineering, 2008. Lundteigen, Mary Ann. Implementing strategies for follow-up of safety instrumented systems. Presentation, Trondheim: Department of Production and Quality Engineering, 2008. Lundteigen, Mary Ann. Safety instrumented systems in the oil and gas industry. PhD Thesis, Trondheim: Department of Production and Quality Engineering, 2009.

Master Thesis Astrid Folkvord Janbu

67


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

Lundteigen, Mary Ann, and Marvin Rausand. “Reliability assessment of safety instrumented systems in the oil and gas industry: A practical approach and a case study.” International Journal of Reliability, Quality and Safety Engineering, 2008. Morgan, M. G, and M. Henrion. Uncertainty: A Guide to Dealing with Uncertainty in Quantitative Risk and Policy Analysis. Cambridge: Cambridge Press, 1990. Mosleh, Ali, Nathan Siu, Carol Smidts, and Christiana Lui. Model Uncertainty: Its Characterization and Quantification. Maryland: Center for Reliability Engineering, University of Maryland, 1995. Murthy, D.N. Prabhakar, Trond Østerås, and Marvin Rausand. Product Reliability - Specification and Performance. Springer, 2007. NASA. Probabilistic Risk Assessment Procedures Guide for NASA Managers and Practitioners . Guideline, Washington: NASA Office of Safetyand Mission Assurance, 2002. nature.com. 2008. http://www.nature.com (accessed March 24, 2009). O'Hagan, A., and J. E. Oakley. “Probability is perfect, but we can't elicit it perfectly.” Reliability Engineering & System Safety, 2004: 239‐248. OLF 070. Application of IEC 61508 and IEC 61511 in the Norwegian Petroleum Industry. Guideline, Oljeindustriens Landsforening, 2004. OREDA. Offshore Realibility Data. 2009. www.oreda.com (accessed May 12, 2009). —. Offshore Reliability Data 4th edition. Høvik: DNV, 2002. Parry, Gareth W. “The characterization of uncertainty in Probabilistic Risk Assessments of complex systems.” Reliability Engineering and System Safety, 1996: 119‐126. Rao, Durga, K., H. S. Kushwaha, A. K Verma, and A. Srividya. “Quantification of epistemic and aleatory uncertainties in level‐1 probabilistic safety assessment studies.” Reliability Engineering & System Safety, 2007: 947‐956. Rausand, Guro. Uncertainty Management in Reliability Analyses. Master Thesis, Trondheim: Department of Production and Quality Engineering, NTNU, 2005. Rausand, Marvin, and Arnljot Høyland. System Reliability Theory: Models, Statistical Methods and Applications. New Jersey: Wiley, 2004. Rausand, Marvin, and Knut Øien. “Risikoanalyse. Tilbakeblikk og utfordringer.” In Fra flis i fingeren til ragnarok, by Sikkerhetsdagene, 85‐110. Trondheim: Tapir akademiske forlag, 2004. Saltelli, A, et al. Global Sensitivity Analysis. The Primer. John Wiley & Sons, 2008. Sydvest. CARA FaultTree. Help menu, http://www.sydvest.com/Products/Cara/, 1999. Vedvik, Atle. Offshore - Topside Systems. Presentation, Høvik: Det Norske Veritas, 2004.

68

Master Thesis Astrid Folkvord Janbu


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

Watson, Stephen R. “The meaning of probability in probabilistic safety analysis.” Reliability Engineering and System Safety, 1993: 261‐269. Webster. Webster's Encyclopedic Unabridged Dictionary of the English Language. New York: Random House, 1989.

Master Thesis Astrid Folkvord Janbu

69


70

CCF

Typical

Vessel

1%

OLF 070

1,20E‐08

Transmitter (OLF070)

2,00E‐06

EV/XV Valve incl. actuator, ex. pilot

9,30E‐08

9,00E‐07

Solenoid

Transmitter (OREDA)

5,00E‐06

62 %

72 %

Master Thesis Astrid Folkvord Janbu

8760

8760

2%

10 %

OLF 070

OLF 070

OLF 070

SIL report 83 %

2%

OREDA

β-factor Source

Guided Wave Radar level Transmitter 4,37E‐06 8760

80 %

SFF

SIL report

Node (logic)

Level transmitter

ESD

8760

Test interval (hours)

NTNU

4,93E‐06

0,60E‐06

Level transmitter

Level transmitter

4,65E‐06

Differential level transmitter

Type

Failure rate (per hour)

Datadossier - Case study

Level transmitter

Component

DATA DOSSIER

PSD

System

APPENDIX A

Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

3)

1,2)

1,2)

1)


Master Thesis Astrid Folkvord Janbu

71

The data from SIL assessment separates between actuator and valve, while OLF 070 includes actuator in the data.

3)

4,00E‐08

Valve

β-factor Source

Gap between highest and lowest failure rate for LT from compliance report (Δλ = 0,562 * 10^‐6 [hours]‐1) are used as gap around the typical value in order to evaluate the effects of different failure rates. ESD is given the most conservative estimate since it is assumed that it will have safer equipment due to its safety ranking.

9,00E‐08

Solenoid

SFF

2)

5,00E‐08

Type

Node

Component

Test interval (hours), SFF and β values are gathered from OLF 070. OREDA data are based on operational time.

System

Test interval (hours)

NTNU

1)

Comments

CCF

Vessel

Failure rate (per hour)

Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

APPENDIX B

NTNU

CONSERVATIVE FTA

The calculations are performed in Excel, and data are gathered from the data dossier in Appendix A. Conservative calculations of minimal cut sets index j

Minimal cut set

1

{PSD LT,ESD LT}

2

{PSD LT,ESD node}

3

{PSD LT,ESD sol}

4

{PSD LT,EV}

5

{PSD node,ESD LT}

6

{PSD node,ESD node}

7

{PSD node,ESD sol}

8

{PSD node,EV}

9

{PSD sol,ESD LT}

10

{PSD sol,ESD node}

11

{PSD sol,ESD sol}

12

{PSD sol,EV}

13

{XV,ESD LT}

14

{XV,ESD node}

15

{XV,ESD sol}

16

{XV,EV}

Conservative calculations of minimal cut sets 1)

Independent components λj,1

j

Minimal cut set

mj

(hour)

λj,2 -1

(hour)

-1

τ (hours)

β

PFD (MCj)

2

{PSD LT,ESD node}

2

4,93E‐06

5,00E‐06

8760

6,31E‐04

3

{PSD LT,ESD sol}

2

4,93E‐06

9,00E‐07

8760

1,13E‐04

4

{PSD LT,EV}

2

4,93E‐06

2,00E‐06

8760

2,52E‐04

5

{PSD node,ESD LT}

2

5,00E‐06

4,37E‐06

8760

5,59E‐04

7

{PSD node,ESD sol}

2

5,00E‐06

9,00E‐07

8760

1,15E‐04

8

{PSD node,EV}

2

5,00E‐06

2,00E‐06

8760

2,56E‐04

9

{PSD sol,ESD LT}

2

9,00E‐07

4,37E‐06

8760

1,01E‐04

10

{PSD sol,ESD node}

2

9,00E‐07

5,00E‐06

8760

1,15E‐04

12

{PSD sol,EV}

2

9,00E‐07

2,00E‐06

8760

4,60E‐05

72

Master Thesis Astrid Folkvord Janbu


Treatment of Uncertainties in Reliability Assessment of Safety Instrumented Systems

NTNU

13

{XV,ESD LT}

2

2,00E‐06

4,37E‐06

8760

2,24E‐04

14

{XV,ESD node}

2

2,00E‐06

5,00E‐06

8760

2,56E‐04

15

{XV,ESD sol}

2

2,00E‐06

9,00E‐07

8760

4,60E‐05

2)

Identical and dependent components λj

j

Minimal cut set

mj

(hour)

τ (hours)

-1

β

PFD (MCj)

6

{PSD node,ESD node} 2

5,00E‐06

8760

1%

8,46E‐04

11

{PSD sol,ESD sol}

2

9,00E‐07

8760

10 %

4,11E‐04

16

{XV,EV}

2

2,00E‐06

8760

2%

2,73E‐04

3)

Non‐identical and dependent components λj,1

j

Minimal cut set

1 4)

j

{PSD LT,ESD LT}

λj,2

mj

(hour)-1

τ (hour)-1 (hours)

2

4,93E‐06

4,37E‐06

λj,1

λj,2

8760

β 2%

PFD (MCj) 2,36E‐03

More complex minimal cuts

Minimal cut set

mj

(hour)

-1

(hour)

-1

τ (hours)

β

PFD (MCj)

Results Formula PFD for system 5,14E-03

Master Thesis Astrid Folkvord Janbu

73


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.