Holger Schünemann, MD, PhD Chair, Department of Clinical Epidemiology & Biostatistics Michael Gent Chair in Healthcare Research McMaster University, Hamilton, Canada
TEACH, New York, NY August 11 - 13, 2010
Content • Evidence hierarchies • From evidence to recommendations
Case scenario A 13 year old girl who lives in rural Indonesia presented with flu symptoms and developed severe respiratory distress over the course of the last 2 days. She required intubation. The history reveals that she shares her living quarters with her parents and her three siblings. At night the family’s chicken stock shares this room too and several chicken had died unexpectedly a few days before the girl fell sick. Interventions: antivirals, such as neuraminidase inhibitors oseltamivir and zanamivir
PICO Clinical question: Population:
Avian Flu/influenza A (H5N1) patients
Intervention: Oseltamivir (or Zanamivir) Comparison: No pharmacological intervention Outcomes: Mortality, hospitalizations, resource use, adverse outcomes, antimicrobial resistance WHO Avian Influenza GL. Schunemann et al., The Lancet ID, 2007
There are no RCTs! • Do you think that users of recommendations would like to be informed about the basis (explanation) for a recommendation or coverage decision if they were asked (by their patients)? • I suspect the answer is “yes” • If we need to provide the basis for recommendations, we need to say whether the evidence is good or not so good; in other words perhaps “no RCTs”
Hierarchy of evidence based on quality
STUDY DESIGN
Randomized Controlled Trials Cohort Studies and Case Control Studies Case Reports and Case Series, Non-systematic observations Expert Opinion
BIAS
Which hierarchy? Recommendation for use of oral anticoagulation in patients with atrial fibrillation and rheumatic mitral valve disease Evidence •B •A • IV
Recommendation Class I 1 C
Organization AHA ACCP SIGN
What to do?
Evidence based healthcare decisions Population values and preferences
(Clinical) state and circumstances
Expertise
Research evidence Haynes et al. 2002
Hierarchy of evidence based on quality
STUDY DESIGN
Randomized Controlled Trials Cohort Studies and Case Control Studies Case Reports and Case Series, Non-systematic observations Expert Opinion
BIAS
“Everything should be made as simple as possible but not simpler.”
Explain the following? • Confounding, effect modification & ext. validity • Concealment of randomization • Blinding (who is blinded in a double blinded study?) • Intention to treat analysis and its correct application • P-values and confidence intervals
BMJ 2003
BMJ, 2003
Relative risk reduction: ….> 99.9 % (1/100,000) U.S. Parachute Association reported 821 injuries and 18 deaths out of 2.2 million jumps in 2007
BMJ 2003
A prominent grading system Level of evidence
Source of evidence
I
SR, RCTs
II
Cohort studies
Grades of recomend.
A B
III
Case-control studies
IV
Case series
C
V
Expert opinion
D
Simple hierarchies are (too) simplistic STUDY DESIGN
Cohort Studies and Case Control Studies Case Reports and Case Series, Non-systematic observations Expert Opinion
Expert Opinion
Randomized Controlled Trials
BIAS
Schünemann & Bone, 2003
Evidence based healthcare decisions Population values and preferences
(Clinical) state and circumstances
Expertise
Research evidence Haynes et al. 2002
So? • Users want to know how good the evidence is – (rightly so) not equipped to do – someone needs to
• Confusion about what grading system – What methods to grade?
• Simple systems leading to recommendations not enough
GRADE
Working Group
Grades of Recommendation Assessment, Development and Evaluation
• Aim: to develop a common, transparent and sensible system for grading the quality of evidence and the strength of recommendations • International group of guideline developers, methodologists & clinicians from around the world (>100 contributors) – since 2000 • International group: ACCP, AHRQ, Australian NMRC, BMJ Clinical Evidence, CC, CDC, McMaster, NICE, Oxford CEBM, SIGN, UpToDate, USPSTF, WHO CMAJ 2003, BMJ 2004, BMC 2004, BMC 2005, AJRCCM 2006, Chest 2006, BMJ 2008
GRADE Uptake
World Health Organization Allergic Rhinitis in Asthma Guidelines (ARIA) American Thoracic Society American College of Physicians European Respiratory Society European Society of Thoracic Surgeons British Medical Journal Infectious Disease Society of America American College of Chest Physicians UpToDate® National Institutes of Health and Clinical Excellence (NICE) Scottish Intercollegiate Guideline Network (SIGN) Cochrane Collaboration Infectious Disease Society of America Clinical Evidence Agency for Health Care Research and Quality (AHRQ) Partner of GIN Over 40 major organizations
GRADE Quality of Evidence In the context of a systematic review • The quality of evidence reflects the extent to which we are confident that an estimate of effect is correct. In the context of making recommendations • The quality of evidence reflects the extent to which our confidence in an estimate of the effect is adequate to support a particular recommendation.
Likelihood of and confidence in an outcome
Definition of grades of evidence • ⊕⊕⊕⊕/A/High: Further research is very unlikely to change confidence in the estimate of effect. • ⊕⊕⊕/B/Moderate: Further research is likely to have an important impact on confidence in the estimate of effect and may change the estimate. • ⊕⊕/C/Low: Further research is very likely to have an important impact on confidence in the estimate of effect and is likely to change the estimate. • ⊕/D/Very low: Any estimate of effect is very uncertain.
Determinants of quality • RCTs ⊕⊕⊕⊕ • observational studies ⊕⊕ • 5 factors that can lower quality 1. 2. 3. 4. 5.
limitations in detailed design and execution (risk of bias criteria) Inconsistency (or heterogeneity) Indirectness (PICO and applicability) Imprecision (number of events and confidence intervals) Publication bias
• 3 factors can increase quality – –
large magnitude of effect all plausible residual confounding may be working to reduce the demonstrated effect or increase the effect if no effect was observed
1. Design and Execution • Limitations/risk of bias – – – – – –
Randomization lack of concealment intention to treat principle violated inadequate blinding loss to follow-up early stopping for benefit
The evidence for the effect of sublingual immunotherapy in children with allergic rhinitis on the development of asthma, comes from a single randomised trial with no description of randomisation, concealment of allocation, and type of analysis, no blinding, and 21% of children lost to follow-up. These very serious limitations would limit our confidence in the estimates of effect and likely lead to downgrading the quality of evidence.
1. Design and Execution •
CDSR 2008
From Cates , CDSR 2008
1. Design and Execution
Overall judgment required
Quality of evidence across studies Outcome #1 Outcome #2 Outcome #3
Quality: High Quality: Moderate Quality: Very low
III
V
II
IB
Determinants of quality • RCTs ⊕⊕⊕⊕ • observational studies ⊕⊕ • 5 factors that can lower quality 1. 2. 3. 4. 5.
limitations in detailed design and execution (risk of bias criteria) Inconsistency (or heterogeneity) Indirectness (PICO and applicability) Imprecision (number of events and confidence intervals) Publication bias
• 3 factors can increase quality – –
large magnitude of effect all plausible residual confounding may be working to reduce the demonstrated effect or increase the effect if no effect was observed
Case scenario A 13 year old girl who lives in rural Indonesia presented with flu symptoms and developed severe respiratory distress over the course of the last 2 days. She required intubation. The history reveals that she shares her living quarters with her parents and her three siblings. At night the family’s chicken stock shares this room too and several chicken had died unexpectedly a few days before the girl fell sick. Interventions: antivirals, such as neuraminidase inhibitors oseltamivir and zanamivir
PICO Clinical question: Population:
Avian Flu/influenza A (H5N1) patients
Intervention: Oseltamivir (or Zanamivir) Comparison: No pharmacological intervention Outcomes: Mortality, hospitalizations, resource use, adverse outcomes, antimicrobial resistance WHO Avian Influenza GL. Schunemann et al., The Lancet ID, 2007
Oseltamivir for Girl with Avian Flu Summary of findings: No clinical trial of oseltamivir for treatment of H5N1 patients. 4 systematic reviews and health technology assessments (HTA) reporting on 5 studies of oseltamivir in seasonal influenza. Hospitalization: OR 0.22 (0.02 – 2.16) Pneumonia: OR 0.15 (0.03 - 0.69)
3 published case series. Many in vitro and animal studies. No alternative that is more promising at present. Cost: ~ $45 per treatment course
What are the factors that determine your decisions?
GRADE: Factors influencing decisions and recommendations
• Quality of Evidence • Balance of desirable and undesirable consequences • Values and preferences • Resource
Determinants of the strength of recommendation Factors that can strengthen a Comment recommendation Quality of the evidence The higher the quality of evidence, the more likely is a strong recommendation. Balance between desirable The larger the difference between the and undesirable effects desirable and undesirable consequences, the more likely a strong recommendation warranted. The smaller the net benefit and the lower certainty for that benefit, the more likely weak recommendation warranted. Values and preferences The greater the variability in values and preferences, or uncertainty in values and preferences, the more likely weak recommendation warranted. Costs (resource allocation) The higher the costs of an intervention – that is, the more resources consumed – the less likely is a strong recommendation warranted
Oseltamivir – Avian Influenza Factors that can weaken the strength of a recommendation. Example: treatment of H5N1 patients with oseltamivir Lower quality evidence
Decision
Explanation
Yes □ No
The quality of evidence is very low
Uncertainty about the balance of benefits versus harms and burdens
Yes □ No
Uncertainty or differences in values
□ Yes No
The benefits are uncertain because several important or critical outcomes where not measured. However, the potential benefit is very large despite potentially small relative risk reductions. All patients and care providers would accept treatment for H5N1 disease
Uncertainty about whether the net benefits are worth the costs
□ Yes No
For treatment of sporadic patients the price is not high ($45).
Frequent “yes” answers will increase the likelihood of a weak recommendation
Example: Oseltamivir for Avian Flu Recommendation: In patients with confirmed or strongly suspected infection with avian influenza A (H5N1) virus, clinicians should administer oseltamivir treatment as soon as possible (????? recommendation, very low quality evidence).
Schunemann et al. The Lancet ID, 2007
Example: Oseltamivir for Avian Flu Recommendation: In patients with confirmed or strongly suspected infection with avian influenza A (H5N1) virus, clinicians should administer oseltamivir treatment as soon as possible (strong recommendation, very low quality evidence). Values and Preferences Remarks: This recommendation places a high value on the prevention of death in an illness with a high case fatality. It places relatively low values on adverse reactions, the development of resistance and costs of treatment. Schunemann et al. The Lancet ID, 2007
Reassessment of clinical practice guidelines • Editorial by Shaneyfelt and Centor (JAMA 2009) – “Too many current guidelines have become marketing and opinion-based pieces…” – “AHA CPG: 48% of recommendations are based on level C = expert opinion…” – “…clinicians do not use CPG […] greater concern […] some CPG are turned into performance measures…” – “Time has come for CPG development to again be centralized, e.g., AHQR…”
Conclusions Clinical practice guidelines should be based on the best available evidence to be evidence-based GRADE combines what is known in health research methodology and provides a structured approach to improve communication Criteria for evidence assessment across questions and outcomes Criteria for moving from evidence to recommendations Transparent, systematic four categories of quality of evidence two grades for strength of recommendations
Transparency in decision making and judgments is key
Example: Oseltamivir for Avian Flu Recommendation: In patients with confirmed or strongly suspected infection with avian influenza A (H5N1) virus, clinicians should administer oseltamivir treatment as soon as possible (????? recommendation, very low quality evidence).
Schunemann et al. The Lancet ID, 2007
Are values important? Should resources be considered?
45
Example: Oseltamivir for Avian Flu Recommendation: In patients with confirmed or strongly suspected infection with avian influenza A (H5N1) virus, clinicians should administer oseltamivir treatment as soon as possible (strong recommendation, very low quality evidence). Values and Preferences Remarks: This recommendation places a high value on the prevention of death in an illness with a high case fatality. It places relatively low values on adverse reactions, the development of resistance and costs of treatment. Schunemann et al. The Lancet ID, 2007
2. Consistency of results
• Look for explanation for inconsistency
– patients, intervention, comparator, outcome, methods
• Judgment – variation in size of effect – overlap in confidence intervals – statistical significance of heterogeneity – I2
3. Directness of Evidence • differences in – populations/patients (mild versus severe COPD, older, sicker or more co-morbidity) – interventions (all inhaled steroids, new vs. old) – outcomes (important vs. surrogate; long-term healthrelated quality of life, short –term functional capacity, laboratory exercise, spirometry) • indirect comparisons – interested in A versus B – have A versus C and B versus C – formoterol versus salmeterol versus tiotropium
4. Publication Bias • Publication bias – Cave! Only few small studies
A systematic review of topical treatments for seasonal allergic conjunctivitis showed that patients using topical sodium cromoglycate were more likely to perceive benefit than those using placebo. However, only small trials reported clinically and statistically significant benefits of active treatment, while a larger trial showed a much smaller and a statistically not significant effect. These findings suggest that smaller studies demonstrating smaller effects might not have been published.
5. Imprecision • small sample size • small number of events – wide confidence intervals – uncertainty about magnitude of effect Observational studies examining the impact of exclusive breastfeeding on development of allergic rhinitis in high risk infants showed a relative risk of 0.87 (95% CI: 0.48 to 1.58) that neither rules out important benefit nor important harm (Mimouni Bloch 2002).
What can raise quality? 3 Factors • large magnitude can upgrade one level – very large two levels – common criteria • everyone used to do badly • almost everyone does well – The parachute example: if we’d look at the observational evidence, we’d find a very large effect and the evidence probably would be high quality for preventing death
• dose response relation (higher dose of brain radiation in childhood leukemia leads to greater risk of late malignancies) • Residual confounding unlikely to be responsible for observed effect
55
Hierarchy of evidence based on quality
STUDY DESIGN
Randomized Controlled Trials Cohort Studies and Case Control Studies Case Reports and Case Series, Non-systematic observations Expert Opinion
BIAS
GRADE
Working Group
Grades of Recommendation Assessment, Development and Evaluation
• Aim: to develop a common, transparent and sensible system for grading the quality of evidence and the strength of recommendations • International group of guideline developers, methodologists & clinicians from around the world (>100 contributors) – since 2000 • International group: ACCP, AHRQ, Australian NMRC, BMJ Clinical Evidence, CC, CDC, McMaster, NICE, Oxford CEBM, SIGN, UpToDate, USPSTF, WHO CMAJ 2003, BMJ 2004, BMC 2004, BMC 2005, AJRCCM 2006, Chest 2006, BMJ 2008
GRADE Uptake
World Health Organization Allergic Rhinitis in Asthma Guidelines (ARIA) American Thoracic Society American College of Physicians European Respiratory Society European Society of Thoracic Surgeons British Medical Journal Infectious Disease Society of America American College of Chest Physicians UpToDate® National Institutes of Health and Clinical Excellence (NICE) Scottish Intercollegiate Guideline Network (SIGN) Cochrane Collaboration Infectious Disease Society of America Clinical Evidence Agency for Health Care Research and Quality (AHRQ) Partner of GIN Over 40 major organizations
Guideline development Process
Prioritise problems & scoping Establish guideline panel and develop questions, including outcomes Find and critically appraise systematic review(s) and/or Prepare protocol(s) for systematic review(s) and Prepare systematic review(s) (searches, selection of studies, data collection and analysis) Prepare an evidence profile Assess the quality of evidence for each outcome Prepare a Summary of Findings table If developing guidelines: Assess the overall quality of evidence and Decide on the direction (which alternative) and strength of the recommendation Draft guideline Consult with stakeholders and/or external peer reviewers Disseminate guidelines Update review or guidelines when needed Adapt guidelines, if needed Prioritise guidelines/recommendations for implementation Implement or support implementation of the guidelines Evaluate the impact of the guidelines and implementation strategies Update systematic review/guidelines
P I C O
Outcome
Critical
Outcome
Critical
Outcome
Important
Outcome
Not im po rta n
of y t i l ua f or q e e te Ra denc tcom RCT start high, evi h ou obs. data start low eac
High Moderate Low Very low t
Summary of findings & estimate of effect for each outcome
Systematic review
Grade down
es nce u a m q t o r e po u tc lat o m u i t rm te lec e Fo Ra S
file o r e at e p pro Cre denc ADE evi h GR wit
1. 2. 3. 4. 5.
Grade up
ion t s e
es ies m tco tud Ou oss s acr
Risk of bias Inconsistency Indirectness Imprecision Publication bias
1. Large effect 2. Dose response 3. Confounders
Guideline development Formulate recommendations: •For or against (direction) •Strong or weak (strength) By considering: Quality of evidence Balance benefits/harms Values and preferences Revise if necessary by considering: Resource use (cost)
Rate overall quality of evidence across outcomes based on lowest quality of critical outcomes • • • •
“We recommend using…” “We suggest using…” “We recommend against using…” “We suggest against using…”
Oxford Centre for Evidence Based Medicine
Levels of Evidence and Grades of Recommendations- 23 November 1999. Grade of Recommendation
Level of Evidence
Therapy/Prevention, Aetiology/Harm
Prognosis
Diagnosis
Economic analysis
1a
SR (with homogeneity) of RCTs
SR (with homogeneity*) of Level 1 diagnostic studies; or a CPG validated on a test set.
SR (with homogeneity*) of Level 1 economic studies
1b
Individual RCT (with narrow Confidence Interval)
SR (with homogeneity*) of inception cohort studies; or a CPG validated on a test set. Individual inception cohort study with > 80% follow-up
Independent blind comparison of an appropriate spectrum of consecutive patients, all of whom have undergone both the diagnostic test and the reference standard.
1c
All or none
All or none case-series
Absolute SpPins and SnNouts
Analysis comparing all (critically-validated) alternative outcomes against appropriate cost measurement, and including a sensitivity analysis incorporating clinically sensible variations in important variables. Clearly as good or better, but cheaper. Clearly as bad or worse but more expensive. Clearly better or worse at the same cost.
2a
SR (with homogeneity*) of cohort studies
SR (with homogeneity*) of Level >2 diagnostic studies
SR (with homogeneity*) of Level >2 economic studies
2b
Individual cohort study (including low quality RCT; e.g., <80% follow-up)
SR (with homogeneity*) of either retrospective cohort studies or untreated control groups in RCTs. Retrospective cohort study or follow-up of untreated control patients in an RCT; or CPG not validated in a test set.
Any of: • Independent blind or objective comparison; • Study performed in a set of non-consecutive patients, or confined to a narrow spectrum of study individuals (or both) all of whom have undergone both the diagnostic test and the reference standard; • A diagnostic CPG not validated in a test set.
Analysis comparing a limited number of alternative outcomes against appropriate cost measurement, and including a sensitivity analysis incorporating clinically sensible variations in important variables.
2c
“Outcomes” Research
3a
SR (with homogeneity*) of case-control studies Individual Case-Control Study
Independent blind comparison of an appropriate spectrum, but the reference standard was not applied to all study patients
Analysis without accurate cost measurement, but including a sensitivity analysis incorporating clinically sensible variations in important variables.
A
B
3b
4
Case-series (and poor quality cohort and case-control studies)
Case-series (and poor quality prognostic cohort studies)
Any of: • Reference standard was unobjective, unblinded or not • independent; • Positive and negative tests were verified using separate reference standards; • Study was performed in an inappropriate spectrum** of patients.
Analysis with no sensitivity analysis
5
Expert opinion without explicit critical appraisal, or based on physiology, bench research or “first principles”
Expert opinion without explicit critical appraisal, or based on physiology, bench research or “first principles”
Expert opinion without explicit critical appraisal, or based on physiology, bench research or “first principles”
Expert opinion without explicit critical appraisal, or based on economic theory
C
D
“Outcomes” Research
Oxford Centre for Evidence-Based Medicine (Chris Ball, Dave Sackett, Bob Phillips, Brian Haynes, and Sharon Straus).
USPSTF - Grade Definitions After May 2007: Certainty Level of Certainty High
Moderate
Low
Description The available evidence usually includes consistent results from well-designed, wellconducted studies in representative primary care populations. These studies assess the effects of the preventive service on health outcomes. This conclusion is therefore unlikely to be strongly affected by the results of future studies. •The available evidence is sufficient to determine the effects of the preventive service on health outcomes, but confidence in the estimate is constrained by such factors as: The number, size, or quality of individual studies. •Inconsistency of findings across individual studies. •Limited generalizability of findings to routine primary care practice. •Lack of coherence in the chain of evidence. As more information becomes available, the magnitude or direction of the observed effect could change, and this change may be large enough to alter the conclusion. •The available evidence is insufficient to assess effects on health outcomes. Evidence is insufficient because of: The limited number or size of studies. •Important flaws in study design or methods. •Inconsistency of findings across individual studies. •Gaps in the chain of evidence. •Findings not generalizable to routine primary care practice. •Lack of information on important health outcomes. More information may allow estimation of effects on health outcomes.
The USPSTF defines certainty as "likelihood that the USPSTF assessment of the net benefit of a preventive service is correct."
• Recommendations for prognosis – Use prognostic information to determine baseline risk for healthcare decisions
65
Center for Disease Control and Prevention (CDC) Evidence of Execution Effectiveness - Good or Fair
Design Suitability — Greatest, Moderate, or Least Greatest
Number of Studies
Consistent
Effect Sized
Expert Opinion
At Least 2
Yes
Sufficient
Not Used
Greatest or Moderate Greatest
At Least 5
Yes
Sufficient
Not Used
Good or At Least 5 Yes Fair Meet Design, Execution, Number, and Consistency Criteria for Sufficient But Not Strong Evidence Sufficient Good Greatest 1 Not Applicable Good or Greatest or At Least 3 Yes Fair Moderate Good or Greatest, At Least 5 Yes Fair Moderate, or Least Expert Opinion Varies Varies Varies Varies
Sufficient
Not Used
Large
Not Used
Sufficient
Not Used
Sufficient
Not Used
Sufficient
Not Used
Sufficient
Insufficient
D. Small
Supports a Recommendation E. Not Used
Strong
Good Good
A.Insufficient Designs or Execution
B. Too Few Studies
C. Inconsistent
GRADE - 2004
P I C O
Outcome
Critical
Outcome
Critical
Outcome
Important
Outcome
Not im po rta n
of y t i l ua f or q e e te Ra denc tcom RCT start high, evi h ou obs. data start low eac
High Moderate Low Very low t
Summary of findings & estimate of effect for each outcome
Systematic review
Grade down
es nce u a m q t o r e po u tc lat o m u i t rm te lec e Fo Ra S
file o r e at e p pro Cre denc ADE evi h GR wit
1. 2. 3. 4. 5.
Grade up
ion t s e
es ies m tco tud Ou oss s acr
Risk of bias Inconsistency Indirectness Imprecision Publication bias
1. Large effect 2. Dose response 3. Confounders
Guideline development Formulate recommendations: •For or against (direction) •Strong or weak (strength) By considering: Quality of evidence Balance benefits/harms Values and preferences Revise if necessary by considering: Resource use (cost)
Rate overall quality of evidence across outcomes based on lowest quality of critical outcomes • • • •
“We recommend using…” “We suggest using…” “We recommend against using…” “We suggest against using…”
Case scenario A 13 year old girl who lives in rural Indonesia presented with flu symptoms and developed severe respiratory distress over the course of the last 2 days. She required intubation. The history reveals that she shares her living quarters with her parents and her three siblings. At night the family’s chicken stock shares this room too and several chicken had died unexpectedly a few days before the girl fell sick.
Potential interventions: antivirals, such as neuraminidase inhibitors oseltamivir and zanamivir
Types of questions Background Questions Definition: What is Avian Influenza? Mechanism: What is the mechanism of action of oseltamivir? Foreground Questions Benefit > harm: In patients with avian influenza, does oseltamivir therapy improve survival, …?
Framing a foreground question Population:
Avian Flu/influenza A (H5N1) patients
Intervention: Oseltamivir Comparison: No pharmacological intervention Outcomes: Mortality, hospitalizations, resource use, adverse outcomes, antimicrobial resistance Schunemann, et al., The Lancet ID, 2007
P I C O
Outcome
Critical
Outcome
Critical
Outcome
Important
Outcome
Not im po rta n
of y t i l ua f or q e e te Ra denc tcom RCT start high, evi h ou obs. data start low eac
High Moderate Low Very low t
Summary of findings & estimate of effect for each outcome
Systematic review
Grade down
es nce u a m q t o r e po u tc lat o m u i t rm te lec e Fo Ra S
file o r e at e p pro Cre denc ADE evi h GR wit
1. 2. 3. 4. 5.
Grade up
ion t s e
es ies m tco tud Ou oss s acr
Risk of bias Inconsistency Indirectness Imprecision Publication bias
1. Large effect 2. Dose response 3. Confounders
Guideline development Formulate recommendations: •For or against (direction) •Strong or weak (strength) By considering: Quality of evidence Balance benefits/harms Values and preferences Revise if necessary by considering: Resource use (cost)
Rate overall quality of evidence across outcomes based on lowest quality of critical outcomes • • • •
“We recommend using…” “We suggest using…” “We recommend against using…” “We suggest against using…”
Evidence Profiles/Summaries
Evidence Profiles/Summaries
Evidence Profiles/Summaries
Evidence Profiles/Summaries
Galactomannan ELISA for the diagnosis of invasive aspergillosis Question: Should Galactomannan ELISA be used for the diagnosis of invasive aspergillosis? Authors of the profile: Nancy Santesso, Jan Brozek, Holger Schünemann Population: immunocompromized patients, mostly hematology patients [1] Setting: mainly hematology or cancer departments, mainly inpatients New Test: commercial Platelia® sandwich ELISA detecting galactomannan in serum [2] Cut-off value: 1.5 ODI Reference Test: composite of EORTC/MSG clinical and histological criteria [3] Threshold: Proven or probable invasive aspergillosis Bibliography: Leeflang et al. Galactomannan detection for invasive aspergillosis in immunocompromized patients. Cochrane Database of Systematic Reviews 2008, Issue 4. CD007394 Number of Participants (studies)
Study design
Risk of bias
Indirectness
Inconsistency
Imprecision
True positives (Patients correctly classified as having invasive aspergillosis) [6] 2777 cohort, caseno no no no (18 studies) control True negatives (Patients correctly classified as not having invasive aspergillosis) [7] 2777 cohort, caseno no no no (18 studies) control False positives (Patients incorrectly classified as having invasive aspergillosis) [8] 2777 cohort, caseno serious [9] no no (18 studies) control False negatives (Patients incorrectly classified as not having the disease) [10] 2777 cohort, caseno serious [11] no no (18 studies) control Inconclusive results [12] 901 cohort, case– – – – (7 studies) control Based on Sensitivity of 0.64 (0.50-0.77) and Specificity of 0.95 (0.91-0.97) Difference in complications [13] – – – – – – Difference in resource use [13] – – – – – –
Publication bias
Effect (number per 1000 patients tested) Prevalence Prevalence Prevalence 1% [4] 12% [4] 44% [4]
Quality of the evidence (GRADE)
unlikely
6 (5 to 8)
77 (60 to 92)
282 (220 to 339)
HIGH ⊕⊕⊕⊕
7
unlikely
941 (901 to 960)
836 (801 to 854)
532 (510 to 543)
HIGH ⊕⊕⊕⊕
7
unlikely
50 (89 to 30)
44 (79 to 26)
28 (50 to 17)
MODERATE ⊕⊕⊕
8
unlikely
4 (5 to 2)
43 (60 to 28)
158 (220 to 101)
MODERATE ⊕⊕⊕
9
–
–
–
–
–
4
–
–
–
–
–
3
–
–
–
–
–
6
Importance [5]
Footnotes: [1] One study that included only patients with HIV/AIDS was excluded, because this patient group and setting differed from the other included patient groups in such a way that authors of the review regarded the study population as not being representative. [2] A diagnostic test for invasive aspergillosis (IA) needs to be not too invasive or a burden to immunocompromised patients. The galactomannan ELISA test on a serum specimen is less burdensome that a current reference standard. [3] A true reference standard is autopsy combined with culture from autopsy specimens; a clinical reference standard is composed of EORTC/MSG clinical and histological criteria. [4] Estimates of prevalence of IA were based on the median and range of values in included studies. [5] GRADE recommends classifying outcomes on a 9 point scale: 1-3 is not important; 4-6 is important; and, 7-9 is critical to a decision.
Galactomannan ELISA for the diagnosis of invasive aspergillosis Patient or population: immunocompromized patients, mostly hematology patients 1 Settings: mainly hematology or cancer departments, mainly inpatients New Test: commercial Platelia® sandwich ELISA detecting galactomannan in serum 2 Cut-off value: 1.5 ODI Reference Test: composite of EORTC/MSG clinical and histological criteria 3 Threshold: Proven or probable invasive aspergillosis Illustrative Comparative Numbers Quality of (95% CI)4 Number of the Outcomes Assumed numbers with participants Evidence galactomannan ELISA (studies) compared to reference test (EORTC/MSG criteria)
True positives
Prevalence 1% 6 (5 to 8)
(Patients correctly classified as having invasive aspergillosis)
Prevalence 12% 77 (60 to 92)
True negatives
Prevalence 1% 941 (901 to 960)
(Patients correctly classified as not having invasive aspergillosis)
False positives (Patients incorrectly classified as having invasive aspergillosis)
False negatives (Patients incorrectly classified as not having the disease)
Inconclusive results
2777 (18)
⊕⊕⊕⊕
2777 (18)
⊕⊕⊕⊕
2777 (18)
⊕⊕⊕
High
Prevalence 44% 282 (220 to 339)
Prevalence 12% 836 (801 to 854)
High
Comments
True positives are important because patients will be treated with improved survival and spared the invasive diagnostic procedures, however they will suffer the toxicity of the treatment.
True negatives are important, because patients will not be treated unnecessarily, will be spared the toxic effects of treatment, and will be reassured.
Prevalence 44% 532 (510 to 543) Prevalence 1% 50 (89 to 30) Prevalence 12% 44 (79 to 26)
Moderate5
Prevalence 44% 28 (50 to 17) Prevalence 1% 4 (5 to 2) Prevalence 12% 43 (60 to 28)
2777 (18)
⊕⊕⊕
Moderate6
False positives are important, because patients will be treated unnecessarily and many will be exposed to nephrotoxic drugs, also because of a possibly delayed diagnosis and treatment of true cause of symptoms.
False negatives important, because of missed diagnosis and either will not be treated with 60–90% mortality rate or treatment will be delayed with uncertain consequences on mortality.
Prevalence 44% 158 (220 to 101) See comment
See comment
See comment
Inconclusive results were not reported. Most likely they would result in repeating the index test and therefore increased cost.
See See Complications were not reported. comment comment See See See comment Resource use was not reported. Resource use comment comment CI: Confidence interval; EORTC: European Organization for Research and Treatment of Cancer; MSG: Mycoses Study Group GRADE Working Group grades of evidence High quality: Further research is very unlikely to change our confidence in the estimate of effect. Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may chang e the estimate. Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to chan ge the estimate. Very low quality: We are very uncertain about the estimate. 1 One study that included only patients with HIV/AIDS was excluded, because this patient group and setting differed from the ot her included patient groups in such a way that authors of the review regarded the study population as not being representative. 2 A diagnostic test for invasive aspergillosis (IA) needs to be not too invasive or a burden to immunocompromised patients. The galactomannan ELISA test on a serum specimen is less burdensome that a current reference standard. 3 A true reference standard is autopsy combined with culture from autopsy specimens; a clinical reference standard is composed of EORTC/MSG clinical and histological criteria. 4 Estimates of prevalence of IA were based on the median and range of values in included studies. 5 There is uncertainty to what extent a delayed diagnosis and treatment of true cause of symptoms would impact on patient -important outcomes. 6 There is some uncertainty to what extent a delayed diagnosis and treatment would impact mortality and other outcomes.
Complications
See comment
P I C O
Outcome
Critical
Outcome
Critical
Outcome
Important
Outcome
Not im po rta n
of y t i l ua f or q e e te Ra denc tcom RCT start high, evi h ou obs. data start low eac
High Moderate Low Very low t
Summary of findings & estimate of effect for each outcome
Systematic review
Grade down
es nce u a m q t o r e po u tc lat o m u i t rm te lec e Fo Ra S
file o r e at e p pro Cre denc ADE evi h GR wit
1. 2. 3. 4. 5.
Grade up
ion t s e
es ies m tco tud Ou oss s acr
Risk of bias Inconsistency Indirectness Imprecision Publication bias
1. Large effect 2. Dose response 3. Confounders
Guideline development Formulate recommendations: •For or against (direction) •Strong or weak (strength) By considering: Quality of evidence Balance benefits/harms Values and preferences Revise if necessary by considering: Resource use (cost)
Rate overall quality of evidence across outcomes based on lowest quality of critical outcomes • • • •
“We recommend using…” “We suggest using…” “We recommend against using…” “We suggest against using…”
Questions
GRADE: recommendation – quality of evidence Clear separation: 1) Recommendation: 2 grades – weak/conditional/optional or strong (for or against an intervention)?
– Balance of benefits and downsides, values and preferences, resource use and quality of evidence
2) 4 categories of quality of evidence: ⊕⊕⊕⊕ (High), ⊕⊕⊕(Moderate), ⊕⊕(Low), ⊕(Very low)? – methodological quality of evidence – likelihood of bias – by outcome and across outcomes
*www.GradeWorking-Group.org
GRADE Quality of Evidence In the context of making recommendations: • The quality of evidence reflects the extent of our confidence that the estimates of an effect are adequate to support a particular decision or recommendation.
Likelihood of and confidence in an outcome
Definition of grades of evidence • ⊕⊕⊕⊕/A/High: Further research is very unlikely to change confidence in the estimate of effect. • ⊕⊕⊕/B/Moderate: Further research is likely to have an important impact on confidence in the estimate of effect and may change the estimate. • ⊕⊕/C/Low: Further research is very likely to have an important impact on confidence in the estimate of effect and is likely to change the estimate. • ⊕/D/Very low: Any estimate of effect is very uncertain.
Determinants of quality • RCTs ⊕⊕⊕⊕ • observational studies ⊕⊕ • 5 factors that can lower quality 1. 2. 3. 4. 5.
limitations in detailed design and execution (risk of bias criteria) Inconsistency (or heterogeneity) Indirectness (PICO and applicability) Imprecision (number of events and confidence intervals) Publication bias
• 3 factors can increase quality – –
large magnitude of effect all plausible residual confounding may be working to reduce the demonstrated effect or increase the effect if no effect was observed
1. Design and Execution/Risk of Bias Examples: • Inappropriate selection of exposed and unexposed groups • Failure to adequately measure/control for confounding • Selective outcome reporting • Failure to blind (e.g. outcome assessors) • High loss to follow-up • Lack of concealment in RCTs • Intention to treat principle violated
Design and Execution/RoB
From Cates , CDSR 2008
Design and Execution/RoB
Overall judgment required
Who believes the risk of bias is of concern? Yes No Don’t know or undecided
Would you downgrade for risk of bias? No, there are no serious limitations
Yes, there are serious limitations
Yes, there are very serious limitations
Would you downgrade for risk of bias? No, there are no serious limitations – although there …. Yes, there are serious limitations – most people would agree that selective reporting is…. Yes, there are very serious limitations – there is a risk of bias but only for the one criteria of selective reporting
2. Inconsistency of results (Heterogeneity) • if inconsistency, look for explanation – patients, intervention, comparator, outcome
• if unexplained inconsistency lower quality
Reminders for immunization uptake
3. Directness of Evidence • differences in – populations/patients (childrent – neonates, women in general – pregnant women) – interventions (all vaccines, new - old) – comparator appropriate (new policy – old or no policy) – outcomes (important – surrogate: cases prevented – seroconversion) • indirect comparisons – interested in A versus B – have A versus C and B versus C – Vaccine A versus Placebo versus Vaccine B
4. Publication Bias • Should always be suspected – Only small “positive” trials – For profit interest – Various methods to evaluate – none perfect, but clearly a problem
ISIS-4 Lancet 1995
I.V. Mg in acute myocardial infarction
Meta-analysis Yusuf S.Circulation 1993
Publication bias
Egger M, Smith DS. BMJ 1995;310:752-54
98
Funnel plot
Standard Error
0
Symmetrical: No publication bias
1
2
3 0.1
0.3
0.6 1
Odds ratio
3
10
99
Funnel plot
Standard Error
0
1
File drawer problem No interest in publishing or being published
0.4
Asymmetrical: Publication bias?
2
3 0.1
0.3
0.6 1
Odds ratio
3
10
100
5. Imprecision • Small sample size – small number of events
• Wide confidence intervals – uncertainty about magnitude of effect
• Extent to which confidence in estimate of effect adequate to support decision
Example: Immunization in children
What can raise quality? 1. large magnitude can upgrade (RRR 50%/RR 2) – very large two levels (RRR 80%/RR 5) – criteria • everyone used to do badly • almost everyone does well – parachutes to prevent death when jumping from airplanes
Reminders for immunization uptake
What can raise quality? 2. dose response relation – (higher INR – increased bleeding) – childhood lymphoblastic leukemia • • • •
risk for CNS malignancies 15 years after cranial irradiation no radiation: 1% (95% CI 0% to 2.1%) 12 Gy: 1.6% (95% CI 0% to 3.4%) 18 Gy: 3.3% (95% CI 0.9% to 5.6%)
3. all plausible confounding may be working to reduce the demonstrated effect or increase the effect if no effect was observed
All plausible residual confounding would result in an overestimate of effect
Hypoglycaemic drug phenformin causes lactic acidosis The related agent metformin is under suspicion for the same toxicity. Large observational studies have failed to demonstrate an association
– Clinicians would be more alert to lactic acidosis in the presence of the agent
• Vaccine – adverse effects
Quality assessment criteria Study design
Initial quality of a Lower if body of evidence
Higher if
Quality of a body of evidence
Randomised trials
High
Large effect + 1 Large +2 Very large
A/High (four plus: ⊕⊕⊕⊕)
Risk of Bias - 1 Serious -2 Very serious
Dose response Observational studies
Low
Inconsistency - 1 Serious
+1 Evidence of a gradient
-2 Very serious Indirectness - 1 Serious -2 Very serious
All plausible residual confounding +1 Would reduce a demonstrated effect
Imprecision - 1 Serious -2 Very serious
Publication bias - 1 Likely -2 Very likely
+1 Would suggest a spurious effect if no effect was observed
B/Moderate (three plus: ⊕⊕⊕) C/Low (two plus: ⊕⊕) D/Very low (one plus: ⊕)
1. Design and Execution/Risk of Bias Examples: • Inappropriate selection of exposed and unexposed groups • Failure to adequately measure/control for confounding • Selective outcome reporting • Failure to blind (e.g. outcome assessors) • High loss to follow-up • Lack of concealment in RCTs • Intention to treat principle violated
Design and Execution/RoB Cancer, anticoagulation and mortality
Akl E, Barba M, Rohilla S, Terrenato I, Sperati F, Schünemann HJ. “Anticoagulation for the long term treatment of venous thromboembolism in patients with cancer”. Cochrane Database Syst Rev. 2008 Apr 16;(2):CD006650.
Content • Background • Quality of evidence • Moving from evidence to recommendations
Strength of recommendation “The strength of a recommendation reflects the extent to which we can, across the range of patients for whom the recommendations are intended, be confident that desirable effects of a management strategy outweigh undesirable effects.” • Strong or weak/conditional
Determinants of the strength of recommendation Factors that can strengthen a Comment recommendation Quality of the evidence The higher the quality of evidence, the more likely is a strong recommendation. Balance between desirable The larger the difference between the and undesirable effects desirable and undesirable consequences, the more likely a strong recommendation warranted. The smaller the net benefit and the lower certainty for that benefit, the more likely weak recommendation warranted. Values and preferences The greater the variability in values and preferences, or uncertainty in values and preferences, the more likely weak recommendation warranted. Costs (resource allocation) The higher the costs of an intervention – that is, the more resources consumed – the less likely is a strong recommendation warranted
Developing recommendations
Case scenario A 13 year old girl who lives in rural Indonesia presented with flu symptoms and developed severe respiratory distress over the course of the last 2 days. She required intubation. The history reveals that she shares her living quarters with her parents and her three siblings. At night the family’s chicken stock shares this room too and several chicken had died unexpectedly a few days before the girl fell sick.
Methods – WHO Rapid Advice Guidelines for management of Avian Flu
Applied findings of a recent systematic evaluation of guideline development for WHO/ACHR Group composition (including panel of 13 voting members): clinicians who treated influenza A(H5N1) patients infectious disease experts basic scientists public health officers methodologists
Independent scientific reviewers:
Identified systematic reviews, recent RCTs, case series, animal studies related to H5N1 infection
Oseltamivir for Avian Flu Summary of findings: No clinical trial of oseltamivir for treatment of H5N1 patients. 4 systematic reviews and health technology assessments (HTA) reporting on 5 studies of oseltamivir in seasonal influenza. Hospitalization: OR 0.22 (0.02 – 2.16) Pneumonia: OR 0.15 (0.03 - 0.69)
3 published case series. Many in vitro and animal studies. No alternative that is more promising at present. Cost: 40$ per treatment course
From evidence to recommendation Factors that can strengthen a Comment recommendation Quality of the evidence Very low quality evidence Balance between desirable and undesirable effects Values and preferences Costs (resource allocation)
Uncertain, but small reduction in relative risk still leads to large absolute effect Little variability and clear Low cost under non-pandemic conditions
Example: Oseltamivir for Avian Flu Recommendation: In patients with confirmed or strongly suspected infection with avian influenza A (H5N1) virus, clinicians should administer oseltamivir treatment as soon as possible (????? recommendation, very low quality evidence).
Schunemann et al. The Lancet ID, 2007
Example: Oseltamivir for Avian Flu Recommendation: In patients with confirmed or strongly suspected infection with avian influenza A (H5N1) virus, clinicians should administer oseltamivir treatment as soon as possible (strong recommendation, very low quality evidence). Values and Preferences Remarks: This recommendation places a high value on the prevention of death in an illness with a high case fatality. It places relatively low values on adverse reactions, the development of resistance and costs of treatment. Schunemann et al. The Lancet ID, 2007
Implications of a strong recommendation • Patients: Most people in this situation would want the recommended course of action and only a small proportion would not • Clinicians: Most patients should receive the recommended course of action • Policy makers: The recommendation can be adapted as a policy in most situations
Implications of a conditional/weak recommendation • Patients: The majority of people in this situation would want the recommended course of action, but many would not • Clinicians: Be more prepared to help patients to make a decision that is consistent with their own values/decision aids and shared decision making • Policy makers: There is a need for substantial debate and involvement of stakeholders
es nce u a m q t o r e po u tc lat o m u i t rm te lec e Fo Ra S
Outcome
Critical
Outcome
Critical
Outcome
Important
Outcome
Not im po rta n
of y t i l ua f or q e e te Ra denc tcom evi h ou eac
High Moderate Low Very low t
Summary of findings & estimate of effect for each outcome
Systematic review
Grade up
P I C O
file o r e at e p pro Cre denc ADE evi h GR wit
Grade down
ion t s e
es ies m tco tud Ou oss s acr
Randomization increases initial quality 1. Risk of bias 2. Inconsistency 3. Indirectness 4. Imprecision 5. Publication bias 1. Large effect 2. Dose response 3. Confounders
Guideline development Formulate recommendations: •For or against (direction) •Strong or weak (strength) By considering: Quality of evidence Balance benefits/harms Values and preferences Revise if necessary by considering: Resource use (cost)
Grade overall quality of evidence across outcomes based on lowest quality of critical outcomes • • • •
“We recommend using…” “We suggest using…” “We recommend against using…” “We suggest against using…”
Issues in guideline development for immunization • Causation versus effects of intervention – Causation not equivalent to efficacy of interventions – Bradford Hill • Nearly half a century old – tablet from the mountain?
• Harms caused by medications – Assumption is that removal of drug (or no exposure) leads to NO adverse effects
• How confident can one be that removal of the exposure is effective in preventing disease? – Whether drugs or environmental factors it will depend on the intervention to remove exposure
GRADE and immunizations •
•
•
•
Can herd immunity following immunisation and indirect effects on the cocirculation of other pathogens typically be ascertained only through the use of observational epidemiological methods? – Innovative randomized controlled trials (RCTs) using clusterrandomization increasingly being conducted to provide this evidence A 94% protective effect of a live, monovalent vaccine against measles is classified as “moderate level of scientific evidence.” – GRADE’s strength of association criteria maybe applied to increase the grade by 2 levels – from “low” to “high” - possible in this situation GRADE ratings do not give credit to “gradient of effects with scale of population level impact compatible with degree of coverage.” – GRADE’s dose-response criterion would apply to such gradients May anti-vaccination lobby groups abuse the ratings. – Abuse of any system is possible: equally likely that increased transparency provided by the GRADE framework can strengthen, rather than undermine, the trust in vaccines and other interventions
Bradford Hill and GRADE Bradford Hill Criteria
Consideration in GRADE
Strength
Strength of association and imprecision in effect estimate
Consistency
Consistency across studies, i.e. across different situations (different researchers)
Temporality
Study design, specific study limitations; RCTs fulfil this criterion better than observational studies
Biological gradient
Dose response gradient
Specificity
Indirectness
Biological Plausibility
Indirectness, publication bias
Coherence
Indirectness
Experiment
Study design, randomization
Analogy
Existing association for critical outcomes will lead to not downgrading the quality
Bradford Hill Criteria • Would agree with modifying if these criteria were not already considered • Emphasis is on strength of recommendation, quality of evidence is only one factor
Conclusions Clinical practice guidelines should be based on the best available evidence to be evidence based GRADE combines what is known in health research methodology and provides a structured approach to improve communication Criteria for evidence assessment across questions and outcomes Criteria for moving from evidence to recommendations Transparent, systematic four categories of quality of evidence two grades for strength of recommendations
Transparency in decision making and judgments is key