Holger Schunemann TEACH 2010

Page 1

Holger Schünemann, MD, PhD Chair, Department of Clinical Epidemiology & Biostatistics Michael Gent Chair in Healthcare Research McMaster University, Hamilton, Canada

TEACH, New York, NY August 11 - 13, 2010


Content • Evidence hierarchies • From evidence to recommendations


Case scenario A 13 year old girl who lives in rural Indonesia presented with flu symptoms and developed severe respiratory distress over the course of the last 2 days. She required intubation. The history reveals that she shares her living quarters with her parents and her three siblings. At night the family’s chicken stock shares this room too and several chicken had died unexpectedly a few days before the girl fell sick. Interventions: antivirals, such as neuraminidase inhibitors oseltamivir and zanamivir


PICO Clinical question: Population:

Avian Flu/influenza A (H5N1) patients

Intervention: Oseltamivir (or Zanamivir) Comparison: No pharmacological intervention Outcomes: Mortality, hospitalizations, resource use, adverse outcomes, antimicrobial resistance WHO Avian Influenza GL. Schunemann et al., The Lancet ID, 2007


There are no RCTs! • Do you think that users of recommendations would like to be informed about the basis (explanation) for a recommendation or coverage decision if they were asked (by their patients)? • I suspect the answer is “yes” • If we need to provide the basis for recommendations, we need to say whether the evidence is good or not so good; in other words perhaps “no RCTs”


Hierarchy of evidence based on quality

STUDY DESIGN 

Randomized Controlled Trials Cohort Studies and Case Control Studies Case Reports and Case Series, Non-systematic observations Expert Opinion

BIAS


Which hierarchy? Recommendation for use of oral anticoagulation in patients with atrial fibrillation and rheumatic mitral valve disease Evidence •B •A • IV

Recommendation Class I 1 C

Organization  AHA  ACCP  SIGN


What to do?



Evidence based healthcare decisions Population values and preferences

(Clinical) state and circumstances

Expertise

Research evidence Haynes et al. 2002


Hierarchy of evidence based on quality

STUDY DESIGN 

Randomized Controlled Trials Cohort Studies and Case Control Studies Case Reports and Case Series, Non-systematic observations Expert Opinion

BIAS


“Everything should be made as simple as possible but not simpler.”

Explain the following? • Confounding, effect modification & ext. validity • Concealment of randomization • Blinding (who is blinded in a double blinded study?) • Intention to treat analysis and its correct application • P-values and confidence intervals


BMJ 2003

BMJ, 2003


Relative risk reduction: ….> 99.9 % (1/100,000) U.S. Parachute Association reported 821 injuries and 18 deaths out of 2.2 million jumps in 2007

BMJ 2003


A prominent grading system Level of evidence

Source of evidence

I

SR, RCTs

II

Cohort studies

Grades of recomend.

A B

III

Case-control studies

IV

Case series

C

V

Expert opinion

D


Simple hierarchies are (too) simplistic STUDY DESIGN 

Cohort Studies and Case Control Studies Case Reports and Case Series, Non-systematic observations Expert Opinion

Expert Opinion

Randomized Controlled Trials

BIAS

Schünemann & Bone, 2003


Evidence based healthcare decisions Population values and preferences

(Clinical) state and circumstances

Expertise

Research evidence Haynes et al. 2002


So? • Users want to know how good the evidence is – (rightly so) not equipped to do – someone needs to

• Confusion about what grading system – What methods to grade?

• Simple systems leading to recommendations not enough


GRADE

Working Group

Grades of Recommendation Assessment, Development and Evaluation

• Aim: to develop a common, transparent and sensible system for grading the quality of evidence and the strength of recommendations • International group of guideline developers, methodologists & clinicians from around the world (>100 contributors) – since 2000 • International group: ACCP, AHRQ, Australian NMRC, BMJ Clinical Evidence, CC, CDC, McMaster, NICE, Oxford CEBM, SIGN, UpToDate, USPSTF, WHO CMAJ 2003, BMJ 2004, BMC 2004, BMC 2005, AJRCCM 2006, Chest 2006, BMJ 2008


GRADE Uptake                  

World Health Organization Allergic Rhinitis in Asthma Guidelines (ARIA) American Thoracic Society American College of Physicians European Respiratory Society European Society of Thoracic Surgeons British Medical Journal Infectious Disease Society of America American College of Chest Physicians UpToDate® National Institutes of Health and Clinical Excellence (NICE) Scottish Intercollegiate Guideline Network (SIGN) Cochrane Collaboration Infectious Disease Society of America Clinical Evidence Agency for Health Care Research and Quality (AHRQ) Partner of GIN Over 40 major organizations


GRADE Quality of Evidence In the context of a systematic review • The quality of evidence reflects the extent to which we are confident that an estimate of effect is correct. In the context of making recommendations • The quality of evidence reflects the extent to which our confidence in an estimate of the effect is adequate to support a particular recommendation.


Likelihood of and confidence in an outcome


Definition of grades of evidence • ⊕⊕⊕⊕/A/High: Further research is very unlikely to change confidence in the estimate of effect. • ⊕⊕⊕/B/Moderate: Further research is likely to have an important impact on confidence in the estimate of effect and may change the estimate. • ⊕⊕/C/Low: Further research is very likely to have an important impact on confidence in the estimate of effect and is likely to change the estimate. • ⊕/D/Very low: Any estimate of effect is very uncertain.


Determinants of quality • RCTs ⊕⊕⊕⊕ • observational studies ⊕⊕ • 5 factors that can lower quality 1. 2. 3. 4. 5.

limitations in detailed design and execution (risk of bias criteria) Inconsistency (or heterogeneity) Indirectness (PICO and applicability) Imprecision (number of events and confidence intervals) Publication bias

• 3 factors can increase quality – –

large magnitude of effect all plausible residual confounding may be working to reduce the demonstrated effect or increase the effect if no effect was observed


1. Design and Execution • Limitations/risk of bias – – – – – –

Randomization lack of concealment intention to treat principle violated inadequate blinding loss to follow-up early stopping for benefit

The evidence for the effect of sublingual immunotherapy in children with allergic rhinitis on the development of asthma, comes from a single randomised trial with no description of randomisation, concealment of allocation, and type of analysis, no blinding, and 21% of children lost to follow-up. These very serious limitations would limit our confidence in the estimates of effect and likely lead to downgrading the quality of evidence.


1. Design and Execution •

CDSR 2008

From Cates , CDSR 2008


1. Design and Execution

Overall judgment required


Quality of evidence across studies Outcome #1 Outcome #2 Outcome #3

Quality: High Quality: Moderate Quality: Very low

III

V

II

IB


Determinants of quality • RCTs ⊕⊕⊕⊕ • observational studies ⊕⊕ • 5 factors that can lower quality 1. 2. 3. 4. 5.

limitations in detailed design and execution (risk of bias criteria) Inconsistency (or heterogeneity) Indirectness (PICO and applicability) Imprecision (number of events and confidence intervals) Publication bias

• 3 factors can increase quality – –

large magnitude of effect all plausible residual confounding may be working to reduce the demonstrated effect or increase the effect if no effect was observed


Case scenario A 13 year old girl who lives in rural Indonesia presented with flu symptoms and developed severe respiratory distress over the course of the last 2 days. She required intubation. The history reveals that she shares her living quarters with her parents and her three siblings. At night the family’s chicken stock shares this room too and several chicken had died unexpectedly a few days before the girl fell sick. Interventions: antivirals, such as neuraminidase inhibitors oseltamivir and zanamivir


PICO Clinical question: Population:

Avian Flu/influenza A (H5N1) patients

Intervention: Oseltamivir (or Zanamivir) Comparison: No pharmacological intervention Outcomes: Mortality, hospitalizations, resource use, adverse outcomes, antimicrobial resistance WHO Avian Influenza GL. Schunemann et al., The Lancet ID, 2007


Oseltamivir for Girl with Avian Flu Summary of findings:  No clinical trial of oseltamivir for treatment of H5N1 patients.  4 systematic reviews and health technology assessments (HTA) reporting on 5 studies of oseltamivir in seasonal influenza.  Hospitalization: OR 0.22 (0.02 – 2.16)  Pneumonia: OR 0.15 (0.03 - 0.69)

   

3 published case series. Many in vitro and animal studies. No alternative that is more promising at present. Cost: ~ $45 per treatment course


What are the factors that determine your decisions?


GRADE: Factors influencing decisions and recommendations

• Quality of Evidence • Balance of desirable and undesirable consequences • Values and preferences • Resource


Determinants of the strength of recommendation Factors that can strengthen a Comment recommendation Quality of the evidence The higher the quality of evidence, the more likely is a strong recommendation. Balance between desirable The larger the difference between the and undesirable effects desirable and undesirable consequences, the more likely a strong recommendation warranted. The smaller the net benefit and the lower certainty for that benefit, the more likely weak recommendation warranted. Values and preferences The greater the variability in values and preferences, or uncertainty in values and preferences, the more likely weak recommendation warranted. Costs (resource allocation) The higher the costs of an intervention – that is, the more resources consumed – the less likely is a strong recommendation warranted


Oseltamivir – Avian Influenza Factors that can weaken the strength of a recommendation. Example: treatment of H5N1 patients with oseltamivir Lower quality evidence

Decision

Explanation

 Yes □ No

The quality of evidence is very low

Uncertainty about the balance of benefits versus harms and burdens

 Yes □ No

Uncertainty or differences in values

□ Yes  No

The benefits are uncertain because several important or critical outcomes where not measured. However, the potential benefit is very large despite potentially small relative risk reductions. All patients and care providers would accept treatment for H5N1 disease

Uncertainty about whether the net benefits are worth the costs

□ Yes  No

For treatment of sporadic patients the price is not high ($45).

Frequent “yes” answers will increase the likelihood of a weak recommendation


Example: Oseltamivir for Avian Flu Recommendation: In patients with confirmed or strongly suspected infection with avian influenza A (H5N1) virus, clinicians should administer oseltamivir treatment as soon as possible (????? recommendation, very low quality evidence).

Schunemann et al. The Lancet ID, 2007


Example: Oseltamivir for Avian Flu Recommendation: In patients with confirmed or strongly suspected infection with avian influenza A (H5N1) virus, clinicians should administer oseltamivir treatment as soon as possible (strong recommendation, very low quality evidence). Values and Preferences Remarks: This recommendation places a high value on the prevention of death in an illness with a high case fatality. It places relatively low values on adverse reactions, the development of resistance and costs of treatment. Schunemann et al. The Lancet ID, 2007


Reassessment of clinical practice guidelines • Editorial by Shaneyfelt and Centor (JAMA 2009) – “Too many current guidelines have become marketing and opinion-based pieces…” – “AHA CPG: 48% of recommendations are based on level C = expert opinion…” – “…clinicians do not use CPG […] greater concern […] some CPG are turned into performance measures…” – “Time has come for CPG development to again be centralized, e.g., AHQR…”


Conclusions  Clinical practice guidelines should be based on the best available evidence to be evidence-based  GRADE combines what is known in health research methodology and provides a structured approach to improve communication  Criteria for evidence assessment across questions and outcomes  Criteria for moving from evidence to recommendations  Transparent, systematic  four categories of quality of evidence  two grades for strength of recommendations

 Transparency in decision making and judgments is key



Example: Oseltamivir for Avian Flu Recommendation: In patients with confirmed or strongly suspected infection with avian influenza A (H5N1) virus, clinicians should administer oseltamivir treatment as soon as possible (????? recommendation, very low quality evidence).

Schunemann et al. The Lancet ID, 2007


Are values important? Should resources be considered?

45


Example: Oseltamivir for Avian Flu Recommendation: In patients with confirmed or strongly suspected infection with avian influenza A (H5N1) virus, clinicians should administer oseltamivir treatment as soon as possible (strong recommendation, very low quality evidence). Values and Preferences Remarks: This recommendation places a high value on the prevention of death in an illness with a high case fatality. It places relatively low values on adverse reactions, the development of resistance and costs of treatment. Schunemann et al. The Lancet ID, 2007



2. Consistency of results

• Look for explanation for inconsistency

– patients, intervention, comparator, outcome, methods

• Judgment – variation in size of effect – overlap in confidence intervals – statistical significance of heterogeneity – I2


3. Directness of Evidence • differences in – populations/patients (mild versus severe COPD, older, sicker or more co-morbidity) – interventions (all inhaled steroids, new vs. old) – outcomes (important vs. surrogate; long-term healthrelated quality of life, short –term functional capacity, laboratory exercise, spirometry) • indirect comparisons – interested in A versus B – have A versus C and B versus C – formoterol versus salmeterol versus tiotropium


4. Publication Bias • Publication bias – Cave! Only few small studies

A systematic review of topical treatments for seasonal allergic conjunctivitis showed that patients using topical sodium cromoglycate were more likely to perceive benefit than those using placebo. However, only small trials reported clinically and statistically significant benefits of active treatment, while a larger trial showed a much smaller and a statistically not significant effect. These findings suggest that smaller studies demonstrating smaller effects might not have been published.


5. Imprecision • small sample size • small number of events – wide confidence intervals – uncertainty about magnitude of effect Observational studies examining the impact of exclusive breastfeeding on development of allergic rhinitis in high risk infants showed a relative risk of 0.87 (95% CI: 0.48 to 1.58) that neither rules out important benefit nor important harm (Mimouni Bloch 2002).


What can raise quality? 3 Factors • large magnitude can upgrade one level – very large two levels – common criteria • everyone used to do badly • almost everyone does well – The parachute example: if we’d look at the observational evidence, we’d find a very large effect and the evidence probably would be high quality for preventing death

• dose response relation (higher dose of brain radiation in childhood leukemia leads to greater risk of late malignancies) • Residual confounding unlikely to be responsible for observed effect




55


Hierarchy of evidence based on quality

STUDY DESIGN 

Randomized Controlled Trials Cohort Studies and Case Control Studies Case Reports and Case Series, Non-systematic observations Expert Opinion

BIAS


GRADE

Working Group

Grades of Recommendation Assessment, Development and Evaluation

• Aim: to develop a common, transparent and sensible system for grading the quality of evidence and the strength of recommendations • International group of guideline developers, methodologists & clinicians from around the world (>100 contributors) – since 2000 • International group: ACCP, AHRQ, Australian NMRC, BMJ Clinical Evidence, CC, CDC, McMaster, NICE, Oxford CEBM, SIGN, UpToDate, USPSTF, WHO CMAJ 2003, BMJ 2004, BMC 2004, BMC 2005, AJRCCM 2006, Chest 2006, BMJ 2008


GRADE Uptake                  

World Health Organization Allergic Rhinitis in Asthma Guidelines (ARIA) American Thoracic Society American College of Physicians European Respiratory Society European Society of Thoracic Surgeons British Medical Journal Infectious Disease Society of America American College of Chest Physicians UpToDate® National Institutes of Health and Clinical Excellence (NICE) Scottish Intercollegiate Guideline Network (SIGN) Cochrane Collaboration Infectious Disease Society of America Clinical Evidence Agency for Health Care Research and Quality (AHRQ) Partner of GIN Over 40 major organizations


Guideline development Process

Prioritise problems & scoping  Establish guideline panel and develop questions, including outcomes  Find and critically appraise systematic review(s) and/or Prepare protocol(s) for systematic review(s) and Prepare systematic review(s) (searches, selection of studies, data collection and analysis)  Prepare an evidence profile  Assess the quality of evidence for each outcome  Prepare a Summary of Findings table  If developing guidelines: Assess the overall quality of evidence and Decide on the direction (which alternative) and strength of the recommendation  Draft guideline  Consult with stakeholders and/or external peer reviewers  Disseminate guidelines  Update review or guidelines when needed  Adapt guidelines, if needed  Prioritise guidelines/recommendations for implementation  Implement or support implementation of the guidelines  Evaluate the impact of the guidelines and implementation strategies  Update systematic review/guidelines


P I C O

Outcome

Critical

Outcome

Critical

Outcome

Important

Outcome

Not im po rta n

of y t i l ua f or q e e te Ra denc tcom RCT start high, evi h ou obs. data start low eac

High Moderate Low Very low t

Summary of findings & estimate of effect for each outcome

Systematic review

Grade down

es nce u a m q t o r e po u tc lat o m u i t rm te lec e Fo Ra S

file o r e at e p pro Cre denc ADE evi h GR wit

1. 2. 3. 4. 5.

Grade up

ion t s e

es ies m tco tud Ou oss s acr

Risk of bias Inconsistency Indirectness Imprecision Publication bias

1. Large effect 2. Dose response 3. Confounders

Guideline development Formulate recommendations: •For or against (direction) •Strong or weak (strength) By considering: Quality of evidence Balance benefits/harms Values and preferences Revise if necessary by considering: Resource use (cost)

Rate overall quality of evidence across outcomes based on lowest quality of critical outcomes • • • •

“We recommend using…” “We suggest using…” “We recommend against using…” “We suggest against using…”


Oxford Centre for Evidence Based Medicine

Levels of Evidence and Grades of Recommendations- 23 November 1999. Grade of Recommendation

Level of Evidence

Therapy/Prevention, Aetiology/Harm

Prognosis

Diagnosis

Economic analysis

1a

SR (with homogeneity) of RCTs

SR (with homogeneity*) of Level 1 diagnostic studies; or a CPG validated on a test set.

SR (with homogeneity*) of Level 1 economic studies

1b

Individual RCT (with narrow Confidence Interval)

SR (with homogeneity*) of inception cohort studies; or a CPG validated on a test set. Individual inception cohort study with > 80% follow-up

Independent blind comparison of an appropriate spectrum of consecutive patients, all of whom have undergone both the diagnostic test and the reference standard.

1c

All or none

All or none case-series

Absolute SpPins and SnNouts

Analysis comparing all (critically-validated) alternative outcomes against appropriate cost measurement, and including a sensitivity analysis incorporating clinically sensible variations in important variables. Clearly as good or better, but cheaper. Clearly as bad or worse but more expensive. Clearly better or worse at the same cost.

2a

SR (with homogeneity*) of cohort studies

SR (with homogeneity*) of Level >2 diagnostic studies

SR (with homogeneity*) of Level >2 economic studies

2b

Individual cohort study (including low quality RCT; e.g., <80% follow-up)

SR (with homogeneity*) of either retrospective cohort studies or untreated control groups in RCTs. Retrospective cohort study or follow-up of untreated control patients in an RCT; or CPG not validated in a test set.

Any of: • Independent blind or objective comparison; • Study performed in a set of non-consecutive patients, or confined to a narrow spectrum of study individuals (or both) all of whom have undergone both the diagnostic test and the reference standard; • A diagnostic CPG not validated in a test set.

Analysis comparing a limited number of alternative outcomes against appropriate cost measurement, and including a sensitivity analysis incorporating clinically sensible variations in important variables.

2c

“Outcomes” Research

3a

SR (with homogeneity*) of case-control studies Individual Case-Control Study

Independent blind comparison of an appropriate spectrum, but the reference standard was not applied to all study patients

Analysis without accurate cost measurement, but including a sensitivity analysis incorporating clinically sensible variations in important variables.

A

B

3b

4

Case-series (and poor quality cohort and case-control studies)

Case-series (and poor quality prognostic cohort studies)

Any of: • Reference standard was unobjective, unblinded or not • independent; • Positive and negative tests were verified using separate reference standards; • Study was performed in an inappropriate spectrum** of patients.

Analysis with no sensitivity analysis

5

Expert opinion without explicit critical appraisal, or based on physiology, bench research or “first principles”

Expert opinion without explicit critical appraisal, or based on physiology, bench research or “first principles”

Expert opinion without explicit critical appraisal, or based on physiology, bench research or “first principles”

Expert opinion without explicit critical appraisal, or based on economic theory

C

D

“Outcomes” Research

Oxford Centre for Evidence-Based Medicine (Chris Ball, Dave Sackett, Bob Phillips, Brian Haynes, and Sharon Straus).


USPSTF - Grade Definitions After May 2007: Certainty Level of Certainty High

Moderate

Low

Description The available evidence usually includes consistent results from well-designed, wellconducted studies in representative primary care populations. These studies assess the effects of the preventive service on health outcomes. This conclusion is therefore unlikely to be strongly affected by the results of future studies. •The available evidence is sufficient to determine the effects of the preventive service on health outcomes, but confidence in the estimate is constrained by such factors as: The number, size, or quality of individual studies. •Inconsistency of findings across individual studies. •Limited generalizability of findings to routine primary care practice. •Lack of coherence in the chain of evidence. As more information becomes available, the magnitude or direction of the observed effect could change, and this change may be large enough to alter the conclusion. •The available evidence is insufficient to assess effects on health outcomes. Evidence is insufficient because of: The limited number or size of studies. •Important flaws in study design or methods. •Inconsistency of findings across individual studies. •Gaps in the chain of evidence. •Findings not generalizable to routine primary care practice. •Lack of information on important health outcomes. More information may allow estimation of effects on health outcomes.

The USPSTF defines certainty as "likelihood that the USPSTF assessment of the net benefit of a preventive service is correct."



• Recommendations for prognosis – Use prognostic information to determine baseline risk for healthcare decisions


65


Center for Disease Control and Prevention (CDC) Evidence of Execution Effectiveness - Good or Fair

Design Suitability — Greatest, Moderate, or Least Greatest

Number of Studies

Consistent

Effect Sized

Expert Opinion

At Least 2

Yes

Sufficient

Not Used

Greatest or Moderate Greatest

At Least 5

Yes

Sufficient

Not Used

Good or At Least 5 Yes Fair Meet Design, Execution, Number, and Consistency Criteria for Sufficient But Not Strong Evidence Sufficient Good Greatest 1 Not Applicable Good or Greatest or At Least 3 Yes Fair Moderate Good or Greatest, At Least 5 Yes Fair Moderate, or Least Expert Opinion Varies Varies Varies Varies

Sufficient

Not Used

Large

Not Used

Sufficient

Not Used

Sufficient

Not Used

Sufficient

Not Used

Sufficient

Insufficient

D. Small

Supports a Recommendation E. Not Used

Strong

Good Good

A.Insufficient Designs or Execution

B. Too Few Studies

C. Inconsistent


GRADE - 2004


P I C O

Outcome

Critical

Outcome

Critical

Outcome

Important

Outcome

Not im po rta n

of y t i l ua f or q e e te Ra denc tcom RCT start high, evi h ou obs. data start low eac

High Moderate Low Very low t

Summary of findings & estimate of effect for each outcome

Systematic review

Grade down

es nce u a m q t o r e po u tc lat o m u i t rm te lec e Fo Ra S

file o r e at e p pro Cre denc ADE evi h GR wit

1. 2. 3. 4. 5.

Grade up

ion t s e

es ies m tco tud Ou oss s acr

Risk of bias Inconsistency Indirectness Imprecision Publication bias

1. Large effect 2. Dose response 3. Confounders

Guideline development Formulate recommendations: •For or against (direction) •Strong or weak (strength) By considering: Quality of evidence Balance benefits/harms Values and preferences Revise if necessary by considering: Resource use (cost)

Rate overall quality of evidence across outcomes based on lowest quality of critical outcomes • • • •

“We recommend using…” “We suggest using…” “We recommend against using…” “We suggest against using…”


Case scenario A 13 year old girl who lives in rural Indonesia presented with flu symptoms and developed severe respiratory distress over the course of the last 2 days. She required intubation. The history reveals that she shares her living quarters with her parents and her three siblings. At night the family’s chicken stock shares this room too and several chicken had died unexpectedly a few days before the girl fell sick.

Potential interventions: antivirals, such as neuraminidase inhibitors oseltamivir and zanamivir


Types of questions Background Questions Definition: What is Avian Influenza? Mechanism: What is the mechanism of action of oseltamivir? Foreground Questions Benefit > harm: In patients with avian influenza, does oseltamivir therapy improve survival, …?


Framing a foreground question Population:

Avian Flu/influenza A (H5N1) patients

Intervention: Oseltamivir Comparison: No pharmacological intervention Outcomes: Mortality, hospitalizations, resource use, adverse outcomes, antimicrobial resistance Schunemann, et al., The Lancet ID, 2007


P I C O

Outcome

Critical

Outcome

Critical

Outcome

Important

Outcome

Not im po rta n

of y t i l ua f or q e e te Ra denc tcom RCT start high, evi h ou obs. data start low eac

High Moderate Low Very low t

Summary of findings & estimate of effect for each outcome

Systematic review

Grade down

es nce u a m q t o r e po u tc lat o m u i t rm te lec e Fo Ra S

file o r e at e p pro Cre denc ADE evi h GR wit

1. 2. 3. 4. 5.

Grade up

ion t s e

es ies m tco tud Ou oss s acr

Risk of bias Inconsistency Indirectness Imprecision Publication bias

1. Large effect 2. Dose response 3. Confounders

Guideline development Formulate recommendations: •For or against (direction) •Strong or weak (strength) By considering: Quality of evidence Balance benefits/harms Values and preferences Revise if necessary by considering: Resource use (cost)

Rate overall quality of evidence across outcomes based on lowest quality of critical outcomes • • • •

“We recommend using…” “We suggest using…” “We recommend against using…” “We suggest against using…”


Evidence Profiles/Summaries


Evidence Profiles/Summaries


Evidence Profiles/Summaries


Evidence Profiles/Summaries



Galactomannan ELISA for the diagnosis of invasive aspergillosis Question: Should Galactomannan ELISA be used for the diagnosis of invasive aspergillosis? Authors of the profile: Nancy Santesso, Jan Brozek, Holger Schünemann Population: immunocompromized patients, mostly hematology patients [1] Setting: mainly hematology or cancer departments, mainly inpatients New Test: commercial Platelia® sandwich ELISA detecting galactomannan in serum [2] Cut-off value: 1.5 ODI Reference Test: composite of EORTC/MSG clinical and histological criteria [3] Threshold: Proven or probable invasive aspergillosis Bibliography: Leeflang et al. Galactomannan detection for invasive aspergillosis in immunocompromized patients. Cochrane Database of Systematic Reviews 2008, Issue 4. CD007394 Number of Participants (studies)

Study design

Risk of bias

Indirectness

Inconsistency

Imprecision

True positives (Patients correctly classified as having invasive aspergillosis) [6] 2777 cohort, caseno no no no (18 studies) control True negatives (Patients correctly classified as not having invasive aspergillosis) [7] 2777 cohort, caseno no no no (18 studies) control False positives (Patients incorrectly classified as having invasive aspergillosis) [8] 2777 cohort, caseno serious [9] no no (18 studies) control False negatives (Patients incorrectly classified as not having the disease) [10] 2777 cohort, caseno serious [11] no no (18 studies) control Inconclusive results [12] 901 cohort, case– – – – (7 studies) control Based on Sensitivity of 0.64 (0.50-0.77) and Specificity of 0.95 (0.91-0.97) Difference in complications [13] – – – – – – Difference in resource use [13] – – – – – –

Publication bias

Effect (number per 1000 patients tested) Prevalence Prevalence Prevalence 1% [4] 12% [4] 44% [4]

Quality of the evidence (GRADE)

unlikely

6 (5 to 8)

77 (60 to 92)

282 (220 to 339)

HIGH ⊕⊕⊕⊕

7

unlikely

941 (901 to 960)

836 (801 to 854)

532 (510 to 543)

HIGH ⊕⊕⊕⊕

7

unlikely

50 (89 to 30)

44 (79 to 26)

28 (50 to 17)

MODERATE ⊕⊕⊕

8

unlikely

4 (5 to 2)

43 (60 to 28)

158 (220 to 101)

MODERATE ⊕⊕⊕

9

4

3

6

Importance [5]

Footnotes: [1] One study that included only patients with HIV/AIDS was excluded, because this patient group and setting differed from the other included patient groups in such a way that authors of the review regarded the study population as not being representative. [2] A diagnostic test for invasive aspergillosis (IA) needs to be not too invasive or a burden to immunocompromised patients. The galactomannan ELISA test on a serum specimen is less burdensome that a current reference standard. [3] A true reference standard is autopsy combined with culture from autopsy specimens; a clinical reference standard is composed of EORTC/MSG clinical and histological criteria. [4] Estimates of prevalence of IA were based on the median and range of values in included studies. [5] GRADE recommends classifying outcomes on a 9 point scale: 1-3 is not important; 4-6 is important; and, 7-9 is critical to a decision.


Galactomannan ELISA for the diagnosis of invasive aspergillosis Patient or population: immunocompromized patients, mostly hematology patients 1 Settings: mainly hematology or cancer departments, mainly inpatients New Test: commercial Platelia® sandwich ELISA detecting galactomannan in serum 2 Cut-off value: 1.5 ODI Reference Test: composite of EORTC/MSG clinical and histological criteria 3 Threshold: Proven or probable invasive aspergillosis Illustrative Comparative Numbers Quality of (95% CI)4 Number of the Outcomes Assumed numbers with participants Evidence galactomannan ELISA (studies) compared to reference test (EORTC/MSG criteria)

True positives

Prevalence 1% 6 (5 to 8)

(Patients correctly classified as having invasive aspergillosis)

Prevalence 12% 77 (60 to 92)

True negatives

Prevalence 1% 941 (901 to 960)

(Patients correctly classified as not having invasive aspergillosis)

False positives (Patients incorrectly classified as having invasive aspergillosis)

False negatives (Patients incorrectly classified as not having the disease)

Inconclusive results

2777 (18)

⊕⊕⊕⊕

2777 (18)

⊕⊕⊕⊕

2777 (18)

⊕⊕⊕

High

Prevalence 44% 282 (220 to 339)

Prevalence 12% 836 (801 to 854)

High

Comments

True positives are important because patients will be treated with improved survival and spared the invasive diagnostic procedures, however they will suffer the toxicity of the treatment.

True negatives are important, because patients will not be treated unnecessarily, will be spared the toxic effects of treatment, and will be reassured.

Prevalence 44% 532 (510 to 543) Prevalence 1% 50 (89 to 30) Prevalence 12% 44 (79 to 26)

Moderate5

Prevalence 44% 28 (50 to 17) Prevalence 1% 4 (5 to 2) Prevalence 12% 43 (60 to 28)

2777 (18)

⊕⊕⊕

Moderate6

False positives are important, because patients will be treated unnecessarily and many will be exposed to nephrotoxic drugs, also because of a possibly delayed diagnosis and treatment of true cause of symptoms.

False negatives important, because of missed diagnosis and either will not be treated with 60–90% mortality rate or treatment will be delayed with uncertain consequences on mortality.

Prevalence 44% 158 (220 to 101) See comment

See comment

See comment

Inconclusive results were not reported. Most likely they would result in repeating the index test and therefore increased cost.

See See Complications were not reported. comment comment See See See comment Resource use was not reported. Resource use comment comment CI: Confidence interval; EORTC: European Organization for Research and Treatment of Cancer; MSG: Mycoses Study Group GRADE Working Group grades of evidence High quality: Further research is very unlikely to change our confidence in the estimate of effect. Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may chang e the estimate. Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to chan ge the estimate. Very low quality: We are very uncertain about the estimate. 1 One study that included only patients with HIV/AIDS was excluded, because this patient group and setting differed from the ot her included patient groups in such a way that authors of the review regarded the study population as not being representative. 2 A diagnostic test for invasive aspergillosis (IA) needs to be not too invasive or a burden to immunocompromised patients. The galactomannan ELISA test on a serum specimen is less burdensome that a current reference standard. 3 A true reference standard is autopsy combined with culture from autopsy specimens; a clinical reference standard is composed of EORTC/MSG clinical and histological criteria. 4 Estimates of prevalence of IA were based on the median and range of values in included studies. 5 There is uncertainty to what extent a delayed diagnosis and treatment of true cause of symptoms would impact on patient -important outcomes. 6 There is some uncertainty to what extent a delayed diagnosis and treatment would impact mortality and other outcomes.

Complications

See comment


P I C O

Outcome

Critical

Outcome

Critical

Outcome

Important

Outcome

Not im po rta n

of y t i l ua f or q e e te Ra denc tcom RCT start high, evi h ou obs. data start low eac

High Moderate Low Very low t

Summary of findings & estimate of effect for each outcome

Systematic review

Grade down

es nce u a m q t o r e po u tc lat o m u i t rm te lec e Fo Ra S

file o r e at e p pro Cre denc ADE evi h GR wit

1. 2. 3. 4. 5.

Grade up

ion t s e

es ies m tco tud Ou oss s acr

Risk of bias Inconsistency Indirectness Imprecision Publication bias

1. Large effect 2. Dose response 3. Confounders

Guideline development Formulate recommendations: •For or against (direction) •Strong or weak (strength) By considering: Quality of evidence Balance benefits/harms Values and preferences Revise if necessary by considering: Resource use (cost)

Rate overall quality of evidence across outcomes based on lowest quality of critical outcomes • • • •

“We recommend using…” “We suggest using…” “We recommend against using…” “We suggest against using…”


Questions


GRADE: recommendation – quality of evidence Clear separation: 1) Recommendation: 2 grades – weak/conditional/optional or strong (for or against an intervention)?

– Balance of benefits and downsides, values and preferences, resource use and quality of evidence

2) 4 categories of quality of evidence: ⊕⊕⊕⊕ (High), ⊕⊕⊕(Moderate), ⊕⊕(Low), ⊕(Very low)? – methodological quality of evidence – likelihood of bias – by outcome and across outcomes

*www.GradeWorking-Group.org


GRADE Quality of Evidence In the context of making recommendations: • The quality of evidence reflects the extent of our confidence that the estimates of an effect are adequate to support a particular decision or recommendation.


Likelihood of and confidence in an outcome


Definition of grades of evidence • ⊕⊕⊕⊕/A/High: Further research is very unlikely to change confidence in the estimate of effect. • ⊕⊕⊕/B/Moderate: Further research is likely to have an important impact on confidence in the estimate of effect and may change the estimate. • ⊕⊕/C/Low: Further research is very likely to have an important impact on confidence in the estimate of effect and is likely to change the estimate. • ⊕/D/Very low: Any estimate of effect is very uncertain.


Determinants of quality • RCTs ⊕⊕⊕⊕ • observational studies ⊕⊕ • 5 factors that can lower quality 1. 2. 3. 4. 5.

limitations in detailed design and execution (risk of bias criteria) Inconsistency (or heterogeneity) Indirectness (PICO and applicability) Imprecision (number of events and confidence intervals) Publication bias

• 3 factors can increase quality – –

large magnitude of effect all plausible residual confounding may be working to reduce the demonstrated effect or increase the effect if no effect was observed


1. Design and Execution/Risk of Bias Examples: • Inappropriate selection of exposed and unexposed groups • Failure to adequately measure/control for confounding • Selective outcome reporting • Failure to blind (e.g. outcome assessors) • High loss to follow-up • Lack of concealment in RCTs • Intention to treat principle violated


Design and Execution/RoB

From Cates , CDSR 2008


Design and Execution/RoB

Overall judgment required


Who believes the risk of bias is of concern? Yes No Don’t know or undecided


Would you downgrade for risk of bias? No, there are no serious limitations

Yes, there are serious limitations

Yes, there are very serious limitations


Would you downgrade for risk of bias? No, there are no serious limitations – although there …. Yes, there are serious limitations – most people would agree that selective reporting is…. Yes, there are very serious limitations – there is a risk of bias but only for the one criteria of selective reporting


2. Inconsistency of results (Heterogeneity) • if inconsistency, look for explanation – patients, intervention, comparator, outcome

• if unexplained inconsistency lower quality


Reminders for immunization uptake



3. Directness of Evidence • differences in – populations/patients (childrent – neonates, women in general – pregnant women) – interventions (all vaccines, new - old) – comparator appropriate (new policy – old or no policy) – outcomes (important – surrogate: cases prevented – seroconversion) • indirect comparisons – interested in A versus B – have A versus C and B versus C – Vaccine A versus Placebo versus Vaccine B


4. Publication Bias • Should always be suspected – Only small “positive” trials – For profit interest – Various methods to evaluate – none perfect, but clearly a problem


ISIS-4 Lancet 1995

I.V. Mg in acute myocardial infarction

Meta-analysis Yusuf S.Circulation 1993

Publication bias

Egger M, Smith DS. BMJ 1995;310:752-54

98


Funnel plot

Standard Error

0

Symmetrical: No publication bias

1

2

3 0.1

0.3

0.6 1

Odds ratio

3

10

99


Funnel plot

Standard Error

0

1

File drawer problem No interest in publishing or being published

0.4

Asymmetrical: Publication bias?

2

3 0.1

0.3

0.6 1

Odds ratio

3

10

100


5. Imprecision • Small sample size – small number of events

• Wide confidence intervals – uncertainty about magnitude of effect

• Extent to which confidence in estimate of effect adequate to support decision


Example: Immunization in children



What can raise quality? 1. large magnitude can upgrade (RRR 50%/RR 2) – very large two levels (RRR 80%/RR 5) – criteria • everyone used to do badly • almost everyone does well – parachutes to prevent death when jumping from airplanes


Reminders for immunization uptake


What can raise quality? 2. dose response relation – (higher INR – increased bleeding) – childhood lymphoblastic leukemia • • • •

risk for CNS malignancies 15 years after cranial irradiation no radiation: 1% (95% CI 0% to 2.1%) 12 Gy: 1.6% (95% CI 0% to 3.4%) 18 Gy: 3.3% (95% CI 0.9% to 5.6%)

3. all plausible confounding may be working to reduce the demonstrated effect or increase the effect if no effect was observed


All plausible residual confounding would result in an overestimate of effect

Hypoglycaemic drug phenformin causes lactic acidosis  The related agent metformin is under suspicion for the same toxicity.  Large observational studies have failed to demonstrate an association 

– Clinicians would be more alert to lactic acidosis in the presence of the agent

• Vaccine – adverse effects


Quality assessment criteria Study design

Initial quality of a Lower if body of evidence

Higher if

Quality of a body of evidence

Randomised trials

High

Large effect + 1 Large +2 Very large

A/High (four plus: ⊕⊕⊕⊕)

Risk of Bias - 1 Serious -2 Very serious

Dose response Observational studies

Low

Inconsistency - 1 Serious

+1 Evidence of a gradient

-2 Very serious Indirectness - 1 Serious -2 Very serious

All plausible residual confounding +1 Would reduce a demonstrated effect

Imprecision - 1 Serious -2 Very serious

Publication bias - 1 Likely -2 Very likely

+1 Would suggest a spurious effect if no effect was observed

B/Moderate (three plus: ⊕⊕⊕) C/Low (two plus: ⊕⊕) D/Very low (one plus: ⊕)



1. Design and Execution/Risk of Bias Examples: • Inappropriate selection of exposed and unexposed groups • Failure to adequately measure/control for confounding • Selective outcome reporting • Failure to blind (e.g. outcome assessors) • High loss to follow-up • Lack of concealment in RCTs • Intention to treat principle violated


Design and Execution/RoB Cancer, anticoagulation and mortality

Akl E, Barba M, Rohilla S, Terrenato I, Sperati F, Schünemann HJ. “Anticoagulation for the long term treatment of venous thromboembolism in patients with cancer”. Cochrane Database Syst Rev. 2008 Apr 16;(2):CD006650.


Content • Background • Quality of evidence • Moving from evidence to recommendations


Strength of recommendation “The strength of a recommendation reflects the extent to which we can, across the range of patients for whom the recommendations are intended, be confident that desirable effects of a management strategy outweigh undesirable effects.” • Strong or weak/conditional


Determinants of the strength of recommendation Factors that can strengthen a Comment recommendation Quality of the evidence The higher the quality of evidence, the more likely is a strong recommendation. Balance between desirable The larger the difference between the and undesirable effects desirable and undesirable consequences, the more likely a strong recommendation warranted. The smaller the net benefit and the lower certainty for that benefit, the more likely weak recommendation warranted. Values and preferences The greater the variability in values and preferences, or uncertainty in values and preferences, the more likely weak recommendation warranted. Costs (resource allocation) The higher the costs of an intervention – that is, the more resources consumed – the less likely is a strong recommendation warranted


Developing recommendations


Case scenario A 13 year old girl who lives in rural Indonesia presented with flu symptoms and developed severe respiratory distress over the course of the last 2 days. She required intubation. The history reveals that she shares her living quarters with her parents and her three siblings. At night the family’s chicken stock shares this room too and several chicken had died unexpectedly a few days before the girl fell sick.


Methods – WHO Rapid Advice Guidelines for management of Avian Flu

 Applied findings of a recent systematic evaluation of guideline development for WHO/ACHR  Group composition (including panel of 13 voting members):  clinicians who treated influenza A(H5N1) patients  infectious disease experts  basic scientists  public health officers  methodologists

 Independent scientific reviewers:

 Identified systematic reviews, recent RCTs, case series, animal studies related to H5N1 infection


Oseltamivir for Avian Flu Summary of findings:  No clinical trial of oseltamivir for treatment of H5N1 patients.  4 systematic reviews and health technology assessments (HTA) reporting on 5 studies of oseltamivir in seasonal influenza.  Hospitalization: OR 0.22 (0.02 – 2.16)  Pneumonia: OR 0.15 (0.03 - 0.69)

   

3 published case series. Many in vitro and animal studies. No alternative that is more promising at present. Cost: 40$ per treatment course


From evidence to recommendation Factors that can strengthen a Comment recommendation Quality of the evidence Very low quality evidence Balance between desirable and undesirable effects Values and preferences Costs (resource allocation)

Uncertain, but small reduction in relative risk still leads to large absolute effect Little variability and clear Low cost under non-pandemic conditions


Example: Oseltamivir for Avian Flu Recommendation: In patients with confirmed or strongly suspected infection with avian influenza A (H5N1) virus, clinicians should administer oseltamivir treatment as soon as possible (????? recommendation, very low quality evidence).

Schunemann et al. The Lancet ID, 2007


Example: Oseltamivir for Avian Flu Recommendation: In patients with confirmed or strongly suspected infection with avian influenza A (H5N1) virus, clinicians should administer oseltamivir treatment as soon as possible (strong recommendation, very low quality evidence). Values and Preferences Remarks: This recommendation places a high value on the prevention of death in an illness with a high case fatality. It places relatively low values on adverse reactions, the development of resistance and costs of treatment. Schunemann et al. The Lancet ID, 2007


Implications of a strong recommendation • Patients: Most people in this situation would want the recommended course of action and only a small proportion would not • Clinicians: Most patients should receive the recommended course of action • Policy makers: The recommendation can be adapted as a policy in most situations


Implications of a conditional/weak recommendation • Patients: The majority of people in this situation would want the recommended course of action, but many would not • Clinicians: Be more prepared to help patients to make a decision that is consistent with their own values/decision aids and shared decision making • Policy makers: There is a need for substantial debate and involvement of stakeholders


es nce u a m q t o r e po u tc lat o m u i t rm te lec e Fo Ra S

Outcome

Critical

Outcome

Critical

Outcome

Important

Outcome

Not im po rta n

of y t i l ua f or q e e te Ra denc tcom evi h ou eac

High Moderate Low Very low t

Summary of findings & estimate of effect for each outcome

Systematic review

Grade up

P I C O

file o r e at e p pro Cre denc ADE evi h GR wit

Grade down

ion t s e

es ies m tco tud Ou oss s acr

Randomization increases initial quality 1. Risk of bias 2. Inconsistency 3. Indirectness 4. Imprecision 5. Publication bias 1. Large effect 2. Dose response 3. Confounders

Guideline development Formulate recommendations: •For or against (direction) •Strong or weak (strength) By considering: Quality of evidence Balance benefits/harms Values and preferences Revise if necessary by considering: Resource use (cost)

Grade overall quality of evidence across outcomes based on lowest quality of critical outcomes • • • •

“We recommend using…” “We suggest using…” “We recommend against using…” “We suggest against using…”


Issues in guideline development for immunization • Causation versus effects of intervention – Causation not equivalent to efficacy of interventions – Bradford Hill • Nearly half a century old – tablet from the mountain?

• Harms caused by medications – Assumption is that removal of drug (or no exposure) leads to NO adverse effects

• How confident can one be that removal of the exposure is effective in preventing disease? – Whether drugs or environmental factors it will depend on the intervention to remove exposure


GRADE and immunizations •

Can herd immunity following immunisation and indirect effects on the cocirculation of other pathogens typically be ascertained only through the use of observational epidemiological methods? – Innovative randomized controlled trials (RCTs) using clusterrandomization increasingly being conducted to provide this evidence A 94% protective effect of a live, monovalent vaccine against measles is classified as “moderate level of scientific evidence.” – GRADE’s strength of association criteria maybe applied to increase the grade by 2 levels – from “low” to “high” - possible in this situation GRADE ratings do not give credit to “gradient of effects with scale of population level impact compatible with degree of coverage.” – GRADE’s dose-response criterion would apply to such gradients May anti-vaccination lobby groups abuse the ratings. – Abuse of any system is possible: equally likely that increased transparency provided by the GRADE framework can strengthen, rather than undermine, the trust in vaccines and other interventions


Bradford Hill and GRADE Bradford Hill Criteria

Consideration in GRADE

Strength

Strength of association and imprecision in effect estimate

Consistency

Consistency across studies, i.e. across different situations (different researchers)

Temporality

Study design, specific study limitations; RCTs fulfil this criterion better than observational studies

Biological gradient

Dose response gradient

Specificity

Indirectness

Biological Plausibility

Indirectness, publication bias

Coherence

Indirectness

Experiment

Study design, randomization

Analogy

Existing association for critical outcomes will lead to not downgrading the quality


Bradford Hill Criteria • Would agree with modifying if these criteria were not already considered • Emphasis is on strength of recommendation, quality of evidence is only one factor


Conclusions  Clinical practice guidelines should be based on the best available evidence to be evidence based  GRADE combines what is known in health research methodology and provides a structured approach to improve communication  Criteria for evidence assessment across questions and outcomes  Criteria for moving from evidence to recommendations  Transparent, systematic  four categories of quality of evidence  two grades for strength of recommendations

 Transparency in decision making and judgments is key


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.