ICPE Montreal August 30, 2017
Creating Static Variables – a Regulatory Study Example
Mary S. Anthony, PhD Director of Epidemiology, RTI Health Solutions
Disclosures - Anthony • No specific funding was received for this project. • The following personal or financial relationships relevant to this presentation existed during the past 12 months: – Employment by RTI Health Solutions, a research institute that performs contracted services for pharmaceutical and medical device companies
2
Background • Example is from an ongoing regulatory agency required project that involves four data sources with electronic medical records (EMR) • Regulatory agency required a validation study, prior to the postmarketing required study, to determine whether valid algorithms could be developed to identify outcomes and exposures – One of the variables of interest was breastfeeding status at the time of a postpartum event
3
Aims • To show that breast feeding status (yes/no) during specific time intervals postpartum could be determined in EMRs in a large portion of the postpartum population • To determine whether the breastfeeding data results were logical and consistent across data sources and with external data (validity) • To develop an approach to identify breastfeeding status through algorithms
4
Approaches to Identify Breastfeeding • Codes - Initially looked to identify whether there were any ICD-9 CM, CPT, or HCPCS for breastfeeding – HCPCS codes related to breast pump – HCPCS codes for human breast milk processing
• Mother/infant record linkage – all sites • Reviewed data from – Clinical notes for mothers’ outpatient doctor visits – Clinical notes for infants’ outpatient visits – Structured questionnaire for well-baby checks
5
How to Identify Breastfeeding Status? • Structured questionnaire completed by physician/nurse during visit – Do you feed your baby breast milk? – Do you feed your baby formula? – Do you feed your baby anything besides breast milk or formula?
• NLP terms to identify women potentially breastfeeding (examples)
6
breastfeed*
breast pain
mastitis
breast milk
nipple pain
yeast breast
milk supply
pump*
yeast nipple
lactat*
nipple excoriation
breast engorge*
nurs*
breast infection
nipple sore
nipple infection
How was breastfeeding status identified? • Structured questionnaire was used at two sites • NLP applied to mother and infant visit notes and other We were able to identify the records where clinician notes was used at two sites to identify those who breastfeeding would be captured and methods were possibly breastfeeding for identification • Manual EMR review was done to determine breastfeeding status for the validation study
7
Next Issue… • Did visits happen frequently enough to be able to determine breastfeeding status at different times postpartum? • Time windows of interest (requested by regulatory agency) – – – – –
8
Immediately postpartum (i.e., ≤ 3 days postpartum) > 3 days and < 4 weeks postpartum ≥ 4 weeks and < 6 weeks postpartum ≥ 6 weeks and ≤ 14 weeks postpartum > 14 weeks and ≤ 52 weeks postpartum
American Association of Pediatrics Well-Child Care Visit Schedule for First 12 Months
Visit Schedule • Newborn • 3-5 days • By 1 month • 2 months • 4 months • 6 months • 9 months • 12 months
It seemed like there would be interactions between mothers and their health care professionals at intervals that would allow data for time periods of interest
Source: https://www.aap.org/en-us/Documents/periodicity_schedule.pdf 9
Approach to breastfeeding status identification for validation • At each site a random sample of 25 women was selected in each of 5 postpartum time categories • Identified breastfeeding in structured questionnaire or possible breastfeeding through NLP • Manual review of EMRs • Breastfeeding status was classified as yes, no, or undetermined
10
Rules for Classifying Breastfeeding Status at Time of the Event of Interest 1) Breastfeeding Status = Yes Breastfeeding
Not breastfeeding No additional notes of breastfeeding
Birth
Event of Interest
Breastfeeding noted
2) Breastfeeding Status = No Breastfeeding
Not breastfeeding No additional notes of breastfeeding
Birth
Breastfeeding noted
3) Breastfeeding Status = Undetermined No notes of breastfeeding or formula feeding Birth
11
Event of Interest
12 months
Event of Interest
Validity – Can Breastfeeding Status Be Determined?
Status
Site 1 n = 125
Site 2 n = 125
Site 3 n = 125
Site 4 n = 125
All Sites n = 500 (Mean %)
Yes, breastfeeding
72%
86%
80%
45%
71%
No, not breastfeeding
28%
14%
17%
34%
23%
Undetermined
0%
0%
3%
21%
6%
Overall, more than 90% could be classified as yes or no for breastfeeding status in the time interval
12
Validity – Is the Proportion of Women Breastfeeding Over Postpartum Time Intervals1 Logical? Time Interval
Site 1 (n=125)
Site 2 (n=125)
Site 3 (n=125)
Site 4 (n=125)
All sites (n=500)
% of cell
% of cell
% of cell
% of cell
% of cell
≤3 days
96
100
92
>3 days to <4 weeks
88
92
96
>14 to ≤52 weeks
24
68
52
a a
A trend toward a lower proportion of women b 76 92 84 52 ≥4 to <6 weeks breastfeeding over postpartum time as 76 76 76 56 ≥6 to ≤14 weeks expected 20
1Random sample of 25 selected at each site for each postpartum time period aDid
not have adequate sample size to include 25 in these cells bOversampled in this cell to have total of 125 13
92 90 69 71 41
Validity – How Do The Data Compare to an External Source? CDC Data on Breastfeeding Rates in the US, 2014* Location
Ever Breastfed (%)
Breastfeeding at 6 months (%)
Breastfeeding at 12 months (%)
US National average
79.2
49.4
26.7
State for Sites 1 and 2
92.8
63.1
38.4
State for Site 3
91.8
64.2
35.3
State for Site 4
74.1
38.6
21.5
Data are consistent with data from the sites Source: https://www.cdc.gov/breastfeeding/pdf/2014breastfeedingreportcard.pdf * Percent of women surveyed in the National Immunization Survey, 2011 births 14
Next Steps • Use NLP terms to develop an algorithm for the sites without the structured questionnaire • Validate the algorithm(s)
15
Summary • Many approaches can be used to identify data and create variables – Knowledge of the data sources and how, why, and by whom they are created and used – Create rules for identification and classification – Use various approaches to assess validity of the data
• Develop algorithms after there is a good understanding of the data, terms, sources • Describe approach to the Regulatory Agency
16
Questions?
17