Data Mining Healthcare Data Warehouse

Page 1

Case Study: How to Apply Data Mining Techniques in a Healthcare Data Warehouse Michael Silver, MD, FACP, FCCP, FCCM; Taiki Sakata; Hua-Ching Su, MS; Charles Herman; Steven B. Dolins, PhD; Michael J. O’Shea ABSTRACT Healthcare provider organizations are faced with a rising number of financial pressures. Both administrators and physicians need help analyzing large numbers of clinical and financial data when making decisions. To assist them, Rush-Presbyterian–St. Luke’s Medical Center and Hitachi America, Ltd. (HAL), Inc., have partnered to build an enterprise data warehouse and perform a series of case study analyses. This article focuses on one analysis, which was performed by a team of physicians and computer science researchers, using a commercially available on-line analytical processing (OLAP) tool in conjunction with proprietary data mining techniques developed by HAL researchers. The initial objective of the analysis was to discover how to use data mining techniques to make business decisions that can influence cost, revenue, and operational efficiency while maintaining a high level of care. Another objective was to understand how to apply these techniques appropriately and to find a repeatable method for analyzing data and finding business insights. The process used to identify opportunities and effect changes is described. KEYWORDS • Data mining • On-line analytical processing tool (OLAP) • Data warehouse • Business process improvement Note: The authors would like to thank Pat Skarulis, chief information officer at RushPresbyterian–St. Luke’s Medical Center, Yoichi Shintani, vice president at Hitachi America, Ltd., and Bob Kero, chief of business development at Hitachi America, Ltd., for providing guidance for this research. Thanks to Shinji Fujiwara and Arti Denterlein, our colleagues at Hitachi America, Ltd., for setting up the case study environment. JOURNAL OF HEALTHCARE INFORMATION MANAGEMENT®, vol. 15, no. 2, Summer 2001 © Healthcare Information Management Systems Society and Jossey-Bass, A Publishing Unit of John Wiley & Sons, Inc.

155


156

Silver, Sakata, Su, Herman, Dolins, O’Shea

Healthcare provider organizations are faced with a rising number of financial pressures: payer reimbursements that are not covering costs, uninsured patients who are provided care at low or no reimbursement, increased labor costs, decreased admissions, and so on. Both administrators and physicians need help analyzing clinical and financial data when making decisions.1,2 To assist administrators and physicians, Rush-Presbyterian–St. Luke’s Medical Center and Hitachi America, Ltd. (HAL), Inc., have partnered to build an enterprise data warehouse and to perform a series of case study analyses; this article focuses on one analysis.

Background of the Case Study OLAP is a technique used to analyze databases. A number of commercially available products have been built to support this functionality; examples are Cognos’ Enterprise’s OLAP and PowerPlay, Business Objects Inc.’s Business Objects, Informix’s MetaCube, Platinum’s InfoBeacon, MicroStrategy’s DSS Agent, and Oracle’s Express. All these products offer similar functionality. OLAP typically includes the following kinds of analyses: simple (view one or more measures that can be sorted and totaled), comparison (view one measure and sort or total based on two dimensions), trend (view measure over time), variance (compare one measure at different times such as “sales” and “sales a year ago”), and ranking (top 10 or bottom 10 products sold).3 OLAP enables users to drill down within a dimension to see more detailed data at various levels of aggregation. Users can also filter data, that is, focus their analysis on a subset of records in the database. For example, if a user is interacting with a retail chain store database, then he or she may only be interested in “West Coast” stores. Users need to know for which attribute or attributes they want to set up filter conditions. Users also need to know how to define the filtering conditions; OLAP enables users to filter records based only on arithmetic conditions on one or more database attributes or a “where” clause in a SQL statement. For the case study we used Microsoft OLAP Services for the multidimensional database server and Knosys ProClarity to do the reporting, that is, to display grids and graph. Typical problems that data mining addresses are how to classify data, cluster data, find associations between data items, and perform time series analysis. Numerous data mining techniques have been invented for each type of problem.4,5 Each problem requires data mining techniques to analyze large quantities of data. Two techniques for data mining were used: patient rule induction method (PRIM)6,7 and weighted item sets (WIS), a type of association rule technique. PRIM and WIS are described next. PRIM. PRIM is a technique that does not fall exactly into one of the business problem categories listed earlier. PRIM finds the optimal region, that is, a


Case Study: How to Apply Data Mining Techniques

157

subset of data points with the highest average value, given a set of input attributes and a minimum size of the region specified by the user. Data records contain input variables and an output variable (variables are record attributes or derived attributes, and the output variable must be a measure). A record’s location in a dimensional space is based on the value of its attributes, for example, “attending physician,” “payer,” and “LOS” in a hospital database. PRIM finds regions where the output variable has a high average value compared to the average value for the entire set of records. PRIM could also be used to find regions with minimum average value by maximizing the negative values of the output variable. WIS. WIS is an association rule tool that finds relationships between various attributes in a database; some of the attributes can be derived measures. The relationships are defined in terms of if-then rules that show the frequency of records appearing in the database that satisfies the rule. For example, ninety out of one hundred patients in the database with DRG “999” have a length of stay greater than or equal to ten days. Data mining and data warehousing are becoming more prevalent in the healthcare industry because of the large quantities of data stored in various systems at medical institutions and the number of business decisions made based on the data.8,9,10

Identify Cost and Revenue Opportunities Using Data Mining The objective of the analysis was to discover interesting and unexpected business insights through the application of data mining techniques; subsequently, these insights can be used to make business decisions that can influence cost, revenue, and operational efficiency while maintaining a high level of care. We investigated this business problem from different levels of abstraction: the entire enterprise, department or line of business, and DRG level. For the analysis we are describing, we looked at one specific DRG: Medicare and Medicaid inpatients—a population the institution wanted to study. The case study analysis was performed by a cross-functional team consisting of one physician, several computer science researchers, and one IT project manager. The physician provided the clinical expertise required to analyze results; the computer science researchers applied tools and techniques they had developed; and the IT project manager provided expertise in the hospital’s patient accounting system. All team members helped formulate the business problem(s). At the DRG level we looked at DRGs that were the most and least profitable in the institution, solely for Medicare and Medicaid patients. We asked computer science researchers to apply the tools they had developed rather than asking business analysts to apply them. We did this because data analysis tools can perform sophisticated analyses, and their potential is enormous, but they can be difficult to apply to business problems with


158

Silver, Sakata, Su, Herman, Dolins, O’Shea

numerous, complicated, and interdependent factors. For example, when do you apply OLAP, an association rule tool? This was an important decision that led to our better understanding of how to apply these techniques appropriately and to find repeatable methods for analyzing data and finding business insights. The data mining tools and the OLAP tool were evaluated, and we attempted to use the strength of each tool. Both PRIM and WIS are capable of analyzing large numbers of records in a database. PRIM is an algorithm for solving global optimization problems. PRIM can process many dimensions simultaneously when finding the best region, that is, a subset of data points with a high average value for an output variable. A SQL statement or rule can represent this region. WIS is useful for finding patterns or associations between attributes. It does not find optimized regions. For WIS the patterns are represented by rules; each rule describes a region that consists of data points satisfying the rule’s conditions. The results of neither PRIM nor WIS are easy for users to evaluate. A user cannot easily look at a SQL statement describing a PRIM region and intuitively understand the differences between the “high average” region and the “outer” region. A user can look at a WIS rule or pattern and understand the attributes and values. However, some difference between the rule’s region and the outer region may be missed, which could offer an explanation of the meaning of the pattern. OLAP tools cannot discover high average regions or find new patterns in data. OLAP does allow users to drill down into detail, once a data area of focus is identified, and then lets the user visualize the result on various dimensions effectively. This means the tools can be used to complement one another. For example, PRIM finds an optimized region (a subset of data points), then OLAP can graphically display aggregated values for various dimensions for the region and points outside the region, that is, the outer region. For WIS, the algorithm finds an association rule (a subset of data points) by looking at all combinations of attributes; OLAP can then display data graphically for both the region and outer region. In WIS’s case the region is not an optimized region but a region made of records satisfying the criteria in the rule. We used PRIM and WIS to find regions. In essence we ran PRIM and WIS on the entire inpatient record set: inpatients in a department and inpatients with a specific DRG. We experimented with parameter settings so that the algorithms would run efficiently and find the most accurate results. For example, for PRIM we needed to select an alpha value, which controls how fast the algorithm finds a region, and beta value, which constrains the size of the region. Based on the data mining results, we used OLAP to compare data points in a region to data points outside a region. We could run an OLAP report on all dimensions, for example, on all input variables used in PRIM. Even new measures could be defined for the OLAP analysis, for example, “day of week of admission.” Based on the OLAP results, further OLAP reports can be run to drill down on interesting dimensions.


Case Study: How to Apply Data Mining Techniques

159

Case Study: DRG-Level Analysis We selected an unprofitable DRG that had approximately 426 inpatient visits during a one-year period; we only included Medicare and Medicaid patient visits. This total number of patient visits does not include a small number of inpatient visits for which payment was not yet received at the time of the study. PRIM was executed using “loss� as the output variable. Over fifteen inpatient attributes were used for input variables. After running PRIM, the region consisted of sixty-four inpatient visits. This makes up 15 percent of the inpatient visits. However, these visits made up more than half of the total losses associated with these inpatient visits. The average loss associated with the inpatient visits in the region was seven times larger than the inpatient visits outside the region, that is, the outer region. The average length of stay for the inpatient visits in the region was two times larger than the outer region. This is shown in Figure 1. After PRIM successfully found the high average region, we wanted to compare the high average region with the outer region. We wanted to know why these patient visits had greater losses.

Figure 1. DRG Analysis Using a Region Found by PRIM: Medicare and Medicaid Data for One Year Visit, Loss, Average Loss, Average Age, Average LOS (Measures Level) Visit

Loss

Average Loss

Average Age

Average LOS Box 0 Outer 0

Values Displayed Box 0 Outer 0

Visit

Loss

Average Loss

Average Age

Average LOS

64

$1,361,823.38

$21,278.49

67.0

26.8

362

$1,293,739.26

$3,573.87

74.2

12.5


160

Silver, Sakata, Su, Herman, Dolins, O’Shea

Numerous OLAP reports were run on the attributes, for example, financial class, marital status, and age. The report on financial class broke down the losses by the following categories: Medicare-Exempt Rhab/Psych/SNF, MedicareNonexempt, and Medicaid. Medicare-Exempt Rhab/Psych/SNF had large and comparable losses in the region and outside the region, but the average loss was significantly larger in the region—almost seven times larger. The report on age showed that inpatients between the ages of forty-six and sixty-four had significantly larger losses than the rest of the patients in the region. This is shown in Figure 2. They also had significantly larger losses than the patients outside the region. Based on these results, it was decided to better understand why patients between the ages of forty-six and sixty-four had poor financial performance. A follow-up OLAP analysis was performed that investigated admission source. The analysis revealed that for inpatient visits in the region with MedicareNonexempt, patients admitted via routine admission had the highest average loss; for inpatient visits in the region with Medicare-Exempt Rhab/Psych/SNF, patients admitted via routine admission had the most visits. We further investigated Medicare-Exempt Rhab/Psych/SNF inpatients with a routine admission source. For this subset of inpatient visits inside the region and outside the region, that is, patients between the ages of forty-six and sixty-four and who entered via routine admission and whose payer is Medicare-Exempt Rhab/Psych/SNF, we ran OLAP reports on admission day of week, icd-9 procedure, icd-9 diagnosis, and surgeon department. We discovered that on Tuesdays, the average loss is significantly greater for patients in the region. Patients in the region have two times the average loss of patients in the region admitted on days of the week other than Tuesday. The difference between patients in the region and patients outside the region is even more dramatic. Although the absolute number of patients is small, and differences may not be statistically significant, we believe that this approach will be useful for high-volume cases. Examination of “Tuesday’s admitting physicians” revealed that several of the physicians were in the same medical specialty. This specialty cared for patients that typically required a high level of service intensity over a long period of time. The identification of a subset of patients with disproportionately high costs has prompted the institution to reevaluate its admission criteria to this unit.

Advantages and Limitations of a Repeatable Methodology By using the strength of each tool, we were able to take advantage of the complex, sophisticated algorithms of the data mining techniques and then more easily visualize the results using OLAP. This is important for several reasons, which we discuss next.


Case Study: How to Apply Data Mining Techniques

161

Figure 2. DRG Analysis Using a Region Found by PRIM: Medicare and Medicaid Data for One Year Average Loss 100 90 80 70 60 50 40 30 20 10 0 46–64

19–45

65–120

Age Description and All Age Dimensions Box 0/Medicare-Exempt Rhab/Psych/SNF Box 0/Medicare-Nonexempt Outer 0/Medicaid Outer 0/Medicare-Exempt Rhab/Psych/SNF Outer 0/Medicare-Nonexempt

Values Displayed Average Loss

Box 0 Outer 0

19–45 46–64 65–120 Medicare-Exempt Rhab/Psych/SNF $18,543.05 $26,644.60 $17,785.52 Medicare-Nonexempt $63,989.42 $14,162.33 $4,013.35 $4,323.30 Medicaid $5,606.27 $3,481.69 Medicare-Exempt Rhab/Psych/SNF $3,877.73 Medicare-Nonexempt $929.36

In order to effect change we need to identify opportunities, that is, either large financial losses that can be prevented, large financial successes that can be identified, or moderate financial success that should be promoted. Some action(s) must be taken. Once these opportunities are identified and action(s) taken, then we need to be able to measure change over time—to rerun these tools in the same manner repeatedly over time so that we can measure and


162

Silver, Sakata, Su, Herman, Dolins, O’Shea

determine whether the action(s) taken have the desired effect. This is one reason that developing a repeatable methodology is important. Several case studies run at Rush-Presbyterian–St. Luke’s have resulted in senior management reviewing identified issues and opportunities. In several other studies, the findings provided additional support for actions being considered by the institution. For the case study described in this article, actions are being considered but have not yet been implemented. One observation that should have been anticipated but was not is the effect on the flow of the information with the introduction of this system. Historically, information has flowed unidirectionally from Physician to Medical Record Department to Billing or Physician to Patient to Cost Center Manager to Billing. With the introduction of this tool, we have already seen physicians providing valuable feedback to the medical records department regarding how specific patient services are coded. This has created additional opportunities and challenges for the organization. A second reason for developing a repeatable methodology is to be able to semi-automate the analyses—that is, to run thorough, critical evaluations of the entire institution, departments, or DRGs on demand. We believe we are developing a process that will allow managers who are not skilled in data mining techniques to view their business unit’s data in a format that allows them to use their domain expertise to ask additional questions or develop defensible arguments for change. This approach should be valid at the manager’s level, the Department of Medicine level, or the level of the charge nurse in a patient care area. A third reason for developing the repeatable methodology is so that we can easily apply these tools to other institutions and possibly other industries. However, the method still depends on human expertise to understand the OLAP reports, make inferences, run more reports, and take action(s); it is not fully automated.

Conclusion A cross-functional team was formed that included clinical, financial, and technical expertise. Business problems were formulated at different levels of abstraction in order to identify financial opportunities. We had a set of tools that we developed and bought, and we tried executing these tools in various combinations with one another. We eventually came up with a repeatable methodology. We discovered that we can apply PRIM and WIS to find optimized regions and patterns, respectively, by performing complex algorithmic steps. And then after running those tools, we could take a region and points outside the region and compare them, using OLAP, which allows us to compare the region and outer region attribute by attribute. In other words, standard OLAP reports can be run in which each dimension or attribute (payer, admission source, patient age, physician


Case Study: How to Apply Data Mining Techniques

163

perspectives, marital status, and so on) can be described. If any of the reports describing an attribute is interesting, then a set of follow-up, detailed OLAP reports can be run. We showed how to apply the methodology to an analysis of one DRG. We explained each step and the results of running each step; we illustrated how this method helped turn data into knowledge. Results were presented in a graphically appealing way that helped users determine how to use the information to make decisions. References 1. Schneider, P. “How Do You Measure Success?” Healthcare Informatics, 1998, 15(3), 45–56. 2. Rosenstein, A. “Inpatient Clinical Decision-Support Systems: Determining the ROI.” Healthcare Financial Management, Feb. 1999, pp. 51–55. 3. Peterson, T., Pinkelman, J., and Pfeiff, B. Microsoft OLAP Unleashed. Indianapolis, Ind.: SAMS/McMillian USA, 1999. 4. Brachman, R., Khabaza, T., Kloesgen, W., Piatetsky-Shapiro, G., and Simoudis, E. “Mining Business Databases.” Communications of the ACM, 1996, 39(11), 42–48. 5. Brin, S., Motwani, R., Ullman, J., and Tsur, S. “Dynamic Itemset Counting and Implication Rules for Market Basket Data.” Paper presented at the SIGMOD Conference, Tucson, Ariz., 1997. 6. Srivastava, A., and Singh, V. “Deriving Interpretable Rules for Financial Outliers in Rush Hospital Data.” Unpublished internal technical report, Hitachi America, Ltd. 7. Friedman, J., and Fisher, N. “Bump Hunting in High-Dimensional Data.” Statistics and Computing, 1999, 9(2), 123–143. 8. Scheese, R. “Data Warehousing as a Healthcare Business Solution.” Healthcare Financial Management, Feb. 1998, pp. 56–59. 9. Borok, L. “Data Mining: Sophisticated Forms of Managed Care Modeling Through Artificial Intelligence.” Journal of Health Care Finance, 1997, 23(3), 20–36. 10. Herr, W. “The Benefits of Data Integration: HFMA Study Findings.” Healthcare Financial Management, Sept. 1996, pp. 52–56.

About the Authors Michael Silver, MD, FACP, FCCP, FCCM, is associate professor of medicine at Rush Medical College, associate director of the Section of Pulmonary and Critical Care Medicine at Rush, and vice president of medical affairs at Oak Park Hospital. He is board certified in internal medicine, pulmonary medicine, and critical care medicine. Taiki Sakata has worked for Hitachi for eight years. He is currently a researcher at the Information Technology Laboratory at Hitachi America, Ltd. His technical interests are computer network architecture, data warehouses, and OLAP. Hua-Ching Su received a BS and MS in computer science and has worked on various software technology and research projects. She is currently a senior software engineer at the Information Technology Laboratory at Hitachi America, Ltd. Charles Herman is senior researcher at the Information Technology Laboratory at Hitachi America, Ltd.


164

Silver, Sakata, Su, Herman, Dolins, O’Shea

Steven B. Dolins received his BS in physics and MS in computer science from Tulane University, and his PhD in computer science from the University of Texas, Arlington. He has worked on semiconductor, military, and consumer packaged goods applications for fifteen years and is currently chief researcher at the Information Technology Laboratory at Hitachi America, Ltd. Michael J. O’Shea is an IS project manager for Rush-Presbyterian–St. Luke’s Medical Center. He has project management responsibility for the development and implementation of a data warehousing project, is responsible for the patient accounting system, and acts as liaison between the financial management group and the IS department.


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.