Biostatistics Collaboration & Consulting Core (BCCC) newsletter Division of Biostatistics Department of Epidemiology and Public Health, University of Miami, Miller School of Medicine, Miami, FL SEPTEMBER –DECEMBER 2011
Highlights in This Issue
Message from the Director Shari Messinger, M.E., Ph.D.
Welcome to our first issue of the BCCC News‐ letter. As Director of the BCCC, and on behalf of the BCCC team, we wish you and your fam‐ Statistical Support in Grant ily a prosperous, happy, and healthy New Year. Development‐page 3 Sample Size Determination The BCCC would like to thank those of you who have so far engaged in statistical support ‐page 4 provided by our staff, and who have taken the time to attend any of our Biostatistics Clinics Statistical Quiz‐page 5 to date. We strive to live up to our mission Test your statistical knowledge statement, which is “To ensure that appropri‐ ‐page 5 ate statistical methodology is incorporated into research.” Statistical Software Programs ‐page 6 This resource was created with the intent of making statistical support available to the Uni‐ Statistical tips‐page 7 versity of Miami Research Community, and we Common mistakes in Statistics welcome you to visit our website or come to our offices to learn more about how we can ‐page 7 provide statistical support to your research. Please feel free to email us at The BCCC Newsletter will be mjrodriguez@biostat.med.miami.edu with any published in April, August & comments and/or suggestions either in gen‐ December eral, or regarding specific topics of interest for future biostatistics clinics and/or workshops. We are located in the Clinical Research Building 1120 N.W. 14th Street, We would love to use this feedback to im‐ 10th Floor (R‐669) prove our resource as well as to address the Miami, FL 33136 topics most desired by our colleagues in the UM research community. BCCC General Information ‐ page 2—3
Mission Statement To ensure that the appropriate statistical methodology is incorporated into research.
Upcoming Talks January 19th, 2012 Experimental Design Speaker: Dr. Robert Duncan February 9th, 2012 Hypothesis Testing Speaker: Fei Tang March 8th, 2012 Why You Need to Understand CONSORT When Planning a Clini‐ cal Trial Speaker: Dr. Daniel Feaster April 19th, 2012 Statistical Collaboration in Clinical Research Speaker: Dr. Shari Messinger May 10th, 2012 TBA Speaker: Dr. Huiliang Xie
HOW CAN WE SUPPORT YOUR RESEARCH? SUPPORT ACTIVITIES
LEADERSHIP and PERSONNEL Shari Messinger Cayetano, M.E., Ph.D. Associate Professor, and Director of BCCC Robert Duncan Ph.D. Professor, and Biostatistician Hua Li, M.D., Ph.D., M.S. Assistant Scientist, and Biostatistician Kaming Lo, M.P.H. Biostatistician Fei Tang, M.S. Biostatistician Maria Jimenez‐Rodriguez, M.A.L.S. Sr. Administrative Assistant to Dr. Shari Messinger, BCCC Administrator, and Editor of BCCC Newsletter All collaboration and consulting activities in‐ volve MS and Ph.D. level BCCC staff statisticians. The members of the BCCC cover a wide range of interests and statistical expertise and have consulting experience in a variety of subject matter areas. Visit our website: www.biostat.med. miami.edu/core
MANUSCRIPT and GRANT REVIEW The BCCC is available to review your manuscript or grant proposal with focus on the statistical considerations of the research. The intent of this support activity is to carefully re‐ view the documents and provide feedback and suggestions to the authors/investigative team regarding any statistical issues that may be of a concern. The fee for this support is based on 5 hours of dedicated BCCC time. If you are inter‐ ested in this support, or have any further ques‐ tions please email them to mjrodriguez@biostat.med.miami.edu. September—December 2011
1. Study Design 2. Randomization Schemes 3. Statistical Analysis Plan (SAP) 4. Sample Size Estimation or Power Analysis 5. Statistical Analysis 6. Manuscript Review 7. Abstract/Manuscript Preparation 8. Grant Preparation 9. Survey/Questionnaire Design 10.Protocol Review 11.Safety Committee 12.Grant Review 13. Other
FEES Fees for all support activities is based on the University approved hourly rate FY 2012: $105, for UM affiliates $152, for Non‐UM affiliates. All fees are based on UM policy B020 for Recharge, or Cost Centers.
BIOSTATISTICS WORKSHOPS The BCCC plans to develop and offer work‐ shops on various topics available to the UM research community in the upcoming months. The format for these workshops would be a presentation on a specific topic followed by group discussion as well as one‐on‐one meetings with biostatistics staff to answer specific questions. For example, one topic we are working on is Framing of Aims and Hypothesis in Grant Preparation. Other potential topics of interest include pilot studies, experi‐ mental design, and power and sample size calculations. This would begin with a presentation, and then be open for a roundtable type of discussion among participants with the biostatisticians. Then attendees can ask specific questions of the statisticians regarding grants proposals they may be currently developing and engage in one‐ on‐one consultation during this time. Fee for participation will be determined based on the preparation time, the time BCCC Newsletter
engaged in the workshop for the BCCC staff, and will be stated at the time the workshop is offered. If you are interested in potentially attending workshops of this type and/or have suggestions as to specific topics of interest you would like to see offered, please email them to mjrodriguez@biostat.med.miami.edu. These will be announced in the Clinical and Research listservs, e‐Update, and e‐ Veritas.
SUPPORT OPTIONS Short Term Support Activities Statistical support on a short term, per hour basis. Short term consultations work best when investigators have well defined questions with relatively small datasets. Fees are based on the esti‐ mated number of hours required for the specific support activities re‐ quested. Ongoing Collaboration Plan Dedicated support to a group engaging in a collaboration plan. Statistical sup‐ port under this structure is designed to ensure available support to an investi‐ gator or department/center that makes this agreement. Grants: Grant development can be as a short term activity or as part of a collabora‐ tion plan. Biostatisticians in the BCCC are available to participate in grant development in various ways including: ‐Assisting with the formation and operation of a proposal development team; ‐Assisting the investigators in refining study questions and measurement methods; ‐Developing study and experimental designs; ‐Writing statistical analysis plans; ‐Computing precision, power, and sample sizes necessary to achieve a given precision of estimation or a given power. ‐Provide ongoing support to funded research as line‐ itemed‐other support, on grant budget. 2
Biostatistics Clinics
Statistical Support in Grant Development
BCCC hosts monthly biostatistics clinics, which are presented to the research community and address specific statistical topics, ending with Q&A session. Scientists and biostatisticians are the speakers.
Shari Messinger, M.E., Ph.D.
BCCC Celebrated 1 Year Anniversary on September 16, 2011 Biostatistics Collaboration and Consulting Core (BCCC) offers a Range of Support Services for Research Community The BCCC, which celebrated its one‐year anniversary on September 16, has established a resource dedicated to assuring that the appro‐ priate use of statistical methodology is incorporated into research. BCCC personnel are available for collaboration at all stages of re‐ search, including but not limited to preparation of grants and con‐ tracts, study design, data analysis, and manuscript preparation. Obtaining biostatistical collaboration as early as possible in the de‐ velopment of a research project increases the quality of the re‐ search and the likelihood of success in obtaining extramural funds and meeting study objectives. BCCC offers short‐ and long‐term collaborative support activities, grant development, and quick consulting to faculty, staff, and stu‐ dents, and operates as a cost center. The University‐approved hourly rate is $105, and all fees are based on UM policy B020 for Recharge or Cost Centers. Additionally, BCCC continues to host its monthly biostatistics clinics, which are presented to the research community and address spe‐ cific statistical topics, ending with Q&A session. These clinics are in a “Bring Your Lunch and Learn” format and are usually held from 12 to 1 p.m., in the Clinical Research Building. Please watch for the announcements in Clinical and Research listservs, e‐Update, and e‐ Veritas. Dr. Shari Messinger, Associate Professor in the Division of Biostatis‐ tics, Department of Epidemiology and Public Health (DEPH) directs the BCCC. All collaboration and consulting activities involve M.S. and Ph.D.‐level BCCC staff statisticians. The members of the BCCC cover a wide range of interests and statistical expertise and have consulting experience in a variety of areas. If you are interested in learning more about the BCCC and its sup‐ port activities, please contact Maria Jimenez‐Rodriguez at mjrodriguez@biostat.med.miami.edu.
Informational: Probability and Statistical Notations Are you familiar with these? P(A) ‐ probability of event A P(A ∩B) – probability of events intersection, “A and B” P(A B) – probability of events union, “A or B” P(A | B) – conditional probability function, “A given B” f (x) – probability density function (pdf) September—December 2011
The strongest grants are those that have biostatistical collabo‐ ration from the beginning of the design process. Generally, grants are more likely to be funded when there are biostatisti‐ cians participating in the development of the proposal. Why is this? 1. The research design must be developed to accommodate the scientific questions of interest. The design must be as efficient as possible in order to improve power and increase efficiency (and potentially save money). 2. Each hypothesis proposed under each aim of the investiga‐ tion must have a thought out and described plan for analysis in order to appropriately asess them. 3. In writing a grant proposal, you are essentially asking a funding agency for money. You must demonstrate that scien‐ tifically appropriate methods are in place for analyzing the data collected with the funds provided. You also have to demonstrate adequate statistical power to detect the primary questions of interest. Statistical power can be described with an analogy to a microscope. You have a given amount of power in your lens and with that there is a limit to how small of an object you can see. In order to see smaller objects, you can increase the power of your lens. Sta‐ tistical power is like that, but the “object” you want to see are the answers to the question of interest in the investigation. It may be an effect of a treatment, or a difference between groups. The smaller it is, the more power you need to see it. There are many approaches to increasing power. One isto increase the sample size or amount of information we use to estimate the effect. Another is to have a more efficient study design that minimizes variability, or other noise, in the data and allows you to see the effect clearly. A third is to use sta‐ tistical methods of data analysis that are more efficient, and more precisely estimate the effect of interest. Statistical collaboration is essential for providing support in these critical areas of grant development. Since the likelihood of funding increases, with adequate statistical considerations, it is a worthwhile investment. More importantly, it increases the quality of research. It is very disappointing to spend many years invested in research only to find out that it had a biased design or was not appropriately powered. Additionally, pro‐ vision of statistical support throughout the study duration is also critical in demonstrating the availability of resources to carry out the described investigation and corresponding analysis, and making statistical inference about the popula‐ tions of interest.
BCCC Newsletter
3
Three Factors in Sample Size Determination Kaming Lo, M.P.H.
Sample size is often a critical aspect when designing a study. It is important to understand what are the factors that will go into sample size calculation. Effect Size
Type I Error Rate
It is the actual difference that an investigator is interested in detecting according to the hypothesis. The larger the size, the easier it can be detected, and the less sample size is needed.
Increasing the type I error rate means allowing a larger p-value to be considered significant result in the analysis. Smaller sample size will then be enough to detect a difference.
Variability
Making adjustment on some of these factors can be helpful in reducing the required sample size, but one should keep in mind that the adjustment must still be valid and clinically meaningful .
How spread the data is affect the ability to detect a difference. If the variability is small, estimates are more precise and would bring down the minimum sample size required. September—December 2011
BCCC Newsletter
4
Alternative hypothesis
Statistical Quiz Hua Li, M.D., Ph.D., M.S. 1. Each flip of a coin results in Heads or Tails. How many possible outcomes would there be if three coins were tossed at once? a)2; b)4; c)6; d)8 2. Confidence intervals… a) are a new form of spring training; b) Are calculated routinely by most stats packages; c) Define the likely ranges of population values; d) Are inferior to p values for indicating magnitude of outcomes.
The alternative hypothesis, denoted by H1 or Ha, is the hypothesis that sample observations are influenced by some non-random cause. For example, suppose we wanted to determine whether a coin was fair and balanced. A null hypothesis might be that it is equally likely that each flip results in Heads and Tails. The alternative hypothesis might be that the chance that the flip will result in Heads is different than the chance that it will result in tails. Symbolically, these hypotheses would be expressed as H0: P = 0.5 Ha: P ≠ 0.5
3) The interquartile range IQR is found by subtracting the mean from the maximum value of a data set.
Test Your Statistical Knowledge Dr. Hua Li, M.D., Ph.D., M.S.
Suppose we flipped the coin 50 times, resulting in 40 Heads and 10 Tails. Given this result, we would be inclined to reject the null hypothesis. We would conclude, based on the evidence, that the coin was probably not fair and balanced. After stating the hypothesis, the researcher designs the study. The researcher selects the correct statistical test, chooses an appropriate level of significance, and formulates a plan for conducting the study.
Statistical test vs. Test value
A statistical hypothesis is a statement about a population parameter. This statement may or may not be true. The best way to reach a judgement about whether a statistical hypothesis is true would be to examine the entire population. Since that is often impractical, researchers typically examine a random sample from the population. If sample data are not consistent with the statistical hypothesis, the hypothesis is rejected. There are two types of statistical hypotheses for each situation: the null hypothesis and the alternative hypothesis.
Null hypothesis
A statistical test uses the data obtained from a sample to make a decision about whether the null hypothesis should be rejected. The numerical value obtained from a statistical test is called the test value.
Need Statistical Support for your Research?
Send Us Your Request What information would you be interested to read in the BCCC newsletter? Send it to mjrodriguez@biostat.med.miami.edu.
The null hypothesis, denoted by H0, is usually the hypothesis that sample observations result purely from chance. September—December 2011
BCCC Newsletter
Answers to Quiz on page 7
Statistical hypothesis
5
Statistical Software Programs
Statistical Tips
SAS SAS is not free for download. A license can be purchased from Medical Information Technology. SAS is among one of the top choices by statistician for its excellent breadth of functionality, great reliability, and powerful data management and manipulation ability. Its command-based interface gives data management efficiency and flexibility, although it also makes SAS less user friendly compared to other software that integrate a graphical user interface, such as SPSS.
1.
Try to have group sizes equal if possible. Otherwise, try to have group sizes as nearly equal as possible. Equal group sizes yields maximum statistical power.
2.
If the study protocol specifies times of observation, make every effort to make observations at protocol times. Unequal spacing and number of times limits analyses and inferences.
3.
When starting a study check closely the first few observations to be sure the processes described in the protocol are working correctly and the data look approximately as expected. If there are problems, revise the processes and discard the early data.
4.
If the study requires collecting follow-up data on subjects, make sure you have mechanisms and resources to track subjects and get the needed data.
5.
When developing a study plan, always develop mock data tables and analysis plans to be sure that study hypotheses can be properly tested by the data collected.
Fei Tang, M.S.
Robert Duncan, Ph.D.
SPSS SPSS (Statistical Package for the Social Science) can be download from the university website: https:// it3.med.miami.edu/download/. Its graphical user interface makes it extremely user friendly, but less flexible than SAS or R. It can also be programmed with command syntax language. SPlus SPlus is not free for download. A license can be purchased from Medical Information Technology. It features advanced graphics and therefore is excellent for visualizing data. In SPlus, the data visualizations and statistical modeling can be done via either point-'n -click and dialog box, or via writing S programs. Its point-'n-click and dialog box user interface makes it user friendly.
Common Mistakes in Statistics 1.
Not testing data for non-normality and using Normal Theory tests (e.g., Student t, ANOVA, etc.) indiscriminately. Non-normality usually exhibits itself in two ways, skewed histograms and correlation between sample means and sample standard deviations. If data are not distributed normally with approximately equal standard deviations then either distribution free tests should be used or the data should be transformed to approximate normality.
R R is free and can be downloaded from the R home page. R is an implementation of the S programming language and is therefore similar to SPlus, although it does not have the graphical user interface and some functions that SPlus has. R is great for visualization, especially for exploratory analysis.
2.
Improper “adjustment” of data for control or standard values. Adjustments such as “percent of control” or as “percent change from baseline” are often not appropriate and lead to errors of inference. Always consult a biostatistician about data adjustments
3.
Using Multiple Regression techniques (e.g., Multiple Regression, ANCOVA, Logistic Regression, Cox Proportional Hazards Models) without checking for linearity or parallelism (Interaction). If regressions within groups are not parallel across groups then adjustment by multiple regression is not possible.
4.
Analyzing “paired” or “matched” or “repeated measures” data as independent samples. The problem with this is not whether the repeated measures data are correlated, and hence are “independent”, but is rather with the sample size used to compute error degrees of freedom. In a paired Student t test, the error degrees of freedom are equal to the number of pairs minus one. If treated as independent samples the degrees of freedom for error would be the total number of observations (2n) minus two, which is n-1 greater than that for the paired analysis. This can possibly lead to a much smaller critical value for the test.
R-commander R-commander was developed as an easy to use graphical user interface for R. It is considered the most viable R-alternatives to commercial statistical packages like SPSS (Wikipedia). Stata Stata is not free for download. A license can be purchased from Medical Information Technology. Stata is designed as a general purpose statistical package. Although it is relatively small, Stata can perform most of the major tests that SPSS and SAS can perform and has a powerful built-in graphing capability. September—December 2011
Robert Duncan, Ph.D.
BCCC Newsletter
6
Biostatistics Core : BCCC Biostat Clinics - 2011 2-10-2011 - Biostatistical Collaboration in Clinical and Translational Research, by Dr. Shari Messinger, Associate Professor and Director of the Biostatistics Collaboration and Consulting Core, Division of Biostatistics, Department of Epidemiology and Public Health. This talk will describe the roles and responsibilities of Biostatisticians collaborating in Clinical and Translational Research. We will describe how effective Biostatistical Collaboration throughout all stages of an investigation can facilitate and improve the quality of research. We will additionally address specific expectations that investigators should have of Biostatisticians, as well as expectations Biostatisticians have of investigators in order to be most effective in their research collaborations. 3-10-2011 - Statistics 101, by Dr. Hua Li, Assistant Scientist and Biostatistician, Biostatistics Collaboration and Consulting Core, Division of Biostatistics, Department of Epidemiology and Public Health. This talk will generally review basic statistics used for continuous and categorical variables. Participants will acquire knowledge in the following topics: collecting, organizing, analyzing data and presenting results; measures of central tendency and dispersion; confidence interval estimation; hypothesis testing; non-parametric tests and sample size calculation on medical data. This is about statistical practice rather than presenting theory and methods as they appear in standard textbooks. 4-14-2011 - Sample Size and Power Considerations in Clinical and Translational Research, by Kaming Lo, Biostatistician, Biostatistics Collaboration and Consulting Core, Division of Biostatistics, Department of Epidemiology and Public Health. Sample size issues are the most common reason why Clinical and Translational Investigators initially request statistical support. This presentation will address why sample size calculation is important in research and how sample size, variability, and effect size affect power (an important term that is closely related to sample size). We will discuss the approaches to some common designs in clinical and translational studies. In addition, we will describe the benefits of collaboration with a statistician in determining the sample size as part of the design phase of an investigation, as well as what an investigator should prepare before the meeting with a statistician to obtain the most reliable results. 5 -11-2011 - Experimental Design, by Dr. Robert Duncan, Professor, Division of Biostatistics, Department of Epidemiology and Public Health. This presentation will address the different types of designs, which include the completely randomized design, the N-way cross classification design, and nested designs. Key points will be discussed in reference to unbalanced data. The difference between experiments and designs will be discussed with respect to factorial and repeated measures experiments and longitudinal studies. Randomization will be presented in terms of the population of inference, unconstrained, and constrained randomization (stratification, matching, etc.). Furthermore this presentation will discuss statistical analysis plans, including the analysis of design variables only, the inclusion of concomitant variables, and the analysis of covariance.
9-13-2011 - Out of Sight, Not Out of Mind: Missing Data, by Dr. Tulay Koru-Sengul, Assistant Professor, Division of Biostatistics, Department of Epidemiology and Public Health. Researchers are frequently faced with the problem of analyzing data with missing values. Missing values are practically unavoidable in studies especially in medicine and biomedical sciences. Incomplete datasets make the statistical analyses very difficult. In this talk, I will discuss the missingdata problem, different patterns of missingness, missing data mechanisms, implications of missing values for data analysis and interpretation. Various simple and advanced statistical methodologies for handling missing data will be reviewed by focusing on their advantages and disadvantages. 10-18-2011 - Conducting Implementation Research with Rigorous Randomized Trials, by C. Hendricks Brown , Professor, Epidemiology and Public Health, Director, Center for Prevention Implementation Methodology for Drug Abuse and Sexual Risk Behavior, Director, Social Systems Informatics, Center for Computational Science. Implementation research involves "the use of strategies to adopt and integrate evidence-based health interventions and change practice patterns within specific settings“ (Chambers, 2008). These implementation strategies are major elements in the translation of research findings to practice. The field of implementation science is just beginning to be formed, and we are now beginning to frame the research questions and methodologies that will lay the foundation of this work. The goal of the newly funded Center for Prevention Implementation Methodology (Ce-PIM) is to provide methodology for measuring, modeling, and testing of implementation strategies, concentrating on evidencebased programs that have affected drug abuse or HIV sexual risk behavior. We present a frame for conducting implementation research and discuss how randomized "rollout" trials can be conducted to evaluate implementation strategies. These methods are illustrated using a 53 county randomized implementation trial involving the implementation of an evidence-based program in foster care. Distinctions between implementation trials and efficacy or effectiveness trials are provided as well. 11-14-2011 - Introduction to Structural Equation Modeling (SEM), by Dr. Maria Llabre, Professor, Department of Psychology. The objective of this presentation is to introduce participants to the framework of structural equation modeling (SEM). These methods involve specifying, estimating, and testing explicit theory-based models of relations among variables. The two main components of SEM are structural models, of which path analysis is a special case, and measurement models illustrated by confirmatory factor analysis. Both components will be covered in the presentation, as well as hybrid models that combine the two. Examples will be presented to illustrate applications to problems of measurement reliability and instances of planned missing data.
Answers to quiz from page 4: 1. D; 2. C; 3. B September—December 2011
BCCC Newsletter
7