Connection 2012 Issue 2 Technology That Works
People Who Care
How Many is Enough? Determining the Ideal Number of Candidates Needed for Reliable Item Statistics Dr. Steve Nettles, AMP Sr. Vice President, Psychometrics Lily Chuang, MS, AMP Research Associate
I
t is one of the most common questions a psychometrician gets asked…“How many candidates do we really need to get good stats on our exam?” While there are many programs lucky enough to have thousands of people sitting for their examination each year, the reality is there are far more programs faced with limited candidate volumes. Therefore, determining the ideal number of candidates needed to generate “reliable” item statistics is a common concern as reliable statistics are a cornerstone for instant scoring, a major benefit to computer-based testing (CBT) candidates. In the past, we have been hesitant to provide a firm answer to this question because there have been few studies conducted to find the “magical number”. However, recent research by AMP staff may help provide small volume programs guidance on how many candidates it takes to get meaningful and reliable data.
using small numbers of candidates, one outlier can cause a huge deviation in the statistic’s value. Thus, when using the item performance data to make any decisions on an item, questions arise of how much we should rely on the item statistics and how confident we are with the statistics. We used data from a large credentialing group to conduct a research study which simulated calculating item statistics for various sizes of small groups of candidates to compare the observed item performance with data from the larger group of candidates.
Methodology
Introduction
The current study used data from an examination form with 140 scored items. Item statistics, p-value and rpb, were calculated for each item using a population of 1,525 candidates. These population values became our “gold standard” for comparison purposes.
Many certification programs with small numbers of candidates use classical item statistics to evaluate item performance. The two most commonly used statistics are item difficulty (p-value or the proportion of candidates answering an item correctly) and item discrimination (rpb or the correlation between candidates’ total test scores and if they answered items correctly). If the correlation is sufficiently high (e.g., greater than 0.20), the item is said to discriminate between high and low-scoring candidates – a desirable outcome. Although item performance information can be calculated using any number of candidates, when
To simulate small groups of candidates, we began by randomly selecting 10 candidates from the population of 1,525 candidates. Responses from those 10 candidates were used to calculate difficulty and discrimination indices for each of the 140 items. We ran 100 iterations of randomly selecting 10 candidates, calculated the item statistics, and then calculated the differences between the population values and the sample values. The absolute value of difference between each of these item statistics for the group of 10 candidates and item statistics from the population was then calculated. This procedure was repeated for groups of continued on page 2
Empower Your Chief Staff Officer (CSO)
3
www.goAMP.com
On the Road
4
Congratulations To Dr. Steve Nettles
4
Published by Applied Measurement Professionals, Inc. © 2012
““How Many Is Enough” continued from page 1
15, 20, 25, 50, 75, 100, 125 and 150 candidates, to simulate different certification programs with various sizes of candidate populations. Our criterion was to determine how many candidates were required to achieve an average error of 0.05 across the 100 iterations.
are typically adequate to generate a stable p-value
0.18 0.20 0.16
Findings – Item Difficulty (p-value) As one can expect, when the sample size is larger, the item statistics become more “stable” – that is they display smaller differences compared to item statistics calculated from the total population. Figure 1 shows that, for the group of 10 candidates, the differences ranged from a low of 0.03 to a high of 0.15, with an average of 0.11. The average difference drops to our criterion of 0.05 when 50 more candidates are included in the analysis. The average difference levels off to .03 when 100 or more candidates are used to calculate the p-value.
Difference
0.18 0.14 0.16 0.12
0.10 0.06 0.06 0.02
0.02
A larger group of candidates (150 or more) is necessary to generate a stable rpb.
Min 0.04
0.03
Mean
Max
0.03
0.03
Mean
0.05
0
0.04 0.03110 120 10 20 30 40 50 60 70 80 90 100 130 140 150 0.03 0.03 Number of Candidates
Figure0 1.10Differences p-values from 20 30 40of 50 60 70generated 80 90 100 110reduced 120 130 number 140 150 of candidates and from the population. Number of Candidates 0.40
0.35 0.40 Difference
0.35
Difference
0.30
0.30 0.25
0.25
0.21 0.18 0.25 0.25 0.16 0.15 0.20 0.15 0.10
Since the range of item discrimination (rpb) values is larger than the range of item difficulty (p-value), rpb differences among the reduced number of candidates and the population were expected to be greater than I to be less the p-value differences. That is, they tend Figure 2 stable with smaller groups of candidates. demonstrates a similar pattern as Figure 1. For the I group of 10 candidates, the differences ranged from a low of 0.16 to a high of 0.35, with an average of 0.25. The average difference (purple line) starts to level off with 50 candidates (mean difference 0.11), but even with 150 candidates, it is still 0.06.
0.05
Even if an examination form is carefully constructed, with only 10-25 candidates, item statistics are not particularly stable. A difficult item is likely to be
0.05
0.00
Findings – Item Discrimination (rpb)
Discussion
Max
0.09 0.07 0.06
0.08 0.04 0.04 0.00
Min
0.09 0.07 0.06 0.11
0.12 0.08
Additional Interesting Observation on P-Value The differences on p-value were negatively correlated with the p-value calculated from the population regardless of sample size. In other words, difficult items (low p-value) tend to have more variability on the item difficulty estimate no matter what candidate volume an exam program has. The correlation coefficient was approximately 0.77 for every reduced sample size group and is significant.
0.11
0.14 0.10
Difference
50 candidates
0.20
Min
0.20
0.10 0.05 0.00
0.21 0.18 0.16
Max 0.11
Min 0.09
0.07
0.07
0.11 0.09 0.07 0.07 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 0.06 Number of Candidates
Mean
Max 0.06 Mean
0.00 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 Number of Candidates
Figure 2. Differences of rpb values generated from reduced number of candidates and from the population.
difficult (having a low p-value), but little can be concluded from the discrimination index. When more precise estimates are needed in a decision-making process, at least 50 candidates are suggested for item difficulty statistics and 150 or more candidates for item discrimination. With smaller candidate groups, one can still use the item statistics, but acknowledging the amount of error in their interpretation is important. Over the years, AMP has been instrumental in helping both large and small credentialing organizations develop and maintain quality testing programs. We will use insights from this study to further support smaller programs which have the same need to ensure a reliable certification tool.
In the last eConnect newsletter, we began a series of articles exploring the challenges that associations face and must consider for the future of their organization. “Change” is a scary word to most associations as there is a constant struggle to remain true to traditions and goals of the organization, yet meet the demands of an ever-changing society. In their book “Race for Relevance: 5 Radical Changes for Associations”, Harrison Coerver and Mary Byers, CAE, identify five changes associations must make to ensure survival. The changes challenge the way we operate, plan, and view our associations. Embracing them will take courage, but ignoring them could be catastrophic.
Empower Your Chief Staff Officer
5 Radical Changes Associations Must Consider to Remain Relevant 1. Adopt a 5-member competencybased Board of Directors 2. Empower your CSO and focus on new staff skill sets 3. Rigorously define your member market 4. Rationalize programs and services 5. Build a robust technology framework
Dede Gish-Panjada, MBA, AMP Sr. Vice President, Management Services
B
oards often inject themselves into management decisions and opportunities and reject their true governing role, thus creating a muddled management structure where decision making and accountability becomes unclear. Why is this so common? Board members typically have more experience managing and less experience governing. Managing is independent decision making on short-term immediate issues, while governance requires consensus on bigger-picture issues. Not all managers have been trained in the art of reaching consensus. And in most cases, it is easier to “manage” than to govern. To implement this change, staff and executive leadership need to be allowed to perform their appropriate role in the organization with little or no micromanagement from the Board. The important thing to note with this radical change of empowering your CSO and staff is that roles and responsibilities remain the same as in the traditional structure; it just becomes more important that the roles are acknowledged and adhered to. The Board = governs by setting broad policy, goals and objectives, ensuring adequate resources, retaining CSO and guiding the organization in the best interests of those it serves. The CSO = runs association in ways to meet the Board-established objectives. Responsible for deciding what is to be done to accomplish the Board-identified goals and objectives, how it is to be done and who will do it. Why is empowering your CSO and focusing on new staff skill sets necessary? It increases the speed and efficiency of decision-making, capitalizing on human potential. With more efficient decision-making, volunteer leaders and staff have more time
to explore each other’s strengths which encourages effective working partnerships to naturally form. It encourages honest and straight forward communication between staff and volunteers. Trust builds more quickly, bringing about a new level of teamwork, and dissension is identified more readily. There is no room for lack of commitment or avoidance of accountability. Honest discussion and spirited debate invites clear consensus. By implementing this radical change in structure, it creates a committed, passionate staff driven to accomplish the association’s mission. This structure also requires a more skilled staff or association management company (AMC). What top-notch association executive or AMC wouldn’t want to work with such a Board? Talented staff and competent Boards encourage the best in each other.
The Trends Forcing This Change There is evidence that organizations are moving toward empowered staffs, but it is taking far too long. The trends driving the need for staff empowerment include: Increased Time Pressure – Time pressures on volunteer leaders significantly limit the ability to contribute to associations. When volunteer leaders do have time, it is compromised by personal and professional distractions and interruptions. The association or certification board of the future should not be run by “part-time” volunteers, but by a small, competency-based board and an empowered and skill-enhanced staff. Increased Organizational Complexity – Associations have become complex organizations with an expanded scope of programs, services and activities. Information systems are sophisticated. Communication vehicles are multifaceted. Organizational relationships have expanded. Financial and
legal structures are more complex. All of these factors create the need for increased management competency and require delegating responsibilities previously held by volunteers to staff professionals. Redefinition of Roles – Volunteers have a great variety of skills, but they are not typically association professionals nor are they full-time executives. Small, competency based boards should unleash staff potential while making the best and highest use of the volunteer “resource”. Similarly, association executives (empowered CSO and staff) understand their roles and the importance of optimizing association resources. So, what are the risks in not making this change? Boards hanging onto old roles may pay a costly price: poor decision making, delayed action, sub-optimization of staff and missed opportunities.
The Next Steps Implement the following steps to support an empowered staff model and encourage improved candor in the volunteer-staff relationship: 1. Institute a three-year strategic planning cycle. 2. Conduct an annual performance appraisal of the CSO. 3. Complete an annual Board self-evaluation process. 4. Develop performance evaluations and feedback surveys for distribution following each Board meeting.
In the Next eConnect Newsletter Radical Change #3: Define and Understand the Member Market of the Future
the
ROA D
CLEAR Mid-Year Business Meeting 01/10/2013 – 01/12/2013 Savannah, GA ABC Annual Conference 01/22/2013 – 01/25/2013 San Diego, CA 2013 FARB Forum 01/25/2013-01/27/2013 San Diego, CA ATP Innovations in Testing Conference 02/03/2013 – 02/06/2013 Fort Lauderdale, FL
We look forward to seeing you at one of these upcoming On the Road events!
Congratulations to Dr. Steve Nettles AMP
is honored to announce that Dr. Steve Nettles, AMP Senior Vice President of Psychometrics, has recently been awarded the Council on Licensure, Enforcement and Regulation (CLEAR) Service Award for Lifetime Achievement. This award is established to recognize an individual who has made an outstanding contribution of service and a significant commitment to CLEAR. Attending and presenting at CLEAR events since 1988, Dr. Nettles has served and contributed to several committees, often in a leadership role. Along with his committee contributions, Dr. Nettles has expressed his commitment to the regulatory community through his frequent presentations at CLEAR conferences including sessions on ethics in testing, alternative exam formats, exam security, trends in testing and other important issues in the licensure examination arena. CLEAR President, Bruce Matthews, presented Dr. Nettles with the prestigious award on September 7, 2012 at the CLEAR Annual Conference in San Francisco. “Dr. Nettles has an impressive resume in the measurement profession,” President Matthews commented. “We are fortunate that he has shared his knowledge and expertise with CLEAR, its volunteer committees and membership over the past 24 years. CLEAR is very pleased to honor him for his exceptional leadership, dedication, vision and creativity.”
Happy Holidays from AMP to you! In observance of holidays, the AMP offices, test sites and candidate call center will be closed November 22-24, December 24-26 and December 31-January 1.
Stay Connected
Visit www.goAMP.com and join our mailing list to receive the eConnect newsletter or sign up for RSS feeds for news and press releases.
For more information about any of our products or services, please contact the AMP Marketing department at 913.895.4600 or visit our website at www.goAMP.com.