Choosingand Using an Assessment Tool ROBERTL. KANE
Assessmenthas become a central technology in the care of older persons. [n the context of Rosemary Stevens' analysis of medical specialties, it is this technology that gives geriatrics a claim to specialty status (Stevens, 1971). At the same rime, assessmentis far from the exclusive purview of physicians or nurses who work with older persons. It is safe to argue that, at any age and in a variery of contexts (medical and social services), systematic assessment is preferred over haphazard practice. Despite the banality of such a simple statement, traditional care is not systematic. For many cases the lack of a systematic approach may not be critical (although few would rise to defend it), but in the care of and delivery of service to older persons, where presenting problems are often complex and multidimensional,
, Assessmentof older persons is more 4 l-
icated than that of younger persons
there is often more to assess. reasthe medicalproblemsof younger )nsarelikely to be confinedto a single
organ system, older people typically have several simultaneous chronic problems. Moreover, these problems can be exacerbated by difficulties in other spheres, such as psychological and social.o The growing body of knowledge demonstrates what should be intuitively clear, namely, the greater effectivenessof systematic assessmentsover routine care (see Chapter 13). The positive results of systematic assessment programs created a groundswell of enthusiasm for assessing older people in both the clinical and social service contexts. A major vehicle for such assessmentshas been case management. The underlying premise of case management is that'persons with complex care needs will benefit from a systematic approach to identifying the various problems and their care implications and to continuing oversight to ensure that the problems are being systematically and adequately addressed. UnfortunatelS case management has emerged as the modern cure-all for long-term care dilemmas. Assessment *It would be incorrect to suggest that younger persons do not exhibit multisystem problems. For example, developmentally disabled children may present equally complex management challenges.
ASSESSINGOLDER PERSONS
out.probis seen as the basis for sorting
are In the clinical context' assessments
a caseor to performedeitherto diagnose aptheii tI tiitn" lems and assigning of care'Diagnosis easv' ;;;;;; ;it effectiveness propriateplace.If onlv it werethat valueof diagThe end in itself' ;r;;i i, *."ni'ti ra."r*Trt .ittir "" Assessment " c h a r a c t e r i s t i c s . T r a n-"n s l a t i n g a s s e s s;;;;;.ent m e n t s ' *or " r "(when ' i n i t sthere a b i l iis t yno t o ieffective mplyacourse.of than r'dr. art arr alr -or. more still is actions will at into " ;;;;;;ent) to offer a prognosisthat science. the patient to make appropriaging i."rt The demographicshift roward an "uo* to deal with what is expected' tremen- ;;;;i;"t The population has also produced a Screeningis a variant of diagnosis' to*^.d the catedous spurt of ,.r."r.h"il;J definitively to not is As-- goal of slreening delivery of serviceslr, .ra.. adults. down the field of phaseof io'i'"' but to- narrow sessmentsare necessaryat each to be more sysissues peopleor -ofbaselinestates. iotential this research.They deiine screenlngls Most iematically evaluated' and they persons; They measur. our.oi-,-.r ."r., to begin performed 1l ' "tttPtomatic one are usedto adlustto' iifft"ntes that i"ntt' experts have suggested c;J"a canwith, known as case;i;;;i;;;;"o that should not screenfor problems measureswill i.p;;;-''J;.h--r.r."r.h. K n o w i n g h o w t o . h o o , . t h e r i g h t m e a - , "tty , o tttuttt b . t r et:tt^t"lt* a t e d b e ccan a u salso' e o nhowever' e c a n r a be iseanxiis the surcs and how to ";; 'ht; *isJly evaluation part of' the diagnostic tt'ed in this book' addressed "' challenge definitivediagnostictest dtu"loptd fo' 9t*"*,w1t:^*:screening Somemeasu"' h;';;;n can be usedto winis expensive, pur,"r."rih fn, others and use clinical
ro::T:jffiTjtx po,., f. * ffi.1,'Jl,'Hrirxi'i:;.*ru; of one purpose can be lo*-power lens to define the field ur. cauand. knowledgeably act must " the user t i o u s l y . T h e b u r d e n o f p r o o f r e s t s w i t h . t h e v i s i olens' nbeforeapplyingthehighmagnification on the basisof atuser.Choosing -t"'u"' Assessmentsare often used to determine claims tractive packaging ;;';;;gant eligibility- for services' These assessments in the on the package ir";.-;;;;.d-ous those on arJ usually set up to distinguish the r..t o-f As world of assessmentas it is i., threshold' arbitrary an of .ith.. tid. insight life. User training, competence,and a r e n e c e s s a r y . A f e w b a s i c q u e s t i o n r . ^ r , s u c h , t h wishing e y a r e - to e ssee p e cpeople i a l l y sadmined usceptibleto Li"t.'fftt* to help neophyte measurem.nt ionru-"r, enough find to tettd to the program-will doing and line' think through what they are the over applicant the p"uth poinrt io ,ui,.a ,l what kinds or ^.uru..r-lr. t"r,
goals. their
drL 5urs)
#;'ffi,'|i'o.,,.Xtt'"';:i'::,:iT'i'iliJi# way to
the threshold for eligibility' One ,n.r. artifactsi' tt' -uut from a di"utia A s s e s s m e n t i s e m p l o y e d f o r d i f f e r e n t p u r - cthe h o tamount o m o u sofs iservices t u a t i omoves ntoam oregradual from one' vary .lf poses.The criteria for the assessment which in one to an all-or-nonesituation t"n bt u"d with the purpose'A"t"t"nt of serviceis matchedto the exitr. ,pe.ifil "-o"n, to decide on treatment, either u_ need,a broaderdistributionof asi.., .tll "r resultsshould follow' For examtherapyor the bestplaceto carefo, " sessment ent. lt may serve lt. buri, for determin", i n g e l i g i b i l i t y f o r c a r e - e i t h e r a n y t u " o ' p l e f."tit11ental ' e l i g i b i l i t ycriteria: f o r M e dpoverty i c a i d i (a s bcomasedon ft -"y i*" varioustypesand "-ottt"' of care' disabiland binationof i"comeand assets) the in be the basisfor an evaluationabout l;.-or.. thelatter has beenestablished ,0..,[l of a program or a specr--effectiveness predetermined terms of h"ulng '"uln"a " servrce. WHY ASSESS?
TOOL CHOOSING AND USING AN ASSESSMENT
rts are :orto gnosis 'diagrrse of ective vill at ropriected. The cateld of I SYS-
rg is tons; one canrnxir, be tlon test vinrust Lght lof icarne nts on As to ed gh ry w .o irl n h
level' the inoften arbitrarily defined) achieved has f""i'ili he or she .adedestitution)is eligiblefor a lr".it "f Medicaid would work ]-oi ,"rui..t. if it allowed a more flex[*t.'itff.*ntly betweenneedsand eligi'riiiit ti;-tJ;;r"ship *tt.t.by clientscould becomemore amountsof care J"rif" .lt*iUlt for different Anotherstrategy aro..aittg on theirneeds' performing person the iepa,ating ;;ii; l"t determining of act the from ih. ,rr.tt-.nt that the riigtUifity'but it is naiveto assume imparof state a maintain can ^rr"arro, such of consequences the tidity, f..t*ing iudgments. ' is usedin researchto deterA"ssessment sucmine if the effect of an intervention is a of relationship the ..srful or to study The attributes' Dhenomenon with other f.tt., th. measure' the greater the signal to noise ratio. In other words, good measures enhance researchers' abilities to detect differencesas long as they are measuring the right things. The challenge lies in dweloping-ofmeasuresthat can tap the phespecific interest without also nomentn bringing in extraneous information'
WHAT TO ASSESS? The targets of assessmentboil down to risk factors and outcomes of various interventions. Risk factors can be positive or negative. Negative risk factors include all thJse elemerrts that might be associated with an increased likelihood of developing an undesired condition. Positive risk factors. in contrast' constitute the array ot characteristicsthat might have a protective effect. These are sometimes referred to as strengths. Given the large numbers of positive a'nd negative potential risk factors, the question oi*hat io ttt.tt is not simple' to good assessmentls uslng The key '.or,..f,tu"l model' This model ,,rong " of inshould i"dentifyihe specific attributes only terest. The model should include not related iit ,ittiUutts of the client but also factors, such as the physical environment enand informal support. Too often one
counters professionals who seem to begin with the measure and worked backward; because a measure seems to address a topic of interest, it is deemed appropriate for use. Choosing a measure' however, is no easierthan making any other purchase when alternative products abound' One must have in mind the use of the measure and the likely circumstances of its use' A oortable barbecue grill may be ideal for ia*ping, but it would be a poor choice of for daily cooking' Similarly, "ppli"tt.. .n."ru."t are best used to address specific spectrums of clients. Some are best suited to distinguish the sick from the well; others do better at discriminating among degreesof illnessor.disability' One must examine the measuremenl; o r o d u c t c l o s e l y .R e l y i n g o n a n a m e o r a ,eputation alone is dangerous' One would ,rot b.t on a horse simply because it is called "Speedy," but many people seem tc) accept a scaleas measuringa concept simply be."or. it has a name that evokes that .ona.pr. Most scales are, in fact, createcl after the fact. Responsesto items are statistically manipulated to see how they agg..grt.. The aggregations are reviewed, ,nJth. extent to which they are judged to tap a given domain is determined' Finally' n"-" is applied to the dimension that is " believed to h"u. been addressed' It might be better to pass a rule forbidding the naming of scales and requiring that- they be cited only by number' Life would be less colorful, but no one would inadvertently attribute meaning to these scales' Factor analysis and its variations are not inherently wrong-nor are their products invalid. The danger lies in believing too blindly that they measure the trait that their names suggest. In the end, they are simply the weighted responsesto a series of questions. The investigator must examine the questions and decide if they tap the domain that is being exPlored' The question of what to assessshould be extended to include whom to assess' Here too the answer depends on the context and the purpose. The primary target
ASSESSINGOLDER PERSONS
to mieht be helpful to assesstheir status shouldalwaysbe the client' for assessment ;;."td. if thai step is necessaryor whether but a comprehensiveassessmentof the rotn. o,ft.t forrn of care might be feasible client may not always be possiblt' In and preferred. ,o*. ."r", the client may not be able to are In a researchcontext, assessments communicateeffectively(seeChapter 17)' inan often made at baseline(i'e', before Although some older people -t"I !t .too tervention)and at specificfollow-up times conf.rrJdto respondadequately,it is danafter the intervention.Someanalysesfocus gerousto decidetoo quicklythat dementia on the statusat follow up, whereasothers or.uantt an older personfrom expresslng examinethe changefrom baselineto folvalid feelings. This prematurelydisenlow up. The decisionabout when to cololder people'In somecasesthe franchises lect follo*-op information dependson the but client'sperceptionsmay be important expectedclinical course' The follow-up not altogethertrustworthy' For example' to capturemeantimesshouldbe designed older pe*oplemay claim that they have should correand ingful changesin status more social support than is actuallyavail,o-ond to when most PeoPlecan be exto assessthe able. It may prove necessary (i'e'' oectedto havereachedthat transition can availability of caregiversdirectly' It the shown leasthalf shouldhave *h.n to also be pertinent for the assessment ", of treatment).A common timing benefits p.obe why the older pâ‚Źlso1 claims more error is to compare treatmentsby timing iuppor, tiran is available' Perhapsdeepfrom the end of the treatfor a particular lifestyle the follow up r.i"a preferences discharge)insteadof (e.g., hospital ment combined with realistic and unrealistic part of the treatment If initiation. from its the providers'suggestions f."., -are this effectwill be duration, its is "bout for the exaggeration'Preferences difference reason a strategy' such can also obscuredbY and valucs,as Chapter8 suggests, as this Also, be the subjectof assessment' WHO SHOULD ASSESS? assessan of example shows, the validity the use ment^may be a function of the Some Many peopleconduct assessments' older person expects to be made of it' othand professionals, are higirly trained Older personsmay deny problemsor.may trainformal little with lay persons .., have becomeinured to them' Direct obser"r.In gerrer"l, the more structured the ing. informed vation and reports from other the assessor the lessspecialized sourcesmay be helpful in confirmingolder "rr.rrrn.rrt, professional cases some ln needs to be. persons'self-rePorts. training is necessaryto make informed inferences,but in many instancesproWHEN TO ASSESS? fessional training can interfere with assessmentaccuracy.Professionalsmay be are performed in reMost assessments inclinedto deviatefrom the protocol, presponseto a precipitatingevent,aswhen an ferring their insightsto the stricturesof the id., prrron or an older Person'sfamilY standardizedaPProach. asks for care or help or appliesfor beneIt is important to match the measure however,are done fits. Someassessmentsr are and the measurer'Some assessments to uncover problems not yet percelveo' J.rigr,.d to be applied by highly trained These screeningsare based on the belief individualswho understandthe limitations that earlier detectioncan lead to more elof the tools and the conditions under fective actions' Subsequentassessments which they should be applied' Some ascan also be done to monitor the effectsof require g.nt'oui application,of treatment.Other triggersfor assessment sessments Others are intendedfor expert iudgment' "*id. may be any changein status'For example' range of assessors'Still tt. Uy' u beforepeopleare sent to nursinghomesit
( ( t I
( I I I
( I I I
I I I I
CHOOSING AND USING AN ASSESSMENTTOOL
r sratus to or whether be feasible iments are bre an inv-up times yses focus eas others ne to fol:n to colds on the follow-up rre meanrld corren be extion (i.e., rown the n timing y trming he treatstead of eatment r will be
. Some rd othI trained the ssessor ssional 'ormed i proth asay be l, preof the rasure s are ained tions rnder 3 as'n of I for Still
line effectsof a treatmentor the burdenof an illness.They usually include such constructs as functioning,pain, psychological distress,and well-being.Often when they touch on multiple dimensionsthey are said to addressquality of life. Genericmeasures provide both a measureof the ultimate effect and a basis for comparing across problems. In determining the cost effectivenessof alternativetreatmentstrategies (i.e.,whetherto mount a campaignto addressone problemor another),it is important to have a goal or outcomeby which of the to measurethe relativeeffectiveness differentefforts. The inclusivenatureof genericmeasures also leadsto their weakness.Becausethey are designedto cover a broad spectrum' they may not measureperformancealong all parts of that spectrum equally well. These limitations are referred to as floor and ceilingeffects.For example,the SF-35 (Wareet al., 1993)may be usefulin distinguishing sick from well persons,but it is less able to distinguishthe sick from the very sick. Genericmeasuresdo not capture the upper or lower extremes of many problems well. They may addresssome GENERIC AND CONDMON-SPECIFIC domains better than others.They are less MEASURES able to detectdifferencesin specificareas than are measuresdesignedfor a specific into two canbeclassified Outcomemeasures basicgroups:genericand condition specific. problem. Becausethey are designedto addressa Generic measuresare designedto apply across populations, whereas condition- particular clinical problem, conditionspecificmeasuresare more apt to be able are usedin a limited clinspecificmeasures ical context only. Each has its advantages to detectclinically meaningfuldifferences, but they are not usefulto compareresults as summarizedin Table and disadvantages, across problems. Condition-specificmea1.1. Genericmeasuresaddressthe bottom othersare designedfor self-administration' can use them, alOlde, p.ople themselves assistancewith need though ,o-. -"y readingand interPretation' Somitimesprofessionalsfrom different disciplineswant a role in the assessment are vâ‚Źry exprocess,but team assessments pensive.Often a better alternativeis to through achievedisciplinaryrepresentation the creation of a questionnaireor protocol. but to use only one or fwo professionalswho are trained specificallyin the useand administrationof the instruments to collect the information. Such an approachcan ensurethat the richnessof each withdiscipline'sperspectiveis represented present at to be having each of cost the out encounter. every can For clinicalpurposes,the assessment preliminary screen' a be organized as where trigger questionscan identify the The need for more detailed assessments. more done by be can latter assessments specializedpersonnel, with the results broughtback to interdisciplinaryteamsfor incorporationin a plan of care.
Table1.1. RelativeStrengthsand Limitationsof Genericand condition-Specific Measures Generic
Condition speci6c
Providesa SummaryMeasure (e.g.,qualiryof life)
Usuallybasedon Signsand Symptoms(but can addressqualiry of life as well)
Addressbottom line effectsof a problem
Addressa specificdiagnosisor a specific condition Highlight specificeffectsof a problem
fl oor/ceiling limitations Possess
Measureclinically meaningfuI changes
Enablecomparisonsacrossdiagnoses
ASSESSINGOLDER PERSONS
suresare designedto detectthe specialeffectsof a given problem on the individual' For example,in the caseof diabetes,the outcomeswould addressthe specificcomplicationsof that disease(e.g., vascular problems,poor eyesight,and sores)'Alto thinking of though on. it accustomed qualiry of life measuresas generic,it is also ieasible to create condition-specific quality of life measuresthat refer directly to the complications associatedwith a givencondition(e.g.,difficultiesassociated with managinga diabeticregimen)' At the sametime, one might arguethat condition-specificmeasuresmay be too sensitive.For example,finding a difference in the range of motion of a ioint may be trivial if ii is unrelatedto any functional effect. Becausegeneric and conditionsoecificmeasurescomplementeach other, tLey are often best used in combination' Each can provide an aspectthat the other does not address.The challengelies in avoidingduplication usingthem efficiently, whereverpossible. INFORMATION SOURCES
other clients are unable to respond to ouestions.In theseinstancesa proxy re,pond.nt may be used or another method oi dr,t collectionemployed'Although the issuessurroundingthe use of proxies and selectionof appropriate proxies are discussedin greater detail in Chapter 77, a few commentsare pertinent here' Proxies shouldbe usedonly when thereis convincing evidencethat the respondent'sreports ar! unreliable.Somepeoplewith cognitive deficitscan still tell us a greatdeal about what they like and do not like, for example. The use of fixed strategiesto-deterLine when a personis too confusedto be a reliablerespondenthas scientificappeal, but, becausecognitive loss can be very ,poiry, it may be better to attempt to obtain a responsefrom each respondentbefore or in additionto resortingto a proxy' \flhen proxiesare used'it is vital that they have had enough recent exposureto the client to be able to offer a meaningfulreDort. It is also important to recognizethat pro*i.s may havea stakein the answers'If ih. info.rn"tion is going to be used-to determinetheir successas caregivers,for example,their answersmay be biased'It is thus critical to identify this potential sourceof bias and interpretthe reportsin
can be madein a varietyof Measurements different rypes of inforon relying ways, widely usedapproachis most The -"iio.,. (and interview questionnaire the probably are askedpeople older where schedule), types ot many For situation. their about attitudes, the when especially information, of the older person feelings,or experiences be the only way to may this are important, Interviewsand information. obtairr such problems' present however, questionnaires, 'ihey that is, time; real in used haueto be deprospectivethe of they must be part about clients ask sign. (It is possibleto thiir past status or their use of services a.rd.*pos,r.eto potentialthreats,but such recolleitionsmay be biasedby subsequent eventsand recallis subjectto error' On the other hand, almost all clinical histories rely on iust suchrecall.) Becauseof cognitive limitations, some clients' responsesare not trustworthy and
that context. Someinformationcannot be legitimately inferredfrom behaviors,and henceproxies cannot reporr it validly. Proxies derive their information from fwo sources:what they observeand what they hear' In most casesproxies rely on observations,but in (e'g.,when the primary resomeinstances available)the proxy.may not is spondent they haveheard prewhat be reportingon sources(e'g', the various viously from or professionals)' older person,caregivers, hard to derive thus is Some information example, alFor from proxy reports. emoextrâ‚Źme some though one may infer exfacial and tional statesfrom behavior another that pression,it would be unlikely p.rron can really be sure someoneis depressedor evenin Pain' An alternative to using self-report or
proxles vations At one fectivell servatio lar to th report I made n tured f< dictates what tt vations consiste the obs the obs structur The I or dem client is these pr tured cc In esser the clie tured o actual p formanr conditit ( e ' g . ,t h incentiv The sir eliminat ments. to asses der son about r tings, s, not be selves ( forman, psychol man be researcl have th the dise A c < many r clinical an info as the I that we
TOOL CHOOSINGAND USING AN ASSESSMENT
rspond to proxy re_ rr method lough the cxies and are dister 77, a . Proxies convincs reports :ognitive al about )r examo detered to be appeal, be very : to oblent bel proxy. rat they to the ;ful reze that vers. If to deor exl. It is tential rrts in nately :oxies lerive what most ut in 'y remay prethe als). :rive almoex:her deor
tured format for observing. This format dictates both how often to observeand what to observe.Structuring the observations in advancecan make them more consistent,but the structure may limit the observations;the observermay force the observationto fit into the imposed structure. The next level of structure is arranged or demonstratedperformance,where the client is instructedin what to do. Usually theseperformancesare done under structured conditionsto improvecomparability. In essence,these measuresusually reflect the client's capability,whereasless structured observationsor self-reportsreflect actualperformance.The differencesin performancesin structuredand unstructured conditionsmay be due to the environment (e.g.,the rulesthat governactivitiesor the incentivesthe staff or the client perceives). The simulated approach is designedto eliminate the effectsof different environments. In so doing, however,it is limited
variation in recordingfrom one personto another. Moreover, some recordings are difficult to interpret. For example, what does it mean when a clinician records "normal" or "unremarkable"?Without greaterspecificationsabout what precisely was examined and in what detail and about what constitutesthe basisfor calling something"normalr" theseconclusionsare hard to compare from one client to another or one clinician to another. CHARACTERISTICS OF MEASURES
ScalesversusJudgments Information about clientscan be collected in different ways. Questionscan be posed consistently and combined into scales. Clinical judgments can also be systematicallycollected.Someclinical iudgments are more systematicthan others. Diagnoses, for example, represent summary ludgmentspresumablybasedon overt criteria; but the level of systematicassessment that Boes into diagnoses varies greatly.At one extreme,psychiatryhas attemptedto compile a common explicit set of criteria for psychiatric diagnoses,the Diagnostic and Statistical Manual, a volume that lays out specificbasesfor each diagnosis.However, clinicians making a psychiatricdiagnosisdo not enter the speto assessing only what a person can do uncific elements into the clinical record. der somearbitrary conditions;it saysliale about real-world functioning.In someset- Rather,they enter the final conclusion. Efforts to make medical records more tings, such as nursing homes,clients may systematicoften result in a mixture of not be allowed to do certain tasks themsummarizedclinical iudgmentsand specific selves(e.g., bathing). Demonstratedperformance is a standard tool of cognitive data elements.For example,one might ask about heart abnormalities(or even about psychologistsand others working in hucardiac rhythm) in an effort to collect man behaviorlaboratories.Ifhen usedfor research purposesrthese demonstrations more consistent information. Unless the questionsare very specific(e.g.'rate' reguhave the advantageof standardizationbut larity), however,much is still left to cliniof artificiality. the disadvantage cal judgment. There is an important role in information A common source of for systematicjudgments,but they require of many studiesand a basic component (or assume)a high level of professional As clinical practice is the clinical record. competence. an information source,it is only as good It may be informativeto contrastthe inas the structureand commitmentto detail formation obtainedby askingolder people that went into it. Its greatestliability is the
ASSESSINGOLDER PERSONS
about specificsymptomsand askingclinicians about the presenceof the conditions these symptoms represent.Such analyses can shed light on the thoroughnessof clinicians and their ability to recognize pathologY. Another variant for structured clinical judgmentis the rating score.Cliniciansare askedto rate an aspectof a client on some numerical basis. ln some cases' specific termsare appliedto eachlevel;in other instancesthe clinician is simply askedto use a Likert-likerange (e.g.,1-5) to indicate severiry.(For an exampleof sucha scoring task. seethe discussionof the Alzheimer's Scalein Chapter4') , DiseaseAssessment Clinicians may resist any systemattcetforts to collect information, viewing them as an infringement on their professional prerogatives.They are likely to obiect -or. io detailedforms that resemblequestionnairesin their specifics.Partly for this reason,clinicians often make poor interviewers. They feel obliged to go beyond the questionsposedto add their own interpret;tions and insights,thus renderingthe responseslessreliable. In some instances, cliniciansmay adopt a protectiverole with the interviewee,wishing to spare him or her the anticipated embarrassment of some
questions. In extreme casesthey may simjly what the responseswould have -been "rtu-. complete the form without ever and asking the questions.Such behavior is dangerous, but difficult to detect. Properties A few basic concepts are usually employed to assessthe usefulnessof measures'These primarily address whether the measure is indeed measuring what it purports to measure (validity) and whether it yields consistent answers (reliability). Measures that are used to detect differences in the outcomes of a treatment may also be held to another standardr Are they responsiue to meaningful changes in the intervention? Are they capable of detecting treatment effects when those effects are present?
Reliability
Threebroad typesof reliabilityare typ-ically examined' For scales composed of
severaldiscretecomponents,a basic concern is the extentto which the components convergeon the generalmeasure'This is usuallyreferredto as internal reliability or In effect, the question is consisi;tency. whetherhaving a certain item as a part oi the scaleadds to the scale'soverall coherence. This trait is usually measuredby somefype of item-scalecorrelation,often .*pr.ri.d in the form of a Chronbach'salpha statistic.The goal is to achievea high ievel of internal reliabiliry but not total which would mean that the convergence, to the scale or could nothing adds item of it' lieu in used be simply describesthe reliability of type Another permeasures same the which extent to (i'e', difdo hands different in form aswell are they what on agree ferent observers (or inter-obseruer called seeing?).This is question same The interlraterl reliability. of any can be askedabout the consistency be can observer siven rater. The same again thing same the Isked to observe (within a period when it should not have changedor on videotaPe). Variation in responseover tlme can result from severalfactors: (7) real change, (2) dif.ferentreports from the client, or /3/ intra-observeror inter-observervariation' A test that capturesthe latter two components involves test and retest' Here the samerespondentis askedthe samequestions over a period of time designedto be long enough to forget the questionsand short enoughso that no significantchange in status had occurred. Such a test combines the effects of client and observer variations.For example'a personmay respond to a seriesof questionsdifferently occasions.If no apparo.t t*o successive changein the person's the for reason ent one for the discrepancy, account statuscan were questions the either that must assume asked differently (e.g., different phrasing'
different en or the respt or her ansv tions are no Much is ability of a r appreciate t are at least a might wax 1 scalpel.It is tolerancesat timate relial with the pe very differer surgeon tha trained to , hesitate to t hammer, knr a major dete Likewise, wl of their ow those clinici them. Reliab has only lim In each case, reliability o{ sures in the safeto assun previously r come reliabl not automat measurewol work in ano
Validity
Validiry refe to accuratel) tended to n tant concepl Although re validiry it i, case,one ha attribute alr fashion. Th collected sir dard, and t closer the rr measure. Fr
CHOOSING AND USING AN ASSESSMENTTOOL
e typ;ed of ; con)nents fhis is Iity or .on is rart of cohered by , often :h's ala high t total rat the could )es the :s perdo difey are rcr (or restion of any :an be again rt have :an re,hange, or (3) :iation. :omPo:re the : quesItobe ns and change t combserver nay re:erently appar'erson's cy, one rs were rrasing,
different emphasis,or different prompts) or the respondentwas inconsistentin his or her answers.In either case,the questions are not reliable. Much is made over the internal reliability of a measure,but it is important to appreciatethat other forms of reliabiliry are at leastas important' For example'one might wax poetic over the reliability of a scalpel.It is made to perform within small tolerancesand can cut sharply,but the ultimate reliability of its performancerests with the person using it' It will perform very differently in the hands of a skilled surgeon than in those of someone not trained to conduct surgery-One would hesitateto talk about the reliability of a hammer,knowing that carpentryskills are a major determinantof how well it is used. Likewise,while measureshave a reliability of their own, they must be tested with those clinicians who will eventually use them. Reliabilitydata from one experience has only limited transferabilityto another. to establishthe In eachcase,it is necessary reliability of those who will use the measuresin the given application. It is fairly safeto assumethat a measurethat was not previously reliable will not suddenly become reliable now, and likewise one cannot automaticallyassumethat becausethe measureworked in one set of handsit will work
in another.
Validity Validityrefersto the capacityof a measure to accuratelyreflectthe attribute it was intended to measure.Validity is an important concept but is often hard to prove. Although reliabiliry is a prerequisitefor validiry it is not sufficient.In the easiest case,one has the capacityto measurethis attribute already albeit in a cumbersome fashion. The candidate measure is thus collected simultaneouslywith the standard, and the results are compared.The closerthe relationship,the better the new measure.For many items, however, no
such gold standard exists. Other stratagems must be used instead. The most common test of validity is the so-called face validity. In essence, this means that the measures seem to be addressing the right items, and they make clinical or common sense "on the face of it." This test is usually necessarybut rarely sufficient. A more subtle approach to establishing validity is assessing the measure's abiliry to discriminate between two groups identified on the basis of their a priori likelihood to have the characteristic or not, for example, sick versus well. The more subtle the distinction between the groups, the more sensitive the measure' Another way to approach discriminate validity is to test the measure'slevel of agreement with other measuresthat are believed to tap related domains and the measure's lack of concordance with those hypothesized to address different domains. Still another test of a measure'svalidity lies in its ability to predict a client's subsequent status. Logically argued predictions that link current measurementof a domain with future changes in relevant other domains can establish that the first domain measure is indeed tapping a relevant area correctly. Responsiueness A third basic criterion used to assessa measure is its responsiveness' In other words, can the measure detect real change when it occurs. In one sense,responsiveness is related to discriminate validiry. The latter addressesthe ability of a measure to discriminate befween two groups that are believed to be different. The former addresses the ability to detect change over time. A measure may be able to discriminate but may not be useful for detecting change.For example, death is a good discriminator but may not be sensitive to real changes in conditions that are not fatal. Responsiveness (sometimes called sensltiuity, although this term has another meaning in the context of screening, as
ASSESSINGOLDER PERSONS
10
discussedlater) is difficult to establish'It requiresa strong clinical model' responThe only way to test a measure's when changes is'to seeif it detects siveness them' expect there is good reason to as part ot Therefore,it is usually assessed a clinical trial, when there is everyopportuniry for endogenousreasoning'The faitmay lie in the ure to find responsiveness This attriintervention' the of impotence valdiscriminant to linked closely is buie differences' important are there idiry but For example, death has strong discriminant featuies,but it may not be responsive to the effectsof treating arthritis or even pneumonia. Sensitiuityand SPecificitY Another set of criteria are usedto evaluate screeningmeasures.Here we seek sensitivity, specificity,and positive predictive urrrr^rl. Sensitiviryrefers to a measure's ability to detecta problemwhen it is there' Specihcityrefersto the obverse,the abiliry to correctlystatethat a problem is absent' In the context of screening'thesetwo attributes are assessedby comparing the screeningor test resultsto a measureot truth, usuallythe resultof a more compreor laboratory test' Tahensiveassessment comparisons'Sensithe illustrates 1.2 ble proportionof true the as is assessed tivity posiiives(a + b) the test correctlyidentihes, *h..."s specificityaddressthe proportion of true negatives(c +.d) identihed. In general' sensitivityis obtained at thesedefinithe cost Lf specificity.Because the changing by manipulated be tions can a represent to is said that levelof measure the moving state' negative positive or Table 1.2. and SensitivirY Assessing
thresholdwill increaseone at the expense of the other. The pattern of this relationship in responseto changesin thresholdis displayed as a receiver-operator.corre,oond.nc. (ROC)curve'in which the sen- specificsitiviry is graphed against (1 iry). The goal of such a measureis to maximizethe area under the curve (Hanlev & McNell' 1'982\. when the conSensitiviryis emphasized of missinga caseare severe'For sequences ex"-ple, if a diseasewere highly contaeious. it would be important to identify I".h ."r. as quickly as possible'One would want to cast a wide net to identify oossiblecasesand then rule out falseposiiives. Likewise,if a seriousdiseasewith was readily treatableif dire consequences detectedearly,one would want a sensitive test. ln contrast' if the cost of the subsequent evaluationwere quite high.and the condition lessserious,one would aim for greater specificity and accept more false negatives. PositiuePredictiueValue An additional concept to sensitivity and specificiry used in describing screening tools is the positivepredictivevalue (PPV)' In essence,this criterion asks how likely doesa positiveresult from a screeningtest indicatethe presenceof real disease'Using : the terminologYfrom Table 1'2, PPV al(a + c). This information is very important in interpretingscreeningresults and avoiding unnecessaryalarm' The PPV-is of the especiallysensitiveto the prevalencefollowThe for. screened being .onditio.t ing example can illustrate the effect of prevalence.
ficity* Test result
True state Positrve
Total
Positive
Negattve
a
b
a + b
o
c + d
Negative
a + c Total .Sensitiviry:a/(a + b); specificiry: d/(c + d).
b + d
Let 95% show lence that cond in 10 both strinl ficity. tive I accul unde time. to th lem. the I
AS Estal
Com peo[ ing z lems wor! part even vieu com tervi they swet
l1
CHOOSING AND USING AN ASSESSMENT TOOL
the expense ris relationthreshold is ator correich the sen- specificrsure is to urve (Hann the con;evere. For hly contao identify ible. C)ne o identify falseposiease with e a t a b l ei f . sensitive he subser and the I aim for ore false
[rty and creening e (PPV). w likely llng test :. Using PPV : imporIts and PPV is : of the followfect of
Table 1.3. on PositivePredictiveValue(PPV) The Effectsof Prevalence Testresult Positive True State 1/100). (Preualence A Condition 950 Positive Negative
4950
Total
.5900
Condition B (Prevalence1/1000)t Positive Negattve Total
YJ
4995 .5090 'PPV=950/5900=16"/o-
Negative
50
Total
1000
94,050 94,100
99,000 r00,000
5 94,90s 94.910
100 99,900 100,000
IPPY=9s15090=2%,
Let us assume that we have a test that is
95% sensitiveand 95yo specific.Table 1.3 shows the effectsof rwo different prevalencerates.In both caseswe will assume that we havescreened100,000people.In conditionA, the underlyingprevalenceis 1 in 100; in conditionB, it is 1 in 1000. In both cases,despitewhat seemto be rather stringentcriteria for sensitivityand specificiry the PPV is surprisinglylow. A positive screeningresult under condition A is accurateonly about 1'5"/.of the time and under condition B only about 2"/" of the time. The differenceis attributable solely to the underlyingprevalenceof the problem. The greaterthe prevalence'thâ&#x201A;Ź higher the PPV. GENERAL ISSUESIN THE ASSESSMENTOF OLDER PERSONS EstablishingCommunication Common problems in interviewing older peopleinvolve difficultieswith their hearing and vision.At the very leastsuchproblems may lead to poor performance.At worst they can lead to frustration for all parties, unnecessaryuse of proxies, and even misdiagnoses.A first rule of interviewingoldei peopleis to be surethat real communicationhas been established'Interviewersshouldask questionsevenwhen they strongly believethey know the answers and wait for a resPonse.TheY
should speak slowly and enunciatecarefully, facing the respondent.The older person needsto be wearing a hearing aid if one has been prescribed.Speakingloudly evenusinga pormay help,and sometimes table amplifier is a possibility when hearing is a problem. If such a device is not available,a written form of the interview may be used.Similarly,to deal with vision problems,responsecards or reading tests must be printed, and the interviewermust ensurethat the older respondentis wearing readingglassesif they are needed(and they usuallyare). Time In general, interviewing older subjects takes more time than usual. Not only is their responsetime longer,but some may have more difficulty focusingon the task. Lonelypeoplemay want to take advantage of the contact to talk about other things. Personswith minor cognitivedeficitsmay need more prompts and reminders than Interviewersneed unimpairedrespondents. specialtraining in learninghow to accommodatesomeof this time delay but not get off track. Fatigue Older respondentsmay tire easily.Fatigue will not only worsen their performance'it may place too great a strain on them. In somecasesit can lead to an incompletein-
1,2
ASSESSING OLDER PERSONS
terview. Interviewers need to be trained to recognize indications of fatigue and to offer to stop for awhile or even to divide the session into multiple parts geared to the respondent's tolerance. Clients with multiple medical problems will likely be at greater risk of fatigue. More anention should be paid to their condition as the interview continues' Some medications may affect the clients' level of or attention sPan' consciousness Embarrassment Some older people maY become uPset when they cannot perform certain physical and cognitive tests. Interviewers need to be instructed in how to avoid and to cope with these reactions. It is not uncommon' for example, for older respondents to become upset when they cannot complete a simple cognitive test. Interviewers need to know how to provide reassurancethat not everyone is expected to get each question correct or to complete all the tasks. Likewise, some clients may not feel free to give honest answers. They may be embarrassed to admit certain problems and simply offer socially acceptable responses. They may be afraid that reporting certain things will have adverseconsequencesfor them or others (e.g., in institutions). Some efforts may ameliorate these problems' such as assuring the clients of confidentiality (when such assuranceis feasible) and emphasizing that many of the problems explored are common, but often the barriers will persist. Oddly enough, however,interviewerembarrassment is an equally important factor. Even health professionals are uncomfortable asking people who resembletheir parents or grandparents about continence or depression.When the interviewer is matter of fact and unruffled, the respondent is also more comfortable. Test Batteries Often we use a battery of tests. In addition to considerations of fatigue and duplication, the order in which the tests are used
may be important. In general,it is better to begin with easierand less threatening material and allow the testingto proceed on the basis of adequate performance. This is easiestto accomplishwithin a given domain, such as cognition, but it also applies to the order of the domains tested. Areas,like cognition,wherefailure is more shouldbe presentedas late as posf-easible sible. Somepeople use cognitive batteries to determineif a person is as screeners competent to answer questionsin other areas(or needsa proxy), but cognitiveperformance may be spotty. Not only does such screeningpotentially eliminate some people who might nonethelessbe able to express meaningful opinions on various tokens. but the stressof the testing may also affect other areas.Likewise,a battery of questionsabout depressionmay cast a pall over the rest of the interview STAKEHOLDERS AND CONTEXT are usedfor variouspurposes. Assessments The nature of the stakeholdersand the context in which questionsare askedcan dramaticallyinfluencethe answers.Assessmentscan be usedas both a meansof enare also try and as a barrier.Assessments impreciseinstrumentscapableof substantial manipulation.The same assessment can thus be employedby different stakeholdersfor oppositepurposes.The aegisof can dramaticallyaffect its the assessment conduct and its interpretation. Consider, The for example, eligibility assessments. predispositionof the assessorwho is a stakeholdercan play a great role in determining the outcome. If some arbitrary point scoreis neededto rendera client eligible for services,an assessorwho wishes to seethe client receivethe servicescan ask questionsin a certain way or adjust judgmentsto achievethe target score.It is not surprisingthat whereverthe level of eligibiliiy is set, scoresseemto clusteraround that threshold. Clientsand their familiesare also major stakeholders.Their responsesmay be dif-
ferent may b exaggl for de to avo It ir spons( for tr, manaâ&#x201A;Ź o f a f may r see as seen a provid living to voi Older tomed for de or diss
It is alr ASSCSST
in infir proced expect. ciency eral, it ergy u be adc The le factors least n of car, howev worth ment
CHOOSING AND USING AN ASSESSMENTTOOL
lral, it is better ess threatening :rng to proceed : performance. within a given bur it also apomarns tested. failure is more as late as posritive batteries f a person is ions in other cognitive per_ ot only does mtnate some ;s be able to on varlous testing may se, a battery may cast a ew.
)NTEXT rspurposes. rs and the askedcan :rs. Assess)ansof en_ rs are also f substanlssessment ent stake_ reaegisof affect its Consider, :nts. The uho is a in deterarbitrary lient elir wishes can ask st yudg_ [t is not rf eligiaround I marof
be dif-
i: f"t.ttt depending on the context. They too
ut * y be sensitiveto issuesof eligibiliryand exaggeratedifficulties to become eligible for desiredservicesor minimize problems to avoid undesiredtreatments. It is possibleto obtain one set of responseswhen the family brings the client for treatment and another when a case managerasks the samequestionsas part of a formal eligibiliry evaluation.Clients may responddifferently to a person they seeas offering serviceand to one who is seen as an outside regulator.If the care provider also controls the older person's living environments,they may be reluctant to voice problems for fear of retaliation. Older people may also become accustomed to their situation and lose the basis for determiningwhat representsdifficulry or dissatisfaction. EFFICIENCY It is also important to consider the costs of assessment.Assessmentscan be conducted in infinite detail, but the efficiency of such procedures must be judged in terms of the expected benefits. The measure of efficiency will depend on the context. In general, it is unwise to invest considerable energy uncovering more problems than can be addressed, either direcdy or indirectly. The latter refers to identifying external factors that may complicate care or that at least need to be addressed when the plan of care is developed. In some instances, however, extensive evaluations may be worth the expense.For example, if a treatment is expensive (or dangerous), one
13
wants to be sure that there is a sound basis for implementing it. Those who worry about the costs of long-term care may argue for more extensive evaluations first to identify pofenrially correctable problems whose repair may preclude the need for such extended care. Others may take a more programatically protective attitude by establishing admission barriers designed to allow entry to only the most severely disabled. Onet view of the role of evaluation and assessmentdepends in part on how one views the allocation of long-term care services. Those who see such care as composed of specific aliquots may be more inclined to emphasize assessmentas a basis for strict care planning. They advocate closely matching treatments to cases. Extensive evaluation is needed to allow the proper titration of services. On the other hand, those who view long-term care services as more generic may be content with more limited assessmentsthat allow an estimate of the total amount of care needed.
REFERENCES Hanley,J. A., 6. McNeil, B. J. (1982). The meaningand use of the area under a receiver operating characteristic (ROC) curve.Radiology,143, 29-36. -. Stevens,R. (1971). Trends in medical specializationin the United States.lnquiry, 8, 9-19. -. Ware, J. E. Jr., Snow, K. K., Kosinski, M., 6. Gandek,B. (1993). SF-35 beahh suruey: Manual and interpretation guide. Boston: The Health Institute, New England Medical Center.