Primer on Performance Evaluation by Muhammed Akram KHAN

A Primer on Performance Evaluation of Public Sector Programs By Muhammad Akram Khan Former Deputy Auditor General of Pakistan makram1000@gmail.com

Abstract The main objective of the paper is to present the concept, approach, and methodology of performance evaluation of public sector programs in broad terms. The audience of the paper is ordinary readers and not professionals of the evaluation. The first section of the paper introduces the concept of evaluation. The second section discusses standards of performance evaluation. Sections three to six discuss evaluation planning, evaluation approaches, evaluation implementation, and evaluation reporting. The last part presents some issues and challenges relating to performance evaluation.

1. Introduction a) What is performance evaluation? The governments around the globe allocate development budgets to various programs and projects. Many people use the terms ‘program’ and ‘project’ interchangeably. However, in the context of public sector, and for the purpose of this paper, we shall distinguish between the two. A project is a unique set of activities with a limited duration, often intended to create assets at defined locations and have quantifiable outputs. Examples are: construction of roads, dams, and bridges, installation of telephone exchanges and power houses, and development of airports, etc. A program is generally intended to create or strengthen institutions, build or enhance capacity or arrange and deliver services, mostly on an ongoing basis at several places with more intangible outputs and outcomes. Examples are: malaria eradication program, fight against aids program, primary education program, maternity health care program, etc. A program can have several projects. Performance evaluation of public sector programs is also termed as ‘program evaluation’ or merely ‘evaluation’. However, we shall use the term ‘performance evaluation’ in this paper as it seems to be more explicit in its connotation. There is no standard definition of performance evaluation of the public sector programs. Most of the definitions focus on the broader framework of economy, efficiency and effectiveness. For sake of illustration we shall site the following definitions: “Program evaluation is clearly one of the critical tools available to assess program performance. Program evaluation (commonly referred to simply as ‘evaluation’) is the systematic assessment of the appropriateness, effectiveness and/or efficiency of a program, or part of a program. As such, it is of considerable value both to agency managers, external decision-makers and other stakeholders.” [Australian National Audit Office, 1997, xi] “Program evaluation [is] a centrally recognized and formal function of management in order to assist in decision-making on planning and designing programs, determining their respective priorities, ascertaining and assessing the extent to which existing programs were achieving their

objectives with the resources provided, and considering cost-effective alternatives to existing programs.â&#x20AC;? [Treasury Board Secretariat of Canada, 2005]

Every program consists of inputs, outputs, outcomes and impacts. While inputs mean financial, human and other physical resources used in a program, outputs refer to immediate goods or services produced by the program. The outputs are usually inside the organization and the management has control over them. The outcomes of the program refer to medium term results of the program and the changes that the program is expected to bring. These are usually outside the organization. Impacts refer to the long term effects of the program on the society as a whole or certain segment of the society. For example, a primary education program will have financial, and human resources provided in the budget as inputs. The number of students enrolled, the number of students passed and the number of new schools established are outputs of the program. The outcomes of the program could be increase in the number of young people going to high school or obtaining technical education after primary school. These outcomes can be measured only after a number of years. The impact could be reduction in the poverty level among families with children having primary education over a period of, say, ten years. As we go from one tier to the other, the difficulty of measurement and isolation of multiple factors becomes difficult. Performance evaluation, consequently, becomes more technical and difficult as we aim to evaluate the outcomes and impacts. The usual aim of performance evaluation is to evaluate the outcomes of the program. However, in some cases, it becomes difficult to measure the outcomes of the program due to time or resource constraints. In such case, performance evaluation may focus either on the program outputs or program processes. The objective, in such cases, is to determine if the program is being delivered in the manner it was intended in the first place. The main objective of the paper is to present the concept, approach, and methodology of performance evaluation of public sector programs in broad terms. The audience of the paper is ordinary readers and not professionals of the evaluation. The first section of the paper introduces the concept of evaluation. The second section discusses standards of performance evaluation. Sections three to six discuss evaluation planning, evaluation approaches, evaluation implementation, and evaluation reporting. The last part presents some issues and challenges relating to performance evaluation. b) Need for performance evaluation Why is there an enhanced emphasis on performance evaluation these days? Performance evaluation has emerged as a cornerstone of good governance. In fact, it a response to mounting pressure on the governments to demonstrate that they are managing public programs effectively. The pressure has come with greater respect for democratic values, demand for greater transparency in decision-making, increased demand for access to public sector information, almost omnipresent media with 24/7 telecasts, greater expectations of citizens on quality of services being delivered by the governments, and demand for more consultation with stakeholders before policies are finalized. These social currents have become quite visible in recent times. It is possible that due to fiscal discipline resources are well-managed but they may not still be well-spent. The awareness about how well resources are spent comes from performance evaluation of programs. It means that not only resources should be well-spent; there should also be evidence that they could not be spent better, given the constraints of the government. Performance evaluation provides the stakeholders with this assurance by presenting essential information on the effectiveness of the public programs. More specifically: (i)

Performance evaluation provides information on the effectiveness of the public programs. It informs the stakeholders about the actual outcomes and impacts of the

programs as compared to what was planned or what could be achieved with these resources. (ii)

Performance evaluation informs the management about the continued relevance of a program. Sometimes the programs are well-conceived and are beneficial also. But over time they become irrelevant or require major adjustments due to emerging global environment, changes in policy or developments in technology. Performance evaluation shows the way forward.

(iii) Performance evaluation points out the extent of economy and efficiency in achieving program results and shows if cheaper or more efficient options are available. (iv) Performance evaluation is a basis for holding the public managers accountable for their performance. It shows to the stakeholders like government, donors or users of services how well the public managers have delivered on the program. (v)

Some governments, like US, UK and Canada, have started using performance evaluation of ongoing programs to reflect on major policy options and decide direction of future budget allocations. It is not merely a tool for measuring the performance of the public managers and holding them accountable. However, the initiative is yet not a ‘going concern’ because a full-scale application of the concept would require evaluation of all public sector programs which is a costly business in its own right.

Needless to say that performance evaluation can play a useful role only if it is embedded in the management of the programs as a core function. It means the results of evaluation should be fed back into the planning and implementation of future programs. If the public managers consider the performance evaluation as an ‘add-on’ activity, they may grudgingly ‘bear with’ it but would let the evaluation reports gather dust in shelves. Performance evaluation examines effectiveness of the programs. The program managers have a ‘vested’ interest in projecting their performance in bright colors either by suppressing planned targets initially or by side-tracking any adverse evidence in achieving them. Besides, programs often have cross-cutting linkages with other entities, and even with whole of the government. It would, therefore, be reasonable to say that performance evaluation should be an independent function conducted by a central agency staffed by professionally competent persons. The managers can have some flexibility in choosing the timing and coverage of the programs. But an effective performance evaluation cannot be a function ‘internal’ to the program management. Like all initiatives, the success of performance evaluation depends upon top support at the ministerial level. The program managers should have a mandatory responsibility for providing all information and extending full cooperation to the evaluators. In absence of such a regime even the best designed evaluations can become a failed enterprise. c) Objectives of performance evaluation Performance evaluation aims at the following objectives: (i)

To provide better information for improving program performance, identifying policies that work and that do not work

(ii)

Assist public sector entities in setting priorities and decision-making

(iii) Strengthen the process of accountability of public managers

d) Distinction between performance evaluation and performance auditing A cursory look at the scope, objectives and role of performance evaluation would suggest that it is, perhaps, a second name for performance auditing. Both operate within a broader framework of economy, efficiency and effectiveness. Both have similar objectives of improving public sector management and holding the public managers accountable. Both require competence and independence of the professionals who undertake these enterprises. Both recommend that their outcome should be the basis for future decision-making. The question arises: what is the difference between the two? The answer has following dimensions: (i)

Scope: The scope of performance evaluation has a stronger focus on effectiveness of policy, while performance auditing focuses mainly on administration of the programs. Policy is laid down by the elected representatives and implementation is done by the government employees. Performance auditors restrict themselves to the review of implementation of the policy and do not question the policy itself. However, performance evaluation can question the policy as well. (McPhee, 2006,17)

(ii)

Independence: Performance audits are always conducted independently by Supreme Audit Institutions or other auditors who are independent of the executive. Performance evaluation may not be independent of the program management in all cases. (McPhee, 2006, 17)

(iii) Reporting mechanism: Performance audit reports in the government are placed before the parliament or governing boards of the public enterprises. The reports of performance evaluation are usually submitted to the minister-in-charge or the chief executive of the enterprise. (McPhee, 2006,17) (iv) Criteria: There are differences in the criteria that performance auditing uses as compared to the performance evaluation. Performance auditing uses good management practices as basic criteria for auditing the performance and making recommendations. Performance evaluation uses the technical operational standards as basic criteria for evaluation and making recommendations. Some examples would clarify the above. Suppose a government has a mine action program which aims at clearing the landscape from mines for making a certain area safe for the public. Suppose both performance auditing and performance evaluation has chosen to audit the planning of the program. The performance auditors would ask some of these questions: • Has the management an operational plan? • Has this plan been prepared in consultation with those who have to implement it? • Does the plan contain quantified targets with time schedules? • Has the management taken the senior and junior levels on board with this respect to this plan? • Has the management involved other stakeholders in preparing the plan? And so on. In case of performance evaluation, the evaluators would ask such questions:

• Are the planned targets technically realistic, in light of the resources and known technology? • Did the management consider the landscape and topography and weather conditions while preparing the plan? • Are the targets with the available resources adequate? Or, could more be achieved with these resources? • Is the management relying on an appropriate technology for mine action? We can see that in both cases planning function of the program is being reviewed. But auditors and evaluators are asking different questions. The auditors need to have skills in management while the evaluators need to be skilful in mine action technology. Let us take another example. In a primary education program, while focussing on implementation of the program, the performance auditors would ask such questions: • How many students have been enrolled? • What is cost of each student that passed out? • What is cost on repeaters? • How many schools have been set up? • What has been the average time taken to set up a school, and so on? On the same aspect, the performance evaluators will ask such questions: • Is the teacher student ratio right, considering the age and level of students? • Is the curricula being taught appropriate, given the social background of the students? • Is the quality of teaching material, text books appropriate, in light of program objectives? • Are the teachers having appropriate qualifications for the level of education they are expected to impart, and so on? We can see that both auditors and evaluators are focussing on implementation of the program but are asking different questions. In sum, performance auditing emphasizes management of the programs in general terms while performance evaluation looks at technical and operational areas, about which auditors do not generally have appropriate knowledge and skills. However, performance audit and performance evaluation are complimentary to each other. They do overlap at certain points. But as disciplines, they require different types of expertise and serve distinct set of purposes. e) Guiding principles for evaluators Performance evaluation of public sector programs is only one sub-set of a wide arena of operations for the evaluators. The evaluators may be required to evaluate products, personnel, policy, proposals, technology, research, theory and even evaluation itself. They may come across people with different cultural or social backgrounds. There could be diverse legal backgrounds and governance structures in organizations being evaluated. Laying down guiding principles for all situations is a challenging task. There is no universally accepted set of guiding principles for evaluators. However, American Evaluation Association published a set of such principles in 1994 and then revised it in 2004 (available at: www.eval.org). We

shall give below an abstract of these guiding principles to demonstrate the type of expectations that one can have from professional evaluators. These are quite general principles and can be adopted by performance evaluators in most of the settings. (AEA, 2004, 2) (i)

The evaluators should work in a systematic manner. Their work should adhere to the highest technical standards. They should report strengths as well as shortcomings. They should communicate scope, approach and methodology of the evaluation to the clients before undertaking the evaluation.

(ii)

The team of evaluators should collectively possess the qualification, competence, and skills appropriate to the evaluation in question.

(iii) The evaluators should display the highest standards of honesty and integrity. They should disclose any real or potential conflict of interest and any changes in the original negotiated plan of work. They should represent accurately their procedures, resolve any concerns of the clients and disclose all sources of financial support. (iv) The evaluators should respect security, dignity, and self-worth of respondents, program participants, clients and other stakeholders. (v)

The evaluators should articulate and take into account public interest beyond the immediate operations and outcomes, present results in an understandable form and, within the framework of confidentiality, allow stakeholders access to the evaluation information.

2. Performance evaluation standards There are no universally accepted standards for performance evaluation as we find in case of accounting and auditing where a number of national and international standard setting bodies are active. There is now a well-developed procedure for developing standards in these disciplines. This is not so in case of performance evaluation. Different organizations involved in the work of performance evaluation have prepared standards for their practice. However, as yet, there is no internationally recognized body engaged in the work of standard setting in the field of performance evaluation. The United Nations Evaluation Group (UNEG) is a body that consists of professional practitioners. The Group undertook to develop a set of standards for the UN system in response to a General Assembly resolution of 2004. The standards issued by UNEG consist of three parts: (i) establishment of the institutional framework, (ii) management of the evaluation function, and (iii) conduct and use of evaluations. They also have a reference for the competencies and work ethics of evaluation practitioners. Out of these, part (i) is more UN-specific; so we shall not cover it in this paper which is intended for broader readership. Part (ii) we have covered above while discussing the guiding principles for evaluators. We shall, therefore, summarize below part (iii) from the UNEG standards relating to conduct and use of evaluation. (UNEG, 2005, 10-17). a) Standards relating to design of evaluation (i)

The evaluation should be designed to ensure timely, valid and reliable information that will be relevant for the subject being assessed.

(ii)

The terms of reference of evaluation should have following elements: â&#x20AC;˘ Subject of evaluation

• Context, purpose and specific justification for evaluation • Criteria for evaluation • Key questions for evaluation • Methodology to be adopted • Work plan including financial and time budgets, time schedules, and places to be visited, etc • Target dates for reporting and use of the evaluation report (iii) Evaluation objectives should be realistic and achievable in light of the information that can be collected and context of the undertaking. The objectives of evaluation should flow from its purpose. The scope should determine the boundaries, time period, subject and geographical areas to be covered. It should also mention the extent of involvement of various stakeholders. (iv) The design should clearly spell out the criteria against evaluation shall be carried out. The most commonly applied criteria are as follows: • relevance • efficiency • effectiveness • impact • value-for-money • client satisfaction • sustainability • It can have additional criteria depending on the nature of organization, purpose and scope of evaluation. However, not all criteria are relevant in each case. The evaluators should exercise their judgement in defining the criteria for each evaluation. (v)

The methodology should be sufficiently rigorous for assessing the subject of evaluation and should lead to a complete, fair and unbiased assessment. The methodology for data collection and analysis should be appropriate with respect to the objectives of the evaluation. Any limitations and constraints should be recognized in the terms of reference of the evaluation.

(vi) An evaluation should assess cost effectiveness to the extent feasible. It means whether the program management adopted the least cost approach, given the objectives of the program and the constraints of the market. Depending upon objectives of the evaluation, the costs may include economic cost, environmental costs and human and social costs. In each case, the evaluators will need to make a judgement about the types of costs to be included in the analysis. The evaluation should try to assess if there was a room for saving the cost, or more could be achieved at the same cost. It should, where possible, compare the benefits achieved and see if the cost incurred justified the benefits. It should also see if the benefits accrued were sustainable over the long run. Where analysis of the cost is not feasible for any reason, the evaluators should state the reasons explicitly.

b) Process of evaluation (i)

The relationship between the evaluators and the commissioners of an evaluation must, from the outset, be characterized by mutual respect and trust. The responsibilities of the parties should be documented and there should be no confusion about such matters as financing, time-frame, persons, procedures, methodology, and contents of the reports to be produced. Evaluators should consult with the commissioners of the evaluation on matters such as confidentiality, privacy, communication, and ownership of findings and reports.

(ii)

The evaluators should consult the stakeholders in the planning, design, conduct and follow-up of evaluations. They should share planning decisions relating to key questions, criteria, methodologies, data collection, and expected levels of cooperation with the clients at an early stage. It helps both sides. The evaluators get better cooperation from the clients. The clients understand the role and area of operation of the evaluators better.

c) Selection of team Evaluations should be conducted by well-qualified teams. The teams should collectively possess the competence and skills to conduct the evaluation. d) Implementation (i)

Evaluations should be conducted in a professional and ethical manner. The evaluators should adopt a participatory approach with the clients' staff. For putting the clients in proper perspective and bringing them on board, the evaluators should discuss with them the assumptions, values, methods and concepts in an open and unbiased manner.

(ii)

The evaluators should show due respect for cultural diversity, human dignity, human rights and confidentiality of information.

(iii) The evaluators should adopt an objective and balanced approach in their work and address and analyze different perspectives and points of view. They should try to understand the position of the clients on various issues and record it truthfully. The evaluators should document all information collected and should be able to demonstrate their objectivity and non-involvement with any particular point of view. (iv) The evaluators should substantiate their key findings with reliable evidence. They should discuss their assessment with the clients and understand their perspective as objectively as possible. They should report any unresolved difference of opinion with the clients in their report and inform the client of their intention of doing so. e) Reporting The final evaluation report should be logically structured, containing evidence-based findings, and recommendations. The report should be free of irrelevant information. The presentation of the report should be accessible and comprehensible. f) Follow-up Evaluation requires an explicit response by the governing authorities and management addressed by its recommendations. 3. Planning for performance evaluation: preliminary assessment a) Notification of performance evaluation and terms of reference

Performance valuation commences with the notification of the intention of performance evaluation. The notification mentions the authority under which the evaluation is being undertaken. At the same time it incorporates terms of reference for the evaluation. The terms of reference refer to the purpose and objectives of the evaluation in broad and general terms. The terms of reference specify the purpose of study, state boundaries of the evaluation, and clarify respective roles and responsibilities of the evaluators and the clients. The terms of reference promise to provide a more detailed outline of the objectives, scope, criteria, methodology and reporting procedure at a later date. This is done as the evaluators get ready with their work plan and field work. The terms of reference is usually a routine document and most of the evaluation bodies use a sort of a template to notify the evaluation. b) Understanding the program and its environment The first step in planning for performance evaluation is to understand the program and its environments. The evaluators need to find answer to the following basic questions: (TBS, 1984, chapter 2) (i)

What is the legal mandate of the program? How it has been conceived and developed? It means, why the program was needed in the first place?

(ii)

What are the main assumptions of the program? How its various components are linked and are supposed to produce the intended results? It means, what is the program logic model?

(iii) What is the organizational structure of the entity implementing the program? (iv) What are the main objectives of the program? It is supposed to achieve what? (v)

How is the program operating? Where is it operating? What are its main activities?

(vi) How much financial, physical and human resources are deployed? What is the development and recurrent expenditure on the program? Has the program built or acquired any capital assets? (vii) Who are the beneficiaries of the program and where are they located? What is the number of beneficiaries and what is their profile in broad and general terms? (viii) What outputs, outcomes and impacts the program is expected to produce? (ix) What are the environmental implications of the program? (x)

What are the constraints under which the program is operating? They could be financial, physical, human, social, cultural, political, and even global.

The evaluators try to confirm their understanding of the program with the management in interviews so that any misunderstanding is dispelled at an early stage. c) Identification of evaluation report users and purpose of performance evaluation The field work of evaluation depends to a considerable extent on the purpose of evaluation and who would be the users of the report. They could be diverse. However, each evaluation is conducted from the perspective of its major potential users. It could be commissioned by the legislature who has broader public interest questions in view. It could be requested by the ministry of finance who are more interested in allocation of funds or it could be an evaluation on behalf of the program management who are primarily concerned about the output indicators with a view to taking corrective action at an early stage. Depending upon who are the primary users of the report, the evaluators determine the type of information to be collected, and the degree of precision to be observed.

d) Key evaluation questions Familiarization with the program objectives and environment, the purpose of evaluation and prospective users leads the evaluators to define key questions. These questions are of two types: generic and specific. The generic questions deal with such broader areas as objectives and design of the program, the achievement of the program outputs, outcomes and impacts, the unintended negative effects of the program and the economy and efficiency in the acquisition and use of resources. The specific questions are the concerns shown by various potential users of the evaluation such as legislators, senior public managers, news media, beneficiaries of the program and ministry of finance, etc. The evaluators try to integrate the two types of questions in their work plan. However, before finalizing the list of the key questions, the evaluators take into consideration priorities of the major potential users of the evaluation report or the authority that commissioned the evaluation. e) Evaluation criteria Criteria for performance evaluation consistence of two categories: (a) general (b) specific. The general criteria have the following sub-criteria: (i) effectiveness (ii) efficiency (iii) economy (iv) sustainability and (v) institutional development. (i)

Effectiveness refers to the extent the program met its expected outcomes and impacts. The program outcomes and impacts could be quite general and broadbased. For example, the degree to which a program led to reduction in crime, enhanced the average life of new-born children, reduced the number of deaths due to AIDS or reduced poverty, etc.

(ii)

Efficiency refers to the relationship of inputs and outputs measured against some benchmark. Performance evaluation tries to assess if the outputs produced with given inputs matched the efficiency criteria. For example, the cost per unit of electricity produced as compared to cost per unit in the private sector or at some other power house, etc. There are no hard and fast efficiency criteria available for all sorts of programs. Generally, the evaluators would look for efficiency criteria in the technical specifications of various resources used, outputs of similar organizations or programs in the private sector, past performance of the same organization or expert observersâ&#x20AC;&#x2122; rating, where public services are being delivered such as waste disposal or street cleaning.

(iii) Economy refers to the principle of least cost in acquiring resources keeping in view the quality, quantity, time, location and user needs. Economy usually requires acquiring resources through open and fair competition or taking advantage of bulk purchases or entering into systems contracts for repeat purchases at given prices and terms. (iv) Sustainability refers to the ability of a program to continue in the future. It measures the extent to which the program objectives have been (or are expected to be) achieved without using more resources than necessary. It is a multi-faceted concept and involves several factors such as availability of financial and human resources, political will, institutional ownership of the program, absence of major policy reversals, continued support from key support groups, resilience against changes in socio-political conditions, etc. It is obvious all these factors require considered judgment. Not all of these would be applicable in each case. (v)

Institutional development refers to the contribution of a program in creating new or strengthening existing institutions since most of the programs require strong institutions to sustain in the future. It means the extent to which the program

improves the ability of a country to make better use of its resources. The criteria for institutional development involve codification of laws, regulations, rules and procedures, creation of coordinated structures for flow of information, unambiguous rules for use and delegation of authority, a robust system of internal controls and a system of oversight and risk management. It is obvious all these require a considerable amount of judgment at the time of evaluation planning. The specific criteria deal with the technical aspects of a program. For example, programs for health, education, law and justice, democratization, child and women welfare, mine action, or disarmament, etc would each have different sets of evaluation criteria determined in light of technical details of these programs. It would require technical knowledge of these fields. For example, in performance evaluation of maternity health program, the evaluators would lay down criteria for such factors as frequency of check-up for pregnant women, use of certain necessary drugs during pregnancy, the types of tests and the follow-up action on these tests, the hygienic state of labor and delivery rooms, etc. It is in this very aspect and this very juncture that performance evaluation takes a distinct route from performance auditing. Performance auditing is more of an application of generally accepted management practices while evaluation delves into more technical aspects of the program. f) Evaluation design: Key questions and constraints Evaluation design refers to the method the evaluators will adopt in pursuing their objectives. A carefully designed evaluation helps in deciding the information required and the method for gathering and analyzing it. It helps to reduce ad hoc decision-making, cuts down the cost, and reduces the time and effort by focussing only on relevant information. Besides, the quality of evaluation is enhanced as the evaluators are able to think deep into the whole process of evaluation, concentrating on their objectives rather than on anything that comes their way. There are three types of questions that an evaluation can pursue: (i) descriptive questions; for example, the number of widows who received cash sustenance allowance during a year in a certain region; (ii) normative questions, that is, the conditions prevailing as compared to the criteria; for example, the number of children vaccinated as compared to the target; (iii) impact question, that is the cause and effect in an observed condition; for example, the extent to which increase in average weight of new born babies can be attributed to the nutrition program of pregnant women. The design of the evaluation would depend, to a large extent, on the type of questions to be answered by the evaluators. The design would focus on the following elements: (i)

Kind of information to be acquired

(ii)

Sources of information (for example, types of respondents)

(iii) Methods to be used for sampling sources (for example, random sampling) (iv) Methods of collecting information (for example, structured interviews and selfadministered questionnaires) (v)

Timing and frequency of information collection

(vi) Basis for comparing outcomes with and without a program (for impact or causeand-effect questions) and (vii) Analysis plan The evaluators have to decide on these questions at the design stage. This would also determine the time and cost of evaluation. It would influence the quality of evaluation. It may

be more interesting to raise some questions but may not be feasible to focus on them because of the time and cost involved in collection and analysis of the information. The evaluation design would depend upon these constraints as well. Another important factor is to consider the degree of precision and conclusiveness expected from the evaluation. If the evaluation intends to find descriptive information such as the number and type of beneficiaries of a schools aid program for the handicapped children, it may be a simple design tabulating the descriptive information. If, however, the aim of the evaluation is to correlate cause and effect of a certain phenomenon, a more detailed design would be required. Since evaluation is an expensive endeavour, it is sometimes appropriate to think of a pilot study before a full scale evaluation is undertaken. The reason for such a decision could lie in the risk that the preliminary understanding of the program may not be accurate, the data may not be available, some of the locations may not accessible, or the assumptions of the evaluation may not be realistic. A pilot study can help save time, cost and effort if these risks prove to be real and are expected to impact the process of evaluation significantly. While deciding on design of the evaluation, the evaluators consider their constraints as well. For example, time could pose a material constraint. The scope of evaluation is defined keeping in view the time during which the evaluation is to be completed. Cost is another constraint. The scope and boundaries of evaluation will be defined in light of the funds available for the purpose. The design of evaluation also depends on the expertise and competence of the evaluation team. If, for example, the evaluators do not have the expertise in sophisticated econometric analysis or advanced statistical analysis, the design of the evaluation should not have these elements. The evaluation design can retain its sophistication, however, if the time and money budgets allow for engaging consultants to work with the inhouse team. Lastly, the design of evaluation also depends on the locations and facilities. There could be places where the evaluators cannot go or from where data cannot be collected because of lack of communication infrastructure, bad weather conditions, lack of peace or security, etc. In such cases, the design would be modified suitably to suit the feasible locations and facilities. Once the basic design of the evaluation is decided, it is a good practice to have peer reviews for appropriateness, feasibility, cost, time, and risks involved. g) Time and cost budgets The last element of performance evaluation plan consists of time and cost budgets. Both these budgets would depend on the approach and methodology of evaluation. Time budget would work out the time required in terms of person days for all members of the evaluation team. It would also spell out the schedule of various key activities or milestones, such as collection of data, field visits, analysis of data, draft report, clearance of the draft report and final report. Cost budget would include salary and non-salary cost of the staff and fees of the consultants or specialists if required. 4. Evaluation approaches Once the evaluation questions and their constraints have been defined, the planning process focuses on the approach to evaluation. The approach refers to the method that would be adopted for seeking answers to the key evaluation questions. There are several approaches for evaluation. Traditionally, evaluators have been using the following evaluation approaches: â&#x20AC;˘ Literature reviews â&#x20AC;˘ Sampling program managers or program beneficiaries â&#x20AC;˘ Instrument design

• Interviews • File or document reviews • Individual case studies • Comparative cost analysis However, gradually more sophisticated approaches such as the following are being adopted in evaluation: (OED, 2004, 13-15; TBS, 2005; GAO, 1991, 28-58) • Sample survey method • Rapid appraisal methods • Case study method • Field experiment method • Use of available data In the following discussion we shall be taking up the second set of approaches mentioned above. a) Sample survey method The evaluations that adopt sample survey method seek to conclude on the basis of information on certain phenomena in a sample of the population. The objective is to arrive at generalizations about the incidence, distribution and interrelation of naturally occurring events and conditions in the population on the basis of information in the sample. For this purpose, statistical technique of probability sampling is used. In probability sampling each unit of population has a known, nonzero, probability of being selected in the sample by chance. In sample surveys, the techniques of collecting information are structured interviews, or self-administered questionnaires. The interviews could be face-to-face or through phone. The questionnaires could be administered either through mail, e-mail, or even during personal interviews. The questions are quite often close-ended, giving the respondents a set of options for choosing answers. It makes subsequent analysis of data simple. The sample units are often persons. However, they could be organizations as well. An important question in sample surveys is the sample size. Statistical techniques exist for arriving at most appropriate sample size, in light of total population and the nature of information to be collected. b) Rapid appraisal method These are surveys of the beneficiaries of a program for seeking their feedback on the extent of benefits and costs of the program. These surveys provide low cost information to the decision-makers. However, the information often lacks precision and is of limited use for the context in which collected. c) Case study method The evaluation approach may consist of studying a single or multiple programs. The reason for adopting this approach is that the total size of population consists of either one or a few programs of the same type and the objective of the evaluation is quite complex, deep or comprehensive, making sample survey inapplicable. The single case study focuses on a single program. The multiple case study evaluation compares the data from different programs. Example of a single case study could be evaluation of a program for increase in yield per hectare through trickle irrigation in a desert area. The example of a multiple case study evaluation could be comparison of improvements in the performance of students in mathematics in different provinces after the introduction of new text-books.

d) Field experiment method Field experiment method aims at evaluation of impact by focusing on analysis of cause and effect. There are three main types of field experimental designs: (a) with-or-without experiment design in randomly selected groups; (b) nonequivalent randomized groups experiment design (c) before-and-after experiment design. (i)

The with-or-without experiment design focuses on two groups. One group is exposed to the program; the other is not. However, before the program is launched, every member of the population has an equal chance of being selected for the program. For example, in a program for financial assistance to those schools that reduce the percentage of repeaters, all schools have an equal chance of being selected. Once the program is launched, the schools that receive financial assistance under the program would become the control group and those who did not participate would become the comparison group. The results of both categories would be compared to assess the outcome of the program.

The nonequivalent randomized group experiment also uses the methodology of control and comparison groups. However, in this case, the memberships of the groups are not equal although all members of both groups have an equal chance of being selected. For example, a program of financial assistance for higher education leading to increase in income of the recipient has to be evaluated. Those who accept the assistance could be of different age, background, gender, or domicile, etc. These factors can influence their performance, despite the program. In such cases, the difference in the performance of each group could be due to several factors besides the program under evaluation. The participants of the program are nonequivalent. However, they have an equal chance of being selected as they are selected on a random basis. The difference in results could be due to the program or due to some other factor. It would require isolating the effect of the program. Statistical techniques, such as analysis of covariance, are applied to isolate the effects of the program from other possible factors for making the data comparable. (ii)

The before and after experiment design analyzes the data at two different points in time: before and after the program. The data are collected from the same population on the same variables. It is assumed that the difference is due to the program. The results are adjusted for distortions by other factors which may have occurred over the period. One of the methods to increase the reliability of this technique is to collect data for more than one time after the program. The collection of data at repeated intervals can smooth out, to some extent, the effect of other factors.

e) Use of available data Using available data for evaluation may make the evaluation design simple and economical. Sometimes the data are collected and published by other agencies. For example, population census data, crimes data, literacy data, economic indicators data, etc are generally available in published or electronic form. The evaluators may decide to make use of these data to find answers for the evaluation questions. However, use of available data has its own limitations. The data may not be current or may have certain implicit assumptions, or may have different objectives, distorting its use for the study in question. The evaluators have to decide at the design stage the extent of reliance on the available data. 5. Conducting a performance evaluation a) Information gathering: the field work

Performance evaluation field work starts with information gathering. At the planning stage, the evaluators have already decided the key questions, the evaluation approach and the method of collecting the information. At this stage, they develop survey or interview questionnaires and proceed with implementing them. The evaluators have to take care for ensuring and maintaining quality of the information. Some of the usual techniques for this purpose are as follows: (TBS, 1984, chapter3). (i) Pilot testing of information: As already mentioned in section three of the paper, the purpose of pilot testing is to resolve any deficiencies or difficulties in collection of information by testing the technique, usually a questionnaire, on a small sample. It makes the collection of information for the entire study cost-effective. The evaluators modify, edit, reformat, and revise the questionnaires after the pilot study. (ii) Using more than one source of information: Information from a single source may not be complete or fair. To overcome these risks, the evaluators often prefer to collect information from multiple sources, if possible. For example, they may like to supplement or double-check the information in the files through interviews or surveys. (iii) Reviewing the information while being collected: Sometimes, sudden changes in the circumstances take place that make the original assumptions about the data invalid. It is, therefore, a good practice, to monitor the data collection and revise the methodology as the work proceeds. (iv) Editing the information: Raw information collected from the field and then transferred to the computer may contain errors. There are statistical and computerbased techniques available to check the data for errors. However, one hundred per cent accuracy may never be achieved due to time and cost constraints. (v) Procedures for handling changes in the sample: During data collection the evaluation team may come across instances where selected sample items are not available or non-responding. Such situations can bias the original sample and influence the results of the evaluation. The evaluators have to devise compensatory measures to maintain quality of the data. b) Analysis of the information After collecting the information the evaluators proceed to analyze it. The analysis of information involves description of the following: (TBS, 1984, chapter 3). (i)

Size and characteristics of samples of persons and sites

(ii)

Sources of information used and their relevance to the evaluation

(iii) Nature of the program activities and its outputs and outcomes as planned and as actually achieved (iv) Effects of the program by contrasting program performance with the criteria adopted at the time of evaluation planning (v)

Reasons for any observed gaps between the criteria and actual achievement

(vi) Analyzing the potential effect of other options (not followed by the program) for comparison purposes (vii) Costs of the program and its alternatives and their relationship with the results (viii) Extrapolating the results obtained from sample to the population

c) Guarding against the risk of unjustified conclusions The evaluators always run the risk of drawing unjustified conclusions from the information. For safeguarding against such risks, following principles are generally observed by them. (TBS, 1984, chapter 3). (i)

The analytic procedures should be related to the nature of information collected during the evaluation study. For example, if the appropriate statistical technique is mean, median should not be used. Moreover, all relevant information should be collected and not the one that suits the evaluators.

(ii)

The logic of each method of analysis, its assumptions, and any deviations from the logic should be made explicit.

(iii) The evaluators should use several methods of analysis when there is no known or well-established analytical procedure to reduce the risk of adopting a weak method. (iv) The evaluators should take into account any environmental change that took place and affected the program outcome. (v)

The evaluation should take into account the effect of other factors on results of a program.

(vi) The units of analysis should be appropriate to the way the information was collected and the types of conclusions to be drawn. For example, if we have to evaluate the effect of a program on individuals, the conclusion on cities where the individual live would be misplaced. (vii) The evaluators should perform appropriate tests of significance whenever findings are generalized to the population from which the samples were drawn, and sensitivity analyses should be carried out whenever uncertainty exists. (viii) The generalization of the evaluation results to settings other than those included in a sample should be made only if the settings examined in the sample are identical to other settings. d) Conclusions and recommendations As the evaluators proceed with their analysis, they are able to see the contours of their final report. However, as a first step, they sum up their conclusions and discuss them with the management of the entity under evaluation. As a good strategy, they commence with the discussion of those conclusions which are positive and show achievement of the program. That sets the tone for discussion of those conclusions which may not find much favor with the management. Anyhow, the evaluators have to be patient with the management. They try to understand the position of the management on each issue and accommodate their point of view, where possible, while drafting the evaluation report. Discussion of the conclusions and recommendations at this stage gives the evaluators confidence about their work and helps them decide which recommendations to be included in the final report. The evaluators take note of the recommendations which are fiercely contested by the management at this stage. As a good practice, evaluators also try to present alternative recommendations where they expect resistance from the management. 6. Reporting evaluation results Reporting results of performance evaluation is climax of the evaluatorsâ&#x20AC;&#x2122; work. The reader only sees the report and not the effort that has gone into its preparation. If the report is not

interesting or does not answer the questions that the reader is keen to find in the evaluation report, the entire effort of the evaluator may go waste. The evaluation report, therefore, requires the most intensive effort by the evaluators. a) Form and content of the report There are no universal standards about the form of the evaluation report. Different organizations define its form according to their own requirements. However, there are some general guidelines which are kept in view by most of the evaluators: (i) Reports usually have three main parts: executive summary, main report, and annexes. (ii) The reports state findings and recommendations distinctly, so that the reader can see clearly what are the facts and their analysis and what are the recommended actions. Each conclusion is, generally, self-contained, minimizing the need for crossreferencing. b) Quality of the report As with all other reports, the performance evaluation report should: (i)

Provide sufficient information about the objective, scope, methodology, sample and analytical techniques used.

(ii)

Present accurate facts, qualifying those facts which have some doubt.

(iii)

Present facts objectively and fairly, as they are and do not alter them if they are not according to the evaluatorâ&#x20AC;&#x2122;s expectations.

(iv)

Explicitly state any assumptions made in drawing the conclusions.

(v)

Have adequate documented evidence to support the conclusions.

(vi)

Have clear and unambiguous style, avoiding technical jargon as far as possible, explaining technical terms where it is unavoidable and appending an explanation of abbreviations used in the report.

(vii)

Be presented on a timely basis, as soon as possible after the evaluation.

c) Monitoring of recommendations Performance evaluation, per se, does not require monitoring of implementation of the report. However, many public sector organizations define it as the responsibility of the evaluation agency to monitor the progress on implementation of the recommendations and report it to a higher forum, which could be either legislature, top management of the program, or board of directors or an audit committee. Monitoring of the implementation of the report strengthens the accountability of the program managers. 7. Issues and challenges in performance evaluation a) Performance evaluation and decision-making Performance evaluation is an expensive activity. It becomes pointless if the recommendations made in the evaluation report are set aside as academic exercise. The biggest challenge that performance evaluation faces is that it is carried out with a big fanfare and lip service but the results of evaluations are hardly ever fed into the future planning process. The reason is that evaluation is not embedded in the decision-making process. For evaluation to be useful, evidence on program effectiveness should guide the future decision-making for allocating resources to these or similar programs. There is as yet no standard universally accepted

mechanism to ensure that the performance evaluation is embedded into the future decisionmaking. Difference governments are experimenting with different ideas. (TBS, 2005) b) Standardization of methodology, accreditation, and quality assurance Performance evaluation has not, as yet, emerged as a profession, like, for example, accounting or auditing. There is no international body for setting standards for performance evaluation or for accrediting the evaluators. Similarly, there is no standard practice for quality assurance, external to the evaluation body itself, for maintaining the standards of practice. Performance evaluators need to learn something from accountants and auditors in this respect. (TBS, 2005) c) Performance evaluation presumes a robust information system Performance evaluation presumes that there is a focused, robust, and reliable system of producing performance-related information in the public sector. It means the organizations should produce the information according to the outputs and outcomes of the program. The information should be able to withstand organizational changes such as staff turnover. It should be part of the routine functions of each staff member to produce this information. It should be collected in a cost-effective manner. Unfortunately, this is not so universally. Some organizations produce performance-related information, others do not. The standards and contents of those that produce also vary. This is thus a major challenge for performance evaluation. Until it becomes mandatory for public sector entities to produce performance information on a standardized format, the role and scope of evaluation would remain limited. d) Measuring effectiveness of evaluation Another challenge for performance evaluators is to justify their own effectiveness. How do we know that the money spent on evaluation was effective? There is, as yet, no universally accepted methodology to determine that. Some evaluations can lead to cost savings in a tangible way. Others would not be so. In a large number of cases, evaluation studies only recommend changes in the scope or focus of the program and may require some institutional changes. It is not possible in such cases to know what has been the benefit of performance evaluation. How would it have been if there was no evaluation? The theory has yet to take a leap in this direction.

References American Evaluation Association. 2004. AEA Guiding Principles for Evaluators. Fairhaven, MA.: AEA. Asian Development Bank. 2006. Guidelines for Preparing Performance Evaluation Reports for Public Sector Operations. Manila: ADB. 85 Pp. Australian National Audit Office. 1997. Programme Evaluation in the Australian Public Service. Canberra: ANAO. 129 pp. Financial Services Authority. 2002. Our Approach to Performance Evaluation. London. FSA. Pp. 69. Government Accountability Office. 1991. Designing Evaluations. Washington, D.C.: GAO. Pp. 94. Government Accountability Office. 2005. Performance Measurements and Evaluations: Definitions and Relationships. . Washington, D.C.: GAO. Pp. 5. Government Accountability Office. 1992. Quantitative Data Analysis: An Introduction Washington, D.C.: GAO. Pp. 132. Independent Evaluation Group. 2003. Independent Evaluations: Principles, Guidelines and Good Practice. Washington, D.C. : The World Bank. Available at: http://www.worldbank.org/ieg/index.html Independent Evaluation Group. 2007. Evaluation Approach. Washington, D.C. : The World Bank. Available at: http://www.worldbank.org/ieg/index.html Independent Evaluation Group. 2007. Impact Evaluation. Washington, D.C.: The World Bank. Available at: http://www.worldbank.org/ieg/index.html Mackay, Keith. 2004. Two Generations of Performance Evaluations and Management System in Australia. Washington, D.C.: The World Bank. Pp. 28. McPhee, Ian. 2006. Evaluation and Performance Audit: Close Cousins or Distant Relatives?. Canberra: Australian National Audit Office. Multilateral Development Bank Evaluation Cooperation Group. 1998. Good Practice Standards for Evaluation of MDB-Supported Public Sector Operations. MDB ECG. Washington, D.C.: The World Bank. Pp. 18. Available at: http://siteresources.worldbank.org/EXTGLOREGPARPRO/Resources/ECG_GoodPractice_S tandards.pdf Organization of Economic Cooperation and Development. 2002. Glossary of Key Terms in Evaluation and Results Based Management. Paris: OECD. Pp.40. Available at: www.oecd.org/dac/evaluation. Operations Evaluation Department. 2004. Monitoring and Evaluation: Some Tools, Methods and Approaches. Washington, D.C.: The World Bank. 26Pp.

Treasury Board Secretariat, Canada. 1984. Principles for the Evaluation of Programs by Federal Departments and Agencies. Ottawa: TBS. Available at: http://www.tbs-sct.gc.ca/eval/pubs/pubs-to-1995/principles_e.asp Treasury Board Secretariat, Canada. 1989. Working Standards for the Evaluation of Programs in Federal Departments and Agencies. Ottawa: TBS. Available at: http://www.tbs-sct.gc.ca/eval/pubs/pubs-to-1995/stand-normes-e.asp?printable=True Treasury Board Secretariat, Canada. 2005. Decision-making in Government: The Role of Program Evaluation. Ottawa: TBS. Available at: [http://www.tbs-ct.gc.ca/eval/common/contact_e.asp] Treasury Board Secretariat, Canada. 2005. Improving Professionalism of Evaluation. Ottawa: TBS. Available at: http://www.tbs-sct.gc.ca/eval/pubs/pubs-to-1995/principles_e.asp United Nations Evaluation Group. 2005. Standards for Evaluation in the UN System. New York: UNEG. Pp.23.