Program Evaluation
Fredrick W. Seidl

Program evaluation lies in the interface between social work practice and research.  Its primary purpose is to use scientific thinking, methods, measurements, and analysis to improve the efficiency and effectiveness of social programs and the quality of social work services.

The prevailing view is that program evaluation is an administrative practice method. Therefore, social workers who engage in administrative support activities should have a working familiarity with the means of evaluating programs.
 

DEFINITIONS AND COMPARISONS

Program evaluation differs from evaluation research, which emphasizes the results achieved by a program—its positive or negative side effects, impacts on the target audience, prediction of long-term effects, and relationship of costs to benefits (Shufflebeam, 1974). Rossi and Freeman (1985) defined evaluation research as “the systematic application of social [science] research procedures in assessing the conceptualization, design, implementation and utility of social intervention programs” (p. 19). In program evaluation, however, social workers primarily emphasize the formative or process aspects of programs, rather than the summative or outcome aspects (Gordon, 1991; Jones, 1980).

Evaluation research is more complex and comprehensive than program evaluation and requires extensive specialized training, even though carefully executed evaluation research usually includes program evaluation. Thus, evaluation research picks up where program evaluation leaves off.

Performance audits, accreditation reviews, and licensing procedures are also related to program evaluation, although they are derived from different intellectual roots and use different methodologies. Whereas program evaluation is a child of social science, performance audits are the progeny of accountancy, and the purpose of the review and the methods used reflects the intellectual roots of the practitioner (D. F. Davis, 1990). Program evaluations focus on improving the quality of social programs through social science research methods.  Performance audits focus on organizational control and accountability issues through accountancy procedures.

Licensing and accrediting processes are also used to evaluate programs. Because social science methods of data collection, other than interviewing, are rarely used in licensure and accreditation reviews, such reviews are more akin to performance audits than to program evaluations. Both licensure and accreditation reviews evaluate programs against a predetermined, externally derived set of standards that are devised not by stakeholders in the programs, but by bodies with external authority: government agencies (in licensure) and agency and professional associations (in accreditation).  These external authorities specify the criteria for the reviews, and the programs are measured against these criteria. If a program fails to “measure up,” it loses its sanction to continue; to receive funds from the government or insurers; or its good standing among peer programs, colleagues, and institutions. Evaluators call this a “go/no-go” evaluation.

In contrast, program evaluations are considered developmental. Increasingly program managers view program evaluation as a data-based, problem-solving process and as a means to improve programs through informed administrative decision making.

AGENCY-BASED PROGRAM EVALUATION

Identifying Stakeholders
Essential stakeholders, with participation from evaluators, decide the problems or issues for a program evaluation. The standards of the Evaluation Research Society (1982) refer to “formulation and negotiation,” during which “concerned parties should strive for a clear mutual understanding of what is to be done, how it is to be done, and why.”  Jones (1980) viewed program evaluation as “a political process, and researchers must build support and acceptance by recognizing the priorities of all participants with a stake in the outcome” (p. 70).

A stakeholder is anyone with an interest in the program: the sponsor, clients, social workers, the United Way, program administrators, neighborhood people, and so on. The stakeholders may change from project to project or from program to program.

Furthermore, stakeholders have different interests. For example, governmental funders may be the primary stakeholders in a cost–benefit study of a particular child abuse screening service. However, other organizational participants might not be as interested in the cost–benefit aspect. Social workers will be interested in protecting children and preserving families. The police will be interested in protecting children and punishing the abusers. Administrators will be interested in the program's ability to follow guidelines, thereby ensuring reimbursement.  And, finally, clients will be interested in the services they receive and whether there are false accusations of abuse. Although the government will be interested in carrying out screening as efficiently as possible, each of the other stakeholders will have different views of which criteria are important for evaluating the program.
 The social worker who is interested in conducting a program evaluation needs to perform several tasks. First is to decide who the audiences are and what they want to know. Internal audiences have different questions than do external ones. An external audience that provides funding for the program has its own agenda for the program, and conflicts with internal agendas are common.  Social worker–evaluators are likely to experience pressures from various directions that may be difficult to manage. Kelly and Maynard-Moody (1993) noted that engaging stakeholders in a discussion of the problem for evaluation helps participants get past parochial views of the program and build cohesion about the program's goals. In this way, program evaluation produces desired changes even while it is under way.

It is usually best to begin with who is funding the evaluation. What do they want to know?  What do they intend to do with what they know? Occasionally, program managers may want to keep the evaluation in house and not inform the funders. There may be good and sufficient reasons for this view.

Second, it is advisable to involve those who are in a position to implement recommendations that are derived from the program evaluation, usually line and staff officers. If organizational participants fail to accept the definition of the problem to be studied, they are not likely to carry out the recommendations for change.

Third, it is advisable to consult with the people who provide the program's political support—both those in positions of authority and the constituents. The reasons for their support may be a surprise to the program officers and subsequently overlooked in evaluating the program.

Finally, it is better to cast a wide net than a narrow one. It is particularly important to include minority stakeholders in the problem-development process (Madison, 1992).

Generating Problems to Study
Most commonly, evaluators solicit ideas from stakeholders through interviews or small-group meetings. Qualitative research methods, such as focus groups, which have been widely applied in recent years, and delphi and nominal-group strategies may be helpful in formulating problems for program evaluations (Kruger, 1988).

Once a list of problems is generated through the various qualitative approaches available to the social worker–evaluator, those problems that will receive attention during the evaluation must be selected. This decision requires a consideration of the costs, level of precision, time, and amount of staff attention that will be required to deal with each problem. The social worker–evaluator then renegotiates the final list of problems to be addressed in the evaluation with the stakeholders so that they understand why certain problems are to be addressed and others are not. Again, if program evaluation is viewed as a means to foster change, unless interested parties believe that their interests are represented in the evaluation process, they are not likely to support the recommendations generated during the evaluation.

Testing scientific theory is not a purpose of program evaluation, although it may be an occasional byproduct. The search for research problems in the literature of social work or in the cognate disciplines is therefore not likely to be productive.  However, familiarity with relevant research can lead to useful methodological approaches and help with interpreting findings.

THE RESEARCH CYCLE

Once the problems to be evaluated are specified, the normal research cycle is initiated. This cycle includes the formulation of hypotheses, selection of the variables of interest and their measurement, and the determination of how and by whom the data are to be collected and analyzed. Finally, the social worker–evaluator, program administrators, and staff forge agreements for making and implementing recommendations and the publication of the findings for handling (Evaluation Research Society, 1982).

Specific strategies for carrying out the program evaluation depend on the nature of the problems obtained in the negotiation–renegotiation process and the program's stage of development (Seidl & Macht, 1979; Tripodi, Fellin, & Epstein, 1978). In addition, evaluators need to consider (1)the level of technology used in the program, (2)the stability of environmental support for the program, and (3) the degree to which the program's goals are free from ambiguity, inconsistencies, and conflict in deciding specific program-evaluation strategies  (Seidl & Macht). It is better to emphasize formative evaluation and qualitative methods evaluating programs in the early stages of development, programs that use complex technologies, those that have uncertain environmental support, or one just developing clear goals than to use a summative evaluation emphasizing quantification.

Formulation of Hypotheses
A hypothesis is a statement that specifies the relationship between two or more variables. Not all pieces of sought-after information must be cast as hypotheses.  One gains no information, for example, by “hypothesizing” that the program “sees” 400 clients a year when the actual count is readily obtainable. Rather, evaluators need to formulate hypotheses to keep the evaluation focused on important issues and concerns so that stakeholders can follow the data collection and results regarding particular hypotheses that relate to their interests. For example, a board member in a rural agency may suspect that the level of participation by volunteers in a program is low because the volunteers have difficulty getting to the agency. The examination of the hypothesis, “the greater the difficulty in reaching the agency, the lower the participation of volunteers” will yield useful information and, if it is true, may suggest what actions an agency may take with regard to the transportation of volunteers.

Data Collection
In any evaluation, evaluators can conceptualize variables as input, throughput, or output variables, depending on the phase of the agency's program to which the variables pertain. Input variables describe the agency's structure and resources. They include the program's mission, the agency context (including the program's history and auspices), the setting in which services are offered, the characteristics of the client population, the characteristics (including qualifications) of the staff, and the sources of funding and costs of the program.

Throughput variables are “process” variables. They include what happens to the client by virtue of contact with the agency;  the technologies (methods) used by the professional staff; the services that are available and how they are used; and the frequency, duration, and intensity of the services provided. Concerns about cost can be both input and throughput variables because expenditures are made in establishing the infrastructure and in applying the technologies.

Output variables describe the outcomes of services. Unless these outcomes closely reflect the mission of the service, the evaluation must raise questions about why they do not and whether the agency's goals have been displaced. Typical outcomes may include improved psychological adjustment, reduced family conflict, improved personal functioning, increased self-esteem, a higher quality of life for the individual or the community, better performance in school, job placements, and ability to live independently. Program evaluations are more likely to focus on input and throughput variables than on output variables.
The scientific rigor of program evaluations may range from anecdotal accounts by the program's participants to carefully controlled experimental research designs.  Evaluators, however, strive for “practical rigor” in their evaluations so that they can achieve as much validity in the design as possible against trade-offs in costs and disruption of the program. Any evaluation less rigorous than fully randomized experimental designs indicates that concessions were made to the practical issues of delivering service in vivo. Because program evaluators rarely have the opportunity to implement experimental designs fully (this is more common in evaluation research), they need to be aware of the costs in knowledge and confidence of these compromises and interpret findings and use recommendations, fully cognizant of the limits of the information they have produced.

Description of the Program
Clients are likely to describe a program differently from practitioners, administrators, or members of the board of directors. Therefore, evaluators may find it useful to understand how a program is seen to operate from these various perspectives. One common way to do so is to describe, using a set of throughput variables, what happens to a “typical” client who comes to the agency and seeks the help the agency can provide. What is the process that engages the client?  Is the agency successful in engaging the client? What are the steps in that process? How are these steps likely to occur?  What are the expected outcomes?

Cost–Benefit Analysis
One widely used form of program evaluation, particularly among economists, is cost–benefit analysis. Cost–benefit analysis was developed to assist in making social-investment decisions by providing information about comparative dollar-value returns of various program options (Chakravarty, 1987).  In this formulation, costs are input and throughput variables and benefits are outcome variables. Programs are described in terms of their costs and expenditures for various input and throughput items. In simplistic terms, cost-benefit analysis asks whether a program's costs are greater or less than the dollar value of its benefits.
Several methodological issues immediately surface. In estimating costs, evaluators need to decide which costs they are counting: the program's sponsors, opportunity costs experienced by volunteers, or costs by government only? Because nonprofit human services agencies rarely pay property taxes, is this forgone revenue to be counted as a program cost for the municipality?  Despite the conventions concerning what is an accountable cost and what is not, issues remain.

These issues loom larger in the estimation of benefits. Although the cost–benefit model was developed to capture economic benefits, most social programs do not have dollar outcomes. What is the dollar value of a happy marriage? A prevented teenage pregnancy? An integrated community? A human life?  Thompson (1980), for example, spent considerable effort deciding the desirability of valuing a human life in monetary terms and how to do so. In cost–benefit studies on the delivery of health services, this is an important concern.

Furthermore, issues remain as to what is to be included as a benefit. Organizations do more than one thing. Should indirect benefits (like employed workers) be counted?  If so, which ones? Both practitioners and scholars correctly complain that the results of cost–benefit studies tend to be interpreted as if only the monetized results are worth attention. In so doing, users of the evaluation may miss the whole point of the program.

Approaches to Data Collection
Patton (1986) argued that there is no best plan and no perfect design and that all designs produce errors and ambiguities. Social worker–evaluators will wish to obtain the best data they can under the realistic conditions of the naturalistic (agency) environment. What is “best” depends on more than its technical measurement properties. Other considerations include the confidence of the stakeholders, reactivity, cost, and efficiency of administration. Patton  further asserted that qualitative methodologies provide fruitful approaches when there is an emphasis on individualized outcomes, or program processes, or when in-depth information is needed about certain clients. Similarly, evaluators may wish to consider qualitative approaches if there is an interest in focusing on diversity or a need for the details of how the program is implemented or if standardized tests either are  not available or would be disruptive.  Finally, when there may be unintended consequences of the program and when the administrators may be biased toward qualitative approaches, such methods may be indicated.

In the normal course of daily work, social agencies generate data. They have reports to prepare for governmental agencies, for boards of directors, or for the United Way. Many agencies, particularly those with large client populations, have well-organized management information systems (MIS) that routinely keep track of hundreds of elements of data. Although there was great hope in the 1970s that MIS could provide a wealth of data for social work research, these hopes have not materialized for several reasons. First, data are collected for managerial purposes, and although these purposes may overlap with those of evaluative research, they usually do not. Second, the data from MIS are not always accessible for research and evaluation purposes. If the quality of MIS data is poor, then a great deal of effort must go into “cleaning up” the data. Finally, it is usually difficult and occasionally impossible to obtain estimates of the reliability and validity of agency-collected data.

Nevertheless, program evaluators can frequently save themselves a great deal of trouble and expense using agency data. Collecting original data to answer questions using measures with known reliability and validity is usually best, provided that the social worker–evaluators can “get the data out.” Obtaining cooperation from busy colleagues who have to do additional work to carry out evaluations requires a great deal of diplomacy, if not authority. Many a scientifically credible program evaluation has been planned and initiated but never finished because of the difficulties in  collecting data. If the staff members consider the data being collected to be potentially threatening in that the data may expose their poor performance and thus lead to budget cuts, they can sabotage the data collection process. In this respect, social worker–evaluators have an advantage over others in obtaining such cooperation because the staff's involvement in the evaluation process is a given: Those who are to implement the evaluation must accept it. One may expect that the social worker–evaluator is more sensitive to the subtler aspects of service delivery and can explain them to colleagues.

Reporting Results for Implementation
In good program evaluation, there are no surprises. Evaluators keep stakeholders informed about how the evaluation is proceeding. As findings emerge, evaluators apprise stakeholders of the findings and solicit the staff's, the clients', and the board's thoughts and reactions. Interim reports are useful in this respect, particularly if the findings are negative. Interim reports can also help the discussion of findings and recommendations, and meetings around their validity and potential implementation can be held. (As always, these meetings are interactive and not didactic.)

Minority Issues
Program evaluation shares many of the minority issues that have been identified for other forms of social research. For example, the tendency to use standardized test scores to determine the effectiveness of programs may contribute to further inequalities (McDowell, 1992). As in other types of research, program evaluations are remiss in using race as a variable when race becomes a proxy for understanding what the minority experience means. E. D. Davis (1992) argued that evaluators overemphasize differences between groups when they ignore or dismiss important variations within groups. In a criticism specific to program evaluation, Madison (1992) contended that evaluators are more likely to leave minority stakeholders out of the evaluation process. This problem is particularly acute because members of ethnic and racial groups have the most at stake in evaluations of human services programs.

EVALUATING PROGRAM EVALUATIONS

Referring to evaluations of human services programs, Weiss (1978) remarked: “Just as most human service programs have had indifferent success in improving the lot of clients, most program evaluations have had little impact in improving the quality of programs” (p. 113). Ten years later, Weiss (1988) observed that “the influence of evaluation on program decisions has not notably increased” (p. 7). Yet, there is cause for optimism. Chambers, Wedel, and Rodwell (1992) projected a bright future for evaluation, based on changes that have occurred and are occurring in social work's approach to program evaluation. The trend has clearly been toward a more participatory practice model in which evaluators are less often outside authorities and more often colleagues and co-workers. Furthermore, the evaluators work to understand the complexities of “people work,” value diversity, and appreciate the richness of data generated by qualitative approaches to evaluation. Although they understand the contributions that  quantitative social science methods can make, as well as the limitations of these methods, evaluators bridge the gap between the social science observer–nomothetic researcher and the practicing social worker. As experience with this emerging approach accumulates, the profession will have to reevaluate evaluation.

REFERENCES
Chakravarty, S. (1987). Cost–benefit analysis. In J. Eatwell, M. Milgate, & P. Newman (Eds.), The new Pelgrave: A dictionary of economics (pp. 687–690). New York: Stockton Press.
Chambers, D. E., Wedel, K., & Rodwell, M. K. (1992). Evaluating social programs. Boston: Allyn & Bacon.
Davis, D. F. (1990). Do you want a performance audit or a program evaluation? Public Administration Review, 50, 35–41.
Davis, E. D. (1992, Spring). Reconsidering the use of race as an explanatory variable in program evaluation. New Directions in Program Evaluation, 53, 55–67.
Evaluation Research Society Standards for Program Evaluation (1982). New Directions in Program Evaluation, 15, 7–19.
Gordon, K. H. (1991). Improving practice through illuminative evaluation. Social Service Review, 65, 202–223.
Jones, W. C. (1980) Evaluation research and program evaluation: A difference without a distinction. In D. Fanshel (Ed.), Future of social work research (pp. 70–71).  Washington, DC: National Association of Social Workers.
Kelly, M., & Maynard-Moody, S. (1993).  Policy analysis in the post-positivist era: Engaging stakeholders in evaluating the economic development districts program.  Public Administration Review, 53, 135–142.
Kruger, R. A. (1988).  Focus  groups: A practical guide for applied research. Beverly Hills, CA: Sage Publications.
Madison, A. (1992, Spring). Primary inclusion of culturally diverse minority program participants in the evaluation process. New Directions in Program Evaluation, 53, 35–43.
McDowell, C. (1992, Spring). Standardized test and program evaluation: Inappropriate measures in critical times. New Directions in Program Evaluation, 53, 45–54.
Patton, M. Q. (1986). Utilization-focused evaluation (2nd ed.) Beverly Hills, CA:  Sage Publications.
Rossi, P. H., & Freeman, H. (1985). Evaluation: A systematic approach. Beverly Hills, CA: Sage Publications.
Seidl, F. W., & Macht, M. W. (1979). A contingency approach to evaluation: Manager's guidelines. In Jesse McClure (Ed.), Managing human services (pp. 190–200). Davis, CA:  International Dialog Press.
Shufflebeam, D. L. (1974). Meta-evaluation (Occasional Paper No. 3). Kalamazoo: Evaluation Center, College of Education, Western Michigan University.
Thompson, M. S. (1980). Benefit–cost analysis in program evaluation. Beverly Hills, CA: Sage Publications.
Tripodi, T., Fellin, P., & Epstein, I. (1978). Differential social program evaluation. Itasca, IL: F. E. Peacock.
Weiss, C. (1978) Alternative models for program evaluation. In W. C. Sze & J. G. Hopps (Eds.) Evaluation and accountability in human service programs (2nd ed., pp. 113–122). Cambridge, MA: Schenkman.
Weiss, C. (1988). Evaluation for decisions. Evaluation Practice, 9(1), 7–9.
FURTHER READING
Berk, R. A., & Rossi, P. H. (1990). Thinking about program evaluation. Newbury Park, CA: Sage Publications.
Langbein, L. I. (1980). Discovering whether programs work: A guide to statistical methods for program evaluation. Santa Monica, CA: Goodyear.
Nowakowski, J. (Ed.). (1987). The client perspective on evaluation San Francisco: Jossey-Bass.
Palumbo, D. J. (Ed.). (1987). The politics of program evaluation. Newbury Park, CA: Sage Publications.
Posavac, E. J., & Carey, R. C. (1989). Program evaluation methods and case studies. Englewood Cliffs, NJ: Prentice Hall.
Riddick, W. E., & Stewart, E. (1981). Workbook for program evaluation in the human services. Washington, DC: University Press of America.
Rossi, P. H. (Ed.). (1982). Standards for evaluation practice. San Francisco: Jossey-Bass.
Royce, D. (1992). Program evaluation: An introduction. Chicago: Nelson Hall.

Fredrick W. Seidl, PhD, is professor and dean, University at Buffalo, State University of New York, School of Social Work, 359 Baldy Hall, Buffalo, NY 14260.

For further information see
Clinical Social Work; Computer Utilization; Direct Practice Overview; Economic Analysis; Epistemology; Expert Systems; Information Systems; Licensing, Regulation, and Certification; Management Overview; Meta-analysis; Organizations: Context for Social Services Delivery; Policy Analysis; Psychometrics; Public Social Services Management; Quality Assurance; Quality Management; Recording; Research Overview; Social Work Practice: Theoretical Base; Social Work Profession Overview; Volunteer Management.

Key Words
cost–benefit analysis
program evaluation
research