Ó the Journal of Behavioral and Applied Management – Summer/Fall 2001 – Vol. 3(1) Page 44

A Framework for Training Students as Evaluators of Instructor Performance
Linda S. Hartenian
University of Wisconsin Oshkosh

 ABSTRACT

Student evaluations of instructor performance are important tools for instructor development and assessment. A framework for training students as evaluators of instructors is presented, incorporating four themes from the performance evaluation research—rating errors, rater accuracy, cognitive processes, and tangential factors. Goals and methods for the training program, as well as administrative issues, are presented. Finally, evaluation of benefits and costs of the training program is discussed.

A Framework for Training Students as Evaluators of Instructor Performance

Instructors typically undergo periodic evaluations of their teaching performance in conjunction with university or college policies. While purposes of these evaluations can differ (e.g., tenure, promotion, merit, certification, development of teaching skills), students often are given an opportunity to provide input about instructor performance in the classroom. Hence, they play an important role in the instructor development and evaluation process. Student involvement makes sense for a couple of reasons (Tuckman, 1995): a) students have an opportunity to regularly observe instructors in class; and, b) students are customers of the university and should provide feedback on how well instructors perform (also see Schneider, Hanges, Goldstein, & Braverman, 1994). Though some might feel that students are products  (rather than customers) of educational systems, this article errs on the side of involvement of all constituencies!

A critical assumption is that the students completing evaluations have some degree of precision and skill in performing this task. Researchers suggest that this is not the case--student evaluations are fraught with accuracy problems (cf. Nyirenda, 1994). Despite shortcomings of student evaluations, universities continue to treat student evaluations as important assessment tools (Greenwald, 1997; Greenwald & Gilmore, 1997a). Yet, a literature search revealed no articles directed toward training student evaluators. At a minimum, having discussions with students about their role in instructor evaluation would be an important step in improving the process (Smith, 1986). The argument is developed below that universities should go further by developing a unified approach toward training students.

Four themes emerging from a review of the literature are incorporated into a unified framework for training student evaluators—rater/student errors (biases) in evaluation, rater/student accuracy, rater cognitive processes, and tangential factors that affect student judgments (see Bernardin & Walter, 1977; Cronbach, 1977; and Marsh, 1984). Administrative issues in designing and implementing a training program are included. Finally, the evaluation of costs and benefits of such a program is discussed.

A Training Program for Student Evaluators

While one best way to train students does not exist, taking a unified approach toward student training is useful if we are to address the multiple issues raised in the literature review. The design of the training program begins with identification of the Goals for training. Four goals are presented which are designed to improve the accuracy of student evaluations. Four training modules are created to reflect each Goal. Several

Ó the Journal of Behavioral and Applied Management – Summer/Fall 2001 – Vol. 3(1) Page 45

issues are considered simultaneously in the design of each training module (Gagne & Briggs, 1979): the need for training; the trainee’s cognitive, emotional, and behavioral processes; and, the basic tenets of learning (refer to the Appendix for a description and general examples of events of instruction as they apply to any of the Goals presented below).

Goal 1:            Understanding Dimensions of Instructor Performance
Goal 2:            Providing Fair Evaluations
Goal 3:            Understanding the Broader Context of Instructor Evaluations
Goal 4:            Preparing for the Evaluation Process

These goals are presented in the order in which training sessions might be conducted. Goals 1, 2, and 3 could be addressed during a new student orientation. Because of the importance of individual feedback to trainees, a workshop format for training is suggested in Goals 1, 2, and 3. Given resource constraints, designers of the student training may find that lecture and discussion with general feedback will serve to improve accuracy (Athey & McIntyre, 1987). Goal 4 is directed toward returning students and should be conducted in smaller discussion groups. This allows students to process their actual evaluation experiences after training in Goals 1, 2, and 3 (Bargh & Schul, 1980).

Before beginning Goal 1 training, students could complete a short questionnaire on attitudes toward contaminants in the rating process. For example, students might be asked what they think of an instructor who assigns more than the average amount of work during a semester, what kinds of grades they expect in courses given their current grade point average, and if they are likely to give lower ratings to an instructor they dislike. Questionnaire data are held for Goal 2 discussion.

Goal 1: Understanding Dimensions of Instructor Performance

The purpose of Goal 1 is to provide an opportunity for the student to generate a cognitive schema for performance evaluations. Knowledge of dimension names and definitions is a prerequisite to developing the student’s skills in the cognitive processes of observing, storing, and recalling information and rating an instructor’s performance. Links have been demonstrated between these processes and accuracy (DeNisi & Williams, 1988). For example, Bernardin and Walter (1977) found that students who were trained in how to use an evaluation form exhibited fewer rater errors. Day and Sulsky (1995) found that proper categorization resulted in more accurate ratings. Also called frame-of-reference training, defining dimensions and teaching proper categorization of instructor behaviors provides raters with empirically developed standards for performance (Hedge & Kavanaugh, 1988). Training also increases the likelihood that information will be accessible (i.e., recalled) when the time comes to complete the instructor evaluation (Woehr, 1992). In summary, student raters should become familiar with the “job” they are being asked to evaluate (Heneman, Wexley, & Moore, 1987). Table 1 summarizes definitions of rater errors and provides recommendations for correcting rating errors. Examples of dimensions, terms, and measurement scales are suggested below. Specific training guidelines are then presented (refer to Table 2). 

Ó the Journal of Behavioral and Applied Management – Summer/Fall 2001 – Vol. 3(1) Page 46

Table 1  
Rater Errors: Definitions
and Recommended Strategies for Correction

 

 

Definitions

Methods for Correcting

Leniency: All instructors rated high

For leniency, severity and central tendency:

Severity: All instructors rated low

a) Recognize performance variability

Central Tendency: All instructors rated average

b) Be fair in evaluations

 

c) Define dimensions/use behavioral anchors

 

 

Halo: Instructor rated high (average/low) on all

Reinforce that dimensions of performance

dimensions because instructor is high

are mutually exclusive and independent of 

(average/low) on one dimension

one another

 

 

First Impression: Early attitudes toward

Take notes during semester (i.e., diary)

instructor determine ratings at end of semester

 

 

 

Recency Effect: Recent behavior is weighted

Take notes during semester (i.e., diary)

more heavily than earlier behaviors

 

 

 

Contrast Effect: Knowledge of previous

Define dimensions/use behavioral anchors

performance levels (or others' performance)

 

influences ratings in presents situation

 

 

 

Similar-to-me Effect: Instructor with qualities

Provide instructor's [job] description,

like student is rated more highly.

including performance expectations

 

Table 2

Recommendations for Goal 1:

Understanding Dimensions of Instructor Performance

 

Learning Objectives:

To accurately define dimensions, terms, and standards (levels) of performance

To correctly place instructor behaviors into dimensions

To identify levels of instructor performance

 

Workshop Format:

Demonstration, Discussion, Practice, Feedback, Discussion

 

Video Segment 1:

Demonstration, Discussion, Practice, Feedback, Discussion

 

Video Segment 2:

Instructor Behaviors Varied by Level of Performance Across Dimensions:

Learning to Distinguish Performance Levels

 

Suggestions:

Provide clear standards (expectations) for instructor performance

Use behaviorally-based evaluation measures

Use video segments that reflect actual evaluation settings

 

Ó the Journal of Behavioral and Applied Management – Summer/Fall 2001 – Vol. 3(1) Page 47

Dimensions

Dimensions of performance must be identified and defined. Dimensions are broad conceptual categories for describing what instructors do in the classroom, such as delivery of instruction, student/instructor interaction, evaluation techniques, and classroom management (see Tuckman, 1995, and Nyirenda, 1994).  Feldman (1989) concluded that student ratings could represent as many as 28 different dimensions; research continues to explore the dimensionality of instructor ratings (Marks, 2000). Whether a university or college adopts dimensions that others have created or chooses to create its own, dimensions should be comprehensive and mutually exclusive (Binning & Barrett, 1989). (The term “university” will be subsequently used.)

Dimensions should be defined. The student/instructor interaction dimension, for example, might be defined as “the extent to which the instructor maintains effective communication with the students, is aware of the students’ developmental and emotional characteristics, shows compassion and empathy, and sincerely wants students to learn.”  Once the dimensions are defined, more specific behavioral examples of how an instructor demonstrates performance are provided. To continue with the above example, the following behaviors might represent the student/instructor dimension: a) this instructor provides extra help to students who request it, b) this instructor praises or encourages students when they give a correct response, c) when students make comments, their contributions are accepted without disagreement or further discussion, d) this instructor corrects a student’s incorrect response in a condescending manner, and e) this instructor does not use the student’s name when addressing him or her. Note that behaviors are positively and negatively phrased; however, as indicated below (see Measurement Scales), students are not asked to evaluate the meaning of the behaviors. Consider the last behavioral example. In a lecture room with 300 students, not referring to students by name may be more understandable (i.e., permissible) than in a class of 20 students.

Terms

Specific terms also should be defined. While some terms are familiar, others are likely to be strange to students and others: constructs, dimensions of performance, criteria, and standards [levels] of performance. Terms may be part of the process itself (e.g., the formal name of the evaluation form). Be sure to clarify if terms have specific applications. For example, the term “instructor” may refer to a professor or to a teaching assistant who leads a discussion section of a larger class. Once terms and dimensions have been defined, measurement scales are created.

Ó the Journal of Behavioral and Applied Management – Summer/Fall 2001 – Vol. 3(1) Page 48

Measurement Scales

A measurement scale represents levels (i.e., standards) for evaluating an instructor’s performance. Measurement scales can have as many levels as desired, though five or seven are recommended. The following examples demonstrate 5-level scales:  a) below average, slightly below average, average, slightly above average, above average; b) below expectations, slightly below expectations, meets expectations, slightly above expectations, exceeds expectations; c) poor, acceptable, good, very good, excellent. Regardless of which terminology is used or how many levels are desired, each level should be defined. The process of defining levels is referred to as “providing anchors” for the standards of performance.

As mentioned earlier, behavioral examples of what students might expect an instructor to exhibit should be created. A substantive amount of research exists to support the use of behavioral definitions for the anchors and behavioral examples of instructor performance (e.g., Hartel, 1993). Behavioral-based ratings are more accurate (cf. Weirsma, VandenBerg, & Latham, 1995); use of personality traits can result in rater errors (cf. Borman & Dunnette, 1975). When ratings are based on clear standards and observable information, they are less susceptible to interpersonal affect (Park & Sims, 1989).

Recognize that the above recommendations to use behavioral-based methods assume that students are being asked to record their observations of instructor behaviors. A more controversial issue (and one beyond the scope of this article) is whether or not to have students interpret and judge the meaning of those behaviors. One way to minimize intentional distortion of ratings is for students to record frequency of instructor behaviors; use a student evaluation team to provide periodic feedback to the instructor, with final interpretation and judgment about instructor performance coming from the instructor (self assessment) and peers (peer assessment).

Training Program

Learning objectives for Goal 1 should be put in writing and shared with students in a handout (see Table 2). Training begins with students viewing videos of instructors in actual classroom settings. Videos are recommended rather than written descriptions (i.e., “paper people”) because they result in better accuracy (Ryan, Daum, Bauman, Grisez, Mattimore, Nalodka, & McCormick, 1995). Observed behavior results in better retention and retrieval of training material from memory (Kinicki, Hom, Trost, & Wade, 1995) and provides multiple cues (e.g., visual, auditory), which may result in deeper processing of information (Murphy & Balzer, 1986).

A carefully crafted video will include several demonstration, discussion, and practice segments.  Begin with a demonstration of instructor behaviors that represent each dimension (Segment 1). Following the demonstration segment, the trainer allows for discussion of the observed behaviors and the dimensions used on the evaluation form. Students then practice evaluating a video segment that includes a variety of instructor behaviors. The trainer presents correct conclusions (i.e., identification of behavior and placement of behavior in a dimension) and allows for discussion of the practice segment. The next demonstration segment presents several instructor behaviors, but

Ó the Journal of Behavioral and Applied Management – Summer/Fall 2001 – Vol. 3(1) Page 49

varies performance levels (Segment 2). The trainer discusses with students the examples of various instructor behaviors and why the levels of performance differ in each example. Students are presented with the practice segment in which they are asked to rate instructor behaviors using the measurement scale. The trainer provides feedback on and allows for discussion of the conclusions regarding the observed levels of performance.

In concluding the training under Goal 1, students are reminded that they are expected to recall this training when they proceed to Goals 2, 3, and 4 and in the actual evaluation setting. Once students have developed a common terminology and understanding, they are ready to begin the more challenging training--learning to evaluate consistently.

Goal 2: Providing Fair Evaluations

The term “fairness” frequently is divided into two types--procedural (process) and distributive (outcome). For the purposes of Goal 2, fairness of student evaluations is a procedural issue for two reasons. First, student evaluations are inputs into evaluation and development decisions about an instructor. They are interpreted by an instructor’s peers and/or administration to arrive at an outcome (e.g., merit pay, identifying a need for instructor development). Second, Goal 2 is concerned specifically with the process that students follow when completing instructor evaluations. If instructors are to believe in the fairness of the performance evaluation system that includes student evaluations, then instructors must perceive that students will provide fair evaluations. Universities should be concerned about instructor perceptions of fairness--Harris (1988) suggested that perceptions of unfairness lead to decreased performance.

One way to influence these perceptions would be to ensure that student evaluations were reliable and accurate. Training improves the reliability of ratings by reducing rating errors (Fay & Latham, 1982). Training in observation skills has been found to increase accuracy (Hedge & Kavanaugh, 1988). Other research has confirmed that student ratings are stable (i.e., consistent) over time (Hanges, Schnieder, & Niles, 1990). Simply put, “Will a student observe and record the same instructor behavior in the same way from one time to another and across different instructors?”  Because consistency of ratings is a necessary (though not sufficient) step to ensure accuracy (Schmidt, 1990), additional training efforts to improve accuracy must be undertaken (i.e., Goals 1, 3, and 4). The training program for Goal 2 first revisits the questionnaire data that were collected prior to Goal 1 training. Then, consistency of evaluations is taught using video Segments 3, 4, and 5.

Training Program

Goal 2 training begins by providing to students written learning objectives (see Table 3) and a handout with definitions of rater errors (e.g., Table 1). Use the results from the questionnaire about biases and attitudes to lead into a lecture about fairness of evaluations. Students should be told that the ultimate goal is to have accurate data about an instructor’s performance and that any factors that reduce accuracy must be addressed. The training program begins with a short lecture about how biases and attitudes can result in reduced accuracy. Then, students have an opportunity to practice providing consistent evaluations and reducing biases in the evaluation process.

Ó the Journal of Behavioral and Applied Management – Summer/Fall 2001 – Vol. 3(1) Page 50

Table 3

Recommendations for Goal 2: Providing Fair Evaluations

 

Learning Objectives:

To consistently observe/record an instructor’s performance

To consistently observe/record behaviors from one instructor to another

To demonstrate more complex processing of instructor performance data

To learn about the role of biases and attitudes in the evaluation process

 

Workshop Format:

Demonstration, Discussion, Practice, Feedback, Discussion

 

Video Segment 3:

Instructor Behaviors at Different Points in Classroom Lecture:

Learning to Consistently Observe and Record

 

Video Segment 4:

Several Instructors’ Behaviors at Different Points in Classroom Lecture:

Learning to Consistently Observe and Record

 

Video Segment 5:

Several Instructors’ Behaviors Varied by Level of Performance Across

Dimensions: Learning to Consistently Observe and Record

 

Format:

Lecture on Issues Tangential to the Evaluation Process

Pre- and Post-Test:

Change in Knowledge about Attitudes; Change in Attitudes

 

Suggestions:

Use video segments that reflect actual evaluation settings

Suggest students take notes during semester on observed instructor behaviors

Biases/attitudes. The lecture/discussion should address how each bias might be manifested. Students’ early attitudes toward the course and instructor (Sauber & Ludlow, 1988), course workload (Greenwald & Gilmore, 1977b), and attitudes toward grades (Vasta & Sarmiento, 1970) have been related to student ratings at the end of a course. A student might rate an instructor high if he enjoys sports, as she does (similar-to-me effect). Or, a student rates all instructors high because he believes that the university would not hire someone who could not teach (leniency). Students should be warned about contrast effects as they undertake the task of observing and rating several instructors.

Ó the Journal of Behavioral and Applied Management – Summer/Fall 2001 – Vol. 3(1) Page 51

Two types of contrast effects are (1) knowledge about an instructor’s previous performance (Gaugler & Rudolph, 1992) and (2) comparisons to instructors previously rated (Murphy & Balzer, 1986). Students may be particularly susceptible to the latter. Alert students to the contrast effect and tell them that an opportunity to discuss contrasts will be provided after they have completed Segments 4 and 5.

Goal 2 video segments follow the same approach as for Goal 1: demonstration, discussion, practice, feedback, and discussion. This short lecture is designed to address unintentional biases--biases that the student is not aware he or she is exhibiting. A more problematic issue is a student’s intentional shifting of evaluation ratings. The use of student evaluation teams may help control for intentional rating inflation or deflation. Student evaluation teams are discussed further in the section entitled, “Administration Decisions.”

Consistency across time. The trainer explains the concept of reliability and why reliability is a necessary condition for the valid use of student evaluations. A videotaped classroom lecture shows an instructor exhibiting the same behavior at one or more different times (Segment 3). The trainer leads a discussion about the similarity of behaviors from the beginning to the end of a classroom lecture and suggests how instructors might exhibit similar behaviors at different times during a semester.  Students practice with another video segment, after which the trainer provides feedback on accuracy of student observations and recording. Students discuss their observations before moving to the next segment.

Consistency across instructors. The demonstration for Segment 4 shows different instructors exhibiting similar behaviors across all dimensions. Instructors might be teaching the same or different course material but the behaviors they exhibit are the similar, allowing for individual differences in style and approaches to teaching. The students practice on another video segment, receive feedback, and discuss accurate placement of several instructors’ behaviors in dimensions.

More complex processing of performance information occurs in Segment 5 where different performance levels for several instructors are given. The demonstration and practice portions of the video should be prepared in a way that elicits attitudinal and contrast effects and most closely resembles what students might expect in the actual evaluation setting. After the practice segment is completed, discuss the impact of student attitudes, biases, and contrast effects on student observational and recording accuracy. Provide additional time to answer student questions about the questionnaire and what to expect in the actual evaluation setting.

Reassure students that the normal evaluation setting should allow for some control of a contrast effect. A delay between observing and rating has been shown to minimize contrast effects (Murphy, Balzer, Lockhart, & Eisenman, 1985). Remind them that the type of training received in Goal 1 (i.e., frame-of-reference) may also reduce contrast effects by providing prototypes for performance effectiveness (Murphy & Balzer, 1986). A more challenging issue may be training students to ignore an instructor’s personality (Clayson & Haley, 1990). The design team should follow research recommendations for

Ó the Journal of Behavioral and Applied Management – Summer/Fall 2001 – Vol. 3(1) Page 52

reducing the impact of student attitudes toward the instructor. For example, the evaluation system should be performance-based (see Tuckman, 1995, for a review). Behavioral examples (discussed under Goal 1) will focus student attention away from personalities and traits (Weirsma et al., 1995). And, students should be encouraged to take notes on instructor behaviors throughout the semester. DeNisi and his colleagues found that diaries increase recall and rating accuracy and control for errors (e.g., halo, leniency) (DeNisi, Robbins, & Cafferty, 1989; DeNisi & Peters, 1992). Students would not need to keep extensive diaries but could use a behavioral checklist to note observed behaviors. In the evaluation setting, remind students to base their ratings on observed variability across instructors (Stamoulis & Hauenstein, 1993). Finally, students are reminded to recall this training material in subsequent Goals and in the actual evaluation setting.

Goal 3: Understanding the Broader Context of Instructor Evaluations

The purpose of Goal 3 is to increase student knowledge about the context in which their input is solicited. Several reasons support sharing information about the context. First, instructor behaviors and student evaluations don’t occur in a vacuum. Training should not focus, therefore, only on how to reduce rater errors. Second, trainees are more likely to retain and recall information if they understand the broader context (Gagne & Briggs, 1979). Third, satisfaction with the performance evaluation system improves with additional knowledge (Giles & Mossholder, 1990). Fourth, more information results in better reliability across sources of data (Williams & Levy, 1992). Finally, credibility of the system may be enhanced (Nyirenda, 1994), resulting in students taking the process more seriously. (Also see Cardy & Dobbins, 1994, for a review of formats, cognitive schemata, and purposes of performance appraisal.)

The broader context includes the purpose of performance evaluation, the types of performance data collected, the sources of performance data, and how information is used (refer to Table 4). One might reasonably expect that broad university policies and procedures for use of student evaluations are established in a faculty handbook. College or department by-laws may guide procedures for collecting and using student evaluation data. At a minimum, students should know where they could access these policies and procedures.

Ó the Journal of Behavioral and Applied Management – Summer/Fall 2002 – Vol. 3(1) Page 53

Table 4

Recommendations for Goal 3:

Understanding the Broader Context of Instructor Evaluations

 

Learning Objects:

To know the purpose of evaluations

To know the sources of instructor evaluation information

To understand the full process of evaluation

 

Format:

Lecture on Broader Context

 

Suggestions:

Tell students how to find university policies on use of instructor evaluations

Provide for a transition to Goal 4 and to the actual evaluation setting

Training Program

Learning objectives are shared with the students. Students will have become familiar with evaluation criteria (Goal 1), but they may not know how the criteria were developed. Telling students if the evaluation criteria were developed internally and if students were involved may lend credibility to the process.

Students should also know the purpose of evaluations. The difficulty is that knowledge of the purpose of the rating can influence accuracy. When the purpose of evaluation is personnel decision making, ratings tend to be more lenient (Harris, Smith, & Champagne, 1995) and susceptible to bias (i.e., a popularity contest) (Tuckman, 1995). A developmental purpose results in increased feedback and [instructor] satisfaction with feedback (Tharenou, 1995). However, without knowing why performance evaluations are done, students may not take them seriously.

Students should understand that their evaluations are only one source of information on an instructor’s teaching performance. Other sources include peer assessments and self-assessment. Peers are considered more appropriate sources of evaluation for some criteria, such as course planning/preparation and keeping up with teaching-related professional fields. Peers also provide input on delivery of instruction via classroom visits. Though peer reviews have been criticized on a number of points, they are reliable and valid indicators of performance (Latham & Wexley, 1994). Self-assessment includes an examination of strengths and weaknesses resulting in a developmental plan for improving teaching effectiveness. To reduce the possibility of a leniency effect, Fahr, Werbel, and Bedeian (1988) suggest that self-appraisals be supported by objective data. Instructors might prepare a portfolio of classroom-related materials and assessment methods (e.g., syllabi, exams, student projects). Participation through self-assessment may increase the instructor’s acceptance of the process and accountability for improving (see an expanded discussion of accountability in the section on Administration Decisions).

Finally, students may be informed about how student evaluations are used.  Though specific student feedback may be looked at more closely at lower levels in the university (i.e., department), if students understand the potential impact of their evaluations on the short- and long-term development plans for the instructor (and hence achievement of department and university vision), they may take them more seriously. Further, upward feedback [from students] can be used to reinforce team values (Auteri, 1994).

Ó the Journal of Behavioral and Applied Management – Summer/Fall 2001 – Vol. 3(1) Page 54

Summary of Goals 1, 2, and 3

Training in the first three Goals should prepare students adequately to conduct evaluations during their first year at the university. Encourage students to make a transition into the actual setting by allowing student trainees to visit actual classes early in the semester to practice their newly-acquired rating skills. The typical evaluation process and setting should be explained (e.g., evaluation surveys are administered by the testing center in the actual classroom) as well as unique practices (e.g., criteria for evaluation vary by department). Finally, students should be told that the retraining they will receive throughout their college career will provide an opportunity to discuss their experiences and address particularly challenging evaluation settings (e.g., pedagogical approaches such as cooperative learning groups or lecture may make it challenging to compare instructors).

Goal 4: Preparing for the Evaluation Process

Goal 4 is created to retrain students periodically in important aspects of the evaluation process. While the training that students received in Goals 1, 2, and 3 has a priming effect that encourages recall at the appropriate times, additional cues need to be provided to ensure that students remember their training. Simple reminders in the evaluation instructions will cue recall (Anderson, 1985). For example, “using the dimensions for evaluating instructor performance, complete the evaluation form, considering the instructor behaviors that you observed during the semester. Recognize the types of biases that can interfere with evaluations and avoid them.”  However, Goal 4 recognizes that the effects of rater training decrease after 6-12 months (Ivancevich, 1979). Martin and Bartol (1986) recommended annual retraining. Besides cueing, retraining is also an important opportunity to share changes that may have occurred in dimensions (Goal 1) or administrative procedures (Goal 3).

Table 5

Recommendations for Goal 4:  Preparing for the Evaluation Process

 

Learning Objectives:

To recall dimensions, terms, and behaviors (Goal 1)

To recall the importance of providing fair and accurate evaluations (Goal 2)

To recall the context within which evaluations are conducted (Goal 3)

 

Format:

Discussion; Question and Answer

 

Suggestions:

Provide visual cues to enhance recall; provide cues in evaluation instructions

Allow time for small group discussion and sharing real evaluation experiences

Training Program

First, share the learning objectives with students (see Table 5). Then, begin the training. Since Goal 4 involves retraining, visual cues may be sufficient to induce students to recall material from Goals 1, 2, and 3. Overhead transparencies could summarize the important points. Goal 4 is an appropriate time to use student groups for discussions (e.g., small groups of 4-5 students, with 5-6 groups in a training session). Training sessions lasting 1.5 hours may be sufficient (for Goal 4).

Annual retraining need not be as extensive as training in the previous three Goals, but it does require a different kind of information processing. Students may have questions about their training in Goals 1, 2, and 3 and based on their first-year experience with

Ó the Journal of Behavioral and Applied Management – Summer/Fall 2001 – Vol. 3(1) Page 55

conducting evaluations. Be sure to allow for sufficient group discussion time to generate student questions and to allow students to share real situational experiences. The role of the trainer is to provide clarification answers and to help students deal with actual rating situations.

Retraining of returning students can be conducted on the same day that new students receive training (i.e., new student orientation). Expect some resistance from returning students who are now required to attend an additional orientation session. Resistance might be lessened by holding retraining sessions in residence halls or at the unit level (i.e., college or department). Resistance might come from those who are responsible for coordinating the training. For example, documenting whether or not students attended the required training for all four Goals is time-consuming. A more appealing approach may be to create mechanisms by which students and instructors are motivated and accountable for involvement in the development and implementation phases of this training program.

Administrative Issues

Several administrative issues need to be addressed to accomplish the suggested training.  Mechanisms must be put into place to show commitment to the training. Design and implementation teams must be formed to identify (or develop) criteria, to design the training, and to coordinate and implement the training (refer to Table 6). Some of these issues are elaborated below.

Table 6

Administrative Issues in Design and Implementation

of the Training Program

 

Cost/Benefit Analysis:

Number of New Students (Goals 1, 2, and 3)

Number of Returning Students (Goal 4)

Determine Training Method - Workshop, Lecture, Discussion, Q/A

Determine How to Motivate Student/Instructor Participation

Measure Increased Student Accuracy, Improved Decision Making, and Morale

 

Forming a Design Team:

Identification (or further development) of Criteria/Behaviors

Creation of Training Modules for Goals 1, 2, 3, 4

Creation of Videos

Creation of Lectures

Creation of Training Module to Train the Trainers

 

Training the Trainers:

Select Internal Trainers: Instructors versus Staff

Determine When to Train

 

Coordinating Training:

Determine When to Offer Goal 1, 2, 3 Training to New Students

Determine When to Offer Goal 4 Training to Returning Students

Find Suitable Location Based on Training Method

Ó the Journal of Behavioral and Applied Management – Summer/Fall 2001 – Vol. 3(1) Page 56

Administration Decisions

Universities demonstrate their commitment to this process in a number of ways, some of which are resource-related. By including accountability issues in vision and mission statements, the university signals the value of time spent on developing policies and procedures that reinforce accountability at all levels in the university. A broad discussion of accountability is beyond the scope of this article; however, student accountability for conscientiousness in the evaluation process can be encouraged by administration.

In the context of ensuring accuracy of student evaluations, instructors and students should be involved in the design, development, and implementation phases of the training program. Instructor participation reinforces acceptance of the system. Student participation reinforces accountability for accuracy. If students are held accountable, they may, in part, be motivated to attend training and retraining sessions. The essence of the discussion that follows would be communicated to students when they are told that they must attend training and retraining sessions.

Research has supported that raters are more vigilant, are more accurate, and take the process more seriously when they are held accountable (Hauenstein, 1992; Mero & Motowidlo, 1995). London, Smither, and Adsit (1997) suggested that anonymity can be maintained and accountability can be moderately increased by using a facilitator to encourage discussions between the rater and the person being rated. In the student/instructor evaluation setting, universities are encouraged to adopt a student evaluation team approach for providing feedback to instructors during the semester. Students bring instructor-, course-, or classroom-related issues to the student evaluation team for discussion. The team, in turn, presents the issues to the instructor at pre-set times throughout the semester. No one is identified directly, but the student evaluation team does require individual students to be clear and accurate with their behavioral examples. Intentional distortion of ratings be minimized, and if the instructor uses this feedback to improve the course, students are reinforced to provide course-appropriate evaluations. This form of student evaluation and feedback may or may not replace the more traditional end-of-course evaluation, but if handled correctly, can contribute to accountability in the evaluation process.

Method/Length of Training Sessions

As suggested earlier, the number of students that need training will determine the cost associated with implementing this training. Workshop methods are the most time-consuming; lectures are the easiest way to share information with large numbers of students. Discussion groups can be incorporated into both workshops and lectures.  An important aspect in training for Goals 1, 2, and 3 is the opportunity for students to receive feedback on their responses. If forms of feedback can be provided in a lecture and discussion format, some improvement in accuracy will occur (Stamoulis & Hauenstein, 1993).

Ó the Journal of Behavioral and Applied Management – Summer/Fall 2001 – Vol. 3(1) Page 57

In research, the length of training has varied from as little as one hour (Bernardin & Walter, 1977) to as many as fifteen hours (Martin & Bartol, 1986). Hedge and Kavanaugh (1988) concluded that two hours of training were not enough. Universities will need to determine a break-even point that establishes how much training is required to generate improvements in reliability and accuracy of student ratings and faculty/instructor morale (also see Cost-Benefit Analysis, below).

Internal/External Trainers

Internal trainers are recommended. Universities have a wealth of expertise at their fingertips. Advantages of internal trainers include knowledge of the university, lower costs, instructor buy-in (through participation/involvement), and improved transfer of training because follow-up questions can be directed to an on-site trainer. Instructors or staff can provide the training. In either case, trainers should be trained in the Goals and methods for achieving the goals. Though each of these phases could be completed at the unit level (i.e., college or department), university-level planning and design are encouraged to ensure consistency.

Design Team

In the simplest sense, a design team would represent academic disciplines and include persons with expertise in designing performance evaluation systems. Student representatives (e.g., from student governance groups) would be welcome additions to this team. Instructors should be involved in the identifying and specifying of performance dimensions and measurement scale anchors to promote buy-in of the final product.

The design team responsibilities begin with identifying (or further developing) criteria for evaluation. Since Smith and Kendall’s (1963) introduction of behavioral scaling methods, several behavioral-based approaches have been proposed over the years, including behaviorally-anchored rating scales, behavioral expectation scales, behavioral observation scales, and behavioral summary scales (see summaries by Cardy & Dobbins, 1994, and Peters & DeNisi, 1990). The design team creates the training to accomplish the Goals. Borman (1977) and Latham, Wexley, and Pursell (1975) provide instructions on creating videos and guidelines for workshop development. The design team is also responsible for determining when and where to hold the training sessions. Finally, the design team provides support to the implementation team to coordinate and schedule the actual training activities.

An overall summary of the process for designing and implementing the training program is provided in a checklist format in Table 7.

Ó the Journal of Behavioral and Applied Management – Summer/Fall 2001 – Vol. 3(1) Page 58

Table 7

A Design and Implementation Check List

 

Design