Halaman 251-277.
BEYOND
TESTS: ALTERNATIVE IN ASSESSMENT.
Why, then, should we even refer to
the notion of alternative when assessment already encompasses such a range of
possibilities? This was the question to which Brown and Hudson (1998) responded
in a TESOL Quarterly article.
to speak of alternative
assessments is counterproductive because the term implies something new and
different that may be "exempt from the requirements of responsible test
construction" (p. 657). So they proposed to refer to
"alternatives" in assessment instead. Their term is a perfect fit
within a model that considers tests as a subset of assessment. Throughout this
book, you have been reminded· that all tests are assessments but, more
important, that not all assessments are tests.
The defining characteristics of the
various alternatives in assessment that have been commonly used across the
profession were aptly summed up by Brown and Hudson (1998, pp. 654-655).
Alternatives in assessments.
1.
require
students to perform, create, produce, or do something;
2.
use
real-world contexts or Simulations;
3.
are
nonintrusive in that they extend the day-to-day classroom activities;
4.
allow
students to be assessed on what they normally do in class every day;
5.
use
tasks that represent meaningful instructional activities;
6.
focus
on processes as well as products;
7.
tap
into higher-level thinking and problem-solving skills;
8.
provide
information about both the strengths and weaknesses of students;
9.
are
multi culturally sensitive when properly administered;
10.
ensure
that people, not machines, do the scoring, using human judgment;
11.
encourage
open disclosure of standards and rating criteria; and
12.
call
upon teachers to perform new instructional and assessment roles.
THE DILEMMA OF MAXIMIZING BOTH PRACTICALITY
AND WASHBACK
The principal- purpose-of this
chapter is to--examine-some of the alternatives in
assessment that are markedly different from formal tests. Tests, especially
large-scale standardized tests, tend to be one-shot performances that are
timed, multiple-choice, decontextualized, norm-referenced, and that foster extrinsic
motivation. On the other hand, tasks like portfolios, journals, and
self-assessment are
•
open-ended
in their time orientation and format,
•
contextualized
to a curriculum,
•
referenced
to the criteria (objectives) of that curriculum, and
•
likely
to build intrinsic motivation.
One way of looking at this contrast
poses a challenge to you as a teacher and test designer. Formal standardized
tests are almost by definition highly practical, reliable instruments. They are
designed to minimize time and money on the part of test designer and
test-taker, and to be painstakingly accurate in their scoring. Alternatives
such as portfolios, or conferencing with students on drafts of written work, or
observations of learners over time all require considerable time and effort on
the part of the teacher and the student. Even more time must be spent if the
teacher hopes to offer a reliable evaluation within students across time, as
well as across students (taking care not to favor one student or group of
students). But the alternative techniques also offer markedly greater washback,
are superior formative measures, and, because of their authenticity, usually
carry greater face validity.
A
number of approaches to accomplishing this end are possible, many of which have
already been implicitly presented 'in this book:
• building as much authenticity as possible
into multiple-choice task types and items
• designing classroom tests that have both
objective-scoring sections and open-ended response sections, varying the
performance tasks
• turning multiple-choice test results into
diagnostic feedback on areas of needed improvement
• maximizing the preparation period before a
test to elicit performance relevant to the ultimate criteria of the test
• teaching test-taking strategies
• helping students to see beyond the test:
don't "teach to the test"
• triangulating information on a student
before making a final assessment of competence
The flip side of this challenge is
to understand that the alternatives in assessment are not doomed to be
impractical and unreliable. As we look at alternatives in assessment in this
chapter, we must remember Brown-and Hudson's (1998) admonition to scrutinize
the practicality, reliability, and validity of those alternatives at the same
time that we celebrate their face validity, washback potential, and
authenticity. It is easy to fly out of the cage of traditional testing rubrics,
but it is tempting in doing so to flap our wings aimlessly and to accept
virtually any classroom activity as a viable alternative. Assessments proposed
to serve as triangulating measures of competence imply a responsibility to be
rigorous in determining objectives, response
modes, and criteria for evaluation and interpretation.
PERFORMANCE-BASED ASSESSMENT.
a word about
performance-based assessment is in order. There has been a great deal of press
in recent years about performance-based assessment, sometimes merely called
performance assessment (Shohamy, 1995; Norris et aI., 1998). Is this different
from what is being called "alternative assessment"?
The push toward more performance-based
assessment is part of the same general educational reform movement that has
raised strong objections to using standardized test scores as the only measures
of student competencies (see, for example, Valdez Pierce & O'Malley, 1992; Shepard & Bliem,
1993). The argument, as you can guess, was that standardized tests do not
elicit actual performance on the part of test-takers.
Performance-based
assessment implies productive, observable skills, such as speaking and writing,
of content-valid tasks.
O'Malley and Valdez Pierce (1996)
considered performance-based assessment to be a subset of authentic assessment.
In other words, not all authentic assessment is
performance-based. One could infer that reading, listening, and thinking have
many authentic manifestations, but since they are not directly observable in
and of themselves, they are not performance-based. According to O'Malley· and
Valdez Pierce (p. 5), the following are characteristics of performance
assessment:
1.
Students
make a constructed response.
2.
They
engage in higher order thinking, with open-ended tasks.
3.
Tasks
are meaningful, engaging, and authentic.
4.
Tasks
call for the integration of language skills.
S.
Both
process and product are assessed.
6.
Depth
of a student's mastery is emphasized
over breadth.
To sum up, performance assessment is
not completely synonymous with the concept of alternative assessment. Rather,
it is best understood as one of the primary traits of the many available
alternatives to assessment.
One of the most popular alternatives in assessment,
especially within a framework of communicative language teaching, is portfolio
development. According to Genesee and Upshur (1996), a portfolio is "a
purposeful collection of students' work that demonstrates ... their efforts,
progress, and achievements in given areas" (p. 99). Portfolios include
materials such as
·
essays
and compositions in draft and final forms;
·
reports,
project outlines;
·
poetry
and creative prose;
·
artwork,
photos, newspaper or magazine clippings;
·
audio
and/or video recordings of presentations, demonstrations, etc.;
·
journals,
diaries, and other personal reflections; .
·
tests,
test scores, and written homework exercises;'
·
notes
on lectures; and
·
self·
and peer assessments comments, evaluations, and checklists
Gottlieb (1995) suggested a
developmental scheme for considering the nature and purpose of portfolios,
using the acronym CRADLE to designate six possible attributes of a portfolio:
§ Collecting
§ Reflecting
§ Assessing
§ Documenting
§ linking
§ Evaluating
As
Collections, portfolios are an expression of students' lives and identities.
The appropriate freedom of students to choose what to include should be
respected, but at the same time the purposes of the portfolio need to be
clearly specified.
We
need to recognize that a portfolio is an important Document in demonstrating
student achievement, and not just an insignificant adjunct to tests and grades
and other more traditional evaluation. A portfolio can serve as an important
Link between student and teacher, parent, community, and peers; it is a
tangible product, pride, that identifies a student's uniqueness.
JOURNALS
A
journal is a log (or "account") of one's thoughts, feelings,
reactions, assessments, ideas, or progress toward goals, usually written with
little attention to structure, form, or correctness.
Sometimes
journals are rambling sets of verbiage that represent a stream of consciousness
with no particular point, purpose, or audience. Fortunately, models of journal
use in educational practice have sought to tighten up this style of journal in•
order to give them some focus (Staton et al., 1987). The result is the
emergence of a number of overlapping categories or purposes in journal writing,
such as the following:
•language-learning
logs
•grammar
journals
•responses
to readings
•strategies-based
learning logs
•self-assessment
reflections
•diaries
of attitudes, feelings, and other affective factors
•acculturation
logs
Most
classroom-oriented journals are what have now come to be known as dialogue
journals. They imply an interaction between a reader (the teacher) and the
student through dialogues or responses. For the best results, those responses
should be dispersed across a course at regular intervals, perhaps weekly or
biweekly. One of the principal objectives in. a student's dialogue journal is
to carry on a conversation with. the teacher. Through dialogue journals,
teachers can become better acquainted with their students, in terms of both
their learning progress and their affective states, and thus become better
equipped to meet students' individual needs.
CONFERENCES AND INTERVIEWS
Conferences are not
limited to drafts of written work. Including portfolios and journals discussed
above, the list of possible functions and subject matter for conferencing is
substantial:
•commenting on drafts
of essays and reports
•reviewing portfolios
•responding to journals
,• advising on a
student's plan for an oral presentation
•assessing a proposal
for a project
•giving feedback on the
results of performance on a test
•clarifying
understanding of a reading
•exploring
strategies-based options for enhancement or compensation
•focusing on aspects of
oral production
•checking a student's
self-assessment of a performance
•setting personal goals
for the near future
•assessing general
progress in a course
Conferences must assume
that the teacher plays the role of a facilitator and guide, not of an
administrator, of a formal assessment. In this intrinsically motivating
atmosphere, students need to understand that the teacher is an ally who is
encouraging self-reflection and improvement. So that the student will be as
candid as possible in self assessing, the teacher should not consider a
conference as something to be scored or graded. Conferences are by nature
formative, not summative, and their primary purpose is to offer positive
washback.
conference: an
interview. This term is intended to ..denote a context in which a teacher
interviews a student for a designated assessment purpose. (We are not talking
about a student conducting an interview of others in order to gather
information on a topic) Interviews may have one or more of several possible goals,
in which the teacher
•assesses the student's
oral production,
•ascertains a student’s
needs before designing a course or curriculum,
•seeks to discover a
student's learning styles and preferences,
•asks a student to
assess his or her own performance, and
•requests an evaluation
of a course.
OBSERVATIONS
How
do all these chunks of information become stored in a teacher's brain cells?
Usually not through rating sheets and checklists and carefully completed
observation charts. Still, teachers' intuitions about students' performance are
not infallible, and certainly both the reliability and face validity of their
feedback to students can be increased with the help of empirical means of
observing their language performance. The value of systematic observation of
students has been extolled for decades (Flanders, 1970; Moskowitz, 1971; Spada
& Frolich, 1995), and its utilization greatly enhances a teacher's
intuitive impressions by offering tangible corroboration of conclusions.
Occasionally, intuitive information is disconfirmed by observation data.
We
are talking about observation as a systematic, planned procedure for real-time,
almost surreptitious recording of student verbal and nonverbal behavior. One of
the objectives of such observation is to assess students without their
awareness (and possible consequent anxiety) of the observation so that the
naturalness of their linguistic performance is maximized.
Potential observation
foci
·
sentence-level oral production skills
(see micro skills, Chapter 7) -pronunciation of target sounds, intonation, etc.
-gramn1atical features (verb tenses, question formation, etc.)
·
discourse-level skills (conversation
rules, turn-taking, and other macro skills)
·
interaction with classmates
(cooperation, frequency of oral production)
·
reactions to particular students,
optimal productive pairs and groups, which
·
"zones" of the classroom are
more vocal, etc. ..
·
frequency of student-initiated responses
(whole class, group work)
·
quality of teacher-elicited responses
·
latencies, pauses, silent periods
(number of seconds, minutes, etc.)
·
length of utterances
·
evidence of listening comprehension
(questions, clarifications, attention giving verbal and nonverbal behavior)
·
affective
states (apparent self-esteem, extroversion, anxiety, motivation, etc.)
·
evidence
of attention-span issues, learning style preferences, etc.
·
students'
verbal or nonverbal response to materials, types of activities, teaching styles
·
use of
strategic options in comprehension or production (use of communication strategies,
avoidance, etc.)
·
culturally
specific linguistic and nonverbal factors (kinesics; proxemics; use of humor,
slang, metaphor, etc.)
SELF- AND PEERASSESSMENTS
Self-assessment derives
its theoretical justification from a number of well established principles of
second language acquisition. The principle of autonomy stand out as one of the
primary foundation stones of successful learning. The ability to set one's own
goals both within and beyond the structure of a classroom curriculum, to pursue
them without the presence of an external prod, and to independently monitor
that pursuit are all keys to success. Developing intrinsic motivation that
comes from a self-propelled desire to excel is at the top of the list of
successful acquisition of any set of skills.
Peer-assessment appeals
to similar principles, the most obvious of which is cooperative learning. Many
people go through a whole regimen of education from kindergarten up through a
graduate degree and never come to appreciate the value of collaboration in
learning-the benefit of a community of learners capable of teaching each 'other
something. Peer-assessment is simply one arm of a plethora of tasks and
procedures within the domain of learner-centered and collaborative education.
Researchers (such as
Brown & Hudson, 1998) agree that the above theoretical underpinnings of
self- and peer-assessment offer certain benefits: direct involvement of
students in their own destiny, the encouragement of autonomy, and increased
motivation because of their self-involvement. Of course, some noteworthy draw
backs must also be taken into account. Subjectivity is a primary obstacle, to
over come. Students may be either too harsh on themselves or too
self-flattering, or they may not have the necessary tools to make an accurate
assessment. Also, especially in the case of direct assessments of performance
(see below), they may not be able to discern their own errors. In contrast,
Bailey (1998) conducted a study in which learners showed moderately high
correlations (between .58 and .64) between self rated oral production ability
and scores on the OPI, which suggests that in -the assessment of general
competence, learners' self-assessments may be more accurate than one might
suppose.
Types
of Self- and Peer-Assessment
five categories of
self- and peer-assessment:
(1)direct assessment
of performance,
(2)indirect assessment
of performance,
(3)metacognitive
assessment,
(4) assessment of socioaffective factors, and
(5) student self-generated tests.
1. Assessment of fa specific} performance.
In this category, a student typically monitors
him- or herself-in either oral or written production-and renders some kind of
evaluation of performance. The evaluation takes place immediately or very soon
after the performance. Thus, having made an oral presentation, the student (or
a peer) fills out a checklist that rates performance on a defined scale. Or
perhaps the student views a video-recorded lecture and completes a
self-corrected comprehension quiz. A
journal may serve as a tool for such" self-assessment. Gardner (1996)
recommended. that students in non-English-speaking countries access bilingual
news, films, and television programs and then self-assess their comprehension
ability. He also noted that video versions of movies with sub titles can be
viewed first without the subtitles, then with them, as another form of self-
and/or peer-assessment.
2. Indirect assessment of[general) competence.
Indirect self- or peer-assessment targets
larger slices of time with a view to rendering an evaluation 'of general
ability, as opposed to one specific, relatively time-constrained performance.
The distinction between direct and indirect assessments is the classic
competence-performance distinction. Self- and peer-assessments of performance
are limited in time and focus to a relatively short performance. Assessments of
competence may encompass a lesson over several days, a module, or even a whole
term of course work, and the objective is to ignore minor, non-repeating
performance flaws and thus to evaluate general ability.
3.Metacognitive assessment [for setting goals}.
Some kinds of
evaluation are more strategic in nature, with the purpose not just of viewing
past performance -or competence but of setting goals and maintaining an eye on
the process of their pursuit. Personal goal-setting has the advantage of
fostering intrinsic motivation and of providing learners with that
extra-special impetus from having set and accomplished one's own goals.
Strategic planning and self-monitoring can take the form of journal entries,
choices from a list of possibilities, questionnaires, or cooperative (oral)
pair or group planning.
Guidelines for Self- and
Peer-Assessment
Self-
and peer-assessment are among the best possible formative types of assessment
and possibly the most rewarding, but they must be carefully designed and
administered for them to reach their potential. Four guidelines will help
teachers bring this intrinsically motivating task into the classroom
successfully. '
1.Tell
students the purpose of the assessment. Self-assessment is a process that many
students-especially those in traditional educational systems-will initially find
quite uncomfortable. They need to be sold on the' concept. It is therefore
essential that you carefully analyze the needs that will be met in offering
both self and peer-assessment opportunities, and then convey this information
to students.
2.Define
the task(s) clearly. Make sure the students know exactly what they are supposed
to do. If you are offering a rating sheet or questionnaire, the task is not
complex, but an open-ended journal entry could leave students perplexed about
what to write. Guidelines and models will be of great help in clarifying the
procedures. '
3.Encourage
impartial evaluation of performance or ability. One of the greatest drawbacks
to self-assessment is the threat of subjectivity. By showing students the
advantage of honest, objective opinions, you can maximize the beneficial
washback of self-assessments. Peer-assessments, too, are vulnerable to
unreliability as students apply varying standards to their peers. Clear
assessment criteria can go a long way toward encouraging objectivity.
4.Ensure
beneficial washback through follow-up tasks. It is not enough to simply toss a
self-checklist at students and then walk away. Systematic follow-up can be
accomplished through further self-analysis, journal reflection, written
feedback from the teacher, conferencing with the teacher, purposeful
goal-setting by the student, or any combination of the above.
A Taxonomy of Self and Peer
Assessment Tasks
An evaluation
of self- and peer-assessment according to our classic principles of assessment yields a pattern that is quite consistent with
other alternatives to assessment that have been analyzed in this chapter.
Practicality can achieve a moderate level with such procedures as checklists·
and questionnaires, while reliability risks remaining at a low level, given the
variation within and across learners. Once students accept the notion that they
can legitimately assess themselves, then face validity can be raised from what
might otherwise be a low level. Adherence to course objectives will maintain a
high degree of content validity. Authenticity and washback both have very high
potential because students are centering on their own linguistic needs and are
receiving useful feedback.
Perhaps it is now clear why "alternatives in
assessment" is a more appropriate phrase than "alternative
assessment." To set traditional testing and alternatives against each
other is counterproductive. All kinds of assessment, from formal conventional
procedures to informal and possibly unconventional tasks, are needed to
assemble information on students. The alternatives covered in this chapter may
not be markedly different from some of the tasks described in the preceding
four chapters (assessing listening, speaking, reading, and writing). When we
put all of this together, we have at our disposal an amazing array of possible
assessment tasks for second language learners of English. The alternatives
presented in this chapter simply expand that continuum of possibilities.
REFERENCES:
Brown. 2004 . LANGUAGE
ASSESSMENT “principles and classroom practice”. New York: Longman.
Douglas Brown about “BEYOND TESTS: ALTERNATIVE IN
ASSESSMENT.”
Tidak ada komentar:
Posting Komentar