Selasa, 28 April 2020

“BEYOND TESTS: ALTERNATIVE IN ASSESSMENT.” by Douglas Brown

Assigment for meeting 9 and 10 "SUMMARY"
Halaman 251-277.


BEYOND TESTS: ALTERNATIVE IN ASSESSMENT.

Why, then, should we even refer to the notion of alternative when assessment already encompasses such a range of possibilities? This was the question to which Brown and Hudson (1998) responded in a TESOL Quarterly article.

to speak of alternative assessments is counterproductive because the term implies something new and different that may be "exempt from the requirements of responsible test construction" (p. 657). So they proposed to refer to "alternatives" in assessment instead. Their term is a perfect fit within a model that considers tests as a subset of assessment. Throughout this book, you have been reminded· that all tests are assessments but, more important, that not all assessments are tests.
The defining characteristics of the various alternatives in assessment that have been commonly used across the profession were aptly summed up by Brown and Hudson (1998, pp. 654-655). Alternatives in assessments.

1.     require students to perform, create, produce, or do something;
2.      use real-world contexts or Simulations;
3.      are nonintrusive in that they extend the day-to-day classroom activities;
4.      allow students to be assessed on what they normally do in class every day;
5.      use tasks that represent meaningful instructional activities;
6.      focus on processes as well as products;
7.      tap into higher-level thinking and problem-solving skills;
8.      provide information about both the strengths and weaknesses of students;
9.      are multi culturally sensitive when properly administered;
10.      ensure that people, not machines, do the scoring, using human judgment;
11.       encourage open disclosure of standards and rating criteria; and
12.       call upon teachers to perform new instructional and assessment roles.


THE DILEMMA OF MAXIMIZING BOTH PRACTICALITY AND WASHBACK
The principal- purpose-of this chapter is to--examine-some of the alternatives in assessment that are markedly different from formal tests. Tests, especially large-scale standardized tests, tend to be one-shot performances that are timed, multiple-choice, decontextualized,  norm-referenced, and that foster extrinsic motivation. On the other hand, tasks like portfolios, journals, and self-assessment are
     open-ended in their time orientation and format,
     contextualized to a curriculum,
     referenced to the criteria (objectives) of that curriculum, and
     likely to build intrinsic motivation.

One way of looking at this contrast poses a challenge to you as a teacher and test designer. Formal standardized tests are almost by definition highly practical, reliable instruments. They are designed to minimize time and money on the part of test designer and test-taker, and to be painstakingly accurate in their scoring. Alternatives such as portfolios, or conferencing with students on drafts of written work, or observations of learners over time all require considerable time and effort on the part of the teacher and the student. Even more time must be spent if the teacher hopes to offer a reliable evaluation within students across time, as well as across students (taking care not to favor one student or group of students). But the alternative techniques also offer markedly greater washback, are superior formative measures, and, because of their authenticity, usually carry greater face validity.
A number of approaches to accomplishing this end are possible, many of which have already been implicitly presented 'in this book:
     building as much authenticity as possible into multiple-choice task types and items
     designing classroom tests that have both objective-scoring sections and open-ended response sections, varying the performance tasks
     turning multiple-choice test results into diagnostic feedback on areas of needed improvement
     maximizing the preparation period before a test to elicit performance relevant to the ultimate criteria of the test
     teaching test-taking strategies
     helping students to see beyond the test: don't "teach to the test"
     triangulating information on a student before making a final assessment of competence
The flip side of this challenge is to understand that the alternatives in assessment are not doomed to be impractical and unreliable. As we look at alternatives in assessment in this chapter, we must remember Brown-and Hudson's (1998) admonition to scrutinize the practicality, reliability, and validity of those alternatives at the same time that we celebrate their face validity, washback potential, and authenticity. It is easy to fly out of the cage of traditional testing rubrics, but it is tempting in doing so to flap our wings aimlessly and to accept virtually any classroom activity as a viable alternative. Assessments proposed to serve as triangulating measures of competence imply a responsibility to be rigorous in determining objectives, response modes, and criteria for evaluation and interpretation.

PERFORMANCE-BASED ASSESSMENT.

a word about performance-based assessment is in order. There has been a great deal of press in recent years about performance-based assessment, sometimes merely called performance assessment (Shohamy, 1995; Norris et aI., 1998). Is this different from what is being called "alternative assessment"?
The push toward more performance-based assessment is part of the same general educational reform movement that has raised strong objections to using standardized test scores as the only measures of student competencies (see, for example, Valdez Pierce & O'Malley, 1992; Shepard & Bliem, 1993). The argument, as you can guess, was that standardized tests do not elicit actual performance on the part of test-takers.
Performance-based assessment implies productive, observable skills, such as speaking and writing, of content-valid tasks.
O'Malley and Valdez Pierce (1996) considered performance-based assessment to be a subset of authentic assessment. In other words, not all authentic assessment is performance-based. One could infer that reading, listening, and thinking have many authentic manifestations, but since they are not directly observable in and of themselves, they are not performance-based. According to O'Malley· and Valdez Pierce (p. 5), the following are characteristics of performance assessment:

1.      Students make a constructed response.
2.      They engage in higher order thinking, with open-ended tasks.
3.      Tasks are meaningful, engaging, and authentic.
4.      Tasks call for the integration of language skills.
S.     Both process and product are assessed.
6.      Depth of a student's mastery is emphasized over breadth.

To sum up, performance assessment is not completely synonymous with the concept of alternative assessment. Rather, it is best understood as one of the primary traits of the many available alternatives to assessment.
One of the most popular alternatives in assessment, especially within a framework of communicative language teaching, is portfolio development. According to Genesee and Upshur (1996), a portfolio is "a purposeful collection of students' work that demonstrates ... their efforts, progress, and achievements in given areas" (p. 99). Portfolios include materials such as
·         essays and compositions in draft and final forms;
·         reports, project outlines;
·         poetry and creative prose;
·         artwork, photos, newspaper or magazine clippings;
·         audio and/or video recordings of presentations, demonstrations, etc.;
·         journals, diaries, and other personal reflections;                             .
·         tests, test scores, and written homework exercises;'
·         notes on lectures; and
·         self· and peer assessments comments, evaluations, and checklists
Gottlieb (1995) suggested a developmental scheme for considering the nature and purpose of portfolios, using the acronym CRADLE to designate six possible attributes of a portfolio:       
§  Collecting
§  Reflecting
§  Assessing
§  Documenting
§  linking
§  Evaluating
As Collections, portfolios are an expression of students' lives and identities. The appropriate freedom of students to choose what to include should be respected, but at the same time the purposes of the portfolio need to be clearly specified.

We need to recognize that a portfolio is an important Document in demonstrating student achievement, and not just an insignificant adjunct to tests and grades and other more traditional evaluation. A portfolio can serve as an important Link between student and teacher, parent, community, and peers; it is a tangible product, pride, that identifies a student's uniqueness.

JOURNALS
A journal is a log (or "account") of one's thoughts, feelings, reactions, assess­ments, ideas, or progress toward goals, usually written with little attention to struc­ture, form, or correctness.
Sometimes journals are rambling sets of verbiage that represent a stream of consciousness with no particular point, purpose, or audience. Fortunately, models of journal use in educational practice have sought to tighten up this style of journal in• order to give them some focus (Staton et al., 1987). The result is the emergence of a number of overlapping categories or purposes in journal writing, such as the following:

•language-learning logs
•grammar journals
•responses to readings
•strategies-based learning logs
•self-assessment reflections
•diaries of attitudes, feelings, and other affective factors
•acculturation logs
Most classroom-oriented journals are what have now come to be known as dialogue journals. They imply an interaction between a reader (the teacher) and the student through dialogues or responses. For the best results, those responses should be dispersed across a course at regular intervals, perhaps weekly or biweekly. One of the principal objectives in. a student's dialogue journal is to carry on a conversa­tion with. the teacher. Through dialogue journals, teachers can become better acquainted with their students, in terms of both their learning progress and their affective states, and thus become better equipped to meet students' individual needs.

CONFERENCES AND INTERVIEWS
Conferences are not limited to drafts of written work. Including portfolios and journals discussed above, the list of possible functions and subject matter for conferencing is substantial:
•commenting on drafts of essays and reports­
•reviewing portfolios
•responding to journals
,• advising on a student's plan for an oral presentation
•assessing a proposal for a project
•giving feedback on the results of performance on a test
•clarifying understanding of a reading
•exploring strategies-based options for enhancement or compensation
•focusing on aspects of oral production
•checking a student's self-assessment of a performance
•setting personal goals for the near future
•assessing general progress in a course
Conferences must assume that the teacher plays the role of a facilitator and guide, not of an administrator, of a formal assessment. In this intrinsically motivating atmosphere, students need to understand that the teacher is an ally who is encouraging self-reflection and improvement. So that the student will be as candid as possible in self assessing, the teacher should not consider a conference as something to be scored or graded. Conferences are by nature formative, not summative, and their primary purpose is to offer positive washback.
conference: an interview. This term is intended to ..denote a context in which a teacher interviews a student for a designated assessment purpose. (We are not talking about a student conducting an interview of others in order to gather information on a topic) Interviews may have one or more of several possible goals, in which the teacher
•assesses the student's oral production,
•ascertains a student’s needs before designing a course or curriculum,
•seeks to discover a student's learning styles and preferences,
•asks a student to assess his or her own performance, and
•requests an evaluation of a course.

OBSERVATIONS
How do all these chunks of information become stored in a teacher's brain cells? Usually not through rating sheets and checklists and carefully completed observation charts. Still, teachers' intuitions about students' performance are not infallible, and certainly both the reliability and face validity of their feedback to students can be increased with the help of empirical means of observing their language performance. The value of systematic observation of students has been extolled for decades (Flanders, 1970; Moskowitz, 1971; Spada & Frolich, 1995), and its utilization greatly enhances a teacher's intuitive impressions by offering tangible corroboration of conclusions. Occasionally, intuitive information is disconfirmed by observation data.
We are talking about observation as a systematic, planned procedure for real-time, almost surreptitious recording of student verbal and nonverbal behavior. One of the objectives of such observation is to assess students without their awareness (and possible consequent anxiety) of the observation so that the naturalness of their linguistic performance is maximized.
Potential observation foci
·         sentence-level oral production skills (see micro skills, Chapter 7) -pronunciation of target sounds, intonation, etc. -gramn1atical features (verb tenses, question formation, etc.)
·         discourse-level skills (conversation rules, turn-taking, and other macro skills)
·         interaction with classmates (cooperation, frequency of oral production)
·         reactions to particular students, optimal productive pairs and groups, which
·         "zones" of the classroom are more vocal, etc.       ..
·         frequency of student-initiated responses (whole class, group work)
·         quality of teacher-elicited responses
·         latencies, pauses, silent periods (number of seconds, minutes, etc.)
·         length of utterances
·         evidence of listening comprehension (questions, clarifications, attention­ giving verbal and nonverbal behavior)
·         affective states (apparent self-esteem, extroversion, anxiety, motivation, etc.)
·         evidence of attention-span issues, learning style preferences, etc.
·         students' verbal or nonverbal response to materials, types of activities, teaching styles
·         use of strategic options in comprehension or production (use of communication strategies, avoidance, etc.)
·         culturally specific linguistic and nonverbal factors (kinesics; proxemics; use of humor, slang, metaphor, etc.)

SELF- AND PEERASSESSMENTS
Self-assessment derives its theoretical justification from a number of well­ established principles of second language acquisition. The principle of autonomy stand out as one of the primary foundation stones of successful learning. The ability to set one's own goals both within and beyond the structure of a classroom curriculum, to pursue them without the presence of an external prod, and to independently monitor that pursuit are all keys to success. Developing intrinsic motivation that comes from a self-propelled desire to excel is at the top of the list of successful acquisition of any set of skills.
Peer-assessment appeals to similar principles, the most obvious of which is cooperative learning. Many people go through a whole regimen of education from kindergarten up through a graduate degree and never come to appreciate the value of collaboration in learning-the benefit of a community of learners capable of teaching each 'other something. Peer-assessment is simply one arm of a plethora of tasks and procedures within the domain of learner-centered and collaborative education.
Researchers (such as Brown & Hudson, 1998) agree that the above theoretical underpinnings of self- and peer-assessment offer certain benefits: direct involvement of students in their own destiny, the encouragement of autonomy, and increased motivation because of their self-involvement. Of course, some noteworthy draw­ backs must also be taken into account. Subjectivity is a primary obstacle, to over­ come. Students may be either too harsh on themselves or too self-flattering, or they may not have the necessary tools to make an accurate assessment. Also, especially in the case of direct assessments of performance (see below), they may not be able to discern their own errors. In contrast, Bailey (1998) conducted a study in which learners showed moderately high correlations (between .58 and .64) between self rated oral production ability and scores on the OPI, which suggests that in -the assessment of general competence, learners' self-assessments may be more accurate than one might suppose.

Types of Self- and Peer-Assessment
five categories of self- and peer-assessment:
(1)direct  assessment  of performance,
(2)indirect  assessment  of performance,
(3)metacognitive assessment,
 (4) assessment of socioaffective factors, and
 (5) student self-generated tests.
1. Assessment of fa specific} performance.
 In this category, a student typically monitors him- or herself-in either oral or written production-and renders some kind of evaluation of performance. The evaluation takes place immediately or very soon after the performance. Thus, having made an oral presentation, the student (or a peer) fills out a checklist that rates performance on a defined scale. Or perhaps the student views a video-recorded lecture and completes a self-corrected  comprehension quiz. A journal may serve as a tool for such" self-assessment. Gardner (1996) recommended. that students in non-English-speaking countries access bilingual news, films, and television programs and then self-assess their comprehension ability. He also noted that video versions of movies with sub­ titles can be viewed first without the subtitles, then with them, as another form of self- and/or peer-assessment.
2. Indirect assessment of[general) competence.
 Indirect self- or peer-assessment targets larger slices of time with a view to rendering an evaluation 'of general ability, as opposed to one specific, relatively time-constrained performance. The distinction between direct and indirect assessments is the classic competence-performance distinction. Self- and peer-assessments of performance are limited in time and focus to a relatively short performance. Assessments of competence may encompass a lesson over several days, a module, or even a whole term of course work, and the objective is to ignore minor, non-repeating performance flaws and thus to evaluate general ability.
3.Metacognitive assessment [for setting goals}.
Some kinds of evaluation are more strategic in nature, with the purpose not just of viewing past performance -or competence but of setting goals and maintaining an eye on the process of their pur­suit. Personal goal-setting has the advantage of fostering intrinsic motivation and of providing learners with that extra-special impetus from having set and accomplished one's own goals. Strategic planning and self-monitoring can take the form of journal entries, choices from a list of possibilities, questionnaires, or cooperative (oral) pair or group planning.

Guidelines for Self- and Peer-Assessment
Self- and peer-assessment are among the best possible formative types of assessment and possibly the most rewarding, but they must be carefully designed and administered for them to reach their potential. Four guidelines will help teachers bring this intrinsically motivating task into the classroom successfully. '

1.Tell students the purpose of the assessment. Self-assessment is a process that many students-especially those in traditional educational systems-will initially find quite uncomfortable. They need to be sold on the' concept. It is therefore essential that you carefully analyze the needs that will be met in offering both self­ and peer-assessment opportunities, and then convey this information to students.

2.Define the task(s) clearly. Make sure the students know exactly what they are supposed to do. If you are offering a rating sheet or questionnaire, the task is not complex, but an open-ended journal entry could leave students perplexed about what to write. Guidelines and models will be of great help in clarifying the procedures. '

3.Encourage impartial evaluation of performance or ability. One of the greatest drawbacks to self-assessment is the threat of subjectivity. By showing students the advantage of honest, objective opinions, you can maximize the beneficial washback of self-assessments. Peer-assessments, too, are vulnerable to unreliability as students apply varying standards to their peers. Clear assessment criteria can go a long way toward encouraging objectivity.

4.Ensure beneficial washback through follow-up tasks. It is not enough to simply toss a self-checklist at students and then walk away. Systematic follow-up can be accomplished through further self-analysis, journal reflection, written feedback from the teacher, conferencing with the teacher, purposeful goal-setting by the student, or any combination of the above.


A Taxonomy of Self and Peer Assessment Tasks
An evaluation of self- and peer-assessment according to our classic principles of assessment yields a pattern that is quite consistent with other alternatives to assessment that have been analyzed in this chapter. Practicality can achieve a moderate level with such procedures as checklists· and questionnaires, while reliability risks remaining at a low level, given the variation within and across learners. Once students accept the notion that they can legitimately assess themselves, then face validity can be raised from what might otherwise be a low level. Adherence to course objectives will maintain a high degree of content validity. Authenticity and washback both have very high potential because students are centering on their own linguistic needs and are receiving useful feedback.
Perhaps it is now clear why "alternatives in assessment" is a more appropriate phrase than "alternative assessment." To set traditional testing and alternatives against each other is counterproductive. All kinds of assessment, from formal conventional procedures to informal and possibly unconventional tasks, are needed to assemble information on students. The alternatives covered in this chapter may not be markedly different from some of the tasks described in the preceding four chapters (assessing listening, speaking, reading, and writing). When we put all of this together, we have at our disposal an amazing array of possible assessment tasks for second language learners of English. The alternatives presented in this chapter simply expand that continuum of possibilities.



REFERENCES:
Brown.  2004 . LANGUAGE ASSESSMENT “principles and classroom practice”. New York: Longman.
Douglas Brown about  BEYOND TESTS: ALTERNATIVE IN ASSESSMENT.”





Rabu, 08 April 2020

STANDARDS-BASED ASSESSMENT BY DOUGLAS BROWN.

Assignment 7 "SUMMARY"
Halaman 104-113

STANDARDS-BASED ASSESSMENT.

In the previous chapter, you saw that a standardized test is an assessment instrument for which there are uniform procedure for administration, design, scoring, and reporting. It is also a procedure that, through repeated administrations and ongoing research, demonstrates criterion and construct validity. But a third, and perhaps the most important, element of standardized testing is the presupposition of an accepted set of standards on which to base the procedure.

Toward the end of the twentieth century, such claims began to be challenged on all fronts (see medina & neill, 1990; Kohn, 2000), and at the vanguard of those challenges were the teacher of those millions of children. Teacher saw not only possible inequity in such tests but a disparity between the content and tasks of the tests and what they were teaching in their classes.


ELD STANDARDS
The process of designing and conducting appropriate periodic reviews of ELD standards involves dozen of curriculum and assessment specialists, teachers, and researchers (Field, 2000; Kuhlman, 2001). In creating such “benchmarks for accountability” (O’Malley & Valdez pierce 1996), there is a tremendous responsibility to carry out a comprehensive study of a number of domains:
Literally thousands of categories of language raging from phonology at one end of a continuum to discourse, pragmatics, functional and sociolinguistic elements at the other end;
Specification of what ELD student’ needs are, at thirteen different grade levels, for succeeding in their academic and social development;
A consideration of what is a realistic number and scope of standards to be included within a given curriculum;
A separate set of standards (qualification, expertise, training) for teachers to teach ELD students successfully in their classroom ; and
A through analysis of the means available to assess student attainment of those standarts.

ELD ASSESSMENT.
The development of standards obviously implies the responsibility for correctly assessing their attainment. As standards-based education became more accepted in the 1990s, many school systems across the united state found that the standardized tests of past decades were not in line with newly developed standards. Thus began the interactive process not only of developing standards but also of creating standards-based assessment. The comprehensive process of developing such assessment in California still continues as curriculum and assessment specialists design, revise, and validate numerous tests (Morgan & Kuhlman, 2001; Stack et al., 2002; see also the website http://www.cde.ca.gov/statetests/celdt/celdt.html).

CASS AND SCANS.
A similar set of standards compiled by the U. S. department of labor, now know as the secretary’s commission in achieving necessary skills (SCANS), outlines competencies necessary for language in the workplace. The competencies cover language function in terms of
Resources (allocating, time, materials, staff, etc.),
Interpersonal skills, teamwork, customer  service, etc.,
Information processing, evaluating data, organizing files, etc.,
System (e.g., understanding social and organizational systems), and
Technology use and application
These five competencies are acquired and maintained through training in the basic skills (reading, writing, listening, speaking); thinking skills such as reasoning and creative problem solving; and personal qualities, such as self-esteem and sociability.

TEACHER STANDARDS.
In addition to the movement to create standards for learning, an equally strong movement has emerged to design standards for teaching . Cloud (2001,p 3) noted that a student’s “performance (on an assessment) depends on the quality of the instructional program provided, which depends on the quality of professional development”. Kuhlman (2001) emphasized the importance of teacher standards in three domains:
1. Linguistics and language development.
2. Culture and the interrelationship between language and culture.
3. Planning and managing instruction.

THE CONSEQUENCES OF STANDARDS-BASED AND STANDARDIZED TESTING.
The task of each test-taking “spy” was not to pass the TOEFL, but to memorize a subset of items, including the stimulus and all of the multiple-choice option, and immediately upon leaving the exam to telephone those item to the central organizers. As the memorized subsections were called in, a complete form of the TOEFL was quickly reconstructed. The organizer had employed expert consultants to generate the correct response for each item, there by re-creating the test items and their correct answer! For an outrageous price of many thousands of dollars, prearranged buyers of the results were given copies. Of the test items and correct response with a few hours to spare before entering a test administrations in the western hemisphere.

Test Bias.
It is no secret that standardized test involve a number of types of test bias. That bias comes in many forms: language, culture, race, gender, and learning styles (Medina & Nell 1990). The national center for fair and open testing, in it’s bimonthly newsletter, parents, students, and legal consultants. For example, reading selection in standardized test may use a passage form a literary piece that reflects a middle-class, white, Ango-saxon norm. lectures used for listening stimuli can easily promote a biased sociopolitical view.

Test-driven learning and teaching.
Yet another consequences of standardized testing is the danger of test-driven learning and teaching. When student and other test-takers know that one single measure of performance will determine their lives, they are less likely to take a positive attitude toward learning. The motives in such a context are almost exclusively extrinsic, with little likelihood of stirring intrinsic interests.  Test-driven learning is a worldwide issue. In japan, korea, and Taiwan, to name just a few countries, students approaching their last year of secondary school focus obsessively on passing the year-end college entrance examination, a major section of which is English (Kuba, 2002).

ETHICAL ISSUE: CRITICAL LANGUAGE TESTING.
One of the by product of a rapidly growing testing industry is the danger of an abuse of power in a special report on “fallout from the testing explosion”, Medina and Neill (1990, p. 36) noted:

Unfortunately, too many policymakers and educators have ignored the complexities of testing issue and the obvious limitations they should place on standardized test use. Instead, they have been seduced by the promise of simplicity and objectivity. The price which has been paid by our schools and our children for their infatuation with test is high.

The issues of critical language testing are numerous:
Psychometric traditional are challenged by interpretive, individualized procedures for predicting success and evaluating ability.
Test designer have a responsibility to often multiple mode of performance to account for varying styles and abilities among test-takers.
Test are deeply embedded in culture and ideology.
Test-takers are political subject in a political context.

A future problem with our test-oriented culture lies in the agendas of those who design and those who utilize the tests. Tests are used in some countries to deny citizenship (Shohamy, 1997,p. 10). Test may be nature be culture-biased and therefore may disenfranchise member of a nonmainstream value system. Test given are always in a position of power over test-takers and therefore  can impose social and political ideologies on test-takers through standards of acceptable and unacceptable items. Tests promote the notion the answers to real-word problem have unambiguous right and wrong answers with no shades of gray. A corollary to the letter is that tests presume to reflect in the standards discussed earlier in this chapter. Logic would therefore dictate that the test-maker must buy in to such a system of beliefs on order to make the cut.

ASSESSING GRAMMAR AND ASSESSING VOCABULARY by James and John

Assignment of meeting 15. “SUMMARY ASSESSING GRAMMAR 1-291”   Differing notions of ‘grammar’ for assessment. Grammar and linguisti...