Kamis, 19 Maret 2020

Principles of language Assessment.

Assigment 4

Summary principles of language assessment.
halaman 19-28.

PRACTICALITY.

An effective is practical.this mean that it
Is not excessively expensive,
Stay within appropriate time constraints,
Is relatively esay to administer, and
Has a scoring/evaluation procedure that is specific and time-efficient.
A test that is prohibitively expensive is impractical. A test of language profi ciency that takes a student five hours to complete is impractical-it consumes more time(and money) than necessary to accomplish its objective . A test that require individual one-on-one proctoring is impractical for a group of several hun-dred test-takers and only a handful of examiners. A test that takes a few minutes for for a student to takes and several hours for an examiner to evaluate is impractical for most classroom situations. A test that can be scored only by computer is impractical if the test takes place a thousand mile away from the nearest computer. The value and quality of a test sometimes hing on such nitty gritty, practical considerations.

RELIABILITY.

A reliable test is consistent and dependable. If you give the same test to the same student or matched students on two different occasions, that test should yield simliar result. The issue of reliability of a test may best be addressed by considering a number of factors that may contribute to the unreliability of a test. Consider the following possibilities (adapted from Mousavi, 2002,p.804):fluctuations in the student ,in scoring,in test administration, and in the test itself.

Student-Related Reliability.
The most common learner-related issue in reliability caused by temporary illness, fatigue, a “bad day”, anxiety, and other physical or psychological factors, which may make an “observed” score deviate from one’s “true” score. Also included in this category are such factory as a test-taker’s “test-wiseness” or strategies for efficient test taking (Mousavi, 2002,p.804).

Rater Reliability .
Human  error, subjectivity, and bias may enter into the scoring process. Inter rater reliability occurs when two or more scorers yield inconsistent scores of the same test, possibly for lack attention to scoring criteria, inexperience, inattention, or even preconceived biases. In the story above about the placement test, the initial were not applying the same standards.

Test Administration Reliability.
Unreliability may also result from the condition in which the test is administered. once witnessed the administration  of a test of aural comprehension in which a tape recorder played item for comprehension, but because of street noise outside the building  student sitting next to windows could not hear the tape accurately. This was a clear case of unreliability caused by the condition of the test administration. Other sources of unreliability are found in photocopying variations, the amount of light in different part of the room, variations in temperature, and even the condition of desks and chairs.

Test Reliability.
Sometimes the nature  of the test itself can cause measurement errors. If a test is too long. Test-takers may become fatigue by the time they reach the later item and hastily respond incorrectly. timed test may discriminate against students who do not perform well on a test with a time limit. We all know people (and you may be included in this category!) who ”know”  the course material perfectly but who are adversely affected by the presence of a clock ticking away. Poorly written test items (that are ambiguous or that have more than once correct answer) may be a further source of test unreliability.

VALIDITY.
By far most complex criterion an effective test- and arguably the most important principle- is validity, “the extent to which inferences made from assessment result are appropriate, meaningful, and useful in terms of the purpose of the assessment” (Gronlund, 1998,p.226). A valid test of reading ability actually measures reading ability- not 20/20 vision, not previous knowledge in a subject, nor some other variable of questionable relevance. To measure writing ability , one might ask student to write as many word ad they can in15 minute ,then simply count the word for the final score. Such a test would be easy to administer (practical), and the scoring quite dependable (reliable). But it would not constitute a valid test of writing ability without some consideration of comprehensibility, rhetorical discourse element, and the organization of ideas, among other factors.

Content- Related Evidene.
If a test actually samples the subject matter about which conclusions are to be drawn, and if it require the test-taker to perform the behavior that is being measured, it can content-related evidence of validity, often popularly referred to as content validity (e.g., Mousavi, 2002;Hughes, 2003).you can usually identify content-related evidence observationally if you can clearly the achievement that you are measuring. A test of tennis competency that asks someone to run a 100-yard dash obviously lacks content validity. If you are trying to assess a person’s ability to speak a second language in a conversational  setting, asking the learner to answer paper and pencil multiple-choice question  requiring  grammatical judgments does not achieve content validity. A test that require the learner actually to speak within some sort of authentic context does. And if a course has perhaps ten objective but only two are covered in a test, then content validity suffers.

Criterion- Related Evidence.
In the case of teacher-made classroom assessment, criterion-related evidence  is best demonstrated through  a comparison of  result of an assessment  with result  of some other measure of the same criterion. For example, in a course unit whose  objective is for student to be able to orally produce voiced and voiceless stop in  all possible phonetic environment, the result  of one teacher’s unit  test might be compared with an independent assessment- possibly a commercially produced test in a textbook- of the same phonemic proficiency. A classroom test designed to assess mastery of a point of a grammar  in communicative use will have criterion validity if test score are corroborated either  by observed subsequent behavior or by other communicative measure  of the grammar point in question.

Construct-Related Evidence.
A third of evidence that can  support validity, but one that does  not play as large a role for classroom teachers, is construct-related validity, commonly referred to as construct validity. A construct is any theory, hypothesis, or model that attempt to explain observed phenomena in our universe perception. Construct may or may not be directly  or empirically  measured-their verification  often requires inferential data. ”proficiency” and ‘communicative competence” are linguistic construct;: self-esteem” and “motivation” are psychological construct. Virtually every issue in language learning and teaching involves theoretical constructs. In a field of assessment, constructs validity asks, “does this test actually tap into the theoretical construct as it has been defined?” tests, in a manner of speaking operational definition of construct in the they operationalize  the entity that is being measured (see Davidson, Hudson, & Lynch, 1985).

Consequential Validity.
As high-stakes  assessment has gained ground  in the last two decade, one aspect of consequential validity had drawn special attention: the effect of test preparation courses and manual on performance. McNamara (200, p. 54) cautions against test result that may reflect socioeconomic condition such as opportunities for coaching that are “differentially  available to the student being assessed  (for example , because only some families can afford coaching, or because children with more highly educated parents get help from their parents)”.

Face Vadility.
An important faces of consequential validity is the extent to which ‘student view the assessment as fair, relevant, and useful for improving learning” (Gronlund, 1998, p. 201), or what is popularly know as face validity. “face validity refers to the degree to which a test looks right, and appears to measure the knowledge or abilities it claims to measure ,based on the subjective judgment of the examinees who take it, the administrative personnel who decide on its use, and other psychometrically unsophisticated observers” (mousavi, 2002, p. 244).
Sometimes student don’t’ know what is being tested when they tackle a test. They many fell, for a variety of reason, that a isn’t testing  what is it “supposed” to test. Validity means that the student  perceive the test to be valid. Face validity asks the question “does the rest, on the ‘face’ of it, appear from the learner’s perspective to test what is it  designed to test?” face validity will likely be high if learner encounter.
A well-constructed, expected format with familiar tasks,
A test that is clearly doable within the allotted time limit,
Items that are clear and uncomplicated,
Directions that are crystal clear,
Tasks that relate to their course work (content validity), and
A difficulty level that present a reasonable challenge.

AUTHENTICITY.

A fourth major principle of language testing is authenticity, a concept that is a little slippery to define, especially within the art and science of evaluating and designing tests. Bachman and Palmer (1996,p,23) define authenticity as “the degree of correspondence   of the characteristics of a given language test task to the features of a target language task, “ and then suggest an agenda for identifying those target language tasks and for transforming them into valid test items.
Essentially, when you make a claim for authenticity in a test task, you are saying that this task is likely to be enacted in the “real world.” Many test item types fail to simulate real-world tasks. They may be contrived or artificial in their attempt to target a grammatical from or a lexical item. The sequencing of items that bear no relationship to one another lacks authenticity. One does not have to look very long to find reading comprehension passages in proficiency tests that do not reflect a real-world passage.
In a test, authenticity may be present in the following ways:
The language in the test is a natural as possible.
Items are contextualized rather than isolated.
Topics are meaningful (relevant, interesting) for the learner.
Some thematic organization to items is provided, such as through a story line or episode.
Tasks represent, or closely approximate, real-world tasks.

WASHBACK

A facet of consequential validity, discussed above, is “the effect of testing on teaching and learning” (Hughes, 2003, p.1), otherwise known among language-testing specialists as washback. In large-scale assessment, washback generally refers to the effect the tests have on instruction in terms of how students prepare for the test. “Cram” courses and “teaching to the test” are examples of such washback. Another form of washback  that occurs more in classroom assessment is the information that “washes back” to students in the form of useful diagnoses of strengths and weak nesses. Washback also includes the effects of an assessment on teaching and learning prior to the assessment itself, that is, on preparation for the assessment. Informal performance assessment is by nature more likely to have built-in washback effects because the teacher is usually providing interactive feedback. Formal tests can also have positive washback, but they provide on washback if the students receive a simple letter grade or a single overall numerical score.






References:

[H._Douglas_Brown]_Language_Assessment_-_Principle(BookFi) (2)

Tidak ada komentar:

Posting Komentar

ASSESSING GRAMMAR AND ASSESSING VOCABULARY by James and John

Assignment of meeting 15. “SUMMARY ASSESSING GRAMMAR 1-291”   Differing notions of ‘grammar’ for assessment. Grammar and linguisti...