Language Assessment

Jumat, 08 Mei 2020

ASSESSING READING AND ASSESSING WRITING By Douglas Brown

Assignment of meeting 14.

“SUMMARY ASSESSING READING 185-217”

ASSESSING READING

TYPES (GENRES) OF READING

As we consider a number of different types or genres of written texts, the components of reading ability, and specific tasks that are commonly used in the assessment of reading, let's not forget the unobservable nature of reading. Like listening, one cannot see the process of reading, nor can one observe a specific product of reading. Other than observing a reader's eye movements and page turning, there is no technology ~hat enables us to "see" sequences of graphic symbols traveling from the pages of a book into compartments of the brain (in a possible bottom-up

process). Even more outlandish is the notion that one might be able to watch information from the brain make its way down onto the page (in typical top-down strategies). Further, once something is read-information from the written text is stored-no technology allows us to empirically measure exactly what is lodged in the brain. All assessment of reading must be carried out by inference.

MICRO SKILS, MACRO SKILS, AND STRATEGIES FOR READING

Aside from attending to genres of text, the skills and strategies for accomplishing reading emerge as a crucial consideration in the assessment of reading ability. The micro- and macro skills below represent the spectrum of possibilities for objectives in the assessment of reading comprehension.

TYPES OF READING

1. Perceptive. In keeping with the set of categories specified for listening comprehension, Similar specification are offered here; except with some differing terminology to capture the uniqueness of reading.

2. Selective. This category is largely an artifact of assessment formats. In order to ascertain one's reading recognition of lexical, grammatical, or discourse features of language within a very short stretch of language, certain typical tasks are used: picture-cued tasks, matching, true/false, multiple-choice, etc.

3. Interactive. Included among interactive reading types are stretches of language of several paragraphs to one page or more in which the reader must, in a psycholinguistic sense, interact with the text. That is, reading is a process of negotiating meaning; the reader brings to the text a set of schemata for understanding it, and intake is the product of that interaction. Typical genres that lend themselves to interactive reading are anecdotes, short narratives and descriptions, excerpts from longer texts, questionnaires, memos, announcements, directions, recipes, and the like.

4. Extensive. Extensive reading, as discussed in this book, applies to texts of more than a page, up to an including professional articles, essays, technical reports, short stories, and books.

DESIGNING ASSESSMENT TASKS: PERCEPTIVE READING

Reading Aloud

The test-taker sees separate letters, words, and/or short sentences and reads them aloud, one by one, in the presence of an administrator. Since the assessment is of reading comprehension, any recognizable oral approximation of the target response is considered correct.

Written Response

the same stimuli are presented, and the test-taker's task is' to reproduce the probe in writing. Because of the transfer across different skills here, evaluation of the test taker's response must be carefully treated. If an error occurs, make sure you determine its source; what might be assumed to be a writing error, for example, may actually be a reading error, and vice versa.

Multiple-Choice

Multiple-choice responses are not only a matter of choosing one of four or five possible answers. Other formats, some of which are especially useful at the low levels of reading, include same/different, circle the answer, true/false, choose the letter, and matching.

Picture-Cued Items

Test-takers are shown a picture, such as the one on the next page, along with a written text and are given one of a number of possible tasks to perform.

DESIGNING ASSESSMENT TASKS:SELECTIVE READING

Multiple-Choice (for Form-Focused Criteria)

By far the most popular method of testing a reading knowledge of vocabulary and grammar is the multiple-choice format, mainly for reasons of practicality: it is easy to administer and can be scored quickly. The most straightforward .multiple-choice items may have little context, but might serve as a vocabulary or grammar check.

The context of the story in this example may not specifically help the test-taker to respond to the items more easily, but it allows the learner to attend to one set of related sentences for eight items that assess vocabulary and grammar. Other contexts might involve some content dependencies, such that earlier sentences predict the correct response for a later item.

Matching Tasks

At this selective level of reading, the test-taker's task is simply to respond correctly, which makes matching an appropriate format. The most frequently appearing criterion in matching procedures is vocabulary.

Alderson (2000, p.: 218) suggested matching procedures at an even more sophisticated level, where test taker have to discern pragmatic interpretations of certain signs or labels such as "Freshly made ·sandwiches" and "Use before 10/23/02."

Editing Tasks

Editing for grammatical or rhetorical errors is a widely used test method for assessing linguistic competence in reading. The TOEFL® and many "other tests employ this technique with the argument that it not only focuses on grammar but also, introduces a simulation of the authentic task of editing, or discerning errors in written passages. Its authenticity may be supported if you consider proofreading as a real-world skill that is being tested. Here is a typical set of examples of editing.

Picture-Cued Tasks

In the previous section we looked at picture-cued tasks for perceptive recognition of symbols and words. Pictures and photographs may be equally well utilized for examining ability at the selective level. Several types of picture-cued methods are commonly used.

1. Test-takers read a sentence or passage and choose one of four pictures that is being described. The sentence (or sentences) at this level is more complex.

2. Test-takers read a series of sentences or definitions, each describing a labeled part of a picture or diagram. Their task is to identify each labeled item. In the following diagram, test-takers do not necessarily know each term, but by reading the definition they are able to make an identification.

Gap-Filling Tasks

Many of the multiple-choice tasks described above can be converted into gap-filling, or "fill-in-the-blank, items in which the test-taker's response is to write a word or phrase. An extension of simple gap-filling tasks is to create sentence completion items where test-takers read part of a sentence and then complete it by writing a phrase.

DESIGNING ASSESSMENT TASKS: INTERACTIVE READING

Cloze Tasks

One of the most popular types of reading assessment task is the cloze procedure. The word cloze was coined by educational psychologists to capture the Gestalt psychological concept of "closure," that is, the ability to fill in gaps in an incomplete image (visual; auditory, or cognitive) and supply (from background schemata) " omitted details.

Cloze tests are usually a minimum of two paragraphs in length in order to account for' discourse expectancies. They can be constructed relatively easily as long as the specifications for choosing deletions and for scoring are clearly defined. Typically every seventh word (plus or minus two) is deleted (known as fixed-ratio deletion), but many cloze test designers instead use a rational deletion procedure , of choosing deletions according to the grammatical or discourse functions of the words. Rational deletion also allows the designer to avoid deleting words that would be difficult to predict from the context.

Impromptu Reading Plus Comprehension Questions

If cloze testing is the most-researched procedure for assessing reading, the traditional "Read a passage and answer son1e questions" technique is undoubtedly the oldest and the most common. Virtually every proficiency test uses the format, and one would rarely consider assessing reading without some component of the assessment minvolving impromptu reading and responding to questions.

Notice that this set of questions, based on a 250-word passage, covers the comprehension

of these features:

• main idea (topic)

• expressions/idioms/phrases in context

• inference (implied detail)

• grammatical features

• detail (scanning for a specifically stated detail)

• excluding facts not written (unstated details)

• supporting idea(s)

• vocabulary in context

Short-Answer Tasks

Multiple-choice items are difficult to construct and validate, and classroom teachers rarely have time in their busy schedules to design such a test. A popular alternative to multiple-choice questions following reading passages is the age. old short-answer format. A reading passage is presented, and the test-taker reads questions that must be answered in a sentence or two. Questions might cover the same specifications indicated above for the TOEFL reading, but be worded in question form.

Editing (Longer Texts)

Imao's (2001) test introduced one error in each numbered sentence. Test-takers followed the same procedure for marking errors as described in the previous section. Instructions to the student included a sample of the kind of connected prose that test-takers would encounter.

Scanning

Scanning is a strategy used by all readers to find relevant information in a text. Assessment of scanning is carried out by presenting test-takers with a text (prose or something in a chart or graph format) and requiring rapid identification of relevant bits of information. Possible stimuli include ,r'

• a one- to two-page news article,

• an essay,

• a chapter in a textbook,

• a technical report,

• a table or chart depicting some research findings,

• a menu, and

• an application form .

Ordering Tasks

Students always enjoy the activity of receiving little strips of paper, each with a sentence on it, anti assembling them into a story, sometimes called the "strip story" technique. Variations on this can serve :as an assessment of overall global understanding of a story and of the cohesive devices that signal the order of events or ideas. Alderson et al. (1995, p. 53) warn, however, against assuming that there is only one 'logical order. They presented these sentences for forming a little story.

Information Transfer: Reading Charts, Maps, Graphs, Diagrams

Every educated person must be able to comprehend charts, maps, graphs, calendars, diagrams, and the like. Converting such nonverbal input into comprehensible intake requires not only an understanding of the graphic and verbal conventions of the medium but also a linguistic ability to interpret that information to someone else. Reading a map implies understanding the conventions of map graphics, but it is often accompanied by telling someone where to turn, how far to go, etc. Scanning a menu requires an ability to understand the structure of most menus as well as the capacity to give an order when the time comes. Interpreting the numbers on a stock market report involves the interaction of understanding the numbers and of conveying that understanding to others.

DESIGNING ASSESSMENT TASKS: EXTENSIVE READING

skimming Tasks

Skimming is the process of rapid coverage of reading matter to determine its gist or main idea. It is a prediction strategy used to give a reader a sense of the topic and purpose of a text, the organization of the text, the perspective or point of view of the writer, its ease or difficulty, and/or its usefulness to the reader. Of course skimming can apply to texts of less than one page, so it would be wise not to confine this type of task just to extensive texts.

Summarizing and Responding

As you can readily see, a strict adherence to the1critetjon ;of assessing reading, and reading only, implies consideration of only the first factor; the other three pertain to writing performance. The first criterion is nevertheless a crucial factor; otherwise the reader-writer could pass all three of the other criteria with virtually no understanding of the text itself. Evaluation of the reading comprehension criterion will of necessity remain somewhat subjective because the teacher will need to determine degrees of fulfillment of the objective (see below for more about scoring this task).'

Of further interest in assessing extensive reading is the technique of asking a student to respond to a text. The two tasks should not be confused with each other: summarizing requires a synopsis or overview of the text, while responding asks the reader to provide his or her own opinion on the text as a whole or on some statement or issue within it.

Note-Taking and Outlining

Finally, a reader's comprehension of extensive texts may be assessed through an evaluation of a process of note-taking and/or outlining. Because of the difficulty of controlling the conditions and time frame for both these techniques, they rest firmly in the category of informal assessment. Their utility is in the strategic training that learners gain in retaining information through marginal notes that highlight key information or organizational outlines that put supporting ideas into a visually manageable framework. A teacher, perhaps in one-on-one conferences-with students, can use student notes/outlines as indicators of the presence or absence of effective reading strategies, and thereby point the learners in positive directions.

“SUMMARY ASSESSING WRITING 218-250”

ASSESSING WRITING

GENRES OF WRIT'ING LANGUAGE

The same classification scheme is reformulated here to include the most common genres that a second language writer might produce, within and beyond the requirements of a curriculum. Even though this list is slightly shorter, you should be aware of the surprising multiplicity of options of written genres that second language learners need to acquire.

TYPES OF WRITING PERFORMANCE

1. Imitative. To produce written language, the learner must attain skills in the fundamental, basic tasks of .writing letters, words, punctuation, and very brief sentences. This category includes the ability to spell correctly and to perceive phoneme-grapheme correspondences in the English spelling system. It is a level at which learners are trying to master the mechanics of writing.

2. Intensive (controlled). Beyond the fundamentals of imitative writing are skills in producing appropriate vocabulary within a context, collocations and idioms, and correct grammatical features up to the length of a sentence. Meaning and Context are of some importance in determining correctness and appropriateness, but most assessment tasks are more 'concerned with a focus on form, and are rather strictly controlled by the test design.

3. Responsive. Here, assessment tasks require learners to perform at a limited discourse level, connecting sentences into a paragraph and creating a logically connected sequence of two or three paragraphs. Tasks respond to pedagogical directives, lists of criteria, outlines, and other guidelines. Genres of writing include brief narratives and descriptions, short reports, lab reports, summaries, brief responses to reading, and interpretations of charts or graphs. Under specified conditions, the writer begins to 'exercise some freedom of choice among alternative forms of expression of ideas. The writer has mastered the fundamentals of sentence-level grammar and is more focused on the discourse conventions that will achieve the objectives of the written text.

4. Extensive. Extensive writing implies successful management of all the processes and strategies of writing for all purposes, up to the length of an essay, a term paper, a major research project report, or even a thesis. Writers focus on achieving a purpose, organizing and developing ideas logically, using details to support or illustrate ideas, demonstrating syntactic and lexical variety, and in many cases, engaging in the process of multiple drafts to achieve a final product. Focus on grammatical form is limited to occasional editing or proofreading of a draft.

MICRO- AND MACROSKII.IS OF WRITING

We turn once again to a taxonomy of micro- and macro skills that will assist you in defining the ultimate criterion of an assessment procedure. The earlier micro skills apply more appropriately to imitative and intensive types of writing task, while the macro skill are essential for the successful mastery of responsive and extensive writing.

DESIGNING ASSESSMENT TASKS: IMITATIVE WRITING

Tasks in [Hand] Writing Letters, Words, and Punctuation

First, a comment should be made on the increasing use of personal and laptop computers and handheld instruments for creating written symbols. Handwriting has the potential of becoming a lost art as even very young children are more and more likely to use a keyboard to produce writing. Making the shapes of letters and other symbols is now more a question of learning typing skills than of training the muscles of the hands to use a pen or pencil. Nevertheless, for all practical purposes, handwriting remains a skill of paramount importance within the larger domain of language assessment.

Spelling Tasks and Detecting Phoneme-Grapheme Correspondences

1. Spelling tests. In a traditional, old-fashioned spelling test, the teacher dictates a simple list of words, one word at a time, followed by the word in a sentence, repeated again, with a pause for test-takers to write the word. Scoring emphasizes correct spelling.

2. Picture-cued tasks. Pictures are displayed with the objective of focusing on familiar words whose spelling may be unpredictable. Items are chosen according to the objectives of the assessment, but this format is an opportunity to present some challenging words and word pairs: boot/book, read/reed, bit/bite, etc.

3. Multiple-choice techniques. Presenting words and phrases in the form of a multiple-choice task risks crossing over into the domain of assessing reading, but if the items have a follow-up writing component, they can serve as formative reinforcement of spelling conventions.

4. Matching phonetic symbols. If students have become familiar with the phonetic alphabet, they could be shown phonetic symbols and asked to write· the correctly spelled word alphabetically. This works best with letters that do not have one-to-one correspondence with the phonetic symbol (e.g., l æ l and a).

DESIGNING ASSESSMENT TASKS: INTENSIVE (CONTROLLED) WRITING

Dictation and Dicto-Comp

A form of controlled writing related to dictation is a dicto-comp. Here, a paragraph is read at normal speed, usually two or three times ;then the teacher asks students to rewrite the paragraph from the best of their recollection. In one of several variations of the dicto -Comp technique, the teacher, after reading the passage, distributes a handout with key words from the paragraph, in sequence, as cues for the students. In either case, the dicto-comp is genuinely classified as an intensive, if not a responsive, writing task. Test-takers must internalize the content of the passage, remember a few phrases and lexical items as key words, then recreate the story in their own words.

Grammatical Transformation Tasks

In the heyday of structural paradigms of language teaching with slot-filler ,techniques and slot substitution drills, the practice of making grammatical transformations. Orally or in writing-was very popular. To this day, language teachers have also used this technique as an assessment task, ostensibly to measure grammatical competence. Numerous versions of the task are possible:

• Change the tenses in a paragraph.

• Change full forms of verbs to reduced forms (contractions).

• Change statements to yes/no or wh-questions.

• Change questions into statements.

• Combine two sentences into one using a relative pronoun.

• Change direct speech to indirect speech.

• Change from active to passive voice.

Picture-Cued Tasks

A variety of picture-cued controlled tasks have been used in English classrooms around the world. The main advantage in this technique is in detaching the almost ubiquitous reading and writing connection and offering instead a nonverbal means to stimulate written responses.

Vocabulary Assessment Tasks

Most vocabulary study is carried out through reading. A number of assessments of reading recognition of vocabulary were discussed in the previous chapter: multiple choice techniques, matching, picture-cued identification, cloze techniques, guessing the meaning of a word in context, etc. The major techniques used to assess vocabulary are (a) defining and (b) using a word in a sentence. The latter is the more authentic, but even that task is constrained by a contrived situation in which the test-taker, usually in a matter of seconds, has to come up with an appropriate sentence,

which mayor may not indicate that the test-taker "knows" the word. Read (2000) suggested several types of items for assessment of basic knowledge of the meaning of a word, collocational possibilities, and derived morphological forms.

Ordering Tasks

One task at the sentence level may appeal to those who are fond of word games and puzzles: ordering (or reordering) a scrambled set of words into a correct sentence. Here is the way the item format appears. While this somewhat inauthentic task generates writing performance and may be said to tap into grammatical word-ordering rules, it presents a challenge to test takers whose learning styles do not dispose them to logical-mathematical problem solving. If sentences are kept very simple (such as #2) with perhaps no more than four or five words, if only one possible sentence can emerge, and if students have practiced the technique in class, then some justification emerges. But once again, as ,in so many writing techniques, this task involves as much, if not more, reading performance as writing.

Short-Answer and Sentence Completion Tasks

Some types of short-answer tasks were discussed in Chapter 8 because of the heavy participation of reading performance in their completion. Such items range from very simple and predictable to somewhat more elaborate responses. Look at the range of possibilities. The reading*writing connection is apparent in the first three item types but has less of an effect in the last three, where reading is necessary in order to understand the directions but is not crucial in creating sentences. Scoring on a 2-1-0 scale (as described above) may be the most appropriate way to avoid self. arguing about the appropriateness of a response.

ISSUES IN ASSESSING RESPONSIVE AND EXTENSIVE WRITING

Responsive writing creates the opportunity :tor test-takers to offer an array of 'possible creative responses within a pedagogic11 or assessment framework: test-takers are "responding" to a prompt or assignment. Freed from the strict control of intensive writing, learners can exercise a number of options in choosing vocabulary, grammar, and discourse, but with some constraints and conditions. Criteria now· begin to include the discourse and rhetorical conventions of paragraph structure and of connecting two or three such paragraphs in texts of limited length. The learner is responsible for accomplishing a purpose in writing, for developing a sequence of connected ideas, and for empathizing with an audience.

The genres of text that are typically addressed here are

• short reports (with structured formats and conventions);

• responses to the reading of an article or story;

• summaries of articles or stories;

• brief narratives or descriptions; and

• interpretations of graphs, tables, and charts.

DESIGNING ASSESSMENT TASKS RESPONSIVE AND EXTENSIVE WRITING

paraphrasing

One of the more difficult concepts for second language learners to grasp is paraphrasing. The initial step in teaching paraphrasing is to ensure that learners understand the importance of. paraphrasing: to say something in one's own words, to avoid plagiarizing, to offer some variety in expression. With those possible motivations and purpose in mind, the test designer needs to elicit a paraphrase of a sentence or paragraph, usually .not more.

Guided Question and Answer

Another lower-order task in this type of writing, which has the pedagogical benefit of guiding a learner without dictating the form of the output, is a guided question and- answer format in which the test administrator poses a series of questions that essentially serve as an outline of the emergent written text. In the writing of a narrative that the teacher has already covered in a class discussion, the following kinds of questions might be posed to stimulate a sequence of sentences.

Paragraph Construction Tasks

The participation of reading performance is inevitable in writing effective paragraphs. To a great extent, writing is the art of emulating what one reads. You read an effective paragraph; you analyze the ingredients of its success; you emulate it. Assessment of paragraph development takes on a number of different forms:

1. Topic sentence writing. There is no cardinal rule that says every paragraph must have a topic sentence, but the stating of a topic through the lead sentence (or a subsequent one) has remained as a tried-and· true technique for teaching the concept of a paragraph.

2. Topic development within a paragraph. Because paragraphs are intended to provide a reader with "clusters" of meaningful, connected thoughts or ideas, another stage of assessment is development of an idea within a paragraph.

3. Develop1nent of main and supporting ideas across paragraphs. As writers string two or more paragraphs together in a 19nger text (and as we move up the continuum from responsive to extensive writing), the writer attempts to articulate a thesis or main idea with clearly stated supporting ideas.

Strategic Options

Developing main and supporting ideas is the goal for the writer attempting to create an effective text, whether a short one- to two-paragraph one or an extensive one of several pages. A number .of strategies are commonly taught to second language writers to accomplish their purposes. Aside from strategies of free writing, outlining, drafting, and revising, writers need to be aware of the task that has been demanded '" and to focus on the genre o~ writing and the expectations of that genre.

TEST OF WRITTEN ENGLISH (TWE®)

The TWE is in the category of a timed impromptu test in that test..takers are under a 30-minute time limit and are not able to prepare ahead of time for the topic that will appear. Topics are prepared by a panel of experts following specifications for topics that represent commonly used discourse and thought patterns at the university level Here are some sample topics published on the TWE website.

SCORING METHODS FOR RESPONSIVE AND EXTENSIVE WRITING

Holistic Scoring

The TWE scoring scale above is a prime example of holistic scoring. In Chapter 7, a rubric for scoring oral production holistically was presented. Each point on a holistic scale is given a, systematic set of descriptors, and the reader-evaluator matches an overall impression with the descriptors to arrive at a score. Descriptors usually (bl.lt not always) follow a prescribed pattern. For example, the first descriptor across all score categories may address the quality of task achievement, the second may deal with organization, the third with grammatical or rhetorical considerations, and' so on. Scoring, however, is truly holistic in that those subsets are not quantitatively added up to yield a score.

Primary Trait Scoring

A second method of scoring, primary trait, focuses on "how well students can write within a narrowly defined range of discourse" (Weigle, 2002, p. 110).This type of scoring en1phasizes the task at hand and assigns a score based on the effectiveness of the text's achieving that one goal. For example, if the purpose or function of an essay is to persuade the reader to do something, the score for the writing would rise or fall on the accomplishment of that function. If a learner is asked to exploit the. ,imaginative function of language by expressing personal feelings, then the response would be evaluated on that feature alone.

For rating the primary trait of the text, Lloyd-Jones (1977) suggested a four point scale ranging from zero (no response or fragmented response) to 4 (the purpose is unequivocally accomplished in a convincing fashion). It almost goes without saying that organization, supporting details, fluency, syntactic variety, and other features will implicitly be evaluated in the process of offering a primary trait score.

Analytic Scoring

Classroom evaluation of learning is best served through analytic scoring, in which as many as six major elements of writing are scored, thus enabling learners to home in on weaknesses and to capitalize on strengths.

Analytic scoring may be more appropriately called analytic assessment in order to. capture its closer association with classroom language instruction than with formal testing. Brown and Bailey (1984) designed an analytical scoring scale that specified five major categories and a description of five different levels in each category, ranging from "unacceptable" to "excellent".

BEYOND SCORING: RESPONDING TO EXTENSIVE WRITING

Formal testing carries with it the burden of designing a practical and reliable instrument that assesses its intended criterion accurately. To accomplish that mission, designers of writing test are charged with the task of providing as "objective" a scoring procedure as possible, and one that in many cases can be easily interpreted by agents beyond the learner. Holistic, primary trait, and analytic scoring all satisfy those ends . Yet beyond mathematically calculated scores lies a rich domain of assessment in which a developing writer is coached from stage to stage in a process of building a storehouse of writing skills. Here in the classroom, in the tutored relationship of teacher and student, and in the community ,of peer learners, most of the hard work of assessing writing is carried out. Such assessment is informal, formative, and replete with washback.

Assessing Initial Stages of the Process of Composing

Following are some guidelines for assessing the initial stages (the first draft or two) of a written composition. These guidelines are generic for self, peer,. and teacher , responding. Each assessor will need to modify the list according to the level of the learner, the context, and the purpose in responding.

The teacher-assessor's role is; as a guide, a facilitator, and an ally; therefore, assessment at this stage of writing heeds to be as positive as possible to encourage the writer. An early focus on overall structure and meaning will enable writers to. clarify their purpose and plan and will set a framework for the writers' later refinement of the lexical and grammatical issues.

Assessing Later Stages of the Process of Composing

Through all these stages it is assumed that peers and teacher are both responding to the writer through conferencing in person, electronic communication, or, at the very least, an exchange of papers. The impromptu timed tests and the methods of scoring discussed earlier may appear to be only distantly related. to such an individualized process of creating a· written text, but are they, in reality? All those developmental stages may be the preparation that learners need both to function in creative real..world writing tasks and to successfully demonstrate their competence on a timed impromptu test. And those holistic scores are after all generalizations of the various components of effective writing. If the hard work of successfully progressing through a semester or two of a challenging course in academic writing ultimately means that writers are ready to function in their real-world contexts, and to get a 5 or 6 on the TWE, then all the effort was worthwhile.

This chapter completes the cycle of considering the assessment of all of the four skills of listening, speaking, reading, and writing. As you contemplate using some of the assessment techniques that have been suggested, I think you can now fully appreciate two significant overarching guidelines for designing an effective assessment procedure:

1. It is virtually impossible to isolate anyone of the four skills without the involvement of at least one other mode of performance. Don't underestimate the power of the integration of skills in assessments designed to target a single skill area.

2. The variety of assessment techniques and item types and tasks is virtually infinite in that there is always some possibility for creating a unique variation. Explore those alternatives, but with some caution lest your overzealous urge to be innovative distract you from a central focus on achieving the intended purpose and rendering an appropriate evaluation of performance.

REFERENCES:

Brown. 2004 . LANGUAGE ASSESSMENT “principles and classroom practice”. New York: Longman.

Rabu, 06 Mei 2020

ASSESSING LISTENING AND ASSESSING SPEAKING KARANGAN DOUGLAS BROWN

Assignment of meeting 13.

“SUMMARY ASSESSING LISTENING 116-139”

ASSESSING LISTENING

OBSERVING THE PERFORMANCE OF THE FOUR SKILLS

Before focusing on listening itself, think about the two interacting concepts of performance and observation. All language users perform the acts of listening, speaking, reading, and writing. They of course rely on their underlying competence in order to accomplish these performances. When you propose to assess someone's ability in one or a combination of the four skills, you assess that person's competence, but you observe the person's performance. Sometimes the performance does not indicate true competence: a bad night's rest, illness, an emotional distraction, test anxiety, a memory block, or other student-related reliability factors could affect performance, thereby providing an unreliable measure of actual competence.

THE IMPORTANCE OF LISTENING

Every teacher of language knows that one's oral production ability-other than monologues, speeches, reading aloud, and the like-is only as good as one's listening comprehension ability. But of even further impact is the likelihood that input in the aural-oral mode accounts for a large proportion of successful language acquisition. In a typical day, we do measurably more listening than speaking (with the exception of one or two of your friends who may be nonstop chatterboxes).

BASIC TYPES OF LISTENING

From these stages we can derive four commonly identified types of listening performance, each of which comprises a category with in which to consider assessment tasks and procedures.

1.Intensive. Listening for perception of the components (phonemes, words, intonation, discourse markers, etc.) of a larger stretch of language.

2.Responsive. Listening to a relatively short stretch of language (a greeting, question, command, comprehension check, etc.) in order to make an equally short response.

3. Selective. Processing stretches of discourse such as short monologues for several minutes in order to "scan" for certain information. The purpose of such performance is not necessarily to look for global or general meanings, but to be able to comprehend designated information in a context of longer stretches of spoken language (such as classroom directions from a teacher, TV or radio news items, or stories). Assessment tasks in selective listening could ask students, for example, to listen for names, numbers, a grammatical category, directions (in a map exercise), or certain facts and events.

4. Extensive. Listening to· develop a top-down, global understanding of spoken language. Extensive performance ranges from listening to lengthy lectures to listening to a conversation and deriving a comprehensive message or purpose. Listening for the gist, for the main idea, and making inferences are all part of extensive listening.

MICRO- AND MACRO SKILLS OF LISTENING.

Richards' (1983) list of micro skills has proven useful in the domain of specifying objectives for learning and may be even more useful in forcing test makers to care fully identify specific assessment objectives.

Micro- and macro skills of listening (adapted from Richards, 1983)

Micro skills

1. Discriminate among the distinctive sounds of English.

2.Retain chunks of language of different lengths in short-term memory.

3.Recognize English stress patterns, words in stressed and unstressed positions, rhythmic structure, intonation contours, and their role in signaling information.

4. Recognize reduced forms of words.

5. Distinguish word boundaries, recognize a core of words, and interpret word order patterns and their significance.

6. Process speech at different rates of delivery.

7.Process speech containing pauses, errors, corrections, and other performance variables.

8.Recognize grammatical word classes (nouns, verbs, etc.), systems (e.g., tense, agreement/pluralization), patterns, rules, and elliptical forms.

9. Detect sentence constituents and distinguish between major and minor constituents.

10. Recognize that a particular meaning may be expressed in different grammatical forms.
11. Recognize cohesive devices in spoken discourse.

Macro skills

12.Recognize the communicative functions of utterances, according to situations, participants, goals.

13. Infer situations, participants, goals using real-world knowledge.

14.From events, ideas, and so on, described, predict outcomes, infer links and connections between events, deduce causes and effects, and detect such relations as main idea, supporting idea, flew information, given information, generalization, and exemplification.

15. Distinguish between literal and implied meanings.

16.Use facial, kinesic, body language, and other nonverbal clues to decipher meanings.

17. Develop and use a battery of listening strategies, such as detecting key words, guessing the 'meaning of words from context, appealing for help, and signaling comprehension or lack thereof.

DESIGNING ASSESSMENT TASKS: INTENSIVE LISTENING

RECOGNIZING PHONOLOGICAL AND MORPHOLOGICAL ELEMENTS.

A typical form of intensive listening at this level is the assessment of recognition of phonological and morphological elements of language.

Paraphrase Recognition

The next step up on the scale of listening comprehension micro skills is words, phrases, and sentences, which are frequently assessed by providing a stimulus sentence and asking the test-taker to choose the correct paraphrase from a number of choices.

DESIGNING ASSESSMENT TASKS: RESPONSIVE LISTENING

The objective of this item is recognition of the wh-question bow much and its appropriate response. Distractors are chosen to represent common learner errors:

(a) responding to how much vs. how much longer;

(b) confusing how much in reference to time vs. the more frequent reference to money;

DESIGNING ASSESSMENT TASKS: SELECTIVE LISTENING

Listening Cloze

Listening cloze tasks (sometimes called cloze dictations or partial dictations) require the test-taker to listen to a story. In its generic form, the test consists of a passage in which every nth word (typically every seventh word) is deleted and the test-taker is asked to.

One potential weakness of listening cloze techniques is that they may simply become reading comprehension tasks.

Information Transfer.

The objective of this task is to test prepositions and prepositional phrases of location (at the bottom, on top of, around, along with larger, smaller), so other words and phrases such as back yard, yesterday, last few seeds, and scare away are supplied only as cont~ and need not be tested. (The task also presupposes, of course, that task .takers are able to identify the difference between a bird and a squirrel!) In another genre of picture-cued tasks, a number of people and/or actions are.

Sentence Repetition

The task of simply repeating a sentence or a partial sentence, or sentence repetition, is also used as an assessment of listening comprehension. As in a dictation (discussed below), the test-taker must retain a stretch of language long enough to reproduce it. and then' must respond with an oral repetition of that stimulus. Incorrect listening comprehension, whether at the phonemic or discourse level, may be manifested in the correctness of the repetition. A miscue in repetition is scored as a miscue in listening. In the case of somewhat longer sentences, one could argue that the ability to recognize and retain chunks of language as well as threads of meaning might be assessed through repetition.

DESIGNING ASSESSMENT TASKS: EXTENSIVE LISTENING.

Dictation

Dictation is a widely researched genre of assessing listening comprehension. In a dictation, test-takers hear a passage, typically of 50 to 100 words, recited three times: first, at normal speed; then, with long pauses between phrases or natural word groups, during which time test-takers write down what they have just heard; and finally, at normal speed once more so they can check their work and proofread. Here is a sample dictation at the intermediate level of English.

Scoring is another matter. Depending on your context and purpose in administering a dictation, you will need to decide on scoring criteria for several possible kinds of errors:

• spelling error only, ,but the word appears to have been heard correctly

• spelling 'and/or obvious misrepresentation of a word, illegible word

• grammatical error (For example, test-taker hears I can~t do it, writes I can do it.)

• skipped word or phrase

• permutation of words

• additional words not in the original

• replacement of a word with an appropriate synonym

Communicative Stimulus-Response Tasks

genre of assessment. task in which the test-taker is presented with a stimulus monologue or conversation and then is asked to respond to a set of comprehensions. the ability to respond correctly to such items can be construct validated as an appropriate measure of field-independent listening skills: the ability to remember certain details from a conversation.

Authentic Listening Tasks

Ideally, the language assessment field would have a stockpile of listening test types that are cognitively demanding. communicative, and authentic, not to mention interactive by means of an integration with speaking. However, the nature of a test as a sa1nple of performance and a set of tasks with limited time frames implies an equally limited capacity to mirror all the real-world contexts of listening performance.

“SUMMARY ASSESSING SPEAKING 140-184”

ASSESSING SPEAKING

BASIC TYPES OF SPEAKING

A similar taxonomy emerges for oral production.

1.Imitative. At one end of a continuum of types of speaking performance is the ability to simply parrot back (imitate) a word or phrase or possibly a sentence. While this is a purely phonetic level of oral production, a number of prosodic, lexical, and grammatical properties of language may be included in the criterion performance.

2.Intensive. A second type of speaking frequently employed in assessment contexts is the production of short stretches of oral language designed to demonstrate competence in a narrow band of grammatical, phrasal, lexical, or phonological relationships (such as prosodic elements-intonation, stress, rhythm, juncture). The speaker must be aware of semantic properties in order to be able to respond, but interaction with an interlocutor or test administrator is minimal at best.

3. Responsive. Responsive assessment tasks include interaction and test comprehension but at the somewhat limited level of very short conversations, standard greetings and small talk, simple requests and comments, and the like. The stimulus is almost always a spoken prompt (in order to preserve authenticity).

4.Interactive. The difference between responsive and interactive" speaking is in the length and complexity of the interaction, which sometimes includes multiple exchanges and/or multiple participants. Interaction can take the two forms of transactional language, which has the purpose of exchanging specific information, or interpersonal exchanges, which have the purpose of maintaining social relationships.

5.Extensive (monologue). Extensive oral production tasks include speeches, oral presentations, and story-telling, during which the opportunity for oral interaction from listeners is either highly limited (perhaps to nonverbal responses) or ruled out altogether.

MICRO- AND MACRO SKILLS OF SPEAKING

Micro- and macro skills of oral production

Micro skills

1.Produce differences among English phonemes and allophonic variants.

2. Produce chunks of language of different lengths.

3. Produce English stress patterns, words in stressed and unstressed positions, rhythmic structure, and intonation contours.

4. Produce reduced forms of words and phrases.

5. Use an adequate number of lexical units (words) to accomplish pragmatic purposes.

6. Produce fluent speech at different rates of delivery

7.Monitor one's own oral production and use various strategic devices pauses, fillers, self-corrections, backtracking-to enhance the clarity of the message.

8. Use grammatical word classes (nouns, verbs, etc.), systems (e.g., tense, agreement, pluralization), word order, patterns, rules, and elliptical forms.
9. Produce speech in natural constituents: in appropriate phrases, pause groups, breath groups, and sentence constituents.

10. Express a particular meaning in different grammatical forms.

11. Use cohesive devices in spoken discourse.

Macro skills

12.Appropriately accomplish communicative functions according to situations, participants, and goals.

13.Use appropriate styles, registers, implicature, redundancies, pragmatic conventions, conversation rules, floor-keeping and -yielding, interrupting, and other sociolinguistic features in face-to-face conversations.

14.Convey links and connections between events and communicate such relations as focal and peripheral ideas, events and feelings, new information and given information, generalization and exemplification.

15. Convey facial features, kinesics, body language, and other nonverbal cues along with verbal language.

16.Develop and use a battery of speaking strategies, such as emphasizing key words, rephrasing, providing a context for interpreting the meaning of words, appealing for help, and accurately assessing how well your interlocutor is understanding you.

DESIGNING ASSESSMENT TASKS: IMITATIVE SPEAKING

An occasional phonologically focused repetition task is warranted as long as repetition tasks are not allowed to occupy a dominant role in an overall oral production assessment, and as long as you artfully avoid a negative washback effect. Such tasks range from word level to sentence level, usually with each item focusing on. a specific phonological criterion. In a simple repetition task, test-takers repeat the stimulus, whether it is a pair of words, a sentence, or perhaps a question (to test for intonation production).

DESIGNING ASSESSMENT TASKS: INTENSIVE SPEAKING

Directed Response Tasks

In this type of task, the test administrator elicits a particular grammatical form or a transformation of a sentence. Such tasks are clearly mechanical and not communicative, but they do require minimal processing of meaning in order to produce the correct grammatical output.

Read-Aloud Tasks

Intensive reading-aloud tasks include reading beyond the sentence level up to a paragraph or two. This technique is easily administered by selecting a passage that incorporates test specs and by recording the test-taker's output; the scoring is relatively easy because all of the test taker's oral production is controlled. Because of the results of research on the Phone Pass test, reading aloud may actually be a surprisingly strong indicator of overall oral production ability.

Sentence/Dialogue Completion Tasks and Oral Questionnaires

Another technique for targeting intensive aspects of language requires test-takers to read dialogue in which one speaker's lines have been omitted. Test-takers are first given time to read through the dialogue to get its gist and to think about appropriate lines to fill in. Then as the tape, teacher, or test administrator produces one part orally, the test- taker respond.

Picture-Cued Tasks

One of the more popular ways to elicit oral language performance at both intensive and extensive levels is a pictl1re-cued stimulus that requires a description from the test taker. Pictures may be very simple, designed to elicit a word or a phrase; somewhat more elaborate and "busy"; or composed of a series that tells a story or incident. Here is an example of a picture-cued elicitation of the production of a simple minimal pair.

Translation (of Limited Stretches of Discourse)

Translation is a part of our tradition in language teaching that we tend to discount or disdain, if only because our current pedagogical stance plays down its importance. Translation methods of teaching are certainly passe in an era of direct approaches to creating communicative classrooms. But we should remember that in countries where English is not the native or prevailing language, translation is a meaningful communicative device in contexts where the English user is. called on to be an interpreter. Also, translation is a well-proven communication strategy for learners of a second language.

DESIGNING ASSESSMENT TASKS: RESPONSIVE SPEAKING

Question and Answer

Question-and-answer tasks can consist of one or two questions from an interviewer, or they can make up a portion of a whole battery of questions and prompts in an oral interview. They can vary from simple questions like "What is this called in English?" to complex questions like "What are the steps governments should take, if any, to stem the rate of deforestation in tropical countries?" The first question is intensive in its purpose; it is a display question intended to elicit a predetermined correct response. We have already looked at some of these types of questions in the previous section. Questions at the responsive level tend to be genuine referential questions in which the test-taker is given more opportunity to produce meaningful language in response.

Giving Instructions and Directions.

We are all called on in our dally routines to read instructions on how to operate an appliance, how to put a bookshelf together, or how to create a delicious clam chowder. Somewhat less frequent is the mandate to provide such instructions orally, but this speech act is still relatively common. Using such a stimulus in an assessment context provides an opportunity for the test-taker to engage in a relatively extended stretch of discourse, to be very clear and specific, and to use appropriate discourse markers and connectors. The technique is Simple: the administrator poses the problem, and the test-taker responds. Scoring is based primarily on comprehensibility and scondari1y on other specified grammatical or discourse categories. Here are some possibilities.

Paraphrasing

Another type of assessment task that can be categorized as responsive asks the test taker to read or hear a limited number of sentences (perhaps two to five) and-pro duce a paraphrase of the sentence.

TEST OF SPOKEN ENGLISH(TSE@)

The tasks on the TSE are designed to elicit oral production in various discourse categories rather than in selected phonological, grammatical, or lexical targets. The following content specifications for the TSE represent the discourse and pragmatic contexts assessed in each administration:

1. Describe something physical.

2. Narrate from presented material.

3. Summarize information of the speaker's own choice.

4. Give directions based on visual materials.

5. Give instructions.

6.Give an opinion.

7. Support an. opinion.

8.Compare/contrast.

9. Hypothesize.

10. Function "interactively."

11. Define.

DESIGNING ASSESSMENT TASKS: INTERACTIVE SPEAKING

Interview

When "oral production assessment" is mentioned, the first thing that comes to mind is an oral interview: a test administrator and a test-taker sit down a direct face-to face exchange and proceed through a protocol of questions and directives. The interview, which may be tape-recorded for re-listening, is then scored on one or more parameters such as accuracy in pronunciation and/or grammar, vocabulary usage, fluency, sociolinguistic/pragmatic appropriateness, task accomplishment, and even comprehension.

Role Play

Role playing is a popular pedagogical activity in communicative language-teaching classes. Within constrains set forth by the guidelines, it frees students to be some what creative in their linguistic output. In some versions, role play allows some rehearsal time so that students can map out what they are going to say. And it has the effect of lowering anxieties as students can, even for a few moments, take on the persona of someone other than themselves.

Discussions and Conversations

As formal assessment devices, discussions and conversations with and among students are difficult to specify and even more difficult to score. But as informal techniques to assess learners, they offer a level of authenticity and spontaneity that other assessment techniques may not provide. Discussions may be especially appropriate tasks through which to elicit and observe such abilities as

• topic nomination, maintenance, and termination;

• attention getting, interrupting, floor holding, control;

• clarifying, questioning, paraphrasing;

• comprehension Signals (nodding, "uh-huh,""hmm," etc.);

• negotiating meaning;

• intonation patterns for pragmatic effect;

• kinesics, .eye contact, proxemics, body language; and

• politeness, formality, and other sociolinguistic factors.

Games

Among informal assessment devices are a variety of games that directly involve lan guage production. Consider the following types:

Assessment-games

1. "Tinkertoy" game: A Tinkertoy (or Lego block) structure is built behind a screen. One or two learners are allowed to view the structure. In successive stages of construction, the learners tell "runners" (who can't observe the structure) how to re-create the structure. The runners then tell "builders" behind another screen how to build the structure. The builders may question or confirm as they proceed, but only through"1he two degrees of separation. Object: re-create the structure as. accurately as possible.

2. Crossword puzzles are created in which the names of all members of a class are clued by obscure information about them. Each class member -must ask questions of others to determine who matches the clues in the

3. Information gap grids are created such that class members must conduct mini-interviews of other classmates to fill in boxes, e.g., "born in July," "plays the violin," "has a two-year-old child," etc.

4. City maps are distributed to class members. Predetermined map directions are given-to one student who, with a city map in front of him or her, describes the route to a partner, who must then trace the route and get to the correct final destination.

ORAL PROFICIENCY INTERVIEW (OPI)

The best-known oral interview format is one that has gone through a consider able metamorphosis over the last half-century, the Oral Proficiency Interview (OPI). Originally known as the Foreign Service Institute (FSI) test, the OPI is the result of a historical progression of revisions under the auspices of several agencies, including the Educational Testing Service and the American Council on Teaching Foreign Languages (ACTFL). The latter, a- professional society for research on foreign language instruction and assessment, has now become the principal body for promoting the use of the OPI."The OP! is widely used across dozens of languages around the world.

DESIGNING ASSESSMENTS: EXTENSIVE SPEAKING

Oral Presentations

In the academic and professional arenas, it would not be uncommon to be called on to present a report, a paper, a marketing plan, a-sales- idea, a design of a new product, or a method. A summary of oral assessment techniques would therefore be incomplete without some consideration of extensive speaking tasks. Once again the rules for effective assessment must be invoked: (a) specify the criterion, (b) set appropriate tasks, (c) elicit optimal output, and (d) establish practical, reliable scoring procedures.

Picture-Cued Story-Telling

One of the most common techniques for eliciting oral production is through visual pictures, photographs, diagrams, and charts. We have already looked at this' elicitation device for intensive tasks, but at this level we consider a picture or a series of pictures as a stimulus for a longer story or description.

Retelling a Story, News Event

In this type of task, test-takers hear or read a story or news event that they are asked to retell. This differs from the paraphrasing task discussed above (pages 161-162) in that it is a longer stretch of discourse and a different genre. The objectives in assigning such.a task vary from listening comprehension of the original to produc tion of a number of oral discourse features (communicating sequences and rela tionships 01 events, stress and emphasis patterns, ."expression" in the case of a dramatic story), fluency, and interaction with the hearer. Scoring should of course meet the intended criteria.

Translation (of Extended Prose)

Translation of words, phrases, or short sentences was mentioned under the category of -intensive speaking. Here, longer texts are presented for the test-taker to read in the native language and then translate into English. Those texts could come in many forms: dialogue, directions for assembly of a product, a synopsis of a story or play or movie, directions on how to find something on a map, and other genres. The advantage of translation is in the control of the content, vocabulary, and, to some extent, the grammatical and discourse features. The disadvantage is that translation of longer texts is a highly specialized skill for which some individuals obtain post-baccalaureate degrees! To judge a non specialist's oral language ability on such a skill may be completely invalid, especially if the test-taker has not engaged in translation at this level. Criteria for scoring should therefore take into account not only the purpose in stimulating a translation but the possibility of errors that are unrelated to oral production ability.

REFERENCES:

Brown. 2004 . LANGUAGE ASSESSMENT “principles and classroom practice”. New York: Longman.

Language Assessment

Jumat, 08 Mei 2020

ASSESSING READING AND ASSESSING WRITING By Douglas Brown

Rabu, 06 Mei 2020

ASSESSING LISTENING AND ASSESSING SPEAKING KARANGAN DOUGLAS BROWN

ASSESSING GRAMMAR AND ASSESSING VOCABULARY by James and John

Laporkan Penyalahgunaan