Language Assessment: Maret 2020

Selasa, 31 Maret 2020

Exercise from language assessment principles and classroom practice

Assignment 6

Assignment 6
Exercise from language assessment principles and classroom practice:
4. figure 3.1 depicts various modes of elicitation and response. Are there other modes of elicitation that could be included in such chart? Justify your additions with an example of each.
Answer:

Elicitation mode

Oral
What is on conveying, word, sentence, question, story of experience, conversation.
Written
(short story, artic, journal, book). Word, sentence, question,paragraph.

Response mode

Oral
Treatment what’s earning, answering question, feedback, recalling the experience.

Written
Write the answer from the books on the read, write the word not understandable, write the story (3-4 sentence)

Kamis, 26 Maret 2020

Douglas Brown about “Designing language classroom test”

ASSIGMENT 5 "SUMARRY"
Halaman 42-62.

DESIGNING CLASSROOM LANGUAGE TESTS.

The previous chapters introduce a number of building blocks for designing language tests. You now have a sense of where tests belong in the large domain of assessment. You have sorted through differences between formal and informal tests, formative and summative tests, and norm and criterion references tests.

TEST TYPES.
We will look first at two test types that you will probably not have many opportunities to create as a classroom teacher-language aptitude test and language proficiency tests and three types that you will almost certainly need to create-placement test, diagnostic test, and achievement tests.

• Language aptitude tests.
One type of test although admittedly not a very common one predicts a person’s success prior to exposure to the second language; A language aptitude test is designed to measure capacity or general ability to learn a foreign language and ultimate success in that undertaking. Language attitude test are ostensibly designed to apply to the classroom learning of any language.

• Proficiency tests.
If your aim is to test global competence in a language, then you are, in conventional terminology, testing, proficiency. A proficiency test is not limited to any one course, curriculum, or single skill in the language; rather, it tests overall ability, proficiency test have traditionally consisted of standardized multiple-choice item on grammar, vocabulary, reading comprehension and aural tests also included oral production performance. A type example of a standardized proficiency test is the Test of English as a Foreign Language (TOEFL) produce by the educational testing service.

• Placement test.
Certain proficiency test can act in the role of placement test, the purpose of the which is to place a student into a particular level or section of a language curriculum or school. A placement test usually, but not always, included of sampling of the material to be covered in the various courses in a curriculum; a student performance on the test should indicate the point at which the student will find material neither too easy nor too difficult but appropriately challenging.

• Diagnostic test.
A diagnostic test is designed to diagnose specified of a language. A test in pronunciation, for example, might diagnose the phonological feature of English that are difficult for learners and should therefore become part of a curriculum. Usually, such test offer a checklist of feature for the administrator to use in pinpointing difficulties. A writing diagnostic would elicit a writing sample from student that would allow the teacher to identify those rhetorical and linguistic features on which the course needed to focus special attention.
A type diagnostic test of oral production was created by Clifford prator (1972) to accompany a manual of English pronunciation. Test-takers are directed to read a 150-word passage while they are tape-recorded. The test administrator then refers to an inventory of phonological items for analyzing a learner’s production. After multiple listening, the administrator produce a checklist of errors in five separate categories, each of which has several subcategories, the main categories included.
1. Stress and rhytm,
2. Intonation,
3. Vowels,
4. Consonants, and
5. Other factors.

• Achievement test.
An achievement test is related directly to classroom lessons, units, or even a total curriculum. Achievement test are (or should be) limited to particular material addressed in a curriculum within a particular time frame and are offered after a course has focused on the adjectives in question. Achievement test can also serve the diagnostic role of indicating what a student need to continue to work on in the future, but the primary role of an achievement test is to determine whether courses adjective have been met- and appropriate knowledge and skill acquired by the end of a period of instruction.
The specifications for an achievement tests should be determined by
• The objectives of the lesson, unit, or course being assessed,
• The relative importance (or weight) assigned to each objective,
• The task employed in classroom lesson during the unit of the time,
• Practicality issues, such as the time frame for the test and turnaround time, and
• The extent to which the test structure lends itself to formative washback.

SOME PRACTICAL STEPS TO TEST CONSTRUCTION.
The descriptions of types of test in the preceding section are intended to help you understand how to answer the first question posed in this chapter. What is the purpose of the test? It is unlikely that you would be asked to design an aptitude test or a proficiency test, but for the purpose of interpreting those tests, it is important that you understand their nature. However, your opportunities to design placement, diagnostic, and achievement tests-especially the latter- will be plentiful, in the remainder of this chapter, we will be on equipping you with the tools you need to create such classroom-oriented test.

• Assessing clear, unambiguous objectives.
In addition to knowing the purpose of the test you’re creating, you need to know as specifically as possible what it is you want to test. Sometimes teachers give tests simply because it’s Friday the third weeks of the course, and after hasty glances at the chapter covered during those three weeks , they dash off some test item so that student will have something to do during the class. this is no way to approach a test. Instead, being by taking a careful look at everything that you think you student should “know” or be able to “do”, based on the material that the student are responsible for. In other words, examine the objectives for the unit you are testing.

• Drawing up test specifications.
In the unit discussed above, your specifications will simply comprise (a) a broad of the test, (b) what skills you will test, and (c) what the items will look like.
(a)outline of the test and (b) skills to be included. Because of the constraints of your curriculum, your unit test must take no more than 30 minute. This is an integrated curriculum, so you need to test all four skills, since you have the luxury of teaching a small class(only 12 student), you decide to include an oral production component in the preceding period (taking one by one into a separate room while the rest of the class reviews the unit individually and completes workbook exercise).(c) items types and task. The next and potentially more complex choices involve the item types and task to use in this test. It is surprising that there are a limited number of models of eliciting responses (that is, prompting) and of responding on test of any kind.

These informal, classroom-oriented specification give you an indication of
• The topics (objectives) you will cover,
• The implied elicitation and response formats for items,
• The number of items in each section, and
• The time to be allocated for each.

• Devising the tasks.
You are now ready to draft other test items .to provide a sense of authenticity and interest, you have decide to conform your items to context of a recent TV sitcom that you used in class to illustrate certain discourse and from focused factors. The sitcom depicted a loud, noisy party with lots of small talk. As you devise your test items, consider such factors as how students will perceive them (face validity), the extent to which authentic language and contexts are present, difficulty caused by cultural schemata, the legth of the listening stimuli, how well a story line comes across, how things like the cloze testing format will work and other practicalities.

• Designing Multiple-Choice Test Items.
In the sample achievement test above, two of the five components (both of the listening sections) specified a multiple-choice format from item. This was a bold step to take. Multiple-choice items, which may appear to be the simplest kind of item to construct, are extremely difficult to design correctly. Hughes (2003, pp. 76-78) caution against a number of a weaknesses of multiple-choice items:
• The technique test only recognition knowledge.
• Guessing may have a considerable effect on test scores.
• The technique severely restricts what can be tested.
• It is very difficult to write successful item.
• Washback may be harmful.
• Cheating may be facilitated.
Since there will be occasions when multiple-choice items are appropriate, consider the following four guidelines multiple-choice items for both classroom based and large situation (adapted from Grondlund, pp.60-75, and J.D. Brown, 1996, pp.54-57).

1. Design each item to measure a specific objective.
The specific objective being tested here is comprehension of wh-question. Distractors (a) is designed to ascertain that the student know the differences between an answer to a wh-question and yes/no question, distractors (b) and (d), as well as the key item. (c), test comprehension of the meaning of where as opposed to why and when. The objective has been directly addressed.

2. State both steam and option as simply and directly as possible.
You might argue that first two sentence of this item give it some authenticity and accomplish a bit of schema setting. But if you simply want a student to identify the type of medical professional who deals with eyesight issue, those sentence are superfluous. Moreover, by lengthening the stem, you have introduced a potentially confounding lexical item, deteriorate, that could distract the student unnecessarily.

3. Make certain that the intended answer is clearly the only correct one.
A quick consideration of the distractor (d)reveals that it is a plausible answer, along with the intended key. (c) eliminating unintended possible answer is often the most difficult problem of designing multiple-choice items. With only a minimum of context in each stem, a wide variety of responses may be perceived as correct.

4. Use item indices to accept, discard, or revise items.
The appropriate selection and arrangement of multiple-choice items on a test can be accomplished by measuring items against three indices: item facility (or item difficulty), item discrimination (sometimes called items differentiation), and distractor analysis. Although measuring these factors on classroom tests would be useful, you probably will have neither the time nor the expertise to do this for every classroom test you create, especially one-time tests. But they are must for standardized norm-referenced test that are designed to be administered a number of time and/or administered in multiple froms.

a. Item facility (or IF) is the extent to which an item is easy or difficult for the proposed group of test-takers. You may wonder why that is important if in your estimation the item achieves validity. The answer is that an item that is too easy (say 99 percent of respondent get it right) or too difficulty (99 percent get it wrong) really does nothing to separate high-ability and low-ability test-takers. It is not really performing much “work” for you on a test.

b. Item discrimination (ID) is the extent to which an item differentiates between high and low ability student score equally well would have poor ID because it did not discriminate between the two groups. Conversely an item that garners correct responses from most of the high ability group and in-correct responses from most of the low ability group has good discrimination power.

c. Distractor efficiency is one more important measure of a multiple-choice item’s value in a test, and one that is related to item discrimination. The efficiency of distractor is the extent to which (a) the distractor “lure” a sufficient number of test takers, especially lower ability ones, and (b) those responses are somewhat evenly distributed across all distractors.

SCORING, GRANDING AND GIVING FEEDBACK.
• Scoring.
As you design a classroom test, you must consider how the test will be scored and graded. Your scoring plan reflects the relative weight that you place on each section and items in each section. The integrated-skills class that we have been using as an example focuses on listening and speaking skills with some attention to reading and writing.

• Grading.
Your first thought might be that assigning grades to student performance on this test would be easy: just give an “A” for 90-100 percent, a “B” for 80-89 percent, and so on. Not so fast! Grading is such a thorny issue that all of chapter 11 is devoted to the topic. How you assign latter grades to this test ia a product of
• The country, culture and context of this English classroom.
• Institutional expectations (most of them unwritten),
• Explicit and implicit definitions of grades that you have set forth,
• The relationship you have established with this class and
• Student expectations that have been engendered in previous rest and quizzes in this class.

• Giving Feedback.
A section and grading would not be complete without some consideration of the forms in which you will offer feedback to your student, feedback that you want to here- which is not unusual in the universe of possible formats for periodic classroom tests- consider the multitude of options. You might choose to return the test to the student with one of, or a combination of, any of the possibilities below:
1. A letter grade
2. A total score
3. Four subscores (speaking, listening, reading, writing)
4. For the listening and reading section
a. An indication of correct/incorrect responses
b. Marginal comments
5. For the oral ingterview
a. Scores for each element being rated
b. A checklist of areas needing work
c. Oral feedback after the interview
d. A post-interview conference to go over the result
6. On the esay
a. Scores for each element being rated
b. A checklist of areas needing work
c. Marginal and end of essay comment, suggestion
d. A post test conferences to go over work
e. A self-assessment
7. On all or selected part of the test, peer checking of result
8. A whole class discussion ofresult of the test
9. Individual conferences with each student to review the whole test.

REFERENCES:
Brown. 2004 . LANGUAGE ASSESSMENT “principles and classroom practice”. New York: Longman.

Kamis, 19 Maret 2020

Principles of language Assessment.

Assigment 4

Summary principles of language assessment.
halaman 19-28.

PRACTICALITY.

An effective is practical.this mean that it
• Is not excessively expensive,
• Stay within appropriate time constraints,
• Is relatively esay to administer, and
• Has a scoring/evaluation procedure that is specific and time-efficient.
A test that is prohibitively expensive is impractical. A test of language profi ciency that takes a student five hours to complete is impractical-it consumes more time(and money) than necessary to accomplish its objective . A test that require individual one-on-one proctoring is impractical for a group of several hun-dred test-takers and only a handful of examiners. A test that takes a few minutes for for a student to takes and several hours for an examiner to evaluate is impractical for most classroom situations. A test that can be scored only by computer is impractical if the test takes place a thousand mile away from the nearest computer. The value and quality of a test sometimes hing on such nitty gritty, practical considerations.

RELIABILITY.

A reliable test is consistent and dependable. If you give the same test to the same student or matched students on two different occasions, that test should yield simliar result. The issue of reliability of a test may best be addressed by considering a number of factors that may contribute to the unreliability of a test. Consider the following possibilities (adapted from Mousavi, 2002,p.804):fluctuations in the student ,in scoring,in test administration, and in the test itself.

Student-Related Reliability.
The most common learner-related issue in reliability caused by temporary illness, fatigue, a “bad day”, anxiety, and other physical or psychological factors, which may make an “observed” score deviate from one’s “true” score. Also included in this category are such factory as a test-taker’s “test-wiseness” or strategies for efficient test taking (Mousavi, 2002,p.804).

Rater Reliability .
Human error, subjectivity, and bias may enter into the scoring process. Inter rater reliability occurs when two or more scorers yield inconsistent scores of the same test, possibly for lack attention to scoring criteria, inexperience, inattention, or even preconceived biases. In the story above about the placement test, the initial were not applying the same standards.

Test Administration Reliability.
Unreliability may also result from the condition in which the test is administered. once witnessed the administration of a test of aural comprehension in which a tape recorder played item for comprehension, but because of street noise outside the building student sitting next to windows could not hear the tape accurately. This was a clear case of unreliability caused by the condition of the test administration. Other sources of unreliability are found in photocopying variations, the amount of light in different part of the room, variations in temperature, and even the condition of desks and chairs.

Test Reliability.
Sometimes the nature of the test itself can cause measurement errors. If a test is too long. Test-takers may become fatigue by the time they reach the later item and hastily respond incorrectly. timed test may discriminate against students who do not perform well on a test with a time limit. We all know people (and you may be included in this category!) who ”know” the course material perfectly but who are adversely affected by the presence of a clock ticking away. Poorly written test items (that are ambiguous or that have more than once correct answer) may be a further source of test unreliability.

VALIDITY.
By far most complex criterion an effective test- and arguably the most important principle- is validity, “the extent to which inferences made from assessment result are appropriate, meaningful, and useful in terms of the purpose of the assessment” (Gronlund, 1998,p.226). A valid test of reading ability actually measures reading ability- not 20/20 vision, not previous knowledge in a subject, nor some other variable of questionable relevance. To measure writing ability , one might ask student to write as many word ad they can in15 minute ,then simply count the word for the final score. Such a test would be easy to administer (practical), and the scoring quite dependable (reliable). But it would not constitute a valid test of writing ability without some consideration of comprehensibility, rhetorical discourse element, and the organization of ideas, among other factors.

Content- Related Evidene.
If a test actually samples the subject matter about which conclusions are to be drawn, and if it require the test-taker to perform the behavior that is being measured, it can content-related evidence of validity, often popularly referred to as content validity (e.g., Mousavi, 2002;Hughes, 2003).you can usually identify content-related evidence observationally if you can clearly the achievement that you are measuring. A test of tennis competency that asks someone to run a 100-yard dash obviously lacks content validity. If you are trying to assess a person’s ability to speak a second language in a conversational setting, asking the learner to answer paper and pencil multiple-choice question requiring grammatical judgments does not achieve content validity. A test that require the learner actually to speak within some sort of authentic context does. And if a course has perhaps ten objective but only two are covered in a test, then content validity suffers.

Criterion- Related Evidence.
In the case of teacher-made classroom assessment, criterion-related evidence is best demonstrated through a comparison of result of an assessment with result of some other measure of the same criterion. For example, in a course unit whose objective is for student to be able to orally produce voiced and voiceless stop in all possible phonetic environment, the result of one teacher’s unit test might be compared with an independent assessment- possibly a commercially produced test in a textbook- of the same phonemic proficiency. A classroom test designed to assess mastery of a point of a grammar in communicative use will have criterion validity if test score are corroborated either by observed subsequent behavior or by other communicative measure of the grammar point in question.

Construct-Related Evidence.
A third of evidence that can support validity, but one that does not play as large a role for classroom teachers, is construct-related validity, commonly referred to as construct validity. A construct is any theory, hypothesis, or model that attempt to explain observed phenomena in our universe perception. Construct may or may not be directly or empirically measured-their verification often requires inferential data. ”proficiency” and ‘communicative competence” are linguistic construct;: self-esteem” and “motivation” are psychological construct. Virtually every issue in language learning and teaching involves theoretical constructs. In a field of assessment, constructs validity asks, “does this test actually tap into the theoretical construct as it has been defined?” tests, in a manner of speaking operational definition of construct in the they operationalize the entity that is being measured (see Davidson, Hudson, & Lynch, 1985).

Consequential Validity.
As high-stakes assessment has gained ground in the last two decade, one aspect of consequential validity had drawn special attention: the effect of test preparation courses and manual on performance. McNamara (200, p. 54) cautions against test result that may reflect socioeconomic condition such as opportunities for coaching that are “differentially available to the student being assessed (for example , because only some families can afford coaching, or because children with more highly educated parents get help from their parents)”.

Face Vadility.
An important faces of consequential validity is the extent to which ‘student view the assessment as fair, relevant, and useful for improving learning” (Gronlund, 1998, p. 201), or what is popularly know as face validity. “face validity refers to the degree to which a test looks right, and appears to measure the knowledge or abilities it claims to measure ,based on the subjective judgment of the examinees who take it, the administrative personnel who decide on its use, and other psychometrically unsophisticated observers” (mousavi, 2002, p. 244).
Sometimes student don’t’ know what is being tested when they tackle a test. They many fell, for a variety of reason, that a isn’t testing what is it “supposed” to test. Validity means that the student perceive the test to be valid. Face validity asks the question “does the rest, on the ‘face’ of it, appear from the learner’s perspective to test what is it designed to test?” face validity will likely be high if learner encounter.
• A well-constructed, expected format with familiar tasks,
• A test that is clearly doable within the allotted time limit,
• Items that are clear and uncomplicated,
• Directions that are crystal clear,
• Tasks that relate to their course work (content validity), and
• A difficulty level that present a reasonable challenge.

AUTHENTICITY.

A fourth major principle of language testing is authenticity, a concept that is a little slippery to define, especially within the art and science of evaluating and designing tests. Bachman and Palmer (1996,p,23) define authenticity as “the degree of correspondence of the characteristics of a given language test task to the features of a target language task, “ and then suggest an agenda for identifying those target language tasks and for transforming them into valid test items.
Essentially, when you make a claim for authenticity in a test task, you are saying that this task is likely to be enacted in the “real world.” Many test item types fail to simulate real-world tasks. They may be contrived or artificial in their attempt to target a grammatical from or a lexical item. The sequencing of items that bear no relationship to one another lacks authenticity. One does not have to look very long to find reading comprehension passages in proficiency tests that do not reflect a real-world passage.
In a test, authenticity may be present in the following ways:
• The language in the test is a natural as possible.
• Items are contextualized rather than isolated.
• Topics are meaningful (relevant, interesting) for the learner.
• Some thematic organization to items is provided, such as through a story line or episode.
• Tasks represent, or closely approximate, real-world tasks.

WASHBACK

A facet of consequential validity, discussed above, is “the effect of testing on teaching and learning” (Hughes, 2003, p.1), otherwise known among language-testing specialists as washback. In large-scale assessment, washback generally refers to the effect the tests have on instruction in terms of how students prepare for the test. “Cram” courses and “teaching to the test” are examples of such washback. Another form of washback that occurs more in classroom assessment is the information that “washes back” to students in the form of useful diagnoses of strengths and weak nesses. Washback also includes the effects of an assessment on teaching and learning prior to the assessment itself, that is, on preparation for the assessment. Informal performance assessment is by nature more likely to have built-in washback effects because the teacher is usually providing interactive feedback. Formal tests can also have positive washback, but they provide on washback if the students receive a simple letter grade or a single overall numerical score.

References:

[H._Douglas_Brown]_Language_Assessment_-_Principle(BookFi) (2)

problem analysis through aspects of reliability, practicality and validity

Assigment 3

SOAL UTS BAHASA INGGRIS.
QUESTION :
1. Write the list of Economic Terms (Management/Accounting Terms), atleast 10 terms!
2. Match the economic terms on the left with the meaning on the right :
a. Accounting a. Place of work, task or assigment, reguralrly paid.
b. Bank b. An account entry with a positive value for assets, and
negative value for liabilities and equity.
c. Job c. Establishment for keeping money, valuabies, etc. Safety.
d. Credit d. Actifities of Planning, Organizing, Actuating, Controlling in
on organization.
e. Debit e. Property with a cash value that is owned by a bussiness or
individual.
f. Management f. Selling or being sold, occasion ehen goods are sold at lower
prices than usual.
g. Assets g. Proces of identifying, measuring and reporting financial
information
h. Journal h. Rooms or building used as a place of bussiness, esp. For
clericalor administrative work.
i. Office i. A record where transactions are recorded, also kown as an
account
j. Sale j. An account entry with a negative value for assets, and
positive value for liabilities and equity.

3. A. Make three sentences by using Simple Present Tense!
B. Make a positive, negative, interogative sentence of nominal and verbal sentence!
4. A. Arrangge the sentences below become a good prosedure text :
- Third, put the tomato sauce
- First, prepare the materials and tools.
- Then, add salad, cheese, egg, and mayonnaise.
- Second, place a slice of bread on the plate
- After that, put a lice of bread on the top.
B. Make the title of that procedure text !
5. What have you known about application letter and curicullum vitae?
6. Correct the verb in the brackets become past simple tense :
a. My class...(practice) job interview last week.
b. Our lecturer...(give) us task to make application letter.
c. ...(do) you attend the additional meeting?
d. Mr, Danang...(teach) cost accounting last semester.
7. Do you know football player Cristiano Ronaldo?Describe about him!

Analysis results
Practicality.
In the English uts problem, there are 7 questions, this problem is a measuring tool to measure a person's ability. This question is in the form of a description of the problem and is used to find out the extent of the experience or learning they learned during their time in class.
Practicality
in this midterm exam must involve experienced people to oversee the exam participants who take the exam, this is needed because to get clear and pure exam results, and no interference from other people.
Validity
• content validity contained in this problem is the content of learning material that has been taught in class and tested again through this test.
• construct validity that is in this test is that there are participant test materials based on the field taken.
• size validity is not contained in this pitch because size validity focuses on counts, algebra and mathematics.
• validity in conccurrent contained in this test is to measure the level of one's ability in a field that is tested.

Language Assessment