Archive for November 2016

Assessing Writing

Writing assessment can be used for a variety of appropriate purposes, both inside the classroom and outside: providing assistance to students, awarding a grade, placing students in appropriate courses, allowing them to exit a course or sequence of courses, certifying proficiency, and evaluating programs-- to name some of the more obvious. Given the high stakes nature of many of these assessment purposes, it is crucial that assessment practices be guided by sound principles to insure that they are valid, fair, and appropriate to the context and purposes for which they designed. This position statement aims to provide that guidance.

Guiding Principles for Assessment

1. Writing assessment is useful primarily as a means of improving teaching and learning. The primary purpose of any assessment should govern its design, its implementation, and the generation and dissemination of its results.

As a result…

A. Best assessment practice is informed by pedagogical and curricular goals, which are in turn formatively affected by the assessment. Teachers or administrators designing assessments should ground the assessment in the classroom, program or departmental context. The goals or outcomes assessed should lead to assessment data which is fed back to those involved with the regular activities assessed so that assessment results may be used to make changes in practice.

B. Best assessment practice is undertaken in response to local goals, not external pressures. Even when external forces require assessment, the local community must assert control of the assessment process, including selection of the assessment instrument and criteria.

C. Best assessment practice provides regular professional development opportunities.Colleges, universities, and secondary schools should make use of assessments as opportunities for professional development and for the exchange of information about student abilities and institutional expectations.

2. Writing is by definition social. Learning to write entails learning to accomplish a range of purposes for a range of audiences in a range of settings.

As a result…

A. Best assessment practice engages students in contextualized, meaningful writing.The assessment of writing must strive to set up writing tasks and situations that identify purposes appropriate to and appealing to the particular students being tested. Additionally, assessment must be contextualized in terms of why, where, and for what purpose it is being undertaken; this context must also be clear to the students being assessed and to all stakeholders.

B. Best assessment practice supports and harmonizes with what practice and research have demonstrated to be effective ways of teaching writing. What is easiest to measure—often by means of a multiple choice test—may correspond least to good writing; choosing a correct response from a set of possible answers is not composing. As important, just asking students to write does not make the assessment instrument a good one. Essay tests that ask students to form and articulate opinions about some important issue, for instance, without time to reflect, talk to others, read on the subject, revise, and have a human audience promote distorted notions of what writing is. They also encourage poor teaching and little learning. Even teachers who recognize and employ the methods used by real writers in working with students can find their best efforts undercut by assessments such as these.

C. Best assessment practice is direct assessment by human readers. Assessment that isolates students and forbids discussion and feedback from others conflicts with what we know about language use and the benefits of social interaction during the writing process; it also is out of step with much classroom practice. Direct assessment in the classroom should provide response that serves formative purposes, helping writers develop and shape ideas, as well as organize, craft sentences, and edit. As stated by the CCCC Position Statement on Teaching, Learning, and Assessing Writing in Digital Environments, “we oppose the use of machine-scored writing in the assessment of writing.” Automated assessment programs do not respond as human readers. While they may promise consistency, they distort the very nature of writing as a complex and context-rich interaction between people. They simplify writing in ways that can mislead writers to focus more on structure and grammar than on what they are saying by using a given structure and style.

3. Any individual's writing ability is a sum of a variety of skills employed in a diversity of contexts, and individual ability fluctuates unevenly among these varieties.

As a result…

A. Best assessment practice uses multiple measures. One piece of writing—even if it is generated under the most desirable conditions—can never serve as an indicator of overall writing ability, particularly for high-stakes decisions. Ideally, writing ability must be assessed by more than one piece of writing, in more than one genre, written on different occasions, for different audiences, and responded to and evaluated by multiple readers as part of a substantial and sustained writing process.

B. Best assessment practice respects language variety and diversity and assesses writing on the basis of effectiveness for readers, acknowledging that as purposes vary, criteria will as well. Standardized tests that rely more on identifying grammatical and stylistic errors than authentic rhetorical choices disadvantage students whose home dialect is not the dominant dialect. Assessing authentic acts of writing simultaneously raises performance standards and provides multiple avenues to success. Thus students are not arbitrarily punished for linguistic differences that in some contexts make them more, not less, effective communicators. Furthermore, assessments that are keyed closely to an American cultural context may disadvantage second language writers. The CCCC Statement on Second Language Writing and Writers calls on us "to recognize the regular presence of second-language writers in writing classes, to understand their characteristics, and to develop instructional and administrative practices that are sensitive to their linguistic and cultural needs." Best assessment practice responds to this call by creating assessments that are sensitive to the language varieties in use among the local population and sensitive to the context-specific outcomes being assessed.

C. Best assessment practice includes assessment by peers, instructors, and the student writer himself or herself. Valid assessment requires combining multiple perspectives on a performance and generating an overall assessment out of the combined descriptions of those multiple perspectives. As a result, assessments should include formative and summative assessments from all these kinds of readers. Reflection by the writer on her or his own writing processes and performances holds particular promise as a way of generating knowledge about writing and increasing the ability to write successfully.

4. Perceptions of writing are shaped by the methods and criteria used to assess writing.

As a result…

A. The methods and criteria that readers use to assess writing should be locally developed, deriving from the particular context and purposes for the writing being assessed. The individual writing program, institution, or consortium, should be recognized as a community of interpreters whose knowledge of context and purpose is integral to the assessment. There is no test which can be used in all environments for all purposes, and the best assessment for any group of students must be locally determined and may well be locally designed.

B. Best assessment practice clearly communicates what is valued and expected, and does not distort the nature of writing or writing practices. If ability to compose for various audiences is valued, then an assessment will assess this capability. For other contexts and purposes, other writing abilities might be valued, for instance, to develop a position on the basis of reading multiple sources or to compose a multi-media piece, using text and images. Values and purposes should drive assessment, not the reverse. A corollary to this statement is that assessment practices and criteria should change as conceptions of texts and values change.

C. Best assessment practice enables students to demonstrate what they do well in writing. Standardized tests tend to focus on readily accessed features of the language (grammatical correctness, stylistic choices) and on error rather than on the appropriateness of the rhetorical choices that have been made. Consequently, the outcome of such assessments is negative: students are said to demonstrate what they do wrong with language rather than what they do well. Quality assessments will provide the opportunity for students to demonstrate the ways they can write, displaying the strategies or skills taught in the relevant environment.

5. Assessment programs should be solidly grounded in the latest research on learning, writing, and assessment.

As a result…

A. Best assessment practice results from careful consideration of the costs and benefits of the range of available approaches. It may be tempting to choose an inexpensive, quick assessment, but decision-makers should consider the impact of assessment methods on students, faculty, and programs. The return on investment from the direct assessment of writing by instructor-evaluators includes student learning, professional development of faculty, and program development. These benefits far outweigh the presumed benefits of cost, speed, and simplicity that machine scoring might seem to promise.

B. Best assessment practice is continually under review and subject to change by well-informed faculty, administrators, and legislators. Anyone charged with the responsibility of designing an assessment program must be cognizant of the relevant research and must stay abreast of developments in the field. The theory and practice of writing assessment is continually informed by significant publications in professional journals and by presentations at regional and national conferences. The easy availability of this research to practitioners makes ignorance of its content reprehensible.

Applications to Assessment Settings

The guiding principles apply to assessment conducting in any setting. In addition, we offer the following guidelines for situations that may be encountered in specific settings.

Assessment in the Classroom

In a course context, writing assessment should be part of the highly social activity within the community of faculty and students in the class. This social activity includes:

a period of ungraded work (prior to the completion of graded work) that receives response from multiple readers, including peer reviewers,
assessment of texts—from initial through to final drafts—by human readers, and
more than one opportunity to demonstrate outcomes.

Self-assessment should also be encouraged. Assessment practices and criteria should match the particular kind of text being created and its purpose. These criteria should be clearly communicated to students in advance so that the students can be guided by the criteria while writing.

Assessment for Placement

Placement criteria in the most responsible programs will be clearly connected to any differences in the available courses. Experienced instructor-evaluators can most effectively make a judgment regarding which course would best serve each student’s needs and assign each student to the appropriate course. If scoring systems are used, scores should derive from criteria that grow out of the work of the courses into which students are being placed.

Decision-makers should carefully weigh the educational costs and benefits of timed tests, portfolios, directed self placement, etc. In the minds of those assessed, each of these methods implicitly establishes its value over that of others, so the first impact is likely to be on what students come to believe about writing. For example, timed writing may suggest to students that writing always cramps one for time and that real writing is always a test. Machine-scored tests may focus students on error-correction rather than on effective communication. In contrast, the value of portfolio assessment is that it honors the processes by which writers develop their ideas and re-negotiate how their communications are heard within a language community.

Students should have the right to weigh in on their assessment. Self-placement without direction may become merely a right to fail, whereas directed self-placement, either alone or in combination with other methods, provides not only useful information but also involves and invests the student in making effective life decisions.

If for financial or even programmatic reasons the initial method of placement is somewhat reductive, instructors of record should create an opportunity early in the semester to review and change students’ placement assignments, and uniform procedures should be established to facilitate the easy re-placement of improperly placed students. Even when the placement process entails direct assessment of writing, the system should accommodate the possibility of improper placement. If assessment employs machine scoring, whether of actual writing or of items designed to elicit error, it is particularly essential that every effort be made through statistical verification to see that students, individually and collectively, are placed in courses that can appropriately address their skills and abilities.

Placement processes should be continually assessed and revised in accord with course content and overall program goals. This is especially important when machine-scored assessments are used. Using methods that are employed uniformly, teachers of record should verify that students are appropriately placed. If students are placed according to scores on such tests, the ranges of placement must be revisited regularly to accommodate changes in curricula and shifts in the abilities of the student population.

Assessment of Proficiency

Proficiency or exit assessment involves high stakes for students. In this context, assessments that make use of substantial and sustained writing processes are especially important.

Judgments of proficiency must also be made on the basis of performances in multiple and varied writing situations (for example, a variety of topics, audiences, purposes, genres).

The assessment criteria should be clearly connected to desired outcomes. When proficiency is being determined, the assessment should be informed by such things as the core abilities adopted by the institution, the course outcomes established for a program, and/or the stated outcomes of a single course or class. Assessments that do not address such outcomes lack validity in determining proficiency.

The higher the stakes, the more important it is that assessment be direct rather than indirect, based on actual writing rather than on answers on multiple-choice tests, and evaluated by people involved in the instruction of the student rather than via machine scoring. To evaluate the proficiency of a writer on other criteria than multiple writing tasks and situations is essentially disrespectful of the writer.

Assessment of Programs

Program assessment refers to evaluations of performance in a large group, such as students in a multi-section course or majors graduating from a department. Because assessment offers information about student performance and the factors which affect that performance, it is an important way for programs or departments to monitor and develop their practice.

Programs and departments should see themselves as communities of professionals whose assessment activities reveal common values, provide opportunities for inquiry and debate about unsettled issues, and communicate measures of effectiveness to those inside and outside the program. Members of the community are in the best position to guide decisions about what assessments will best inform that community. It is important to bear in mind that random sampling of students can often provide large-scale information and that regular assessment should affect practice.

Assessment for School Admission

Admissions tests are not only high stakes for students, they are also an extremely important component for educational institutions determining if they and a student are an appropriate match. Consequently, where students’ writing ability is a factor in the admissions decision, the writing assessments should consist of direct measures of actual writing. Moreover, the assessment should consist of multiple writing tasks and should allow sufficient time for a student to engage in all stages of the writing process.
Assessments should be appropriate to educational institutions’ distinctive missions and student populations, although similar institutions may collaborate to create assessments. Assessment should be developed in consultation with high school writing teachers.

ASSESSING SPEAKING

Basic Types of Speaking (Brown:2004)

1. Imitative

At one end of a continuum of types of speaking performance is the ability to simply parrot back (imitate) a word or phrase or possibly a sentence. While this is a purely phonetic level of oral production, a number of prosodic, lexical and grammatical properties of language may be included in criterion performance.

2. Intensive

A second types of speaking frequently employed in assessment contexts is the production of short streches of oral language designed to demonstrate competence in a narrow band of grammatical, phrasal, lexical, or phonological relationships (such as prosodic elements – intonation, stress, rhytm, juncture). Examples of intensive assessment tasks include directed response tasks, reading aloud, sentence and dialogue completion; limited picture-cued tasks including simple sequences; and translation up to the simple sentence level.

3. Responsive

This assessment tasks include interaction and test comprehension but at the somewhat limited level of very short conversation, standard greeting and small talk, simple requests and comments, and the like. The stimulus is almost always a spoken prompt (in order to preserve authebticity), with perharps only one or two follow-up questions or retorts :

A. Mary : Excuse me, do you have the time ?

Doug : Yeah. Nine-fifteen

B. Jeff : Hey, Stef, how’s it going ?

Stef : Not bad and yourself ?

Jeff : I’m good

Stef : Cool. Okay,gotta go

4. Interactive

The difference between responsive and interactive speaking is in the length and complexity of the interaction, which sometimes includes multiple exchanges and/or paricipants. Interaction can take the two forms of transactional language, which has the purpose of exchanging specific information, or interpwrsonal exchanges, which have the purpose of maintaining social relationships. (in two dialogues cited above, A was transactional and B was interpersonal). In interpersonal exchanges, oral production can become pragmatically complex with the need to speak in a casul register and use colloquial language, ellipsis, slang, humor, and the other sociolinguistic conventions.

5. Extensive (monologue)

Extensive oral production tasks include speeches, oral presentations, and story telling, during which the opportunity for oral interaction from listeners is either highly limited (perhaps to nonverbal responses) or ruled out altogether.

Micro and Macroskills of Speaking

The microskills refer to producing the smaller chunks of language such as phonemes, morphemes, words, collocations, and phrasal units. The macroskills imply the speaker’s focus on the larger elements: fluency, discourse, function, style, cohesion, nonverbal communication, and strategic options. The micro- and macroskills total roughly 16 different objectives to assess in speaking.

References:

Retrievred from https://iezhkuncung.wordpress.com/2012/12/04/assesing-speaking-in-evaluation-on-elt/

http://www.ncte.org/cccc/resources/positions/writingassessment

assessing speaking and writing

Senin, 28 November 2016

Author : Unknown Comments : 0

Tag :

ASSESSING LISTENING

OBSERVING THE PERFORMANCE OF THE FOUR SKILL

Before focusing on listening itself, think about the two interacting concepts of performance and observation. All language users perform the acts of listening. Speaking, reading & writing. They of course rely on their underlying competence in order to accomplish these performances. When you propose to asses that person’s competence, but you observe the person’s performance.

THE IMPORTANCE OF LISTENING

Listening has often played second fiddle to its counterpart, speaking in the standardized testing industry, a number separate oral production test are available (Test of spoken English, Oral proficiency inventory and phone pass. to name several that are described chapter 7 of this book, but it is rare to find just a listening test. on reason for this emphasis is that listening is often implied as a component of speaking. in addition the overtly observable nature of speaking renders it more empirically measurable than listening.

We therefore need to pay close attention to listening as a mode of performance for assessment in the classroom in this chapter we will begin with basic principals and types listening, than move to a survey of task that can be used to assess listening.

BASIC TYPES of LISTENING

Ass all with effective test, designing appropriate assessment tasks in listening begins wait the specification of objectives, or criteria those objective may be classified in term of several types of listening performance.

1. You recognize speech sounds and hold a temporary “imprint” of them in short term memory.

2. You simultaneously determine the type of speech event (monologue, interpersonal dialogue, transactional dialogue) that is being processed and attend to its context (who the speakers is location purpose)

3. You use (bottom up) listening decoding skill and /or (top-down) background schemata to bring a plausible interpretation to the message and assign a literal and intended meaning to the utterance.

4. In most cases (expect for repetition task, which involve short-term memory only)

Each of these stages represents a potential assessment objective:

 Comprehending of surface structure elements such as phonemes word intonation, or a grammatical category

 understanding of pragmatic context

 determining meaning of auditory input

 developing the gist, a global or comprehensive understanding

From these stages we can derive four commonly identified types of listening performance, each of which comprises a category within which to consider assessment tasks and procedures.

1. intensive

2. Responsive

3. selective

4. extensive

INTENSIV LISTENING

Once you have determined objectives, your next step is to design the task including making decisions about how you will elicit performance and how you will expect the test-taker to respond. We will look at tasks that range from intensive listening performance, such as minimal phonemic pair recognition to extensive comprehension of language in communicative context. The focus in this section is on the micro skill of intensive listening.

Recognizing Phonological and Morphological Elements

A typical form of intensive listening at this level is the assessment of recognizing of phonological and morphological elements of language. A classic test task gives a spoken stimulus and asks test-takers to identify the stimulus from two or more choices.

PARAPHRASE RECOGNITION

The next step up on the scale of listening comprehension micro skills is words, phrases and sentences which are frequently assessed by providing a stimulus sentences and asking the test-taker to choose the correct paraphrase from a number of choices.

RESPONSIVE LISTENING

A question and answer format can provide some interactivity in these lower end listening tasks. The test-taker’s response is the appropriate answer to a question.

SELECTIVE LISTENING

A third type of listening performance is selective listening in which the test-taker listens to a limited quantity of aural input and must discern within it some specific information. A number of techniques have been used that require selective listening.

Listening Cloze

Listening cloze tasks (sometimes called cloze dictations or partial dictations) require the test-taker to listen to a story, monologue, on conversation and simultaneously read the written text in which selected words or phrases have been deleted. Cloze procedure is most commonly associated with reading only (see chapter 9). In its generic form, the test consists of a passage in which every nth word (typically every seventh word) is deleted and the text-taker is asked to supply an appropriate word. in a listening cloze task, test-taker see a transcript of the passage that they are to listening to and fill in the blanks with the words or phrases that they hear.

Information Transfer

Selective listening can also be assessed through an information transfer technique in which aurally processed information must be transferred to a visual representation, such as labeling a diagram, identifying an element in a picture, completing a form, or showing routes on a map.

SENTENCE REPETITION

Sentence repetition is far from a flawless listening assessment task. buck (2001 p. 79) noted that that such tasks “are not just tests of listening, but tests of general oral skills”. Further, this tasks may tests only recognition of sounds, and it can easily be contaminated by lack of short-term memory ability, thus invalidating it as an assessment toning comprehension error from an oral production error. therefore, sentence repetition tasks should be used with caution.

EXTENSIVE LISTENING

Drawing a clear distinction between any two of the categories of listening referred to hear is problematic, but perhaps the fuzziest division is between selective and extensive listening. As we gradually move along the continuum from smaller to larger stretches of language, and from micro- to macro skill of listening. the probability of using more extensive listening task increases.

DICTATION

Dictation is a widely researched genre of assessing listening comprehension. in a dictation, test takers hear passage, typically of 50 to 100 word, recited three times: first at normal speech; than, with long pauses between phrases or natural word groups, during which time test takers write down what they have just heard; and finally, at normal speech once more so they can check there work and proofread.

Scoring criteria for several possible kinds of errors:

 Spelling error only, but the word appears to have been heard correctly

 Spelling and / obvious misrepresentation of word, illegible word

 Grammatical error (for example, test-takers hears I can’t do it, writes I can do it)

 skipped word or phrase

 permutation words not in the original

 replacement of a word with an appropriate synonym

Communicative stimulus – response tasks

Example of extensive listening is found in a popular genre of assessment tasks in which the task-takers is presented with a stimulus monologue or conversation and than is asked to respond to a set of comprehension questions.

Authentic listening tasks

Ideally, the language assessment field would have a stockpile of listening test types that are cognitively demanding, communicative, and authentic, not to mention interactive by means of integration with speaking. The nature of test as a sample of performance and a set of task with limited time frames implies an equally limited capacity to mirror all the real-world contexts of listening performance.

There is no such thing as a communicative test, “stated back (2001.p92).”Every test requires some components of communicative language ability, and no test covers them all similarly, with the nation of authenticity every task shares some characteristics with target-language tasks, and no test is completely authentic.

1. Note-taking. in the academic world, classroom lectures by professors are common features of a non-native English-user’s experience . These notes are evaluated by the teacher on a 30-points system, as follows:

Scoring system for lecture notes

0-15 points

Visual representation: Are your notes clear and easily to read? Can you easily find and retrieve information from them? Do you use the space on the paper to visually represented ideas? Do you use indention headers, numbers, etc?

0-10 points

Accuracy: Do you accurately indicate main ideas from lectures? Do you note important details and supporting information and examples? Do you leave out unimportant information and tangents?

0-5 points

Symbols and abbreviations: Do you use symbols and abbreviations as much as possible to save time? Ado you avoid writing out whole words and do you avoids writing down every single word the lectures say?

2. Editing. Another authentic task provides both a written and a spoken stimulus, and requires the test-taker to listen for discrepancies. Scoring achieves relatively high reliability as there are usually a small number of specific differences that must be identified. Here is the way the task proceeds.

3. Interpretive tasks. One of the intensive listening tasks described above was paraphrasing a story or conversation. An interpretive task extends the stimulus material to a longer stretch of discourse and forces the test – takers to infer a response potential stimulus include.

 Song lyrics

 (recited) poetry

 radio/television news reports and

 an oral account of an experience

4. Retelling. in a related task, tasks takers listen to a story or news event and simply retell it, or summarize it, either orally (on an audiotape) or in writing. In so doing, test takers must identify the gist, main idea, purpose, supporting points.

Assesing Listening (group 5)

Sabtu, 12 November 2016

Author : Unknown Comments : 0

Tag :

STANDARDIZED TESTING

A standardized test is any form of test that requires all test takers to answer the same questions, or a selection of questions from common bank of questions, in the same way, and that (2) is scored in a “standard” or consistent manner, which makes it possible to compare the relative performance of individual students or groups of students. While different types of tests and assessments may be “standardized” in this way, the term is primarily associated with large-scale tests administered to large populations of students, such as a multiple-choice test given to all the eighth-grade public-school students in a particular state, for example.

In addition to the familiar multiple-choice format, standardized tests can include true-false questions, short-answer questions, essay questions, or a mix of question types. While standardized tests were traditionally presented on paper and completed using pencils, and many still are, they are increasingly being administered on computers connected to online programs (for a related discussion, see computer-adaptive test). While standardized tests may come in a variety of forms, multiple-choice and true-false formats are widely used for large-scale testing situations because computers can score them quickly, consistently, and inexpensively. In contrast, open-ended essay questions need to be scored by humans using a common set of guidelines or rubrics to promote consistent evaluations from essay to essay—a less efficient and more time-intensive and costly option that is also considered to be more subjective. (Computerized systems designed to replace human scoring are currently being developed by a variety of companies; while these systems are still in their infancy, they are nevertheless becoming the object of growing national debate.)

While standardized tests are a major source of debate in the United States, many test experts and educators consider them to be a fair and objective method of assessing the academic achievement of students, mainly because the standardized format, coupled with computerized scoring, reduces the potential for favoritism, bias, or subjective evaluations. On the other hand, subjective human judgment enters into the testing process at various stages—e.g., in the selection and presentation of questions, or in the subject matter and phrasing of both questions and answers. Subjectivity also enters into the process when test developers set passing scores—a decision that can affect how many students pass or fail, or how many achieve a level of performance considered to be “proficient.” For more detailed discussions of these issue, see measurement error,test accommodations, test bias and score inflation.

Standardized tests may be used for a wide variety of educational purposes. For example, they may be used to determine a young child’s readiness for kindergarten, identify students who need special-education services or specialized academic support, place students in different academic programs or course levels, or award diplomas and other educational certificates. The following are a few representative examples of the most common forms of standardized test:

Achievement tests are designed to measure the knowledge and skills students learned in school or to determine the academic progress they have made over a period of time. The tests may also be used to evaluate the effectiveness of a schools and teachers, or identify the appropriate academic placement for a student—i.e., what courses or programs may be deemed most suitable, or what forms of academic support they may need. Achievement tests are “backward-looking” in that they measure how well students have learned what they were expected to learn.

Aptitude tests attempt to predict a student’s ability to succeed in an intellectual or physical endeavor by, for example, evaluating mathematical ability, language proficiency, abstract reasoning, motor coordination, or musical talent. Aptitude tests are “forward-looking” in that they typically attempt to forecast or predict how well students will do in a future educational or career setting. Aptitude tests are often a source of debate, since many question their predictive accuracy and value.

College-admissions tests are used in the process of deciding which students will be admitted to a collegiate program. While there is a great deal of debate about the accuracy and utility of college-admissions tests, and many institutions of higher education no longer require applicants to take them, the tests are used as indicators of intellectual and academic potential, and some may consider them predictive of how well an applicant will do in postsecondary program.

International-comparison tests are administered periodically to representative samples of students in a number of countries, including the United States, for the purposes of monitoring achievement trends in individual countries and comparing educational performance across countries. A few widely used examples of international-comparison tests include the Programme for International Student Assessment (PISA), the Progress in International Reading Literacy Study(PIRLS), and the Trends in International Mathematics and Science Study(TIMSS).

Psychological tests, including IQ tests, are used to measure a person’s cognitive abilities and mental, emotional, developmental, and social characteristics. Trained professionals, such as school psychologists, typically administer the tests, which may require students to perform a series of tasks or solve a set of problems. Psychological tests are often used to identify students with learning disabilities or other special needs that would qualify them for specialized services.

Standardized Tests: Advantages

Student: So are all standardized tests good to use?

Expert: Well, actually, there are multiple advantages and disadvantages of these types of tests. Let's talk about the advantages first.

There are many advantages of standardized testing:

Standardized tests are practical, they're easy to administer, and they consume less time to administer versus other assessments.
Standardized testing results are quantifiable. By quantifying students' achievements, educators can identify proficiency levels and more easily identify students in need of remediation or advancement.
Standardized tests are scored via computer, which frees up time for the educator.
Since scoring is completed by computer, it is objective and not subject to educator bias or emotions.
Standardized testing allows educators to compare scores to students within the same school and across schools. This information provides data on not only the individual student's abilities but also on the school as a whole. Areas of school-wide weaknesses and strengths are more easily identifiable.
Standardized testing provides a longitudinal report of student progress. Over time, educators are able to see a trend of growth or decline and rapidly respond to the student's educational needs

Standardized Tests: Disadvantages

Expert: There are disadvantages of standardized testing. Standardized testing is also highly scrutinized. Critics cite the following disadvantages for the use of standardized testing:

Standardized test items are not parallel with typical classroom skills and behaviors. Due to the fact that questions have to be generalizable to the entire population, most items assess general knowledge and understanding.
Since general knowledge is assessed, educators cannot use standardized test results to inform their individual instruction methods. If recommendations are made, educators may begin to 'teach to the test' as opposed to teaching what is currently in the curriculum or based on the needs of their individual classroom.
Standardized test items do not assess higher-level thinking skills.
Standardized test scores are greatly influenced by non-academic factors, such as fatigue and attention.

From :

http://edglossary.org/standardized-test/

http://study.com/academy/lesson/standardized-tests-in-education-advantages-and-disadvantages.html

STANDARDIZED TESTING (group 4)

Author : Unknown Comments : 0

Tag :

lfc

<p>/

tugas ict

Minggu, 06 November 2016

Author : Unknown Comments : 0

Tag :

Designing Classroom Language Tests

Papers created to fulfill the Language Teaching Evaluation course task

Lecturer: Trisilia Devana, M.Pd

Written by Group 3

Riki Pratama Bhakti 14 23 004

Lestari Yuli Prehatin 14 23 018

Rivi Yuandari Ulga 14 23 040

Nita Kurniawati 14 23 045

Reyzha Ramadhan Putra 14 23 042

ENGLISH EDUCATION STUDY PROGRAM

FACULTY OF TEACHER TRAINING AND EDUCATION

BATURAJA UNIVERSITY

2016

PREFACE

Assalammualaikum Wr.Wb.

With particular thanks to Allah SWT because of His blessings and He has given the body and spiritual healthy, so the authors can complete the assignment of Language Teaching Evaluation course in the form of a paper entitled "Designing Classroom Language Tests" appropriately.

This paper precedes the distinctions and clarification of test types, some practical steps to test construction by clear and appropriate explanations.

In this occasion, the authors also wish to express many thanks to all of partners who have helped finishing this paper and also to the lecturer of Language Teaching Evaluation course, Trisilia Devana, M.Pd who has guided the authors to make this paper well.

The authors realize that this paper is still far from perfect. Therefore, the authors expect for the critical and suggestions for the next perfectly papers. The authors are grateful for the attention. Hopefully this paper can inspire readers.

Wassalammualaikum Wr.Wb.

Baturaja, October 2016

The author

CHAPTER I

CONTENT

DESIGNING CLASSROOM LANGUAGE TESTS

1. What is the purpose of the test? Why am i creating this test or why was it created by someone else ? for an evaluation of overall proficiency? To place students into a course? To measure achievements within a course? once you have established the major purpose of a test, you can determine its objectives.

2. what are the objectives of the test? What specifically am i trying to find out? Establishing appropriate objectives involves a number of issues, ranging from relatively simple ones about forms and functions covered in a course unit to much more complex ones about constructs to be operationalized in the test. Included here are decisions about what language abilities are to be assessed.

3. How will the test specifications reflect both the purpose and the objectives? To evaluate or design the test you must make sure that the objectives are incorporated into a structure that appropriately weights the various competencies being assessed.

4. How will the test tasks be selected and the separate items arranged? The tasks that the test-takers must perform need to be practical in the ways defined in previous chapter. They should also achieve content validity by presenting tasks that mirror those of the course being assessed. Further, they should be able to be evaluated reliably by the teacher or scorer. The tasks rhemselves should strive for authenticity and the progression of tasks ought to be blased for best performance.

5. What kind of scoring, grading, and or feedback is exoected? Tests vary in the form and function of feedback, depending on their purpose. For every test, test, the way results are reported is an important consideration. Under some circumtances a lettere grade or a holistic score may be appropriate; other circumtances may require that a teacher offer substantive washback to the learner.

TEST TYPES

The first task you will face in designing a test for your students is to determine the purpose for the test. Defining your purpose will help you choose the right kind of test, and it will also help you to focus on the spesific objectives of the test. We will look first at two test types that you will probably not have many opportunities to create as a classroom teacher-language aptitude tests and language proficiency tests-and three types that you will almost certainly need to create-placement tests, diagnostic tests and achievements tests.

Language Aptitude Tests

One type of test-although admittedly not a very common one-predicts a person’s success prior to exposure to the second language. A language aptitude test is designed to measure capacity or general ability to learn a foreign language and ultimate success in that undertaking.
two standardized aptitude tests have been used in the united states: the Modern Language Aptitude Tests (MLAT) and the Pimsleur Language Aptitude Battery (PLAB). Both are english language tests and require students to perform a number of language-related tasks. The MLAT, for example consist of five tasks.

1. Number learning: Examinees must learn a set of numbers through aural input and then discriminate different combinations of those numbers.

2. Phonetic script: Examinees must learn a set of correspondences between speech sounds and phonetic symbols.

3. Spelling clues: Examinees must read words that are spelled somewhat phonetically, and then select from a list the one word whose meaning is closest to the “disguised” word.

4. Words in sentences: Examinees are given a key word in a sentence and are then asked to select a word in a second sentence that performs the same grammatical function as the key word.

5. Paired associates: Examinees must quickly learn a set of vocabulary words from another language and memorize their English meanings.

Proficiency Tests

If your aim is to test global competence in a language, then you are, in conventional terminology, testing proficiency. A proficiency test is not limited to any one course, curriculum or single skill in the language. It tests overall ability. Proficiency test have traditional consisted of standardized multiple-choice items on grammar, vocabulary, reading comprehension, and aural comprehension. Sometimes a sample of writing is added, and more recent tests also include oral production performance. As noted in the previous chapter, such tests often have brought us much closer to constructing successful communicative proficiency tests.

A typical example of a standardized proficiency test is the test of english as a Foreign Language (TOEFL) produced by the Educational Testing Service. The TOEFL is ussed by more than a thousand indtitutions of higher education in the United States as an indicator of a prospective student’s ability to undertake academic work in an English-speaking millieu. The TOEFL consists of sections on listening comprehension, structure, reading comprehension, and written expression.

A key issue in testing proficiency is how the constructs of language ability are specified the task that test-takers are required to perform must be legitimate samples of English language use in a defined context. Creating these tasks and validating them with research is a time-consuming and costly process. Language teachers would be wise not to create an overall proficiency test on their own. A far more practical method is to choose one of a number of commerciallly available proficiency tests.

Placement Tests

Certain proficiency tests can act in the role of placement tests, the purpose of which is to placea student into a particular level or section of a language curriculum or school. A placement test usually, but not always, includes a sampling of the material to be covered in the various courses in a curriculum; a student’s preformance on the test should indicate the point at which the student will find material neither too easy nor too difficult but appropriately challenging.

Placement tests come in many varieties: assesing comprehension and productioin, responding through written and oral performance, open-ended and limited responses, selection (e.g., multiple choice) and gap-filling formats, depending on the nature of a program and its needs. Some program simply use existing standardized profeciency tests because of their obvious advantage in practicality –cost, speed in scoring, and efficient reporting of results. Others prefer the performance data available in more open-ended written and\or oral productioin. The ultimate objective of a placement test is, of course, to correctly place a student into a course or level. Secondary benefits to consider include face validity, diagnostic information on student’s performance and authenticity.

In a recent one-month special summer program in English conversation and writing at San Francisco Sate University, 30 students were to be placed into one of two sections. The ultimate objective of the placement test (consisting of a five-minute oral interview and an essay-writting task) was to find a performance-based means to devide the students evently into sections. This objective might have been achieved easily by administering a simple grid-scorable multiple-choice grammar-vocabulary test. But the interview and writting sample added some important face validity, gave a more personal touch in small program, and provided some diagnostic information on a group of learners about whom we knew very little prior to their arrival on campus.

Diagnostic Tests

A diagnostic test is designed to diagnose specified aspects of a language. A test in pronunciation, for example, might diagnose the phonological features of English that are difficult for learners and should therefore become part of a curriculum. Usually, such tests offer a checklist of features for the administrator (often the teacher) to use in pinpointing difficulties. A writing diagnostic would elicit a writing sample from students that would allow the teacher to identify those rhetorical and linguistic features on which the course needed to focus special attention.

Diagnostic and placement tests, as we have already implied, may sometimes be indistinguishable from each other. The San Francisco state ESLPT serves dual purposes. Any placement test that offers information beyond simply designating a course level may also serve diagnostic purposes.

There is also a fine line of difference between a diagnostic test and a general achievement test. Achievement tests analyze the extent to which students have acquired language features that have already been taught; diagnostic tests should elicit information on what students need to work on in the future. Therefore, a diagnostic test will typically offer more detailed subcategorized information on the learner. In a curriculum that has a form- focused phase, foe example, a diagnostic test might offer information about a learner’s acquisition of verb tenses, modal auxiliaries, definite articles, relative clauses, and the like.

A typical diagnostic test of oral production was created by Clifford Prator (1972) to accompany a manual of English pronunciation. Test-takers are directed to read a 150-word passage while they are tape-recorded. The test administrator then refers to an inventory of phonological items for analyzing a learner’s production. After multiple listenings, the administrator produces a checklist of errors in five separate categories, each of which has several subcategories. The main categories include

1. Stress and rithym,

2. Intonation,

3. Vowels, consonants, and

4. Other factors.

An example of subcategories is shown in this list for the first category (stress and rhythm) :

a. Stress on the wrong syllable (in multi-syllabic words)

b. Incorrect sentence stress

c. Incorrect division of sentences into thought groups

d. Failure to make smooth transitions between words or syllables

Each subcategory is appropriately referenced to a chapter and section of Prator’s manual. This information can help teachers make decisions about aspects of English phonology on which to focus. This same information can help a student become aware of errors and encourage the adoption of appropriate compensatory strategies.

Achievemet Test

An achievement test is related directly to classroom lessons, units, or even a total curriculum. Achievement tests are (or should be) limited to particular material addressed in a curriculum within a particular time frame and are offered after a course has focused on the objectives in question. Achievement tests can also serve the diagnostic role of indicating what a student needs to continue to work on in the future, but the primary role of an achievement test is to determine whether course objectives have been met- and appropriate knowledge and skills acquired – by the end of a period of instruction.

Achievement tests are often summative because they are administered at the end of a unit or term of study. They also play an important formative role. An effective achievement test will offer washback about the quality of a learner’s performance in subsets of the unit or course. This washback contributes to the formative nature of such tests.

The specifications for an achievement test should be determined by

· The objectives of the lesson, unit, or course being assessed,

· The relative importance (or weight) assigned to each objective,

· The task employed in classroom lessons during the unit of time,

· Practicality issues, such as the time frame for the test and turnaround time, and

· The extent to which the test structure lends itself to formative washback.

Achievement tests range from five- or ten-minute quizzes to three- hour final examinations, with an almost infinite variety of item types and formats. Here is the outline for a midterm examination offered at the high –intermediate level of an intensive English program in the United States. The course focus is on academic reading and writing ; the structure of the course and its objectives may be implied from the sections of the test.

Section A. Vocabulary

Part 1 (5 items): match words and definitions

Part 2 (5 items): use the word in a sentence

Section B. Grammar

(10 sentences) : error detection (underline or circle the error)

Section C. Reading Comprehension

(2 one- paragraph passages):four short –answer items for each

Section D. Writing

Respond to a two-paragraph article on Native American culture

Midterm examination outline, high – intermediate

SOME PRACTICAL STEPS TO TEST CONSTRUCTION

The descriptions of types of tests in the preceding section are intended to help you understand how to answer the first question posed in this chapter. What is the purpose of the test? It is unlikely that you would be asked to design an aptitude test or a proficiency test, but for the purposes of interpreting those tests, it is important that you understand their nature. However, your opportunities to design placement, diagnostic, and achievement tests- especially the latter – will be plentiful. In the remainder of this chapter, we will explore the four remaining questions posed at the outset, and the focus will be on equipping you with the tools you need to create such classroom- oriented tests.

You may think that every test you devise must be a wonderfully innovative instrument that will garner the accolades of your colleagues and the admiration of your students. Not so. First, new and innovative testing formats take a lot of effort to design and a long time to refine through trial and error. Second, traditional testing techniques can, with a little creativity, conform to the spirit of an interactive, communicative language curriculum.Your best tack as a new teacher is to work within the guidelines of accepted, known, traditional testing techniques. Slowly, with experience, you can get bolder in your attempts. In that spirit, then, let us consider some practical steps in constructing classroom tests.

Assessing Clear, Unambiguous Objectives

In addition to knowing the purpose of the test you’re creating, you need to know as specifically as possible what it is you want to test. Sometimes teachers give tests simply because it’s Friday of the third week of the course, and after hasty glances at the chapter (s) covered during those three weeks, they dash off some test items so that students will have something to do during the class. This is no way to approach a test. Instead, begin by taking a careful look at everything that you think your students should “know” or be able to “do,” based on the material that the students are responsible for. In other words, examine the objectives for the unit you are testing.

Remember that every curriculum should have appropriately framed assessable objectives, that is, objectives that are stated in terms of evort performance by students. Thus, an objective that states “Students will learn tag questions” or simply names the grammatical focus “Tag questions” is not testable. You don’t know whether students should be able to understand them in spoken or written language, or whether they should be able to produce them orally or in writing. Nor do you know in what context (a conversation? An essay? An academic lecture?) those linguistics forms should be used. Your first task in designing a test, then, is to determine appropriate objectives.

If you’re lucky, someone will have already stated those objectives clearly in performance terms. If you’re little less fortunate, you may have to go back through a unit and formulate them yourself. Let’s say you have been teaching a unit in a low intermediate integrated –skills class with an emphasis on social conversation, and involving some reading and writing, that includes the objectives outlined below, either stated already or as you have reframed them. Notice that each objective is stated in terms of the performance elicited and the target linguistic domain.

Form-focused objectives (listening and speaking)

Students will

1. Recognize and produce tag questions, with the correct grammatical form and final intonation pattern, in simple social conversations.

2. Recognize and produce wh-information questions with correct final intonation pattern.

Communication skills (speaking)

Students will

3. State completed actions and events in a social conversation.

4. Ask for confirmation in a social conversation.

5. Give opinions about an event in a social conversation.

6. Produce language with contextually appropriate intonation, stress, and rhythm.

Reading skills (simple essay or story)

Students will

7. Recognize irregular past tense of selected verbs in a story or essay.

Writing skills (simple essay or story)

Students will

8. Write a one-paragraph story about a simple event in the past.

9. Use conjunctions so and because in a statement of opinion.

Selected objectives for a unit in a low- intermediate integrated- skills course

You may find, in reviewing the objectives of a unit or a course, that you cannot possibly test each one. You will then need to choose a possible subset of the objectives to test.

Drawing Up Test Specifications

Test specifictions for classroom use can be a simple and practical outline of your test. (for large-scale standarized tests that are intended to be widely distributed and therefore are broadly generalized, test specifications are much more formal and detailed). In the unit discussed above, your specifications will simply comprise (a) a broad outline of the test, (b) what skills you will test, and (c) what the items will look like. Let’s look at the first two in relation to the midterm unit assessment already referred to above.

(a) Outline of the test and (b) skills to be included. Because of the constraints of your curriculum, your unit test must take no more than 30 minutes. This is an integrated curriculum, so you need to test all four skills. Since you have the luxury of teaching a small class (only 12 students !), you decide to include an oral production component in the preceding period (taking students one by one into a separate room while the rest of the class reviews the unit individually and completes workbook exercises). You can therefore test oral production objectives directly at that time. You determine that the 30- minute test will be divided equally in time among listening, reading, and writing.

(cIitem types and tasks. The next and potentially more complex choices involve the item types and tasks to use in this test. It is suprising that there are a limited number of modes of eliciting responses (that is, prompting) and of responding on tests of any kind. Consider the options : the test prompt can be oral (student listens) or written (student reads), and the student can respond orally or in writing. It’s that simple. But some complexity is added when you realize that the types of prompts in each case vary widely, and within each response mode, of course, there are a number of options, all of which are depicted in Figure 3.1.

Elicitation mode : Oral (student listens) written (student reads)

Word, pair of words word, set of words

sentence (s), question sentence (s), question

directions directions

monologue, speech paragraph

pre-recorded conversation essay, excerpt

interactive (live) dialogue short story, book

Granted, not all of the response modes correspond to all of the elicitation modes. For example, it is unlikely that directions would be read aloud, nor would spelling a word be matched with a monologue. A modicum of intuition will elliminate these non sequiturs.

Armed with a number of elicitation and response formats, you have decided to design your specs as follows, based on the objectives stated earlier:

Speaking (5 minutes per person, previous day)

Format : oral interview, T and S

Task : T asks questions of S (objectives 3, 5; emphasis on 6)

Listening (10 minutes)

Format : T makes audiotape in advance, with one other voice on it

Tasks : a. 5 minimal pair items, multiple- choice (objective 1)

b. 5 interpretation items, multiple-choice (objective 2)

Reading (10 minutes)

Format : cloze test items (10 total) in a story line

Tasks : fill-in-the-blanks (objective 7)

Writing (10 minutes)

Format : prompt for a topic : why I liked/didn’t like a recent TV sitcom

Task : writing a short opinion paragraph (objective 9)

Test specifications

These informal, classroom-oriented specifications give you an indication of

· The topics (objectives ) you will cover,

· The implied elicitation and response formats for items,

· The number of items in each section, and

· The time to be allocated for each.

Notice that three of the six speaking objectives are not directly tested. This decision may be based on the time you devoted to these objectives, but more likely on the feasibility of testing that objective or simply on the finite number of minutes available to administer the test. Notice, too, that objectives 4 and 8are not assessed. Finally, notice that this unit was mainly focused on listening and speaking, yet 20 minutes of the 35-minute test is devoted to reading and writing tasks. is this an appropriate decision?

One more test spec that needs to be included as a plan for scoring and assigning native weight to each section and each tem within. This issue will be addressed later in this chapter when we look at scoring, grading, and feedback.

Revising Test Tasks

Your oral interview comes first, and so you draft questions to conform to the accepted pattern of oral interviews for information on constructing oral interviews). You begin and end with nonscored items (warm-up and wind-down) designed to set students at case, and then sandwich between them items intended to test the objective (level check) and a little beyond (probe)

A. Warm-up: questions and comments

B. Level-check questions, (objectives 3, 5, and 6)

1. Tell me about what you did last weekend.

2. Tell me about an interesting trip you took in the last year.

3. How did you like the TV show we saw this week?

C. Probe (objectives 5,6)

1. What is your opinion about ? (news event)

2. How do you feel about ? (another news event)

D. Wind- down : comments and reassurance

Oral interview format

You are now ready to draft other test items. To provide a sense of authenticity and interest, you have decided to conform your items to the context of a recent Tv sitcom that you used in class to illustrate certain discourse and form- focused factors. The sitcom depicted a loud, noisy party with lots of small talk. As you devise your test items, consider such factors as how students will perceive them (face validity), the extent to which authentic language and contexts are present, potential difficulty caused by cultural schemata, the length of the listening stimuli, how well a story line comes across, how things like the cloze testing format will work, and other practicalities.

Let’s say your first draft of items produces the following possibilities within each section:

Litening, part a. (sample item)

Directions: listen to the sentence (on the tape). Choose the sentence on your test page that is closest in meaning to the sentence you heard.

Voice : they sure made a mess at that party, didn’t they?

S reads: a. They didn’t make a mess, did they?

b. They did make a mess, didn’t they?

Listening, part b. (sample item)

Directions : listen to the question (on the tape). Choose the sentence on your test page that is the best answer to the question.

Voice : where did George go after the party last night?

S reads : a.yes, he did.

b. because he was tired.

c. to Elaine’s place for another party.

d. he went home around eleven o’clock.

Reading (sample items)

Directions: fill in the correct tense of the verb (in parentheses) that should go in each blank.

Then, In the middle of this loud party they (hear) the loudest thunder you have ever heard! And then right away lightning (strike) right outside their house!

Test items, first darft

Writing

Directions : write a paragraph about what you liked or didn’t like about one of the characters at the party in the TV sitcom we saw.

As you can see, these items are quite traditional. You might self-critically admit that the format of some of the items is contrived, thus lowering the level of authenticity. But the thematic format of the sections, the authentic language within each item, and the contextualization add face validity, interest, and some humor to what might otherwise be a mundane test. All four skills are represented, and the tasks are varied within the 30 minutes of the test.

In revising your draft, you will want to ask yourself some inportant questions:

1. Are the directions to each section absolutely clear?

2. Is there an example item for each section?

3. Does each item measure a specified objective?

4. Is each item stated in clear, simple language?

5. Does each multiple-item have appropriate distractors; that is, are the wrong items clearly wrong and yet sufficiently “aluring”that they aren’t ridiculously easy? (see below for a primer on creating effective distractors.)

6. Is the diffficulty of each item appropriate for your students?

7. Is the language of each item sufficiently authentic?

8. Do the sum of the items and the test as a whole adequately reflect the learning objectives?

In the current example that we have been analyzing, your revising process is likely to result in at least four changes or additions:

1. In both interview and writing sections, you recognize that a scoring rubric will be essential. For the interview, you decide to create a holistic scale, and for writing section you devise a simple analytic scale that captures only the objectives you have focus on.

2. In the interview questions, you realize that follow-up questions may be needed for students who give one-word or very short answers.

3. In the listening section, part b, you intend choice “c” as the correct answer, but you realize that choice “d” is also acceptable. You nedd an answer that is unambigiously incorrect. You shorten it “d”. Around eleven o’clock.” You also note that providing the prompts for this section on an audio recordingwill be logistically difficult, and so you opt to read these items to your students.

4. In the writing prompt, you can see how some students would not use the words so or because , which were in your objectives, so you reword the prompt:”name on of the characters at the party in the TV sitcom we saw. Then, use the word so at least once and the word because at least once to tell why you liked or didn’t like that person.”

Ideally, you would try out all your tests on students not in your class before actually administering the tests. But in our daily classroom teaching, the tryout phase is almost impossible. Alternativelly, you could enlist the aid of a colleague to look over your test. And so you must do what you can to bring to your students an instrument that is, to the best of your ability, practical, and reliable.

In the final revision of your test, imagine that you are a student taking the test. Go through each set of directions and all items slowly and deliberately. Time your-self. (often we underestimate the time students will need to complete a test). If the test should be shortened or lengthened, make the necessary adjustments. Make sure your test is neat and uncluttered on the page, reflecting all the care and precision you have put into its construction. If there is an audio component, as there is in our hypothetical test, make sure that the script is clear, that your voice and any other voices are clear, and that the audio equipment is in working order before starting the test.

Desigining Multipl-Choice Test Items

In the sample achievement test above,twoof the five compenents ( both of the listening sections ) specified a multiple – choice format for items.This was a bold stepto take.Multi-choice items,which may appear to be the simples kind of item to construct,are extremely difficult to design correctly.Hughes ( 2003,pp.76-78 ) cautions against a number of weaknesses of multiple-choice items :

· The technique test only recognition knowledge.

· Guessing may have a considerable effect on test scores.

· It is very difficult to write successful items.

· Washback may be facilitated.

The two priciples that stand out in support of multiple-choice formats are,of course,practicality and reliability.With their predetemined corrert responses and time-saving scoring procedures,multiple-choice items offer overworked teachers the tempting possibility of an easy and consistent process of scoring and grading.but is the preparation phase worth the effort ? Sometimes it is, but you might spend even more time designing such items than you save in grading the test.Of course,ifyour objective is to design a large-scale standardized test for repeated administrations,then a multiple-choice format does indeed become viable.

First, primer on terminology.

1. Multiple-choice item are all recevtive,or selective,or selective,response items in that the test-taker chooses from a set of reponses ( commonly called a supply type of response ) rather than creating a responses.Other receptive itemtypes include true-false questions and matching lists.( In the discussiaon here,the guidelines apply primarily to multiple-choice item types and not necessarily to other receptive types ).

2. Every multple-choice item has a stem,wich presents a stimulus,and several ( usually between theree and five ) options or alternatives to choose from.

3. ON of those options,the key,is the correct response,while the others serve as distractors.

Since there will be occasions when multiple-choice item are appropiate,consider the following four guidelines for designing multiple-choice item for both classroom-bassed and large-scale situations ( adapted from Gronlund,1998,pp.60-75, and J.D.Brown,1996,pp.54-57 ).

1. Design each item to measure a specific objective.

Voice : where did George goafter the party last night ?

S reads : a.yes,he did

b.because hewas tired.

c.to elaine’s place for another party.

d.around eleven o’clock.

The spesific objective being stasted here is comprehesion of wh questions.Distractor (a) is desgned toascertain that the student knows the difference between an answer to a wh-question and a yes/no question.Distractors (b)and (d),as well as the key item (c),test comprehesion of the meaning of where as opposed to why and when.The objective has been directly addreassed.

On the other hand,here is an item that was designed to test recognition of the correct word order of inderect questions.

Multiple-choice item,flawed

Exuse me,do you know_?

a. Where is the post office

b. Where the post office is

c. Whwre post office is

2. State both stem and options as simply and directly as possible.

We are sometimes tempted to make multiple-choice items too wordy.A good rule of thumb is to get directly to the point. Here’s an example.

Multiple-choice cloze item, flawed

My eyesight has really been deteriorating lately. I wonder if i need glasses. I think i’d better go to the to have my eyes checked.

a. Pediatrician

b. Dermatologist

c. Optometrist

It should be placed in the stem to keep the item as succint as possible.

Multiple-choice item, flawed

We went to visit the temples fascinating.

a. Which were beautiful

b. Which were especially

c. Which were holy

3. Make certain that the intended answer is clearly the only correct one.

In the proposed unit test described earlier ,the following item appread in the original draft:

Multiple-choice item,flawed

Voice :where did George go after the party last night?

S read :a.yes he did.

b. because he was tired.

:c.To Elaine’splace to another party.

d.He went home around eleven o’clock.

4. Use item indices to accept,discrd,or revise items.

The appropriate selection and arrangement of suitable mulyi[le-choice items on a test can best be accomplished by measuring items against three indices:item facility (or item difficulty),item discrimination (sometimes called item differentiation),and distractor analysis.Althought measuring these factors on classroom test would be useful,you probably will have neither the time nor the expertise to do his for everyclassroom test you create,especially one-time tests.But they are a must for standardized norm-refrenced test that are designed to be administered a number of times and/or administered in multiple forms.

1. Item facility (or IF) is the extent to wich an item is easy or difficult for the proposed group of test-takers.you may wonder why that is important if in your estimation the item achieves validity.The answer is that an item that is too easy ( say 99 percent of respondents get it right) or too difficult ( 99 percent get it wrong) really does nothing to separate high-ability and low-ability test-takers.It is really performing much “work”for you on a test.

If simply reflects the percentage of students answering the item correctly.the formula looks like this:

IF= # of Ss answering the item correctly

Total # of Ss responding to that item

For example ,if you have an item on which 13 out of 20 students respond correctly,your IF index is 13 divided by 20 or.65 (65 percent).

2. Item distrimination (ID) is the extend to wich an item differentiates between high-and low-ability test takers.An item an onwich high-ability students (who did well in the test) and low-ability students (who didn’t) scoreequally well would have poor ID because it did not discriminate between yhe two groups.conversely,an item that garners correct responses from most of the high-ability group and incorrect responses from most of the low-ability group has good discrimination power.

Suppose your class of 30 students has taken a test.Once you have calculated final scores for all 30 students,divide them roughly into thirds-that is,create three rank-ordered ability groups including the top 10 scores,the middle 10,and the lowest 10.To find out which of your 50 0r so test items were most”powerful” in discriminating between high and low ability,eliminate the middle group,leaving two groups with results that might look something like this on a particular item:

Item #23

High-ability Ss (top 10)

Low-ability Ss (bottom 10)

#Correct

#Incorrect

Using the ID formula (7 – 2 = 5 ÷ 10 = 50), you would find that this item has an ID of .50, or a moderate level.

The formula for calculating ID is

ID = high group # correct – low group #correct = 7 – 2 = 5 = .50

1/2 X total of your two comparison Groups 1/2 X 20 10

The result of this example item tells you that the item has a moderate level of ID. High discriminating power would approach a perfect 1.0, and no discriminating power at all would zero. In most cases, you would want to discard an item that scored near zero. As with IF, no absolute rule governs the establishment of acceptable and unacceptable ID indicies.

One clear, partical use for ID indices is to select items from a test bank that includes more items than you need. You might decide to discard or improve some item with lower ID because you know they won’t be as powerful an indicator of success on your test.

For most teacher who are using multiple-choice items to create a classroom- based unit test, juggling IF and ID indices is more a matter of intuition and “art” than a science. Your best calculated hunches may provide sufficient support for retaining, revising, discarding proposed items. But if you are constructing a large-scale test, or one that will be administered multiple times, these indices are important factors in creating test forms that are comparable in difficulty. By engaging in a sophisticated procedureusing what is called item response theory (IRT), professional test designers can produce test forms whose equated test scores are reliable measure of performance. (For more information on IRT, see Bachman, 1990, pp. 202-209.)

3. Distractor efficiency is one more important measure of multiple-choce item’s value in a test and one that is related to item discrimination. The efficiency of distectors is the extent to which (a) the distracters “lure” a sufficient number of test takers, especially lower-ability ones, and (b) those responses are somewhat evenly distributed across all distracters. Those of you who have a fear of mathematical formulas will be happy to read that there is no formula for calculating efficiency and that an inspection of a distribution of responses will usually yield the information you need.

Consider the following. The same item (#23) used above is a multiple-choice item with five choices, and responses across upper- and lower-ability students are distributed as follows:

Choice

High ability Ss (10)

Low-ability Ss (10)

*Note: C is the correct response.

No mathematical formula is needed to tell you that this item successfully attracts seven of the ten high-ability students toward the correct response, while only two of the low-ability students get this one right. As shown above, its ID is .50, which is acceptable, but the might be improved in two ways: (a) Distractor D doesn’t fool anyone. No one picked it, and therefore it probably has no utility. A revision might provide a distractor that actually attracts a response or two. (b) Distractor E attracts more responses (2) from the high-ability group than the low-ability group (0). Why are good students choosing this one? Perhaps it includes a subtle reference that entices the high group but is “over the head” of the low group, and therefore the latter students don’t even consider it.

The other two distracters (A and B ) seem to be fulfilling their function of attracting some attention from lower-ability students.

SCORING, GRANDING, AND GIVING FEEDBACK

Scoring

As you design a classroom test, you must consider how the test will be scored and graded. Your scoring plan reflects the relative weight that you place on each section and items in each section. The integrated-skills class that we have been using as an example focuses on listening and speaking skills with some attention to reading and writing. Three of your nine objectives target reading and writing skills. How do you asign scoring to the various components of this test?

Because oral production is a driving force in your overall objectives, you decide to place more weight on the speaking (oral interview) section that on the other three sections. Five minutes is actually a long time to spend in a one-on-one situation with a student, and some significant information can be extracted from such a session. You therefore designate 40 percent of the grade to the oral interview . you consider the listening and reading section to be equally important, but each of them, especially in this multiple-choice format, is of less consequence than the oral interview. So you give each of them a 20 percent weight. That leaves 20 percent for the writing section, which seems about right to you given the time and focus on writing in this unit of the course.

Your next task is to assign scoring for each item. This may take a little numerical common sense, but it doesn’t require a degree in manth. To make matters simple, you decide to have a 100-point test in which.

▪ the listening and reading items are each worth 2 points.

▪ the oral interview will yield four scores ranging from 5 to 1, reflecting fluency, prosodic features, acurary of the target grammatical objectives, and discourse appropriateness. To weight these scores appropriately, you will double each individual score and then add them together for a possible total score of 40.

▪ the writing sample has two scores: one for grammar/mechanics (including the correct use of so and because) and one for overall effectiveness of the message, each ranging from 5 to 1. Again, to achieve the correct weight for writing, you wil double each score and add them, so the possible total is 20 points.

Here are your decisions your test:

	Percent of Total Grade	Possible Total Correct
Oral Interview Listening Reading Writing total	40% 20% 20% 20%	4 scores, 5 to 1 range X 2 = 40 10 items @ 2 points each = 20 10 items @ 2 points each = 20 2 scores, 5 to range X 2 = 20
			100

At this point you may wonder if the interview should carry less weight or the written essay more, but your intuition tells you that these weight are plausible representations of the relative emphases in this unit of the course .

After administering the test once, you may decide to shift some of these weights or to make other changes. You will then have valuable information about how easy of difficult the test was, about whether the time limit was reasonable, about your students’ affective reaction to it, and about their general performance. Finally, you will have an intuitive judgement about whether this test correctly assessed your students. Take note of these impressions, however nonemprical they may be, and use them for revising the test in another term.

Grading

Your first thought might be that assigning grades to student performance on this test would be easy; just give an “A” for 90-100 percent, a “B” for 80-90 percent, and so on. Not so fats! Grading is such a thorny issue that all Chapter 11 is devocated to the topic. How you assign letter grades to this test is a product of.

• the country, culture, and context of this English classroom,

• institutional, expectations (most of them unwritten),

• explicit and implicit definitions of grades that you have set forth,

• the relationship you have establish with this class, and

• stundent expectations that have been engendered in previous test and quizzes in this classs.

For the time being, then, we will set aside issues that deal with grading this test in particular, in favor of the comprehensive treatment of grading.

Giving Feedback

A section on scoring and grading would not be complete without some consideration of the forms in which you will offer feedback to your students, feedback that you want to become beneficial washback. In the example test that we have been referring to here-which is not unusual in the universe of possible formats for preodic classroom tests-consider the multitude of options. You might choose to return the test to the student with one of, or a combination of, any the possibilities below:

1. a letter grade

2. a total score

3. four subscore (speaking, listening, reading, writing)

4. for the listening and reading section

a. an indication of correct/incorrect responses

b. marginal comments

5. for the oral interview

a. scores for each element being rated

b. a checklist of areas needing work

c. oral feedback after the interview

d. a post-interview conference to go over the results

6. on the essay

a. scores for each element being rated

b. a checklist of areas needing work

c. marginal and end-of-essay comments, suggestions

d. a post-test conference to go over work

e. a self-assessment

7. on all or selected parts of the test, peer checking of results

8. a whole-class discussion of result of the test

9. individual conferences with each student to review the whole test

obviously, options 1 and 2 give virtually no feedback. They offer the student only a modest sense of where that student stands and vague idea of overall performance, but the feedback they present does not become washback. Washback is achieve when students can, through the testing experience, identify their areas of success and challenge. When a test becomes a learning experience, it achieves washback.

Option 3 given a student a chance to see the relative strength of each skill area and so becomes minimally useful. Option 4,5, and 6 represent the kind of response a teacher can give (including stimulating a student self-assessment) that approaches maximum washback. Students are provided with individualized feedback that has good potential for “washing back” into their subsequent performance. Of course, time and the logistics of large classes may not permit 5d and 6d, which for many teachers may be going above and beyond expectations for a test like this. Likewise option 9 may be impractical. Option 6 and 7, however, are creatly viable possibilities that solve some of the practicality issues that are so important in teachers’ busy schedules.

CHAPTER II

CLOSING

A. Summary

There are five kinds of test types: Language aptitude tests, proficiency tests, placement tests, diagnostic tests, and achievement tests. Every test must be a wonderfully innovative instrument that will garner the accolades of the colleagues and the admiration of the students.

In the test, we have some practical steps to test construction, they are: assessing clear and unambiguous objectives, drawing up test specifications, devising test tasks, and designing multiple-choice test items.

Evaluation can fulfill two functions: assessment and feedback. Assessment is a matter of measuring what the learners already know. Any assessment should also provide positive feedback to inform teachers and learners about what is still not known, thus providing important input to the content and methods of future works.

REFERENCE

Brown, H. Douglas. (2004). LANGUAGE ASSESSMENT:principlesand Clasroom Practices. Pearson Education: NewYork.

http://febbyeni.blogspot.co.id/2013/06/designing-classroom-language-test.html

Designing Classroom Language Tests

Rabu, 02 November 2016

Author : Unknown Comments : 0

Tag :

Navigation

Language Teaching Evaluation

bulshit

Popular Post

Popular Posts

Recent post

Archive for November 2016

Assessing Writing

Guiding Principles for Assessment

Applications to Assessment Settings

Assessment in the Classroom

Assessment for Placement

Assessment of Proficiency

Assessment of Programs

Assessment for School Admission

assessing speaking and writing

ASSESSING LISTENING

Assesing Listening (group 5)

Standardized Tests: Advantages

Standardized Tests: Disadvantages

STANDARDIZED TESTING (group 4)

tugas ict

Designing Classroom Language Tests

Y N W A

why so serious

shits

Weekly most viewed

Blogroll

yuk kepoin saya !

Arsip Blog