![]() |
More Validity |
|
Quick Links >>> |
||
|
|
More ValidityContent validityA test with content validity (sample-population representativeness) tests the important skills and knowledge in a subject area (content domain.) It tests a representative sample of skills and knowledge over a reasonable range of levels. In other words, it tests higher-order thinking and understanding as well as recall of simple facts. To ensure content validity, you need experts to describe the necessary knowledge and skills and identify the important learning objectives. Then you write questions related to each learning objective. Before the test is administered or marked, you get the experts to provide solutions and then you prepare a marking scheme for the markers. You need to ensure the test is long enough, with at least 9 questions that adequately represent the content domain. If you have too few questions, then some people can pass by guessing, or having only a few of the necessary skills and a little knowledge while others, with more skills and knowledge fail.Predictive validityIf you are going to use a test for selection (for employment, promotion or training) then you need some basis for believing that it has predictive validity. You need to provide grounds for presuming that high scores on the test relate to good job performance later and that lower scores relate to poorer performance later. You check for predictive validity by giving the test to a group with competencies ranging from none to expert. Then you check that the scores on the test correlate with (match) the degree of expertise of the subjects.Concurrent validityA test has concurrent validity if it gives similar results to other valid measures of the same competencies. For example, you would expect that a test used to measure employees’ competencies would generally accord with their peers’ and managers’ estimates of competencies.Convergent validityA test with convergent validity does not discriminate on the basis of some irrelevant factor. For example, there should be no correlation between test results and gender, ethnicity, political affiliation, hair colour, religion, shoe colour etc. Non-convergent tests are a very expensive and time-consuming way of finding out something you probably already know (eg your gender) while obscuring the information you really want. All joking aside, i is important that test questions do not assume competence in some irrelevant skill. Carpenters, for example, do not need to write essays, so an essay or extended-answer question is arguably inappropriate in assessing competencies in wood working.Discriminant validityA test should allow you to distinguish between trainees on the basis of relevant characteristics. If you are using the test to discover gaps and weaknesses in skills and knowledge, the wider the range of possible scores, the better. In this case, no one should get 100% or 0%, since neither of these scores allows for an accurate estimate of a trainee’s competence.Inter-rater reliabilityA test should be reliably marked. Any two people marking the same test paper should give the same marks for each question. This kind of reliability is seldom a problem with wellconstructed computer-marked tests, but can be a real problem with tests involving essays, assignments, projects or extended answer questions. There are several ways to deal with this problem when the test papers come in simultaneously. Ideally, each paper should be double-marked (i.e. by two markers, neither of whom knows the mark given by the other.) Where the marks differ by more than 5%, the papers go to an expert referee. A less timeconsuming method involves double-marking only a sample of the papers.Temporal stabilityAssessment standards should not change over time. Once again, this is easier to ensure where the test involves computer-marked questions randomly selected from a large computer bank. It is much more difficult to ensure where the test or assignment questions are changed each year to discourage cheating. Form equivalenceTwo similar tests consisting of items drawn at random from a computer bank should be of equivalent standard. The questions in the bank should be classified according to the order of thinking involved and also the content area. The same rule should be used for selecting both sets of questions from the bank.Internal consistencyTests should be internally consistent: similar questions should elicit similar responses from the same trainee. If trainees all get a supposedly easy question wrong, then there is a problem with internal consistency.
|
|