Which psychometric tests are trustworthy?

What makes a good test?

With literally thousands of psychometric instruments on the market, how can you tell which ones are worth considering? Not all psychometric tests are created equal. Many of the instruments out there seem to be backed up mainly by myth and personal anecdotes, but they have never been critically evaluated in a methodical, scientific way. Besides its cost and lookandfeel, what else should you look for in order to know if a psychometric test will work as it is supposed to?


For one thing, you can tell that a test is of high quality if its measurements are reliable. Suppose one of your applicants takes a test, mysteriously contracts short-term memory loss, and then takes the test again, administered by one of your colleagues. The scores obtained for both test sittings should not differ too much; high-quality tests deliver similar scores, regardless of when they were administered, or by whom. Only then you can be fairly confident that the scores you get are accurate. 

Unstructured interviews, for example, are not always reliable. Interviewers, even with the best intentions, possess unconscious biases towards candidates and tend to adapt their questions and behaviour to the person who is sitting in front of them. Unreliability is not limited to interviews alone; it also applies to many other popular types of tests, such as those that measuring 'personality types'. Type-tests are fine for workshops and communication classes, but they are unfit for high-stakes hiring decisions, as research has shown that the majority of test takers achieve a different personality type when re-tested after only five weeks!

Our advice: carefully examine the reliability section of a test's manual before subjecting any applicant to that test. When a test is proven to have favourable reliability statistics, you can be confident that you are measuring ‘something’ with satisfactory accuracy. Yes, ‘something’, since you do not yet know what exactly it is you are measuring. This can be found out only through validity research.



If a test report says things about you that really fit with what you think about yourself, then it must be valid; right? Well, not exactly. To create a positive candidate experience, it is important that candidates feel a test’s feedback is accurate, but this does not necessarily mean that the test is valid in the scientific sense of the word. Surprisingly, people indiscriminately tend to rate statements as being accurate for them personally, even when those statements could apply to anyone. This has been labelled the 'Forer effect' and it happens because human beings tend to try to find meaning where there is none, especially when relating (positive) information to themselves. The Forer effect can explain, at least in part, why so many people believe in pseudo sciences like astrology, fortune telling, and graphology. But these are not good grounds on which to base your hiring decisions.

To find scientific evidence for validity, different kinds of technical checks can be carried out to verify whether the test actually measures what it claims to measure. For example, the scales should correlate logically with other instruments that measure either similar or different constructs. Another aspect that should be investigated is the extent to which the average test scores for relevant groups differ from each other. For example, it is logical for people in administrative jobs to score lower on a “need for impact” scale than, say, senior level managers. Likewise, people with higher educational degrees should, on average, also score higher on cognitive capacity tests. 

An important claim made by most vendors is that their test will help the purchaser select 'better' employees, resulting in better performance. In order for this to be true, the test scores should correlate with job performance ratings. These correlations are considered proof of 'criterion validity'.

About the author

Amélie Vrijdags | Senior Consultant | Expert Psychologist

