How to Choose a Robust
Psychometric Assessment

10 defining characteristics of
high-quality personality and aptitude tests

What to Look for in Valuable and Robust Psychometric Assessments

When faced with the many different psychometric assessment options it can be difficult to distinguish which would be best for your candidates and employees. It can also be difficult to decide which will provide the information you need and deliver the most value for your organization.

This article outlines the 10 criteria for evaluating the quality of a psychometric instrument, as well as the additional criteria to be considered when selecting online assessments. It also includes a free checklist you can download to help you choose the most effective psychometric assessments for your needs.

This is important because you need to ensure that any psychometric instrument, whether delivered online or offline, offers the accuracy, value and fairness needed to inform your talent decisions. Professional bodies across the world and organizations, such as the International Test Commission and the European Federation of Psychologists’ Associations , have developed and defined consistent standards for psychometric assessments. Typically, they set out minimum standards a tool must reach, as well as how these tools can be used and by whom.
(ITC: www.intestcom.org)
(EFPA: www.efpa.eu)

Looking for the right Psychometric Assessment?
Feel free to contact us and learn more about Aon's Psychometric Tools.

1. Objectivity

Objectivity refers to the extent to which the test or questionnaire results are independent of the person scoring the assessment.

There are three ways in which objectivity is portrayed in an assessment and these relate to the different stages of testing:

  • Standardized Administration
    The assessment must be administered using standardized instructions and a regulated testing situation.
  • Standardized Scoring
    The marker of the assessment should not be able to influence the result or score allocated, or alter the algorithm for scoring.
  • Standardized Interpretation
    The same conclusions should always be drawn from a particular set of results; ideally, the score should be compared with the results of a norm group.
Icon Digital Interview Quick Set Up

A test can only be interpreted unambigously if objectivity in all three stages is assured.

2. Reliability

Reliability or accuracy, refers to how consistently a psychometric tool measures what it is intended to measure. In other words, does it measure the same aspect consistently again and
again? Are the results dependable?

A reliable instrument is robust against interference and yet, at the same time, sensitive to underlying differences in characteristic values. There are different ways of defining reliability, including:

  • Re-test Reliability
    This is calculated when the results from the first application of the test are correlated with the results of a second application.
  • Alternate Form Reliability
    This is calculated when two versions of the same test are completed by the same people and their scores are compared.
  • Internal Consistency
    Internal consistency (Cronbach’s alpha) measures the correlations between different items of the assessment. It shows whether the items intended to measure the same general construct result in similar scores.

A perfectly reliable test provides a correlation co-efficient of 1.00. There are no perfectly reliable instruments, however only reliability values over 0.7 should be accepted. For questionnaires, the ideal value lies between 0.75 and 0.85. For ability tests, it is between 0.8 and 0.9.
 
Icon Digital Interview Quick Set Up

A reliable instrument is robust against interference and sensitive to underlying differences in characteristic values.

3. Validity

Validity refers to how well the assessment measures what it is intended to measure. To what degree of certainty can we draw conclusions about how someone will perform on the job based on how well they perform on the assessment?

There are different types of validity, including:

Criterion-related Validity

The criterion is the reason for assessing someone or the purpose of the test. Criterion-related validity measures the correlation or statistical relationship between test results and job performance. Are those who score high on the assessment more likely to succeed on the job?

Concurrent Validity

Concurrent validity refers to the extent that the assessment results correspond to the results of existing assessments or tests that measure the same constructs and have already been found to be valid.

Predictive Validity

Predictive validity is the extent to which a score on a psychometric scale or test predicts scores on some criterion measure. For example, you may look at the predictive validity of a cognitive test for job performance and this is calculated as the correlation between test scores and, for example, supervisor performance ratings. With a high level of predictive validity, it is possible to make predictions based on test results.

Construct Validity

Construct validity refers to the extent to which the instrument measures what it sets out to measure. It looks at the test results and how these are related to the results of other tests which are seen to be valid indicators of the construct, such as external ratings, behavior measurements or experimental results. The construct validation does not end in a validity co-efficient but provides an overall picture of validity. In theory, a test should correlate higher with a test that measures approximately the same thing than with a test that measures something else.

Face Validity

This considers the extent to which the assessment and its content appears to cover what is intended to be measured and is perceived as doing so by test takers and stakeholders.

Internal and External Validity

In addition to the other areas of validity, internal and external validity can be identified. An instrument has internal validity if its results can be interpreted. Internal validity decreases as uncontrolled, interfering variables increase. Moreover, an instrument is externally valid if its results can be generalized across different situations. This means that external validity is heavily dependent on the genuine nature of the testing situation and the representativeness of the tested sample.

Icon Digital Interview Quick Set Up

A valid ability test provides a measure for the general ability of a person and, therefore, helps predict performance in situations in which general ability is important.

4. Scaling

An underlying scale on which people will be measured must be defined and established. The test scores that result from the scoring algorithms should adequately represent the empirical relations of characteristics.

In ability tests, this requires that the more effective test candidate receives a better test score than the less effective one; that means that the relation of performance is reflected in the test scores. The practicability of this criterion is dependent on the measurement criterion.

Icon Digital Interview Quick Set Up

A good psychometric test will see the more effective candidate achieve a better score than the less effective candidate.

5. Assessment Implementation Costs

This refers to the time and costs needed to acquire, license or access online or offline assessment systems, test materials and any other scoring or interpretation costs. To justify the use of a psychometric assessment, you can develop a business case as the value of the information obtained should always be greater than the actual costs.

Cost-Validity Ratio of Different Psychometric Tools

We have seen that a perfectly valid instrument would provide a validity co-efficient of 1.00. As with reliability, this value is utopian in reality. To confidently assess the performance of a candidate, good diagnostic instruments should have a value of at least 0.3. Figure 1 shows the cost-validity ratio of different psychometric tools.

The most valuable psychometric instruments have a good cost-validity ratio and tend to be instruments which are seen as reasonable and ‘high validity’. Ability tests and personality questionnaires are widely used as they are straightforward to implement with many candidates and tend to have high validity. Probation periods and assessment centers are also highly valid, but are typically more costly to implement. Alternatively, graphology is expensive and yet provides little practical benefit as the validity values are close to zero.


 
Icon Digital Interview Quick Set Up

The most valuable psychometric instruments have a good cost-validity ratio and tend to be instruments which are seen as reasonable and ‘high validity’.

6. Standardization and Norms

Psychometric instruments are beneficial and meaningful only if they are standardized. When an instrument is applied to all those within a defined sample under comparable conditions, this process is called standardization (adjustment). The data collected from a representative sample of those who have completed the assessment under the same standard conditions enables the calculation of norms.

Norms are statistically comparable values that enable you to compare a person’s specific individual test score with the test results of other people from a defined (norm) group. This helps you classify and interpret an individual score.
The defined group against which the individual scores are compared should share the important attributes with the tested individuals (e.g. age, education level or job role). Reputable psychometric instruments will have many robust norm groups available.

Standardization Process

The following elements are important during the standardization of a psychometric tool:
  • The definition of the sample on which the test is standardized and adjusted. The norm sample has to be representative of the population with which the test will be applied.
  • Securing the test objectivity during standardization. Typically, test administrators will be trained to ensure that the testing process is standardized and repeatable.
  • The age of the test norms is critical for the correct interpretation of norm or percentage values. Norms which were collected more than 10 years ago should be considered out of date.


Standardization Techniques

Standardization techniques normally refer to the distance between the individual test score and the mean of the specific norm sample and express the difference in units of standard deviation of the distribution.

Norms which are based on standard values, including IQ-scores, Z-scores, T-scores, centile scores, stanine scores and percentile scores, are widely accepted and used within occupational psychometric assessment.

Icon Digital Interview Quick Set Up

Reputable psychometric instruments will have many robust norm groups available.

7. The Utility of the
Psychometric Instrument

When choosing a psychometric tool for selection or development, the tool must have use and value in that process.

An assessment is useful if what is being measured is practically relevant and if the decisions that are based on it provide value. The aim of any selection instrument is to help you appoint best-fit candidates and reject unsuitable ones. The proportion of suitable candidates appointed and the proportion of unsuitable applicants who are rejected should be high.

The success ratio of a diagnostic instrument describes the relationship between ‘suitable candidates appointed’ (SA) and the total of ‘suitable and unsuitable candidates appointed’
(SA + UA).

To assess the usefulness or utility of a psychometric tool, the following factors play an important role:

  1. The effect of validity
  2. The effect of selection ratio
  3. The effect of base rate
  4. Costs of a wrong decision


Effect of Validity

Assuming selection ratio and base rate are constant and two instruments with different validities are compared, the instrument with the higher validity will have the higher success ratio (see Figure 4).
The higher the validity, the narrower the scatter plot and the higher the proportion of suitable candidates appointed to all candidates appointed.


Effect of Base Rate

If the validity and selection ratio are held constant and the base rate changes, this has an additional impact on the success ratio.
If the base rate is increased (i.e. there are more suitable applicants in the population), the relationship between ‘suitable candidates appointed’ and ‘all candidates appointed’ clearly improves. This is shown by comparing Figure 4 with Figure 6.
However, a high base rate may raise the question as to the usefulness of a psychometric instrument, as the probability to select a suitable applicant is relatively high (even with a random selection).


Effect of Selection Ratio

If the validity and base rate are constant and the selection ratio changes, the success ratio also changes. If the selection ratio is minimized, the success ratio improves as well (see Figure 6). When using assessments, a low selection ratio is often chosen as normally only a small proportion of the applicants will be appointed.
The practical benefit of diagnostic instruments is particularly high if validity is high, base rate is low (i.e. the job requirements are high and the number of unsuitable candidates is low) and if, additionally, the selection ratio is low.

Cost of a Wrong Hire

If the cost of a wrong hire is extremely high, the appropriate psychometric tool used to select applicants for this position should be used in any case, as the costs for using such instruments are usually much lower than the costs of a wrong hire. If the costs of a wrong hire are low, the cost-value ratio of a diagnostic instrument has to be weighed.


 

8+9. The Acceptability of the Psychometric Tool & Inability to Distort Test Scores

The Acceptability of the Psychometric Tool

A test is regarded as acceptable and reasonable if it does not apply undue stress to the test taker regarding their emotional and physical well-being.

Inability to Distort Test Scores

If a psychometric instrument is constructed in a way that the test taker cannot control and falsify the concrete values of test scores by targeted testing behavior, the test cannot be faked.
High face validity can make test scores more capable of being faked or influenced as it is easier to identify what the assessment is looking to measure. Therefore, the applicant may try to influence their test results accordingly. Personality questionnaires are more susceptible to such distortions than ability tests.

Icon Digital Interview Quick Set Up

How you treat and respect your candidates during the assessment process will impact your employer brand.

10. Transparency and Fairness

Another essential factor that should help determine which assessment instrument is how the candidate is treated before, during and after the test.

Transparency

A good diagnostic instrument should provide an appropriate degree of transparency. Before the test, understandable instructions explaining the test and the administration process should be provided. Practice items explaining the purpose of the test or questionnaire should also be available. Additionally, in the case of offline administration, a trained test administrator should explain the system.

Importance of Feedback

Best practice dictates that candidates should receive appropriate feedback about the results of each instrument they complete. The minimum feedback level is a short written feedback report. The best feedback is personal and provided by a trained person and combined with a detailed feedback report.

Fairness

It is vital that test scores are not discriminatory against any group based on ethnic, socio-cultural, gender or other reasons. Accessibility guidelines state that everyone should have the same opportunities to complete psychometric tools and this means that candidates with disabilities should be given the opportunity, according to their abilities, to carry out an assessment – even if this requires additional support.

All applicants must be treated equally and fairly.
Instruments have to substantiate that they do not discriminate against any group of people.


 

Additional Criteria

Online and Mobile Psychometric Assessments

Psychometric tools administered online – whether via a computer, tablet or smartphone – bring additional challenges. As well as the general criteria outlined previously, there are specific criteria which need to be taken into account with online assessments.

Guidelines as to what makes a good online psychometric tool are provided by the International Test Commission (ITC). The following criteria have been taken from the ‘International Guidelines on Computer-Based and Internet-Delivered Testing’.

  • Technology
  • Assured quality
  • Control
  • Security

Technology

This emphasizes the need for consideration of the technical aspects of computer-based, Internet or mobile testing, especially in relation to the hardware and software required to run the assessment.

The following aspects can be included:
  • Think about the technology (hardware: smartphone, tablet, screen size, processor, graphics card, monitor etc and the software) for computer-based or online assessment both at the test provider and test user sites.
  • Make sure the online assessment is secure. That means that the test should be relatively independent of Internet connections and should be stable.
  • Ensure the test is user-friendly. Consider the human factor issues in the presentation of instructions and test items via computer, tablet or smartphone.
  • Adjust for those with disabilities. Consider reasonable adjustments to the technical features of the assessment for those candidates with disabilities (e.g. using support devices).
  • Provide help, information and practice items within the online environment.
  • Ensure the online assessment is hardware independent.
  • Look at how the items appear onscreen, regardless of the size or orientation of the screen. Many online instruments were optimized for a certain computer system (certain conditions: screen resolution, proportions) and, when viewed on other computers, appear blurred or grainy. This can be avoided by vectorizing item material to optimally adapt to monitor conditions and therefore mitigating any effects due to the hardware and Internet connection used.
  • Integrate seamlessly with other HRIS. An online assessment tool needs to be plug-in ready so that seamless integration into existing application tracking, assessment or HR systems (e.g. SAP, Peoplesoft, Oracle) is possible. This guarantees that all participant data stays within the enterprise system and, therefore, ensures the security of the data.

Quality Assured

This refers to the assurance of the quality of assessment materials and ensuring high standards throughout the testing process.
 
  • Ensuring the knowledge, competence and appropriate use of online testing by the provider and user.
  • Considering the psychometric qualities of the assessment to be used.
  • Scoring and analyzing online assessment results accurately (e.g. by defined scoring algorithms).
  • Interpreting results appropriately. Providing appropriate feedback.
  • Considering equality of access for all groups.

Controlled

This refers to the control of the delivery of tests, candidate and test taker authentication and enabling previous test practice.
 
  • Detailing the level of control over the test conditions.
  • Detailing the appropriate control over the performance of the testing, if required (open or supervised administration).
  • Giving due consideration to controlling prior practice/time of tutorial.
  • Giving due consideration to controlling the exposure of test items.
  • Controlling the test taker’s authenticity.
  • Ensuring the prevention of cheating (copying of answers, assistance).

Secure

Security of an online assessment is crucial to protect the integrity of the testing materials, the privacy of the test taker and the data to guarantee confidentiality.
 
  • Taking account of the security of the editors’ and authors’ intellectual property.
  • Ensuring the security of test materials.
  • Ensuring the security of each test taker’s data transferred over the Internet.
  • Maintaining the confidentiality and security of test takers’ results, e.g. by permitting access only to authorized individuals.

 

Conclusion

In summary, Internet-based instruments should be self-explanatory, forgery-proof, hardware independent, plug-in ready and accessible.

Self-explanatory

When online tools are administered openly with no supervisor present, the instructions need to be clear and self-explanatory. Interactive examples and feedback should be included and explanations of the examples given.

Fake-proof

The test needs to minimize the opportunity for the items being copied or shared with others – this makes the items no longer secure and capable of accurate measurement. In online assessment, this is ensured by generating the test dynamically as the test is run. Using this approach, it is almost impossible for exactly the same test to be generated twice.

Accessible

Every online psychometric assessment should be accessible for candidates with disabilities. This may mean that there needs to be other input devices besides a mouse (e.g. touch screens, touch pads, keyboard, specific assistance for those with physical disabilities) or reading assistance support through, for example, a magnifier.

Icon Digital Interview Quick Set Up

Interactive examples and feedback should be included and explanations of the examples given.

Looking for the right Psychometric Assessment?
Download checklist to help you gauge the quality of psychometric tools

References

  • Bartram, D. (2000). Internet Recruitment and Selection: Kissing Frogs to Find Princes. International Journal of Selection and Assessment, 8 (4), 261–274.
  • Bortz, J. (2005). Statistik für Human- und Sozialwissenschaftler (6. Aufl.). Heidelberg: Springer.
  • ITC (2005). International Guidelines on Computer-Based and Internet-Delivered Testing.
  • Kersting, M. (2006). DIN SCREEN – Leitfaden zur Kontrolle und Optimierung der Qualität von Verfahren und deren Einsatz bei beruflichen Eignungsbeurteilungen.
    Lengerich: Pabst Science Publisher.
  • Naglieri, J. A.; Drasgow, F.; Schmit, M.; Handler, L.; Prifitera, A.; Margolis, A.; Velasquez, R. (2004). Psychological Testing on the Internet: New Problems, Old Issues.
    American Psychologist, 59 (3), 150–162.
  • Zimbardo, P. G. (1995). Psychologie (6. Aufl.). Berlin, Heidelberg: Springer.