Psychologist Vincent Wong carried out an analysis of psychometric tests in use across Asia. In this analysis, more than 40 tests were reviewed which involved no less than 20 test developers. There were several focuses in the analysis which included practical information of the tests (information such as price and practical design issues), construct of the test, report design, technical details and training requirements.
There exists a wide pricing range among tests developed by different test developers. In the lower end of the continuum one test provider provides tests for free in their entire product range and a section of the chargeable report will be produced. Obviously for user to obtain useful information they have to pay for the full report and this is certainly a marketing strategy. However in the perspective of psychometric this practice serious harm the integrity of the test as anybody can get access to the tests for unlimited number of times. Therefore it can only been seen as tests for people who are interested in trying out tests, rather than being usable in organizational settings. For more protected tests, prices range from USD$10 to more than USD$120 with some of the providers charge per usage while the others charge for subscription fee as well (usually paid annually).
In this analysis, several design dimensions of the test were considered and they were the split between ipsative and normative measures, the type of scales that were employed, and other practical issue like medium of test administration.
The majority of the personality assessment tools (over 80%) employ normative measures (the type of psychometric tools that compare the respondent with a group of similar others, or the norm group) while the remaining ones employ an ipsative style (the type of psychometric tools that determine the preference among different personality traits within the respondent). Two exceptional case was identified which employs a mixed style, i.e. normative plus ipsative. The reason behind the popularity of normative style might down to the fact that for tests that were designed for selection purpose normative style was the better style to go with as it actually compare the respondents with the others. On the other hand ipsative measures can provide us with better knowledge about the preference or strength within the respondents. In line with this we found that most of the ipsative tests were preference or value tests which were designed for coaching or counselling purposes, although some ipsative measures that were designed for selection purposes were also identified. For the only tests that incorporated both normative and ipsative styles, the underlying connotation of the difference between normative and ipsative scales were utilized and it represented the discrepancy between the real and ideal self of the respondents.
The type of scale used by the tests is actually a function of whether they are ipsative or normative tests. For normative test the most popular scale type used was 5-point Likert Scale (Likert Scale is the type of scale that respondents choose among several options for the one that represent their thought most). 7-point scale was also quite common and there were a few occurrences of 3-point and 9-point scales. Other than using Likert scales, a few normative tests employed true or false scale. For ipsative tests force-choice scale was employed. One of the more popular version of ipsative scales asked the respondents to pick the option that describes them the best (usually termed as ‘most like me’) as well as option that describes them the worst (usually termed as ‘least like me’). Another appearing version of ipsative scale asked the respondents to put the available options into order, although this version was very uncommon.
Most of the surveyed tests, if not all, were designed for completing on computerized environment. While some of the tests can be administered online in an unsupervised manner, there were quite a few that required supervised administration. Whereas there were few test that provided different versions for supervised and unsupervised administration. Having more than one version allowed the result to be checked in a supervised manner after the candidates had passed the unsupervised session. Paper and pencil version of the tests were usually available with similar price of the computerized version although there were a few tests that did not provide paper and pencil version.
Although all the surveyed tests were not designed to be completed in a designated time, timer was identified in one test and it served the function of checking against random or thoughtful responses.
Among the different attributes, personality was the most popular one being measured. The majority of the personality measurements were built on the Big Five model of personality identified by Costa and McCrae (1985). While some of them retained the original five factors within the tests, about half of the surveyed tests restructured the factor compositions based on the result of the factor analysis or other theoretical support, for example one test split the factor of conscientiousness into ‘Industriousness’ and ‘Methodicalness’ while another developer incorporate the five factor model with behavioural tendencies and came up with a seven factor model. Another common phenomenon observed was that under each of the five factors the primary factors (ranges from 3-5 facets, also known as facets) were also measured, and they were actually more commonly used by test developers in report generation and interpretation. This was probably because the primary factors offer more detailed information thus higher flexibility in using them. Besides the Big Five model, another very popular personality model employed by test developers was Jung’s (1920) typology of personality. For instance two of the tests were developed from this theory as their entire theoretical foundation but one employed the original categorical model while the other one developed a continuum model. Besides building upon one theory, many tests extract personality factors from multiple personality theories and some of them measured as many as 34 personality dimensions. Example of the measured personality dimension includes ambition, initiative, concern for others, flexibility, and energy. Nearly most of the surveyed personality tests served multiple functions which included selection, training/development need analysis, counselling and other related applications such as personal development, conflict management and team building. Test developers further added the applicability of personality tests in different situations by providing multiple versions of reports alongside with a general personality profile.
Value, Motive and Preference
Another popular attributes being measured were value, motive and preference. Although these are three distinct attributes, we found it was common that test publisher combine either two or all three attributes into one test. These tests were less commonly employed in the situation of selection but more widely used in counselling and developmental scenarios, although some of them were also designed to be used in selection as well. For tests that measures value and motive, normative measures were found to be more common and ipsative measures were more common among preference tests. Another related attribute being measured was interest and they were mainly designed to be a career development tool.
Other measured attributes included measure of leadership styles, team role, behavioural tendency, Emotional Intelligence, self-efficacy, work ethic, interpersonal communication, sales orientation, customer service orientation, learning style and even work effectiveness tendency.
Nearly all of the surveyed tests have multiple reports and they are all in narrative form alongside with a graphic representation (usually bar charts) of the measured characteristic. However there was one test that did not employ narrative style in their report at all. Graphical representations with a sentence long description for each factor were employed instead of the narrative format. 2 dimensional typology graphs and score matrix were also employed for some type of reports. Some reports made use of different colours in representing different dimensions being measured yet some others used colour to indicate extreme scores (for example green representing high scores while red representing low scores). Colour was also frequently employed for matching test scores with a standard or an established profile, with green meaning a good match and red representing a poor match.
Generic Personality Profile
For all the surveyed tests, there was at least some form of generic personality profile provided in the report, whether in the form of narrative writing, matrix of scores, 2 dimensional typology graphs, bar charts or broken line graphs. Most commonly the personality profile was consisted of a graphical representation of the test scores on different dimensions with a brief descriptive narrative alongside it. In this generic personality profile the test scores, usually in form of sten scores or percentile were presented. Raw scores were also found in some reports. About half of the survey tests also presented the variation of the test score in the report and a few had an explanation on the meaning behind that. In all cases primary dimensions measured by the tests were reported in this section. Secondary or higher-level composite dimensions were also frequently reported in this section.
Strengths and Limitations
Strengths and limitations were another very popular qualities being reported, although we identified a few tests that do not report them. In reporting strengths and limitations some tests referred them to very specific behavioural terms while there existed some tests simply referred high or low scores in particular dimensions as strengths or limitations. Few tests incorporated contextual factors into the reporting of strengths and limitations were identified and they were more common in purpose-specific reports (for example reports designed for leadership development or team building). Overall tests tended to present information about strengths and limitations of the candidates.
Leadership, team work, interpersonal skills or orientation and problem solving orientation were found to be the most popular competencies being tapped. Other competencies being tackled by the surveyed tests included achievement orientation, customer service orientation, management style, decision making, planning and organization, influence and negotiation, delivery, creativity, analytic orientation, coping style and thinking style. Rather than being measured directly in the tests, these competencies were often generated from several primary dimensions of personality. They were found to be written in context of work and behavioural terms were employed heavily in order to aid comprehensibility of the report. Furthermore competency based reports were identified and leadership related reports were the one which appeared most. Competency based reports for sales and managerial positions were also popular.
Interview prompts were found in some reports. These included general instruction of how to use the report correctly to enhance the effectiveness of a follow-up interview as well as specific suggested interview questions to be asked for a particular candidate. The number of interview prompts varies from three to ten plus suggested questions and some reports even included the expected answer from the candidate. These interview prompts also served as a check or back up of the validity of the tests.
Training (Development) Needs
Several tests with a separated training need or developmental report were identified. For tests that did not have a designated report for training needs, it was surprising to found that the section outlining training was absent for majority of the surveyed tests, given most of them were designed to be used in training need analysis. When present, the training needs outlined (or some tests referred it to be ‘action plans’) were usually generated from the unfit aspects identified or areas that were not up to the normative standard. Simple description about the needs per se was common and a few reports were found to be providing concrete training suggestions.
Cultural fit information was identified in a few test reports. This information could include the fit of the candidate with the organizational culture, task nature as well as co-workers and it existed in several forms. The more popular way to compute it was comparing between the candidate’s score with the norm or an ideal profile. One test generated this information by comparing the candidate with the best performers. Yet another test presented the information in light of the candidate himself by stating what culture or environment will be the best fit for the candidate.
Technical information of the test included normative data, reliability and validity data as well as development procedure of the test. They are the most important information to be readily accessible to the public but unfortunately some of them were virtually absent for some of the surveyed tests. Normative data were found to be the most reported information and reliability data followed. However evidence for validity as well as development procedure of the test were absent for some of the tests despite the claim of ‘scientifically validated’ in their marketing materials. For tests that did not provide any of the above mentioned information the integrity of them were seriously in doubt.
Training requirement of the tests varied from no need training for an extreme case (which was the free online test) to BPS Level B plus additional training (approximately 7 days of training in total). For most of the tests 2-3 days of training for the specific test was common but this type of training would not be recognized by a different test provider. The BPS (British Psychological Society) Competence in Occupational Testing was found to be the most widely accepted qualification by the test providers. Most of the tests could be administered by a BPS Level B qualified user but there existed some tests which required a conversion training (1-2 days long) in order to be a qualified user of them.