Researching, sharing and learning to improve online assessment

"As a company, we have witnessed a surge in interest in the value and benefits of online assessment. cut-e is recognised as one of the leaders in online testing because of our commitment and constant drive to consider, research and incorporate new ways to improve the assessment of people at work. As such, our research and development programme is strong; we believe that it is important to challenge, investigate and improve testing practices and we see conferences as a vehicle to share with, and learn from, our fellow psychologists and test developers."

Dr Achim Preuss, founder and MD of the cut-e Group

DOP Conference 2017

Symposium: State of the Art of Game-Based Assessment

General Abstract
Game-Based Assessment (GBA) is becoming increasingly popular in workplace assessment, especially, but not exclusively, in the context of large international companies' graduate and apprenticeship schemes, yet little is known about this fascinating Psychological Assessment approach. This six-paper symposium offers a comprehensive insight on the state of the art in GBA by merging the perspectives of Arctic Shores - UK's GBA pioneer - its recently announced partner cut-e, and Birkbeck College. Hands-on games, scientific evidence, and a snapshot of the "behind the scenes" work that enables GBAs will be combined to answer questions such as "What is GBA? Does it work? Is it fair? Is it science?”, and to stimulate discussion between researcher, practitioners, and the wider audience. 

Paper 1: The elephant in the room: What is Game-Based Assessment? (Lara Montefiori, Arctic Shores)

Paper 2: Fairness in Game-Based Assessment. Adverse impact, perceived stress, and effect of screen size. (Dr Maria Panagiotidi, Arctic Shores)

Paper 3: Gaming the game. Game-Based Assessment and self-presentation bias. (Dr Maria Panagiotidi, Arctic Shores)

Paper 4: Measurement invariance, and dyslexia as a moderator in game-based assessments. (Liam Close, Birkbeck College)

Paper 5: Incremental Validity of Online Psychometric Assessments and GBAs: Some Preliminary Findings. (Dr Katharina Lochner, cut-e)

Paper 6: Arctic Shores’ Psychometric Framework and Infrastructure. (Lara Montefiori, Arctic Shores)

ITC 2016

ITC 2016 - Paper: Positive Assessment

This paper was presented by Dr Achim Preuss and Dr Katharina Lochner at the 10th International Test Commission Conference, July 2016 in Vancouver, Canada

Employment testing has a mixed reputation with the general public. This is partly due to the fact that when designing such processes the applicants’ view is very often neglected (Boss, 2005). However, there are opportunities to provide what we call Positive Assessment: In analogy to Seligman’s (2011) PERMA framework from Positive Psychology we claim that assessment can (1) induce positive emotions (P); (2) be engaging (E); (3) establish positive relationships between organisations and candidates (R); (4) be meaningful for organisations and candidates (M); (5) be an accomplishment in that it achieves clear added value for organisations and candidates (A).

The objective of the session is to present ideas how selection processes and psychometric instruments can be designed in a way that candidates experience them in a positive and engaging way and provide meaningful outcomes for both candidates and organisations.

Study 1 assessed social acceptance (Kersting, 2008) of a test assessing creativity (N = 110). Study 2 assessed to what extent an online test measuring numerical reasoning can evoke a state of flow (N = 41). Study 3 looked at quantitative and qualitative feedback from both accepted and rejected candidates in a selection process.

Study 1: The online creativity test was higher on social acceptance than standard ability tests. This was mainly due to quality of measurement as well as usability and functionality. Study 2: Candidates reported to be in a moderate state of flow, with time pressure having an impact on the experience. Study 3: Candidates, even those who were rejected, perceived the selection process as rather fair and were satisfied with information and communication.

Selection processes and instruments can be designed in line with the idea of Positive Assessment. However, more studies are required on which factors impact candidate experience.

ITC 2016 - Paper: Detecting who is going to cause problems

This paper was presented by Dr Achim Preuss and Dr Katharina Lochner at the 10th International Test Commission Conference, July 2016 in Vancouver, Canada

Innovation has become something like a “Holy Grail” in economics since innovative products and services are a competitive advantage in rapidly changing international markets (Maier, Streicher, Jonas, & Frey, 2007). Thus, companies strive to establish an environment that facilitates innovation (Amabile, Conti, Coon, Lazenby, & Herron, 1996) and try to recruit innovators. It is desirable to measure applicants’ potential to innovate at an early stage in the recruitment process, i.e. in unsupervised online mode. Whether someone will be an innovator is determined by cognitive ability, certain personality characteristics, and creativity (Farr, Sin, & Tesluk, 2003). To date, personality and cognitive ability can be measured in unsupervised online mode, but for creativity there are no standardised instruments for unsupervised testing with automated reporting available.

Creativity is measured by three components fluency, flexibility, and originality (Guilford, 1967). The aim was to develop an online creativity test that assesses these scores using an automated machine-learning-based scoring algorithm. The task is to draw pictures on a scratch board using given shapes and to name them.

In two subsequent studies the instrument and scoring was developed. In a third study with N = 470 participants it was validated.

Test-retest reliabilities were .82 for Fluency, .67 for Flexibility, .71 for Originality, and .72 for Overall Creativity When scoring the pictures according to the Torrance Tests of Creative Thinking (TTCT; Torrance, 1974) manual, correlations between this and the automated scoring were .99 for Fluency, .85 for Flexibility, .88 for Originality, and .93 for Overall Creativity.

The machine-learning-based scoring algorithm for the online creativity test provides reliable and valid creativity scores. A limitation so far is that there is no proof of criterion-related validity in the sense that the test really predicts creative performance in workplace settings. A study is currently being planned.

ITC 2016 - Poster session: Behind the scores - Paradata in Psychometric Assessment

This poster was presented by Dr Achim Preuss and Dr Katharina Lochner at the 10th International Test Commission Conference, July 2016 in Vancouver, Canada 

New technologies for administering psychometric instruments open up new opportunities for obtaining paradata. Paradata of a psychometric instrument are data describing the process of completing an instrument such as response times or clicks (Stieger & Reips, 2010). Thus, new technologies allow for tracking the candidate’s behaviour during the entire completion process.

The purpose of the session is to give an overview of different opportunities of using paradata to assess data quality particularly in unsupervised online settings and to increase the reliability and validity of psychometric online tests and questionnaires. There will be an overview of various research findings, followed by a discussion of implications for research and practice of employment testing.

In Study 1 N = 100 students gave a true and a forged self-description on a questionnaire that assesses job-related competencies. In Study 2 we logged different positions of a slider on a visual analogue scale (first click and subsequent changes of slider positions) that was used in a questionnaire assessing integrity.

Study 1: Scale and item variance increased significantly as an effect of intentional faking while total scale point distribution decreased and the pattern of response times changed. Study 2: Using the position of the slider after the first click instead of its final position for calculating reliability yielded higher internal consistency of the instrument.

It is possible to assess and improve data quality and thus validity using paradata. However, using them raises a number of questions such as: What do the instructions have to explain to candidates on what kind of data is collected about the completion of an instrument? Which effect will the candidate’s awareness of their behaviour being tracked have on their actual test performance?

ITC 2016 - Poster session: Pilot Assessment - Above and Beyond Ability

This poster was presented by Dr Achim Preuss and Dr Katharina Lochner at the 10th International Test Commission Conference, July 2016 in Vancouver, Canada

When assessing airline pilots there is often a focus on mental ability. However, research shows that in most accidents inadequate team behaviour or a breakdown in communication was observed (Maschke, 2004). Moreover, particularly captains need to show professional aviation knowledge and should be strong in the social competencies needed to practice good Threat and Error Management (TEM) and Crew Resource Management (CRM) (IATA, 2012). Thus, assessing competencies along with cognitive ability in pilot selection is essential.

We developed a competency model for airline pilots that comprises the competencies essential for TEM and CRM and validated it with performance data from various airlines. The competencies can be assessed using a self-report questionnaire.

First a conceptual model was developed based on the insights from literature (e.g., Goeters, 2004; Le, Oh, Robbins, Ilies, Holland, & Westrick, 2011) and practice (e.g., IATA, 2004). Second the model was validated using data from major airlines from the Middle East and Southeast Asia. As predictor we used the personality questionnaire shapes that assesses job-related competencies. As criteria we used performance on the simulator and during flight (base and line checks).

The competency model comprises competencies like stress resistance, interaction with others, or decision making. The empirical studies revealed dimensions like controlled (subscale of stress resistance), agreeable (subscale of interaction with others), and analytical (subscale of decision making) – to name but a few – to be predictors of performance both on the simulator and during flight. The competencies showed incremental validity above cognitive ability tests.

Competencies predict pilot performance and have incremental validity above and beyond cognitive ability. Thus competencies can and should be assessed during pilot selection. The model will need further refinement for different target groups such as captains, first officers, and captains.

SIOP 2016

SIOP 2016 - Symposium: Practical Considerations for Cross-cultural Use of Self-report Questionnaires

This session integrates research studies from four global employment-assessment providers with the aims of providing an overview of the impact of socio-cultural factors on the functioning of self-report questionnaires, and the practical implications of these socio-cultural impacts for the international use of self- report questionnaires.

Press release
Using self-report questionnaires in employee selection is globally considered to be a best practice. However, the use of these assessments in the global context is not without its challenges. Self-report questionnaires can potentially function differently due to socio-cultural factors such as language ability, cultural values, response styles, and the interpretation of concepts. To shed light on these issues, this symposium explores the cross-cultural implications of using personality, integrity, and emotional agility assessments.

  • Personality Assessment and Language Proficiency: Hennie Kriek, cut-e South Africa
  • Intercultural Differences in Integrity Scores – Implications for Practice: Katharina Lochner, cut-e Group
  • Hierarchical Assessment of Emotional Agility across Cultures: Rainer Kurz
  • Trends Around the Globe: An Investigation of Culture and Personality: Bharati Belwalkar, Eleni Lobene, Meng Li, Anthony Boyce

SIOP 2016 - Poster: Detecting who is going to innovate

This poster was presented by Dr Achim Preuss and Dr Katharina Lochner at the 31st Annual Conference of the Society for Industrial and Organizational Psychology, April 2016 in Anaheim, Florida, US

The aim of the present study was to develop an online creativity test on the basis of the concept of Torrance Tests of Creative Thinking (TTCT; Torrance, 1974) that assesses creativity using a fully automated scoring algorithm. The instrument was developed and validated in three subsequent studies.

Press paragraph
The aim of the present study was to develop an online creativity test that assesses fluency, flexibility, originality, and an overall creativity score using a fully automated scoring algorithm. The task is to draw pictures on a scratch board using certain given shapes and to name them. In two subsequent studies the instrument and scoring was developed. In a third study with N = 470 participants it was validated. When scoring the pictures according to the Torrance Tests of Creative Thinking (TTCT; Torrance, 1974) manual, correlations between this and the automated scoring were around .9 for all scores.

SIOP 2016 - Poster: Detecting who is going to cause problems

This poster was presented by Dr Achim Preuss and Dr Katharina Lochner at the 31st Annual Conference of the Society for Industrial and Organizational Psychology, April 2016 in Anaheim, Florida, US

An online instrument predicting counterproductive work behaviors is introduced, based on the notion that behavior is best predicted by considering the person, situation, and interaction between both (Mischel & Shoda, 1995). It shows good psychometric properties (Cronbach’s alpha between .71 and .90, correlations with interview results between .39 and .77).

Press paragraph
When screening job applicants companies usually assess factors that predict job success. However, counterproductive work behaviors such as absence from work, fraud, or dangerous behavior cause a lot of damage and therefore, factors that predict such behaviors should be assessed in employment testing as well. A new online instrument based on the notion that behavior is not only based on a person, but also on the situation is introduced. The items describe behaviors rather than traits, which allows for predicting counterproductive behaviors without stigmatizing those scoring low on the questionnaire.

ATP 2016

ATP 2016: Panel discussion - The transition from computer to mobile and tablet based testing: Misinterpretations, implications and pitfalls 

The increasing use of smartphones and tablets seems to urge test providers to make their tests accessible via these devices. The question doesn't seem to be whether or not we are going to use these mobile devices for testing. There seems to exist, however, some controversy on the purposes and the conditions of meaningful use of these devices.

In this session, the panel will address some questions on testing via mobile devices.

  • Is there a difference in data quality gathered by computers and mobile devices and if so, what are the consequences? If this is true, can we base selection decisions on data gathered via mobile devices? Should we complement them? Should we not use them for selection and rather for other purposes such as realistic job previews?
  • Based on what criteria should we decide whether or not to include mobile devices in the selection process? A first criterion might be the constructs we want to measure. Computers might be more appropriate for some constructs, mobile devices for others. A second criterion might be practicability. When we are confronted with a significant number of candidates who have a tablet or a smartphone but no computer, it might be a good idea to provide a test on the mobile device. A third criterion might be the stage of the selection process. Maybe we should use mobile devices solely in the beginning of the process?
  • Mobile devices enable us to measure other psychological constructs and skills than regular computers. Or not? What is the difference between mobile devices and computers in relation to the constructs we to measure? For instance, is the fact that it is easier to confront the candidates with more text relevant for the measurement of constructs? Or is this only a minor difference?
  • Is the impact of the shift towards mobile devices as fundamental as the shift from paper and pencil testing to computer testing? The shift towards computers has confronted the testing industry with important challenges. The tests had to be adapted to be used on computers, for example. Similarly, we can ask ourselves how our tools have to be adapted to mobile devices. Test taking on the computer has also brought us new, interesting possibilities. Adaptive testing and the use of videos, for instance, were not possible when we were using paper and pencil tests. What opportunities does the shift towards mobile devices offer us?

3 session objectives

  • To raise awareness of a significant change that is currently happening in the testing industry.
  • To start a discussion on suitable and unsuitable application of mobile devices in testing.
  • To give some ideas as to how to deal with this change.

How does this session address innovations in testing or examine this topic in an innovative manner? It deals with a significant innovation that is currently happening in the testing industry that we need to address proactively.

On the panel: David Barrett, COO cut-e Group

EAWOP 2015

EAWOP 2015 - Poster: Concurrent construction designs may not overestimate the predictive validity of situational judgment tests (SJTs)

This poster was presented by Mats Englund from cut-e Nordic at the 17th ongress of the European Association of Work and Organizational Psychology (EAWOP), May 2015 in Oslo, Norway

Screening candidates with situational judgment tests (SJTs) is becoming increasingly popular among companies worldwide. Attractive aspects of using SJTs include effective corporate branding and realistic job previews for candidates while maintaining high predictive validity. A common method to establish which items to use involves a validation study and picking the most predictive items from an item pool. Using this method, capitalization on chance may lead to overestimation of the predictive validity, potentially making concurrent designs inaccurate for estimating true predictive validity.

However, in practical cases, data is often collected using classic concurrent validation study designs, where participants already may have learned on the job what behaviors are desirable. This leads to the risk of restriction of range, attributable to inside knowledge rather than true predictive ability/potential, which leads to underestimation of the predictive validity, opposing the overestimation mentioned above. Here, results are presented from the development and validation of an SJT for service positions in a large Swedish organization. Comparisons of results from the construction of the SJT (n=800, concurrent design) with results from high-stakes testing (n=550, predictive design), and a supplementary student-sample study (n=128), suggest that both capitalization-on-chance overestimation and incumbent-knowledge-related restriction-of-range underestimation of the predictive validity were present.

Mental ability scores predicted SJT scores better for students, suggesting SJT restriction of range for employees was attributable to learned desired behavior, and estimated predictive validity proved close to the true predictive validity. These results suggest one reason using a concurrent designs may not always overestimate SJTpredictive validity.

IPPA 2015

IPPA 2015 - Presentation: Impact of emotions on test performance

We were delighted to be part of the 4th World Congress of the International Positive Psychology Association (IPPA) in Florida, US, June 2015. Dr Katharina Lochner, Research Director for the cut-e Group, shared her research of the impact of emotions on test performance.

The Congress as a whole sought to explore not just how individuals are able to flourish and reach their potential through positive psychology but also how this can, ultimately, lead to a better society – and this area of psychology seems to be going from strength to strength.

Dr Lochner’s research has looked at ability testing which has for many years been used as a staple of recruitment and development assessment programs. Performance on such tests is important when selection or promotion is impacted by the score achieved – and yet, despite their popularity and use in high stakes situations, there has been little research into the impact that emotions have on test performance. Whilst there have been studies which show the positive impact a positive mood has on test performance, Dr Lochner researched the impact of more specific emotions such as joy, contentment, anger and sadness have on test scores.

2014 - cut-e presents its latest research findings at conferences across Europe

July 2014. cut-e the leader in online testing and assessment has been sharing its latest research and thinking at applied psychology and testing conferences throughout July. cut-e welcomed the opportunity to share its work with colleagues gathered recently at the International Congress of Applied Psychology (ICAP), the Conference of the International Test Commission (ITC) and the European Association of Test Publishers (E-ATP) Conference.

One of the key areas of recent research includes the design and development of a new instrument which predicts the likelihood of counter-productive work. Katharina Lochner, Research Coordinator at cut-e comments: "Just as organisations look to see the behaviours it wants in its employees early on in the recruitment process, it's just as important to identify and recognise the more undesirable behaviours. After years of looking into this area, our assessment tool takes into account that behaviours displayed by a person at work are due to the situation encountered as much as the person's personality."

With the growth of online testing, the cut-e team also looks at the impact of technology on the science of testing. It presented its research into the effect of cheating on online assessment scores - and its research into the validity of the shorter tests made possible through adaptive testing. Other studies presented recently include looking at the impact that emotional state has on test performance and, in separate papers, the implications of improving test performance through cognitive training, and how rater selection and innovative scoring models can improve the value of 360 degree feedback.

Assessment Product Finder

Search amongst over 60 different online psychometric assessments for the right test or questionnaire to suit your needs.
Search by