Choosing the right performance review rating scales for your organisation
In this article we unpack the different rating scales for performance reviews, share example rating scales and understand the pros and cons for each type of scale.
There’s been a lot of contention around performance rating scales. We’ve seen a trend of throwing them out all together and then bringing them back in different forms.
Rating Scales vs No Rating Scales
Around 2015 we saw Deloitte, Adobe and a number of others ditch their annual review processes. This made for good headlines but it created a narrative that the entire performance process was being killed off which led to many organisations getting rid of their annual reviews.
But here’s the thing, these companies didn’t just throw away their performance processes, they reinvented them. Adobe introduced frequent 1-on-1 meetings through their Check-ins Process and Deloitte introduced 4 questions their managers would answer at least quarterly about each team member.
We reached out to Adobe 3 years after their big change and they were still ratings free (the only major organisation we are aware of that has maintained a policy of no rating scales).
What we’ve learnt is that performance rating scales themselves are not bad, but the way we have traditionally implemented them is a problem. When we are asked to rate qualities such as ‘communication’ or ‘professionalism’ then we are going to end up with bad data because our human bias creeps into these assessments. We go deeper on the 15 bias of performance ratings here .
I don’t think we’ll be solving the human bias problem anytime soon, but we can minimise it by making careful choices about exactly what we rate through these rating scales and how we guide the rater. In traditional performance reviews, managers would look at the question and ‘make a call’ on the rating.
With Crewmojo we capture data throughout the year across many metrics which is presented to managers at performance review time, and when it comes to the accuracy and fairness of rating scales, better supporting data = better decisions.
For those companies electing for no rating scales, Gartner research found that having no performance process actually dropped company performance between 4% and 10%.
Size of Rating Scales
We often get asked what size of rating scale for performance reviews is best. Well the truth is it depends on what you are looking to achieve - let’s take a look at the options from a 2 point rating scale through to a 10 point rating scale.
2 Point Rating Scale
Being a binary option is technically not a rating scale but is a good option when you want to capture a definitive yes or no answer, which may trigger a specific course of action, eg:
This person needs broader opportunities to keep them engaged? Yes / No
3 Point Rating Scale
This simple scale captures the 3 levels of not meeting expectations, meeting expectations and exceeding expectations. Simple is not necessarily a bad thing as you will achieve high levels of consistency across your performance rating data.
With a 3 point rating scale for performance, each level is very clearly defined and offers the least amount of ambiguity for managers categorizing an employee’s performance. The drawback is a reduced ability to identify the extremes of high or low performers.
We’ve seen one of the big 4 consulting firms update their rating scales for performance in 2019 with the simplicity of a 3 point scale. Employees are measured across the two primary axis of performance, ‘what was achieved’ and ‘how it was achieved it’.
Each of these axis with a rating scale of:
Behind Track –> On Track –> Ahead of Track
I also came across this 3 level rating scale for performance in a newsletter from James Clear - Atomic Habits (he has a great newsletter BTW)
The 3 Levels of Employees:
- Level 1 — You do what you are asked to do.
- Level 2 — Level 1 + You think ahead and solve problems before they happen.
- Level 3 — Level 2 + You proactively look for areas of opportunity and growth in the business, and figure out how to tap into them.
4 Point Rating Scale
The key benefit of a 4 point performance rating scale is a forced decision to classify each employee as either above or below average, avoiding the the issue of a lack of variance in performance review data.
5 Point Rating Scale
A 5 point performance rating scale has historically been the most common. As you’ve probably worked out, having more levels allows a more granular classification of employee performance - making it easier to identify the outliers for either an improvement plan or a bigger bonus. However, as you can imagine a 5 point scale starts to create more difficulty in achieveing a consistent, accurate and fair classification from rater assessments.
10 Point Rating Scale
I know what you’re thinking. If a 5 point rating scale is getting harder to make accurate assessments, then what chance do we have with a 10 point scale. And you’d be right. But here’s the thing, when using a 10 point rating scale you may wish to break the results into 3 categories just like the NPS algorithm .
A rating of 1 to 6 equates to below expectations, a 7 or 8 equates to meeting expectations, and a 9 or 10 equates to exceeding expectations. Advocates of this scale say a key benefit is capturing a more accurate picture of both poor performance (where managers may have otherwise been uncomfortable awarding a rating less than 50% of the scale) and standout performance (a 10 out of 10 is always going to be an exceptional score).
This type of rating scale for performance is particularly suited to situations where raters have not had much context or education about how to classify levels of performance.
Consistency in Rating Scales for Performance Reviews
A key goal for any performance rating system is to achieve consistent ratings from different managers. Consistency is when an employee gets placed at the same level on your performance rating scale when rated by multiple different managers.
With consistency comes fairness, increased engagement and ultimately better performance. The opposite may be achieved with inconsistent ratings if participants identify the system as being unfair - we can ironically drive disengagement and lower performance through the very tool that is designed to lift it!
So how do we get this right?
A good starting point is to clearly and succinctly define each level of the rating scale. These guiding principles will help keep this process on track:
- Avoid overlapping descriptions where similar or the same attributes are described in multiple levels.
- Write descriptions of actual observable behaviours, facts, outcomes etc.
- Keep the descriptions simple and brief, avoid long and detailed descriptions as they often won’t be read.
We recommend involving manager and employee representation from each of the key roles in your business to help define each level of the rating scale for their respective roles (and the labels you will use to describe those levels - see next section). Enlisting this help will:
- Improve buy-in and acceptance of performance rating scales
- Achieve greater accuracy of the definition for each level of the rating scales
- Empower your employees to drive the definition of what high performance looks like
- Ensure the naming language matches that of your culture
Use of Labels in Rating Scales
What do we mean by labels? Instead of naming each level of the scale with a number such as 1 to 5 or traditional gradations like ‘below expectations’, ‘meets expectations’ etc. you can name each level with a descriptive label eg. ‘Working on It’, ‘Rock Star’ etc.
This process is suited to questions with rating scales where it’s ‘okay’ to be a level 1 or a level 2 at a point in the employee lifecycle. For example, an employee that is a couple of months into a new role wouldn’t be expected to be expert in their responsibilities just yet. However, it can feel somewhat demotivating to be awarded a 2 out of 5. Using naming conventions can help communicate if they doing well relative to their individual journey.
Accuracy of Performance Rating Scales
Another key goal of performance rating systems is an accurate rating of performance by the rater. A significant contributor to inaccurate data is the idiosyncratic rater effect which is triggered when questions ask the rater to make a call on personality traits. For example rating a person for their ‘communication ability’, ‘strategic thinking’, ‘professionalism’, ‘accountability’ etc.
Gallup research recommends countering this effect by establishing performance ratings across more measurable data types:
- Performance metrics that are within the employees control and reflect outcomes such as productivity, profitability, accuracy, safety or efficiency
- Individualized goals that take into account each team member’s expertise, experience and job responsibilities
- Subjective observations that allow a manager to qualitatively evaluate performance in the context of role expectations
In regard to number 3 we would add that these observations should be supported by data collected throughout the year from peer feedback, values badging and 1:1 coaching conversations.
Use of Emoji in Rating Scales for Performance Reviews
To use emojis 👍, or not use emojis 👎? This question drives surprisingly polarized views 🤔, so let’s look at the research 📚… And enough with emojis ☹️!
This paper by Stange, Mathew and Barry, Amanda and Smyth, Jolene and Olson, Kristen. (2016) finds: Providing a smiley face scale alongside response options for satisfaction questions changed the amount of time that respondents spent processing the questions and response options. Respondents spent less time fixating on the question stem and on the text of the response options when the smiley faces appeared alongside the response options. The faces do not slow down—and may speed up—processing of questions, especially for low-literacy respondents, suggesting smiley face scales are one way to aid low-literacy respondents.
The paper goes on to say: With distinctly different age demographics now both in the workforce, questions have been raised about the implications this can have for communication preferences across generations. With 50% of workforce forecasted to be millennials in 2020, increasing to 75% in 2025, there is a clear need to adapting to a younger, more digital way of being. An increasing reliance on visual communication harks back to cave paintings and emotive expression that predates language, but emoji are often accused of being lowbrow and damaging to written language. On the contrary, Generation Z are showing how efficient and responsive emoji can be online. This new, digital generation are using short form communication more than most – Gen Z are significantly more likely than their older counterparts to use emojis (95%, compared with 79%).
So the answer? It depends on your workforce and your organization’s preference. 😁 Sorry.
Imbalanced Rating Scales
Gallup released a book in 2019 - Its The Manager by Jim Clifton and Jim Harter (It’s a great read).
It refers to a study (p.94) where many variations of questions and scales were tested across 3475 managers and 2813 peers. Their study found the most reliable and valid indication of performance was a question:
“Please rate this person’s performance in the past 6 months, based on the following key job responsibilities.”
The key job responsibilities being rated relate to ‘Individual Achievement’, ‘Team Collaboration’ and ‘Customer Value’.
Now here’s the interesting part, the rating scale is:
Below Average –> Average –> Above Average –> Outstanding –> Exceptional
You’ll notice that ‘Average’ is at level 2, Gallup found having an imbalanced rating scale produced more variance, and reduced halo and leniency bias . The criticism of this approach is being able to accurately define each level - see section above on consistency.
As a guide, Gallup suggests that 1 in 10 employees might be considered ‘Outstanding’ and 1 in 100 employees ‘Exceptional’.
Weighting of Performance Rating Scales
Should questions be weighted to drive more focus on behaviours or impact?
Like before… it depends.
Some business’s may have a greater focus on goal achievement or outcomes from employees, whereas other businesses may be more focused on ‘how’ employees go about getting their work done. Depending on the behaviour the business is trying to drive, it might call for a greater value (weighting) being applied to certain rating questions.
This is a normal practice but it’s important to be transparent with employees if these weightings exist, firstly because it will help drive the behaviour you want and secondly because it creates a fairer process.
Key Points When Designing Rating Scales for Performance Reviews
It might sound obvious, but measure what is important to the business and what you wish to drive a focus on.
It can be easy to get embroiled in the detail of performance review question design and end up losing sight of the bigger picture, here’s some overarching tips to stay on track:
- If you measure too many metrics the process can quickly bloat out. Remember 10 questions might not sound like much, but when a manager has 8 reports, that’s really 80 questions they need to be thoughtfully considering. Remember the simplicity of Deloitte’s system that is centred on two easy to answer questions.
- Identify the data that you actually need from the process in order to make performance related decisions. Everything else is data for data sake.
- Ensure managers and employees are equipped with the information they need to make quality rating decisions (Crewmojo serves up on demand qualitative and quantitative data gathered throughout the entire year👍)
- Design questions and rating scales (with input from employees and managers) so it’s easy to rate accurately and consistently.
- Sprinkle in a few emojis to lighten the mood!