Both our qualitative and quantitative test findings show that users expect ‘Customer Ratings’ sorting to function differently from how it’s currently implemented at 86% of major e-commerce sites.
During our most recent study on e-commerce Product Lists & Filtering the second most applied sorting direction across all test sessions was sorting by ‘customer ratings’ (the most utilized was ‘lowest price’). However, the test sessions identified a critical mismatch between how users expect sorting by ‘Customer Ratings’ to function and how 86% of e-commerce sites currently have it implemented. This mismatch was observed to cause great user frustration and curtailed the subjects’ ability to find what they considered “highly rated” products.
In this article we’ll outline why users expect ‘Customer Rating’ sorting to function differently, how you can align your sorting logic with user expectations, and provide examples from leading e-commerce sites which already have this new sorting logic implemented.
Now the typical mismatch between how users expect customer ratings to function and how it’s implemented comes from the intent users have when applying the “Customer Rating” sort type. From the test sessions it’s clear that most users rely on customer ratings as a way to quickly tap into the “wisdom of the crowd” – the collective opinion and experiences of other shoppers.
During testing, the “Customer Rating” sort type was used most frequently when the subjects were browsing for products where they had little domain knowledge and therefore sought to rely on the insights and experiences of others to make an otherwise difficult decision and to reduce the risk of purchasing an “inadequate” product.
However, when benchmarking the product list experience of 50 major e-commerce sites, we found that on 86% of those sites, “Customer Ratings” sorting is implemented as a naive rating average sorted in descending order, where a 5-star-average-rated product will be placed before a 4.8-star-average product regardless of how many ratings those averages are based on.
When sorting by ‘Customer Ratings’, most sites will position a product with a single 5-star rating before a product with a 4.8-star average based on 18 votes. And technically this is correct, as the former product technically has a higher average. Yet it is a naive implementation that doesn’t take the “sample size” into account, and indeed, nearly all users will find the latter product to be a much better indicator of a product “recommended by the crowd” when looking to make a product selection.
So while it may be mathematically correct to place the 5-star average first, it fails to account for the reliability of the average. A sample size of 1 is obviously flawed – a fact that wasn’t lost on the test subjects, who assumed that products with only a handful of perfect ratings were usually either a coincidence (a couple of ‘fanboys’) or even the manufacturer or site representatives who’d given the rating, and would often find it highly questionable.
Meanwhile the reliability of customer-rating averages based on several votes were never called into question by the test subjects. In practice, skepticism began to drop when the average was based on 5+ votes. This high level of skepticism toward a low number of perfect ratings has been confirmed during our prior Checkout and Mobile E-commerce usability studies as well.
To get a more quantitative understanding for users’ bias of not fully trusting a 5-star rating average based on a just a few ratings, we tested three different rating averages against 1,716 people.
For two otherwise identical products, where one product has a 5-star average based on 2 ratings, and the other has a 4.5-star average based on 12 ratings, 70% would pick the one with the higher number of ratings despite its lower average. This confirms the test observations that when a perfect average is based on only a few ratings users will often prefer other products with a slightly lower average but a higher number of ratings.
As noted in our earlier investigation of Users’ Perception of Product Ratings, product ratings essentially function as a type of social proof for users, letting them tap into the “wisdom of the crowd”, using good ratings as a proxy for “high quality” or “value for money”. The thinking goes that if a lot of other users are happy with a product it means that it must be a bargain or of high quality – or both. (Which is why users lacking domain knowledge or experience with the product find product ratings particularly useful because it allows them to rely on the domain knowledge and product experience of other customers.) The article also outlines why the number of ratings should always be displayed in conjunction with the rating average.
To better match the user’s expectations and intent behind sorting by ‘Customer Ratings’, a site’s sorting logic has to take the number of ratings into account as well and not rely solely on the average score. In essence, when a user decides to sort by ‘customer ratings’, the products with a 5-star average based on just 1-4 ratings should not be placed before any products with a 4.5+ star average based on 50+ ratings.
The sorting logic should instead be weighted to account for the combination of rating average and the total number of ratings. This aligns much better with the intent the vast majority of users have when they sort by ‘Customer Ratings’ (i.e., “show me what other users think are the best products”). For instance, notice in the Home Depot example above how products with a 4.5-star average based on 50 and 36 ratings respectively are placed before the two products with a 5.0-star average based on only 6 ratings.
Now, a simpler 5-vote “cutoff” which simply excludes (i.e. doesn’t calculate an average for) any product with less than 5 votes could also be adopted. However, this is of course a much less sophisticated solution and obviously won’t work well for smaller sites and in categories with few user ratings.
While it’s true that the weighted sorting method makes the actual sorting logic less transparent to the user (as it changes from a simple high-to-low logic to a more complex equation), during testing, this issue proved to be far less severe than the issues caused by listing products with 5-star averages first even if their average was only based on a handful of ratings. Without a weighted logic the most trusted products with 4.5+ averages based on dozens or hundreds of ratings will be scattered across several pages of results, making it very difficult for users to find the products which are “recommended by the crowd”.
The exact weighting between averages and number of ratings will likely vary based on site context and audience and may require ongoing tweaking and A/B split-testing. For inspiration, here’s the few major e-commerce sites that we’ve identified that do currently have a weighted sorting logic for their customer ratings: Overstock, Amazon, Crutchfield, Best Buy, Home Depot, and Lowe’s.
This article presents the research findings from just 1 of the 850+ UX guidelines in Baymard Premium – get full access to learn how to create a “State of the Art” user experience for product lists, filtering and sorting.
Join 22,000+ readers and get Baymard’s research articles by RSS feed or
Topics include user experience, web design, and e-commerce
Articles are always delivered ad-free and in their full length
1-click unsubscribe at any time
Amazon just fell off your list for weighted averages. This week they changed to straight averages. What idiots!
If you are looking for a good algorithm to use for weighting the rankings, you should look no farther than the Bayesian Averaging system used by Board Game Geek (https://boardgamegeek.com/) where ranking from user reviews is crucial.
© 2021 Baymard Institute US: +1 (415) 315-9567 EU: +45 3696 9567 email@example.com