A/B Test: Changing rating icons

Background

During my time at Booking.com, I worked for several years on the reviews team and I was the designer in charge of the reviews collection flow. After a user completed a trip, we asked them for reviews of hotels, cities, and points of interest they visited. This information was then used to inform other travelers and help them to have the best possible trip.

This experiment focused on the form we used to collect reviews of cities after a user’s trip.

Hypothesis

The city review form asked a few questions of users after they stayed in a city. One of those questions was a simple rating of the city on a scale from 1 to 5 stars. We used a similar 1 to 5 rating system when asking a user to rate hotels, but instead of stars we used smiley faces. Our primary goal in this experiment was to eliminate the inconsistency between these two scales. We had seen success in using smileys on hotel ratings, so we used that insight to inform this experiment which would change the city rating from stars to smileys.

We hoped for at least a neutral result on our destination review submitted metric so that we could have a consistent smiley rating for both cities and hotels. Although our goal was neutral results, there was a chance that we actually could see improved results on city ratings because the smiley rating had proven to be more effective on the hotel rating page.

Results

We ran the experiment for one week, with a total of 771k visitors in both variants. Our primary goal destination review submitted increased 5.40% (±0.35%) in the “B” version of the test.

Learnings

Although we started this experiment simply to establish some consistency in our ratings systems, and we were not really expecting a significant change in our metrics, we got a surprise boost in city reviews. Experimentation will often tell you what happened but not why it happened. We can only speculate on the “why” for this experiment’s results. A few reasons come to mind to explain the uptick in city reviews:

Smileys are an inherently better system to rate things as they correlate more directly to human emotion.
A 3-star rating for Amsterdam might mean different things to different people and that ambiguity might be enough to make a user hesitate to answer this question. On the other hand, a 😐 smiley clearly communicates the feeling you might have had about your trip to Amsterdam.
Consistency in rating systems creates a more fluid user experience.
Before a user rates their city, they are asked to rate their hotel with smileys. Therefore, as they go from the hotel rating page to the city rating page, it is less mental effort if we give them the same rating scale.
Smileys are more attention getting.
It’s possible that smileys simply grab attention better than stars, perhaps because they are representations of the human face or because there is more detail to the smiley icons.

This experiment led to more successful experiments using smileys to rate hotels, cities, and landmarks in other areas of the reviews collection flow.