Better sort by rating with Spree and spree_reviews

The blog post How Not To Sort By Average Rating continually pops up and got me thinking about how we currently implement sort by rating. We currently use spree_reviews for capturing ratings and it takes a very simplistic approach to storing the average rating for a product:

This exact scenario is mentioned in the blog post above:

Why it is wrong: Average rating works fine if you always have a ton of ratings, but suppose item 1 has 2 positive ratings and 0 negative ratings. Suppose item 2 has 100 positive ratings and 1 negative rating. This algorithm puts item two (tons of positive ratings) below item one (very few positive ratings). WRONG.

A better solution is using a Bayesian estimate which actually takes the number of reviews into consideration. This is how IMDB currently create their top 250 movie list:

weighted rating (WR) = (v ÷ (v+m)) × R + (m ÷ (v+m)) × C

where:

* R = average for the movie (mean) = (Rating)
* v = number of votes for the movie = (votes)
* m = minimum votes required to be listed in the Top 250 (currently 1300)
* C = the mean vote across the whole report (currently 6.8) for the Top 250, only votes from regular voters are considered.

With that, it’s fairly simple to approximate with spree_reviews. Just be sure to recalculate all your product ratings.