The blog post How Not To Sort By Average Rating continually pops up and got me thinking about how we currently implement sort by rating. We currently use spree_reviews for capturing ratings and it takes a very simplistic approach to storing the average rating for a product:
self[:avg_rating] = reviews.approved.sum(:rating).to_f / reviews_count
This exact scenario is mentioned in the blog post above:
Why it is wrong: Average rating works fine if you always have a ton of ratings, but suppose item 1 has 2 positive ratings and 0 negative ratings. Suppose item 2 has 100 positive ratings and 1 negative rating. This algorithm puts item two (tons of positive ratings) below item one (very few positive ratings). WRONG.
A better solution is using a Bayesian estimate which actually takes the number of reviews into consideration. This is how IMDB currently create their top 250 movie list:
weighted rating (WR) = (v ÷ (v+m)) × R + (m ÷ (v+m)) × C
* R = average for the movie (mean) = (Rating)
* v = number of votes for the movie = (votes)
* m = minimum votes required to be listed in the Top 250 (currently 1300)
* C = the mean vote across the whole report (currently 6.8) for the Top 250, only votes from regular voters are considered.
With that, it’s fairly simple to approximate with spree_reviews. Just be sure to recalculate all your product ratings.
reviews_count = self.reviews.reload.approved.count
self.reviews_count = reviews_count
if reviews_count > 0
r = reviews.approved.average(:rating).to_f
v = reviews.approved.count.to_f
m = Spree::Review.approved.count.to_f
c = Spree::Review.approved.average(:rating).to_f
self.avg_rating = (v / (v+m)) * r + (m / (v+m)) * c
self.avg_rating = 0