Our agile project is to create a site that lets you rate articles. In our corporate goals, we defined the goal to make it easier for people to find and read great content. Last night I was doing some research on social networks and thinking about the nature of our ratings approach. In this article I share some of those thoughts, and the reason for changing the ratings approach relative to previous designs.
An interesting article on the history of social software (going back to the 1940’s) got me thinking about the timelessness of content, and the fact that there are different approaches to scoring or rating things.
Our Content Is Almost Timeless
Generally speaking, the topics we need to read about are “timeless” – at least their relevance can be measured in multiple years, not multiple minutes. An article can become obsolete, someone can build a “better” mousetrap, etc. But short of that, great articles will be great for a long time. The Mythical Man-Month is decades old, and still relevant. We need to make sure that our rating approach will keep “great” articles at the top of the heap, rather than bury them under the stack.
I was also looking at one of the real-time pages at digg. What an extremely cool thing to watch. However – one thing jumped out at me – the articles with the most immediacy of content were getting all the attention. A fifteen minutes of fame thing. If you look at the “most digged [Kevin and Alex say digged on their podcast, not dugg] of all time” – none of those articles were getting noticeable attention. They had become old news.
When there is a great resource, like any of Scott Ambler’s UML 2.0 pages, it needs to be easy to find for people using the site. Perhaps using a “rate it 1 to 5” approach would be more effective.
Comparing Approaches
I created a couple bulletted lists of pros and cons of two general approaches – aggregating ratings and averaging ratings. Here’s what I came up with – feel free to augment or dispute in the discussion on this article.
Sum Of All Scores
An approach that sums all of the scores, so that the number of “votes” affects the score. For example, if 10 people voted “yes” for an article, it would have a score of 10. If 20 people voted, the score would rise to 20. This is the general approach used at digg.com.
- The score reflects the number of people who liked the article enough to vote on it.
- As the number of people using the site grows, newer articles will tend to get more votes and get more of the attention.
- The “runaway” article dynamic will happen – and the crowd will drive attention towards the most attention-getting articles.
Average Of All Scores
An approach that averages all of the scores for an article. For example, if 10 people rated an article an average of “3”, the article’s score would be 3. If 20 people rated it a 3, the score would be a 3. This is the general approach used at Amazon (and Tyner Blain) for rating books (and articles).
- The score obscures the number of people who liked the article enough to vote on it.
- Older “best” articles will be at the top and all articles will have to play “king of the hill” to see which become the top scoring articles. Tie-breakers can be based on the number of ratings for an article.
- Would need some minimum number of ratings for the score to have credibility. The community would probably self-police this by quickly voting on articles that have “biased” initial votes.
- As the community grows, obsolete articles will gain new (lower) ratings pushing them off the top of the heap naturally.
Conclusion
Based on this analysis, the “scoring” approach should be the same general approach used to rate articles at Tyner Blain and books at Amazon. I’ll update the domain model to reflect this change. I think a simple 1-5 scale (where 1 is bad and 5 is good) would work effectively.
Chloe had some great comments on the Use Case Briefs article discussion about people being able to learn from articles with negative reviews or low scores – as ideas to stay away from. That “unintented” use would also be better served with an averaging-score, if it were easy to display “low score, high number of ratings” articles.
I like the presentation and function of the Tyner Blain ratings: quick and easy.
On negative reviews: I believe there will be only a small number of them, because if I read a bad article, propability is high that I don’t read all the way through, and that I don’t want to spend more time with it. bad articles will simply be not rated high.
Cool, thanks Rolf for the feedback.
I think IMDB also weights/normalises people’s votes. For exaple if all users give an average vite of 6 nd my average vote is 8 my vote will show up as an 8 when you look at how I vte, but the movie’s overall rating is adjusted for my abberent voting behaviour.
I was also wondering whether you would weight authors votes. For example if my articles were rated high would my rating of other’s article be weighted based upon my expert view?
Cheers
Craig
Good inputs Craig – thanks!
Really interesting idea – have some notion of authorship incorporated into the site.
I was initially thinking that if people rate the reviews that others give, then the aggregate review scores could be incorporated into the weighting of the reviewer’s scoring of other articles. This might serve as an incentive to people to give quality reviews of articles (so that their vote “matters more” when they score articles.
Basically, by writing good reviews (useful, accurate), you get rewarded by other users of the site, and your opinion becomes quantitatively more important. There may even be a way to promote your skill level, along the lines of forums that “reclassify” you based on the number of posts you make – but based on quality more than quantity.
On rereading Rolf’s comment – two articles with a score of “4” may be very different.
Article A has “4 as an average of 100 votes”
Article B has “4 as an average of 10 votes”
A is likely “better than” B.
But what about article C with “4.5 as an average of 10 votes”?
C is better than B. Is C better than A?
Was it Mao that said “Quantity has a quality of its own”? Until we come up with a better solution, quantity of votes will be a tie-breaker for equivalently scored articles.
On A, B, and C – why don’t we just show the number of votes. Maybe the reader herself has the best algorithm to decide what she thinks is good. This might help overcome the problem of A having been available for voting for the longest period of time.
Definitely will show the level of attention that the articles are getting. Imagine that ABC each represent 100 articles. The challenge is that the C articles would push the A&B articles off the top of the list.