On A2K's Ranking Algorithms and how they will improve

Forums: able2know, Community, a2k Nextgen, Rankings, Algorithms
Email this Topic • Print this Page

When this version of A2K was launched we intended the voting system to be a stop gap before implementing a better bayesian rating system. We ran out of time and although over the years I've kept abreast of the different approaches communities and software have taken to sorting content and have evolved my algorithms hoping to one day put them to use here we have just not had the time to implement any of them on A2K and it has been a simple majority vote ever since.

While serviceable this is unideal in many ways and in this post I will talk about some of the problems with the way A2K's voting system currently works and some algorithms that will improve things. Of course with voting systems there are as many opinions as members times opinions times time of day but this post is specific to how the result of the votes are calculated into a rank and how that is used and not the validity of using member votes in the first place (after all, the algorithms are only as good as the votes they are fed, the argument about voting in general is a larger discussion and this thread will focus on the best interpretations of the voting data we can come up with). I will be providing data examples and even sample algorithms (though the exact algorithms we use will be living and breathing and will evolve from the simple versions I am including here).

Please also try to refrain from jumping to conclusions. One thing to keep in mind is that the new A2K will provide most of these different sorts as options for users and communities so you don't need to form any knee jerk reactions on what is "best" and start arguing. There are advantages and disadvantages to each and we will be providing more options and this does not have to be an argument about what is the single best way to sort topics.

Current A2K

Code:upvotes - downvotes

This table is sorted by vote sum (the bold column), the way the current site topics are sorted in the rather useless "Popular" view.

I've made some color coded topics to show some scenarios where different algorithms perform better than the current site's simple sum. Now granted any vote system is not able to establish "good" and "bad" but merely the particular community's view of "good" and "bad" but beyond this limitation there are algorithms that do better and worse at sussing out nuances that come up.

For example compare the two topics this sorting that is currently used put at the bottom. One of them, with just one upvote (from the author) and 99 downvotes is probably showing a strong signal that this is a "bad" topic to this community (something like awful spam, or something that pretty much everyone in the community agrees is bad). Compare this post to the "unpopular post" which has an even worse score in this rating system. This "unpopular post" has a large negative sum but the votes are much more evenly split, with 302 votes up and 401 votes down. This shows up as the "worst" topic in the current simple sum sort, but it's really just a controversial topic where the balance of the community leans a particular way (for example it could be a perfectly reasonable political position in a community that leans a bit toward one side of the political spectrum).

This simple sort has very dissimilar topics by nature there side by side and does a disservice to merely unpopular topics vs ones where there is truly a supermajority consensus in the community that are "bad".

The flip scenario is at the top of the rankings, where a topic that is "popular" is at the top even though it is controversial just because the majority leans one way in the community and the topic is above a topic where the overwhelming community consensus is that it is "good" even if there is no such consensus on the popular topic at the top of the rankings.

You can also see lack of nuance in the middle, where a truly controversial subject, the most controversial in this list, has a simple sum of 1, the same as a brand new post.

Simply put, this sort does a bad job at the top bottom and middle of the spectrum at capturing a lot of nuance that matters to community culture. You can see examples on A2K when you see one of the most "popular" topics of all time

Percentages

Code:upvotes/total votes

Ok so it is obvious, instead of using a simple sum metric of popularity use a percentage. After all, all of these posts have very different percentages right? So that should differentiate these cases well. Well yes, but simple percentage sorting has its own problems as we'll see in this table of the same data sorted by percentage.

Here when sorting the topics by percentage we see a clear problem: a brand new topic with one vote, or a topic with a handful of votes one way, goes to te extreme of this ranking. When posts have a lot of votes then the percentage is useful but when there is not a quorum of voters the percentage is less reliable. It's a dilemma between the representative percentage versus the need for scientifically significant data that volume provides.

A2K's New Rankings

Code:((average votes * average rating)+(total votes*(positive votes/total votes)))/(average votes+total votes)*10

So one solution is to use a bayesian-based rating to try to capture the reliability of a score. Instead of calculating whether something is good or bad we are calculating the probability that something would reach a consensus of good or bad (as always, as defined by the community, the algorithms are limited to the data that is input into them).

In this kind of rating system items with few votes are weighed toward the average, so that a handful of votes doesn't push the item to either extreme (e.g. a post with 1 upvote is 100% and a post with just 1 downvote would be 0% but these posts are unlikely to be the very best and very worst posts of all time).

In this weighted system simple popularity does not rule as in the simple sum, and it does not suffer the same problems as a simple percentage rating when you have very little data to use (it is basically a bit more agnostic when it doesn't have much data). Here is how these topics using that sort would be ordered.

Note that the popular merely by simple sum topics do not dominate, topics that are new and have little data to calculate a ranking with confidence are weighed toward the average instead of extremes and the algorithm does better to differentiate what the community overwhelmingly has a consensus is "good" or "bad" and most importantly topics that are "popular" or "unpopular" that would have been in extremes in the simple sum sort are now treated with much more nuance.

For the purpose of sorting good to bad as determined by the community consensus (not mere popularity in the community) this is a pretty good way to go. But for other purposes, it is not good. If your goal is to see what the consensus good and bad is this works but there is value to surfacing not just that which is the consensus but that which is furthest from the consensus, or rather: controversy.

Controversial Sort

Code:total votes/max(abs( upvotes - downvotes ), 5)

Now many users will often don the mantle of "controversial" when they really are in fact merely broadly unpopular, or the community may even have an overwhelming consensus on the person or position.

Basking in internet notoriety for one's kicks gives many forum warriors the sense that they are edgy and controversial when in reality we are dealing with pedestrian unpopularity.

Real controversy in a community is reflected by the amount of diametrically opposed opinions, not in how many agree that someone or something is stupid. Everyone here doesn't like spam, that is not controversial it is just both deeply unpopular and a consensus "bad" thing here. So a topic where everyone rates it down is not "controversial" it's just "bad" or "unpopular".

True controversy in the community would be to find the community not just close to equally divided but also deeply divided. A topic that has one vote up and one vote down is perfectly divided in opinions but not as controversial as a topic with 100 up and 99 down (if less perfectly balanced in diametric opposition).

So to calculate true controversy you want to surface these deep divides, things like counting the most pairs of up and down votes and other such algorithms work, and here is an example of an algorithm that surfaces such topics.

Note how it treats both "good" and "bad" topics equally, it is not about popularity but about controversy vs consensus and there is strong consensus on those topics. Similarly the "popular" and "unpopular" opinions are treated equally in terms of their controversy score. This helps undo a lot of downsides from popularity based sorting and we will use this algorithm (not exclusively, we will use several algorithms to achieve different goals) to help highlight and surface controversial posts on the new site.

Topic Aging

One algorithm that can really help content consumption deals with the intersection of these rankings and time. There will still be sorts like by new post or reply in the new system and those do a good job of mixing all good and bad topics or posts and sorting by age but one thing that we will add that the current site does not is a useful view of good AND recent topics.

The current offering of the "popular recent topics" page is nearly useless. It is basically just topics from the last week sorted by simple sum popularity. So a couple topics get to the top and stay there all week, the view is not very useful on a day to day basis, it's really just useful to see what was the most popular in the last x (time frame).

A better and more useful view would surface the good and the new content more dynamically and for that we will be using an algorithm that sorts posts that are both good and new, the ones that are very good will stay in the view longer and their rating will decay over time to cede space for new topics. Its' a much more elegant solution and here is some example math (but not graphs this time as the effect of this algorithm is difficult to demonstrate without real topics to show that it is better at a time-sensitive interpretation of relevance):

Code:rating/(age+recencyBiasCounterweight)^gravity

Use recencyBiasCounterweight = 1 and gravity = .1 for similar results to what we will shoot for.

Culture Artifacts

There are some artifacts that pop up that are not inherent to the algorithms themselves but to presentation thereof, and that too can influence culture. So let's talk about that. Many communities and sites out there use one for sorting and another for display due to the psychological differences between them.

A rating system that shows a percentage of bayesian kind of system inspires a bit more focus on quality and most modern communities sort with algorithms that are similarly focused. But many of them display the simple sum (even if for sorting they actually use a bayesian or lower-bound-of-Wilson-score-confidence-interval-for-a-Bernoulli-parameter based rating system) because it's a bit more engaging, and moving the vote total one vote resonates more with users than incrementing a small fraction of a number in an algorithm.

I'd like to go with the perhaps less engaging but more "accurate" ways of displaying the rankings and have topics and posts show a score of 1-10, so an average or new post would be at around 5.0 and a good one might be 8.5 and a bad one might be 2.8 etc. This is versus the simple sum where you get scores like -5 or +5 etc.

This kind of UI difference affects community culture and is still being obsessed over to so things might change in our plans here, but one option I am strongly considering is to allow communities and/or individuals to select their own sorts and displays and therefore influence their own communities

Conclusion

We plan to employ some of these kinds of algorithms (they are a work in progress and will evolve) on the new community platform we are developing. I welcome any feedback but much prefer the thoughtful kind. This work is the product of not only thousands of my own hours of study and tinkering but a significant amount of standing on the shoulders of industry giants. I would like to thank, in particular, Evan Miller and this article on how not to sort that got me thinking about it back in 2009 and his follow up on Bayesian Average Ratings. While the algorithms we will use are not directly inspired by either of those articles they inspired the thought and the research that led to the solutions we have so far.

In studying the major communities, community software and ranking systems I could it made me ever more certain that there is more for me to learn and more gains to be extracted from such algorithms. I welcome feedback but would like to ask that you keep an open mind and keep the feedback thoughtful. It's a waste of everyone's time to argue about flippant and shallow opinions on what is a deep and complicated subject and if you feel an inordinate amount of strength of conviction about a particular opinion on algorithms please temper it with the realization that this is just not simple stuff, there are no silver bullets and there are competing goals that need to be taken into account These are merely tools, not magic and aren't going to magically fix or destroy any community. I feel compelled to add this disclaimer due to the overreaction that proposed changes to a community brings and when combined with the strong feelings about anything related to votes this can easily devolve into an argument that edifies absolutely none.

Topic Stats
Top Replies
Link to this Topic

Type: Discussion • Score: 13 • Views: 4,127 • Replies: 18

[+3] - Robert Gentel - 12/10/2015[quote="Ionus"]How about reflecting pure hostility towards posters? If you always vote a person down, regardless of topic, then the problem is you not the other person. [/quote] This...

@Robert Gentel,

One area where I've seen weighted rankings is for writing, whether it's rankings on Wattpad or author or sales ranks on Amazon. The complexity certainly goes beyond simple sorts.

At Wattpad (NB for those who don't know, Wattpad is a Canadian writing site where you can publish original or fanfiction work for free. There are over 10 million users), a part of how the rankings are figured is via a kind of bucket sorting. You have to decide on one genre for your book. Hence if you write space westerns, you're going to have to chose between science fiction and adventure, and maybe also teen fiction and/or fanfiction if they apply. Subtlety comes from tagging by the writer. No one but the writer can tag. Additional popularity and ranking measures come from most recent update date, frequency of updating, number of votes, number of comments (the site does not seem to distinguish between positive and negative commenting, so a book with 10,000 variations on 'this sucks!' in the comments could conceivably rank pretty highly). Their ranking system is proprietary; most of these insights come from observations made by me and other users.

In Amazon's system, some rankings come from sales, but they also come from how a book is categorized (by the publisher or uploader of the ISBN). Amazon sometimes seems to lean in favor of diverse storytelling and outlier genres, so a space western could conceivably have an advantage. E. g. ranking as 200 out of only 200 space westerns adds one degree of oomph to an overall ranking, although a 200/40,000 ranking for the more popular vampires category will add a different amount of oomph, as a book in that category has more competition. Other positive factors come from a writer being followed by a lot of people or having older bestsellers even if current sales aren't so hot. Number of reviews helps, as does review positivity, but you can still get some ranking love if you have a lot of reviews but they're less than stellar. 5000 one-star reviews probably won't help you as much as 100 three-star reviews, but at least they don't all have to be perfect five-stars in order to help you.

It's a fascinating area to study.

0 Replies

@Robert Gentel,

I haven't been great at math since I got an early start on it as a kid, so if any of the math buffs would like to help I'd appreciate the input. I can also share a spreadsheet if any others are interested in playing around with some numbers.

0 Replies

@Robert Gentel,

How about reflecting pure hostility towards posters? If you always vote a person down, regardless of topic, then the problem is you not the other person. You could show a hate index, as in one person who continually votes someone down is clearly an emotional reject striking out at difference rather than any factual intellectual disagreement.

There are posters who contribute to make people think yet they can guarantee being voted down on anything they say. Of course if you only want your opinion published then ignore the maths and stay up late deleting unwanted opinions, dont let others do your dirty work.

2 Replies

@Ionus,

Ionus wrote:

You could show a hate index, as in one person who continually votes someone down is clearly an emotional reject striking out at difference rather than any factual intellectual disagreement.

Now, that sounds like an excellent idea.

1 Reply

@roger,

What exactly is the idea? Can someone put together a simple algorithm to illustrate it?

0 Replies

@Ionus,

Ionus wrote:

How about reflecting pure hostility towards posters? If you always vote a person down, regardless of topic, then the problem is you not the other person.

This is a scenario for discarding/not counting votes after a certain point in my opinion. A sorting algorithm for this would not seem to provide any utility nor decrease the scenario you are talking about.

1 Reply

@Robert Gentel,

Out of curiosity: How do you intend to test your ranking algorithm? I'm sure you'll come up with an arithmetically-coherent formula, and testing for arithmetic coherence is straightforward. But that's one thing. It's quite another thing to test if the formula actually promotes the kinds of post you want to see and demotes the kinds of post that you don't. (After all, this is your reason for devising the algorithm in the first place.) So how are you going to test that it fits your bill?

1 Reply

@Thomas,

Ultimately it won't live or die based on whether it promotes the kind of posts I want to see or demotes the ones I don't (don't need an algorithm to do that if it's the goal). It will live or die based on how well it does that for the community (and I know your question is probably what my interpretation of this part is). Given that the community is providing the data I think the weighted ranking algorithm does a good job at finding the community consensus there, the goal of the ranking algorithm could be expressed by saying I want it to express the likelihood of a positive or negative vote if all the users (or some quorum of them) voted.

Now that doesn't answer an important question though: whether showing what the community says they want to see is what they really want to see (i.e. maybe they vote views they disagree with down, but WANT to keep seeing and doing that).

So the axiomatic data I look at for most of those kinds of things is engagement (even though I don't see it as the be-all end-all of quality it is a critical metric for a community's survival). When comparing between two algorithms or versions of one I would want to a/b or multivariate test them to see how the behavior reacts to them and if it does so in a positive or negative way.

That won't be the best approach for all communities (maybe you prefer a certain characteristic above survival of your community etc) and I'm hoping that we'll be able to build in the ability to set different algorithmic tones to one's created community as well.

1 Reply

@Robert Gentel,

But a better answer to your question might be that I was hoping to do well at being able to surface the highlighted types of posts (at least those types of voting patterns, what type of topics those patterns might represent in practice I can only guess at).

For example I think it should do better at not letting mere majority dominate rankings or obscure vast differences in degrees of consensus, like the current scores do.

I think that a post that is 1100 up and 1000 down (+100) being on top of a 100 up 10 down (+90) post is likely a mistake most of the time. But there are times that it is not, so the controversial sort would help bring those to the light.

I also think that the scores being displayed as on a 10 scale vs mere totals lends itself to more emphasis on quality than quantity (though also wonder that this might be a mistake and that the interface should emphasize engagement there).

0 Replies

Weighted scores seem somewhat esoteric.
Would it make sense to record the vote totals # of ups and # of downs?
Instead of the differential? Or, along with the differential? Would that not allow viewers to see the whole picture?

1 Reply

@neologist,

Showing those totals is one thing but sorting is another. You still have to pick one to sort by even if you provide more of the data to show its deficiencies.

1 Reply

@Robert Gentel,

Well, that sorting variable sorta answers my question.
Thanks

0 Replies

@Robert Gentel,

No, it is simply a list. If you vote someone down, there is a visible record of it. Some here are emotionally distraught and vote based on hatred not the quality of a post. If we could access a list of voters, it would help cure the hate motive. Unless of course the management reserves the right to interfere with the voting to reflect more correct thinking.

I have often suspected some people here were preloaded with thumbs down to reflect a correct opinion.

2 Replies

@Ionus,

As an aside, I have three posts that were marked down (perhaps even more? Very Happy

) but which were reposted by others without change and were marked up.

If the managers want to run a forum to counter some of the hopelessly biased ones out there, then say so but stop the pretense at being fair and open. This is why I refer to a hate factor.

0 Replies

@Ionus,

Ive only voted you down once after all these years and that was just because you were dead wrong .
Maybe you should accept the fact that people want some degree of accuracy when a thread involves debate. It seems that you are the one who, from the getgo, calls everyone "**** for Brains" or "Goober" or "dickhead".

Not that Im criticizing, Im merely trying to maintain a "truth in posting" standard.

1 Reply

@Robert Gentel,

Need clarification. Are we talking about votes on Threads, on individual Posts, a combination of the two or what?

Or am I missing an A2k section that only I can't see?

Being a habitual pattern seeker, I have been mystified by the column labeled "Featured Topics". Only conclusion I could reach about the selections was that there was either a human with a lot of biases involved or a 'highly weighted' algorithm with the weighting system being a mystery. I quit paying any attention to it when the 'logic' escaped me.

0 Replies

@farmerman,

To the general population: Perhaps you haven't noticed some people are constantly marked down? You dont respect any alternate opinion to your own. Yet I have seen you whinge about the odd occassion when members of the in-crowd get thumbed down. My advice: wear it as a badge of honour.
That has nothing to do with accuracy, it is hate. That's right, all you lefties who love everything from homosexuals to whales to Palestinian terrorists need something to hate.

To farmerman: For starters, I call you Gomer not goober and I only do it in retaliation for rudeness. As for accuracy, I have seen you deny what was plainly obvious and try to bluster your way out. The person I call **** for brains is the most obnoxious person here. He seeks out newcomers, tells them they are wrong and if they do not agree with him, he abuses the hell out of them. On at least one occasion he denied saying something so I produced the quote.

To the managers: There is no doubt in my mind that everything from the number of views to the thumbs is rigged. Who is getting paid to come up with all those "Dear Dorothy" questions eg "My boyfriend has a bigger penis then me...should I stay with him?" The general tone of this place collapsed when swearing was allowed.

0 Replies

I have to agree that any ranking on the 'thumbs' count is meaningless. Some stats on activity level alone might be useful but I suspect that that would point to conclusions 'management' would not like.

The sys op of the forum I used to hang out at closed up shop and went home just because he didn't like where all the action was. Language, courtesy and bullying was not the cause or a problem.

Having said that, it looks like we do lose a lot of new comers due to bullying. I don't know what the solution is to that.

0 Replies

On A2K's Ranking Algorithms and how they will improve

Related Topics

Quick Links

My Account

able2know