@Twirlip,
Twirlip wrote:Yes, I mentioned that I could see that as a potential problem, back in one of my earlier posts (at about 7:00 p.m. BST - I don't have a reference number for it).
What was confusing me was that people were saying that the problem was the creation of meaningless or offensive tags, and I couldn't see why this relatively minor problem needed to be handled in any other way than by making the least recently added tags invisible by default (but still operative, and still accessible on demand).
You are only considering the on-thread implications, you aren't considering the on-forum implications. On the thread you seek to resolve it by making it invisible (but still accessible on the forum) but there is an on-forum potential for problems as well in that miscategorization could never be undone except by the miscategorizer and that opens up the possiblity for on-forum vandalism (not the on-thread vandalism which you are focused on).
What you propose is how we launched by the way (only the top 5 were displayed on thread but the thread remained in the forums of any tag it was given by anyone) and it was the first fire we had to put out. It blindsided us that we had left the door open for forum vandalism that wide and we had to rush to close it,
So right now the way it works is for the top 5 (as determined by number of users) tags to allow the community a way to remove an irrelevant tag. But this is a crude algorithm and I have a number of improvements to make to it. First of all, we will implement a collaboratively-edited pages feature that is going to serve as curation for forums. I don't want to try to explain it now (I just can't do it justice simply and there's no existing example of what I intend to do on the internet that I am aware of) but simply put we are going to have wiki-style forum management that is human-driven but more structured. This will be used as a signal for the tagging in a way that I believe will eliminate all graffiti tagging except for...
Quote:It's not obvious what to do about the problem of people michievously adding functional but misleading tags to threads. I'll leave that one to you for the moment!
There are 3 ways I am considering (individually or in combo) to do so:
1) Term extraction - we can run the post text through a term extraction algorithm that tries to do semantic analysis of the post and determine it's own tags. Basically, a computer's attempt at tagging the topic. We can use this to weigh tags that match a computer's attempt more highly.
2) Crude semantic analysis - on a very basic level, there's a huge signal we can use in the presence of the word itself in the post title and text. So if "philosophy" is in the text a tag of philosophy can be weighed differently than a graffiti tag that is typically not in the topic text.
3) Wisdom of crowds - We already use the crowd as a vote but we can do so more explicitly. For example if two people have to blindly tag a topic and only tags that are arrived at by at least two people independently are displayed they are likely to be much more relevant (most of the graffiti tags are not likely to be arrived at by even two people independently).
Anyway, none of those are perfect approaches but a combination of some/all of them will be used to rework the current tag approach from a pure use popularity score to a more nuanced relevance score that takes other signals into account. We'll also start running aggregate data on the user level to detect users whose tags are rarely relevant and to lower the weight of their tags and other such things.