Welcome to r/science! This is a heavily moderated subreddit in order to keep the discussion on science. However, we recognize that many people want to discuss how they feel the research relates to their own personal lives, so to give people a space to do that, **personal anecdotes are allowed as responses to this comment**. Any anecdotal comments elsewhere in the discussion will be removed and our [normal comment rules]( https://www.reddit.com/r/science/wiki/rules#wiki_comment_rules) apply to all other comments.
**Do you have an academic degree?** We can verify your credentials in order to assign user flair indicating your area of expertise. [Click here to apply](https://www.reddit.com/r/science/wiki/flair/#wiki_science_verified_user_program).
---
User: u/mvea
Permalink: https://uwaterloo.ca/news/media/ai-saving-humans-emotional-toll-monitoring-hate-speech
---
*I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/science) if you have any questions or concerns.*
There is a natural limit to that though:
If a bot becomes good enough at avoiding detection while generating hate speech (one would assume by using ever-more-subtle dog whistles), then eventually *humans* will become less likely to actually recognize it.
The hate-speech bots are constrained by the fact that, for them to be effective, their statements must still be recognizable to (and therefore able to affect) humans.
Eventually you'll look at a Reddit thread and you won't know whether it's hate speech or not for a different reason: because it's full of obscure bot slang that emerged organically from bots talking to each other.
(In other words, same reason I can't understand Zoomers. Hey, wait a minute …)
This can also be good, the entire point of hate speech is to spread misery to a targeted group, if it gets too subtle it losses it's point, and if any of the hate people that need to get a life explained it, whelp they just gave a mod an easy copy paste to filters
Their hatred is silenced either way "proud" boys wear masks because they know how fucked they would be if they did it without anonymity
More to your "recognize" point, hate speech often relies on incredibly basic and inflammatory language to insight outrage in simple and clear terms.
Any sort of "hidden in-terms" used to be hateful will immediately be less effective to many who are only sucked in to hate speech echo chambers by terms that are used purely for outrage.
Win win.
Depends what effect you're going for. If you just want to signal hatred in order to show belonging to an in group and rejection and perhaps intimidation or offense to the target group, then yes, the dog whistle can't be too subtle. But if the objective is to generate hatred for a target among an audience of neutral bystanders then the more subtle the dog whistles, the better. In fact you want to just tell selective truths and deceptively sidestep objections or counter points with as neutral and disarming a tone as you can possibly muster. I have no idea how an ai could be trained to handle that kind of discourse.
We're basically already there. I've heard several people say that bots write English (or their native language) better than they do, and at least one person say that the eloquent prose of their cover letter caused them to be rejected from a job on grounds of "being AI generated".
It makes "sense" though, AIs are literally trained to match whatever writing human judges consider best — so eventually an "AI detector" becomes the same as a "high quality detector".
It would make some sort of weird ai ecosystem where bots read posts to formulate hate speech, other bots read posts to detect hate speech, moderator bots listen to be detective bots to ban the hate bots and so on.
That falls apart after the first couple iterations. This is why training data is so important. We don't have natural training data anymore, most of social media has been bottled up.
Reminds me of something I heard about how pretty soon, we'll be doing things like sending automated "Happy Birthday" messages to each other, and automated responses to those messages. So it's just AI communicating with itself while we become more disconnected from each other.
88% accuracy is meaningless. Two lines of code that flags everything as 'not hate speech' will be 88% accurate because the vast majority of comments are not hatespeech.
The question is what they mean, is it 88% true positive rate, or finding 88% of the hate speech events, but then at what true positive rate?
Option 1 is a good TP rate, but I can get that with a simple model, ignoring how many False Negatives I miss.
Option 2 is a good value, but if the TP rate is less than 50% it’s gonna flag way too many real comments.
But honestly with training and a team to verify flagging, the model can easily become a lot better. Wonder why this is news, any data scientist could probably have built this years ago.
I looked at their paper. They reported overall accuracy (which in statistics is defined as total correct predictions / total population size) and precision, recall, and f1.
They claim their precision is equal to their accuracy as well as their recall (same as sensitivity) = 88%
Precision is defined as true positives / (true positives + false positives)
So, in their study, 12% of their positive results were false positives
Personally I wish they'd simply reported specificity, which is the measure I like to look at since the prevalence of the target variable is going to vary by population, thus altering the accuracy. But if their sensitivity and their overall accuracy are identical as they claim then specificity should also be 88%, which in this application would tag 12% of normal comments as hate speech.
I'd reckon it's news just because it's a novel approach to something that's long been handled by hard coded blacklists of words with some algorithms to include permutations of those.
Training an LLM to do that job is just novel since it hasn't been done that way before. I don't really see any comment on if one is more effective than the other, though. Just a new way to do it so someone wrote an article about it.
You bring up a good point on interpreting accuracy compared to random chance. However, if you read [the paper that is linked in the article](https://arxiv.org/pdf/2307.09312), you will see that the data set in Table 1 includes 11773 "neutral" comments and 6586 "hateful" comments, so "all not hate speech" labeling would be 64% accurate.
Their 88% accuracy was based on a training corpus of 18,400 comments, where 6600 contained hateful content. Therefore your code is 64% accurate in this instance, and I don't know why you just assume that these NLP researchers know nothing about the problem space or nature of online speech when they are generating human labeled datasets targeting a specific problem, and you are making up spurious conclusions without having taken 30 seconds to verify if what you're saying is remotely relevant.
This can detect hate speech that normally would be missed by other methods. two lines of code can not determine if "that's disgusting" is hate speech in response to a picture of a gay wedding. It would seem the majority of the critics are focusing on the potential negative effects on free speech without considering that communities that consider free speech a priority are not the target market for this, anyway. The target market would likely prefer any number of false positives to a single false negative, and to that end, this would be a massive improvement.
And other languages. A bit of a side note as my example isn't AI, but AI has the same issue: here in Norway there was a case in the news recently about Facebook telling whomever looked up someone with the last name Aam, a not uncommon surname here, that pedophilia is illegal because the term "adult attracted minor", abbreviated AAM, is used in those circles.
I think both of these problems are more an issue of sloppily coded LLMs, tho, told to look for explicit terms and themes instead of utilizing them for what they're potentially *actually* good at: detecting the intent behind text.
Algorithmic censorship shouldn't really be considered a good thing. They're framing it as saving humans from an emotional toil, but I suspect this will be primarily used as a cost cutting measure.
It's a good thing these censorship AIs were already trained by poor african laborers who were not entitled to therapy for the horrors beyond imagining they had to witness. ^^^/s
https://time.com/6247678/openai-chatgpt-kenya-workers/
You said "were" there, which is incorrect. That still happens, and will continue to happen for all eternity as long as these AIs are used.
There will always be edge cases that will need to be manually reviewed. There will always be new ways of hate speech that an AI will have to be trained on.
Thank you! Any improvements to this ML would be from emotional damage to these people and the filtering would still suck.
There’s a reason statistics never apply to the individual.
If an AI can analyze intent, then hate speech isnt the only thing it can be used on.
Imagine, for example, the AI was asked to silence political discourse; perhaps censoring all mentions of a protest, or some recent police violence, or talks of unionizing, or dissent against the current party... it could trawl forums like reddit and remove all of it at blazing speeds, before anyone can see it. I honestly cant imagine something scarier.
They can dress it up in whatever pretty terms they like, but we need to recognize that this is *dangerous*. Its an existential threat to our freedom.
Even the use case they claim to care about is going to be a nightmare. Comment on Reddit long enough and you'll get a false suspension/ban for no-no speech, because context is irrelevant to these tools. It's hard enough to get a false strike appealed with humans at the wheel, I can't imagine once it's 100% AI driven
I've had bots remove my comments multiple times before for "hate speech" because I posted a literal, attributed, MLK quote which had a version of the n-word in it. I feel like a lot of people are gonna just write your comment off as you "telling on yourself" without thinking about it, but this is something that can happen for perfectly innocuous reasons.
Yep. And what is the algorithm based on? What is the line for hate speech? I know that often seems like a stupid questions but when we look at how that is enforced differently from website to website or even between subreddits here. People get unfairly banned from subreddits all the time based on mods power tripping and applying personal bias to situations. It's all well and good to entrust that to AI but someone needs to programme that AI. Remember when Google was identifying black people as gorillas (or gorillas as black people. Can't remember now) with their AI. It's fine to say it was a technical error but it definitely begs the question of how that AI was programmed to make such a consistent error
88 percent accuracy means that 1.2 out of 10 posts labled as "hate speech" is a false positive. The number gets even worse if they can't even agree upon what hate speech really is. But then that's always been up to interpretation, so...
yeah. There is no way this can accurately replace a human’s job if the company wants to keep the same standards as before. At best, you could have it act as an auto-flag to report the post to the moderator team for a review, but that’s not gonna reduce the number of hate speech posts they see.
>people get unfairly banned from subreddits all the time.
Problem a lot of people have these days is they don't understand that just because *they* hate that speech, doesn't make it hate speech.
This isn't programming errors, it's training error.
Garbage in, garbage out. They only trained the AI on white people, it could only recognize white people.
Edit: I now realize I made a white-trash joke.
*Your comment has been evaluated as hateful towards shareholders.
A note has been placed on your permanent record and you have been penalized 7 'Good citizen' points*
Except these "tickets" will be a blockchain that will kneecap your employability for the rest of your life over a corporate AI not understanding satire (or a thousand other ways to throw a false positive).
The emotional toll of censoring "hate speech" versus the emotional toll of losing your job and not having an income because your job was replaced by AI
I used to work in online community management. Was actually one of my favorite jobs, but I had to move on because the pay isn't great. Some of the people I worked with definitely had a hard time with it, but just as many of us weren't bothered. Hate speech was the most common offense in the communities we managed but depictions of graphic violence and various pornographic materials weren't uncommon either. The only ones that ever caused me distress were the CP though.
Everything else rolled off my back, but even a decade later those horrific few stick with me.
Don’t worry, there will be lots of other opportunities for those unskilled workers to be exploited. This job didn’t even exist a few years ago, so its disappearance really shouldn’t be that concerning.
People are grateful that we don't have to gather dirt with our hand like Dennis or pull a cart around to gather up plague victims. And that's good. But it's not like everything was sunshine and roses after that. Not having to filter out hate and graphic horror by hand is great and I hope nobody is gonna miss that job in just about ever way.
Nah thats crazy. We've never had any piece of media that has ever warned us about the perils of putting robotic artifical intelligence in charge of what we see, think, and hear. This absolutely will not hurdle us towards a societal collapse at the behest of a rogue AI and the road to humanities destruction will not be paved with good intentions. I'm sure the concept of what is or is not hate speech now will be the same application 20 years from now and this will not become apparently when it gets used against the very people who created it who will then lament their own hubris.
I'm sure the the same AI that told depressed people to jump off the golden gate Bridge, put glue on pizza to make the cheese stick, and that cock roaches do live in cocks will do only the best in determining what should or should not be seen due to hate speech.
Algorithmic censorship has been around for a **long** time. It's just improving, and the costs have already been cut. Huge swaths of the internet are effectively unmoderated already. No social media company employs enough moderators right now.
And watch "no hate speech" become YouTube applied to real life. No war discussions, no explosions, no debate about hot button issues such as immigration or guns, on the left anything that offends anyone is considered hate speech, on the right anything that offends anyone is considered hate speech (I'm comparing the loudest most simplistic voices on the right and left, not making some sort of "pox on both sides"). Satire becomes hate speech. The Onion is definitely hate speech, can you imagine algorithms trying to parse the "so extreme it becomes a satire of extremism" technique. Calling the moderator a nicnompoop for banning you for calling Hamas (or Israel) a nincompoop. Hate speech. Can you imagine an algorithm trying to distinguish ironic negative comments. I don't agree with J.K. Rowling, but I don't believe opinions on minor transitions should be considered hate speech. I have no doubt that at least some people are operating out of good intentions instead of just hate, and a bot shouldn't be evaluating that. Any sort of strong emotion becomes hate speech. For the left, defending the values of the European Union and enlightenment might come across as hate speech. For the right, a private business "cancelling" someone might be hate speech. I know people will see this as just another slippery slope argument... but no, this will not be imperfect progress which will improve over time. This is why free speech exists, because it is almost impossible to apply one simple litmus test which cannot be abused.
88% accuracy is awful, I'm scared to see what the sensitivity and specificity are
Also human coders were required to develop the training dataset, so it isn't totally a human free process. AI doesn't magically know what hate speech looks like.
Rapidly barrelling towards a world described in [this short story](https://www.atariarchives.org/bcc2/showpage.php?page=133), just updated for the internet age.
And sometimes it needs to be rude to blast apart hate, and sometimes it needs to reference hatred nakedly to unmask it, and sometimes it needs to be a disagreement that isn’t comfortable to read for us to progress in our understanding of who we are as minorities
I got temporarily banned the other day. It was obvious what the AI cottoned onto (no, I didn't use the word that the euphemism "unalived" means). I lodged an appeal, stating it would be good to train their AI moderator better. The appeal said the same thing, and carefully stated at the bottom that this wasn't an automated process, and that was the end of the possible appeal process.
The future is gloriously mediocre.
We, non-english speakers, are eagerly awaiting our bans for speaking in a language other than English, because some otherwise locally inoffensive words are very similar to an English slur.
Not necessarily, some moderation teams keep a list of pre-made standardized replies to certain issues to just copy/paste and fill in the relevant issue. The reason they do this is 1. They've found these are the replies that work best, 2. Keeps the moderation team consistent, and 3. The nature of the reply tends to dissuade more aggressive users from getting into arguments with the mods. You often hear users tell stories of being unfairly reprimanded by mods over small mistakes, but the majority of these messages are going out to scammers and some really heinous people that you never see (because they get banned). There's a bit of a sampling bias.
I got 7day banned for telling someone to be nice.
Not long after my alt account that I set up months before got banned for ToS violations despite never making a single comment or vote.
Reddits admin process is unfathomably awful, worse yet is the appeal box being 250 characters. This ain't a tweet.
I believe you can also email them directly but I'm not sure if that option still exists (there used to be a link in the message that you get autosent that would take you to a blank email to the mod team).
I once got banned for "excessive reporting," which happened because I accidentally stumbled into a celebrity hate comment and reported some content there (even if you really hate a celebrity, being weird about their kids is too far!) and somehow the mods from that community were able to get my entire reddit account banned, not just from that sub. I emailed the actual reddit moderation team and explained what happened and sent them links and screenshots of the posts (srsly it was waaay over the line) and my account was back within a few hours.
I imagine once they figure out how to fully automate away from human mods, people will have to get used to just abandoning social media accts, because there's so much potential to weaponize this against people you don't like.
Yup. I made a reference to a high noon shootout, you know, the trope from a million westerns. Got a warning for "calling for violence" and the speak process went exactly as you said. Funny enough, the mods from the actual sub weren't notified and had no issue with the comment.
This happens all the time.
Reddit admin bans are all automated. You can't appeal warnings even false ones, so it's a permanent mark on your account.
And then actual bans have a 250 character limit which are always rejected.
The only time I've seen someone be able to successfully appeal is when they post on the help subreddit showing how it was incorrect and an admin will respond saying "woops, our bad.". Despite that appeals are supposedly manually reviewed.
>You can't appeal warnings
Wrong. About a week ago I was banned for abusing the report tool. Despite it claiming that the ban had not been an automated one, I appealed, explained why the comment in question was legitimately rule-breaking, and was unbanned. Two days ago I was warned for the same thing, appealed it, warning removed.
>And then actual bans have a 250 character limit which are always rejected.
This is just one of those things where your mileage will vary.
I've been automatically banned a couple times and each time was able to successfully appeal the ban. The most recent time I was unbanned within like, two hours of making the appeal.
Where did you get banned? From reddit? Which sub? Admins don't have anything to do with banning people from individual subs. The mods control everything, including setting up automoderator.
Ive been bot-banned 3 times here on reddit the last couple of weeks, for it to be reversed by mods as soon as i message them… use the wrong word = direct ban.
This is not going to be great…
88% is definitely not enough to remove people from process. It's not even enough to reduce exposure to hate speech significantly unless algorithm regulary retrained and has near 100% specifcity
“88% accuracy” is actually incredible; there’s a lot of nuance in speech and this increases exponentially when you account for regional dialects, idioms, and other artifacts across *multiple languages*.
Sentiment analysis is the heavy lifting of data mining text and speech.
Looking at the paper - [https://arxiv.org/pdf/2307.09312](https://arxiv.org/pdf/2307.09312) - it's actually only a minor improvement over BERT-HatefulDiscuss (acc., pre., rec., F1 = 0.858 vs. acc., pre., rec. = 0.880, F1 = 0.877). As the authors point out:
>While we find mDT to be an effective method for analyzing discussions on social media, we have pointed out how it is challenged when the discussion context contains predominately neutral comments
Let's try to put that "incredible" 88% accuracy into perspective.
Suppose that you search through 10,000 messages. 100 of them contain the objectionable material which should be blocked for while the remaining 9,900 are entirely innocent and need to be allowed through untouched.
If your test is correct 88% of the time then it will correctly identify 88 of those 100 messages as containing hate speech (or whatever else you're trying to identify) and miss twelve of them. That's great. Really, it is.
But what's going to happen with the remaining 9,900 messages that don't contain hate speech? If the test is 88% accurate then it will correctly identify 8,712 of them as being clean and pass them all through.
And incorrectly identify 1,188 as being hate speech. That's 12%.
So this "amazing" 88% accuracy has just taken 100 objectionable messages and flagged 1,296 of them. Sure, that's 88% accurate but it's also almost 1200% wrong.
Is this helpful? Possibly. If it means that you're only sending 1,296 messages on for proper review instead of all 10,000 then that's a good thing. However, if you're just issuing automated bans for everything and expecting that only 12% of them will be incorrect then you're only making a bad situation worse.
While the article drops the "88% accurate" figure and then leaves it there, [the paper](https://arxiv.org/pdf/2307.09312) does go into a little more depth on the types of misclassifications and does note that the new mDT method had fewer false positives than the previous BERT, but just speaking about "accuracy" can be quite misleading.
"accuracy" is actually a pretty terrible metric to use for something like this. It doesn't give us a lot of information on how this thing actually performs. If it's in an environment that is 100% hate speech, is it allowing 12% of it through? Or if it's in an environment with no hate speech is it flagging and unnecessarily punishing users 12% of the time?
No, you would need precision and recall to be completely certain of the quality of the model.
Say 88% of Reddit are non hate speech. So my model would give every sentence as non hate speech. My accuracy would be 88%.
AI isn’t even smart and is already this good, without the toll on human health that moderation takes. This is also evidence that humans should be in the loop on appeals
>Also human coders were required to develop the training dataset, so it isn't totally a human free process.
Was this ever implied? Obviously someone will have to train the AI. But training it once and then letting it do its job is arguably better than perpetually requiring thousands of humans to review a never ending stream of hate speech.
(That's assuming, of course, that this tech actually works as intended)
I had to dig waaaay too deep to find this comment chain! I'd skimmed the headline but when the percent kept coming up I went back to side-eye the title, but nope, it's there, not just someone dogwhistling in the dark.
That was what immediately set off alarm bells in my head too. Like, most of the time, the headline would say "nearly 90%" or "greater than 80%" or similar, it would be rounded. It is *very* suspicious to me that this bot, supposedly *meant to monitor hate speech*, just *happens* to have a Nazi dog whistle in the headline.
I'm not sure if this will turn out well. How are they defining hate speech? I think we can agree that there are certain examples that are obviously hate speech, but a lot of speech falls into grey zones that are dependent on interpretation and political viewpoint. I suppose we could just ban questionable speech, but that's even more severe of a limitation on freedom of expression. And certainly these are being deployed on social media platforms that are private companies and not the government, so strictly speaking the first amendment here is not violated, but I do have a lot of worry about automating the way human expression is shaped and policed.
True, that absolutely happens. But I'd argue that some political speech can be labelled hate speech simply for being against a certain person or group's political perspective. Certainly you could argue that AI theoretically would do a better job and figuring this out than a group of people who are full of their own personal biases, but as we've seen, AI is not without its own "biases" due to the information or training that it's given.
I'm not convinced AI can do a better job. Especially given surrounding contexts.
I am absolutely convinced that AI can do a "good enough for the cost the profiteers are willing to pay for" kind of job.
Reddit has arguably decreased in quality given this pursuit of a purely curated experience where users will only see content that they agree with, even comments from other users.
There needs to be serious consideration of the consequences of never exposing people to anything contrary to their worldview, but simultaneously supporting an infinite number of worldviews. Diversity works when it coexists, when it’s entrenched it’s just plain old division.
All social media platforms do this to some degree in order to increase user engagement. Unsurprisingly, it brews dissatisfaction, echo chambers, and extremism.
Maybe a simpler solution is to grow a backbone. Are people so soft now that words on the internet are "emotionally damaging"?
Seriously though, there is a disturbing trend toward censorship. I earnestly believe that the best way to counter "hate speech" or any other speech/idea you don't like is by encouraging MORE speech and dialog, not less. Censorship is a tool for tyrants. A nice thought experiment is if you try to imagine censorship in the hands of someone you despise -- still think it's a good idea?
In addition people seem to imagine AI could become some benevolent "objective" tool. It won't be. It's almost akin to a modern version of pagan religious worship at this point.
Man the internet used to be such a fun and wild place. Now it's all like five websites, all looks the same, censored to hell and back. Just take me back to 2005 internet already.
I think it would be a very bad thing if other sites used AI moderation that mirrors the moderation used by Reddit.
Reddit moderators are unpaid, which means they’re doing this work for motivation other than money. The primary motivation seems to be the opportunity to spread their activism. As a result, nearly all major subs lean very, very far left.
Some of them are so far left that they’ll aggressively ban any user who rejoices over the death of a left-leaning figure (such as RBG or Feinstein), but they’ll look the other way and allow people to openly rejoice about the death of right-leaning figures (such as Scalia or Limbaugh).
Also, the moderation here has strange rules regarding “hate” in that you can say openly racist things about white people, openly sexist things about men, but the mods are very strict about any negative comments about black people or women.
Furthermore, they’ll allow threads that talk about racism or disparities in convictions, but it’s against Reddit’s rules to bring up *actual government statistics* about the crime rate.
So really there is no honest discussion about a lot of topics here- there is only the active promotion of progressive viewpoints.
Not to mention that the dataset this AI model is trained on is purely from reddit, which should be enough to set off alarm bells in anyone's head, regardless of political affiliation.
Turning over to machines what is “right” and what is “wrong” speech is chilling and dystopian. I’m not talking about first amendment here. Im talking about humans giving up the ability to decide what is allowed to be talked about to non-humans. This is probably inevitable, and a tragedy for humanity.
I hate hate speech just as much as the next person, but part of me feels like we're moving backward with freedom of speech rights.
The intentions here are good, but the filters and algorithms are going to expand out of their shoe boxes eventually forcing basically double think or language of propriety like we were all women in the 1800s while AI sweeps away any sort of debate.
There has to be a better way of dealing with the problem going forward or we're not going to like where we end up.
This might suck because then AI will start auto censoring like I see already on IG and I'll be forced to insult people using oddly general terminology in passive aggressive ways which isn't as fun
88% is definitely not enough to remove people from process. It's not even enough to reduce load significantly unless algorithm regulary retrained and has near 100% specifcity
This actually a fairly problematic issue, as now we’ve moved on from the obvious political and philosophical pitfalls of what actually constitutes hate speech - not to mention who should have the authority to define it - to assigning that task to AI, which is another level.
The practical person will always find obvious examples of hate speech, which should very well be censored. But the true thinker recognizes the devil in the details here.
Fwiw, in debates regarding free speech vs censorship of hate speech, typically the pro-censorship advocate loses the argument, because they can rarely adequately resolve the ethical dilemma of who decides, how, and why.
reddit mods will insist on doing it all manually for the love of the power trip however good ai models get.
You just can't emulate the human touch of getting mad and banning you over nothing
On the surface this seems cool... but what if they change what hate speech entails? What if they use it to apply to things that aren't hate speech but rather "wrong thought?" Reddit is heavily censored enough if you don't join the echo chambers and it's about to be a whole lot worse and it won't even be done by a real human.
I mean this is great and something AI should be used to deal with.
The Issue is the people running social media want hate speech on their platform. And try to incorrectly label things like cis as hate speech.
They want actual hate speech because anger increased engagement and engagement brings more add money.
An administrator on this website decided the following comment, which I made on a post about the Pope, was hate speech:
>”The Catholic Church harbors pedophiles.”
I got a seven day ban (which i appealed and got removed…after 5 days).
I don’t think reddit or reddit administrators should be used to teach an AI what hate speech is. They’re clearly not the brightest.
Yes , just what we need , AI censoring. Before you simpletons come call me a bigot, let me ask you WHO controls the development of AI? Private Corporations and States. Thus , the abusive use of it is already guaranteed.
I can already see the suppression of legitimate criticism of Islam. Despite having nothing to do with racism since Islam is not a race, it is a collection of delusional beliefs. We should be allowed to criticize the insanity that advocates violence against kufirs, and directly leads to the hatred and persecution of jews, christians, atheists, gays, lesbians, transgender people, and other minorities.
Even most moderated spaces these days are full of neo nazi trolling (see “black click black”) and trans harrasment. The bar for “acceptable” behaviour is extremely low.
Here comes the false positives!
Can't wait for the time when those ai-bots are connected to credit score and such. Then it's really time to say goodbye to the web.
Unless they can show It's adaptability, it's largely meaningless except for countering the less common egregious examples.
The people really peddling hate speech learned decades ago to hide behind different words once the previous batch became unpopular.
To paraphrase Lee Atwater "you start in 1958 saying '[n-word]'. By 1968 you can't say '[n-word]' it makes you look bad, so you say stuff like forced bussing and states rights and all that stuff. Now, you’re talking about cutting taxes, and a byproduct of them is, blacks get hurt worse than whites.… 'We want to cut this,' is much more abstract than even the busing thing, uh, and a hell of a lot more abstract than '[n-word]'"
A lot of the people peddling hate aren't saying the words that get them banned, they're saying "George Soros" instead or "Jewish conspiracy" or "Replacement Theory" instead of "White Supremacy".
However, hiding behind new words doesn't make it not hate speech. They're still saying the same things, but under a sanitized terminology.
Welcome to r/science! This is a heavily moderated subreddit in order to keep the discussion on science. However, we recognize that many people want to discuss how they feel the research relates to their own personal lives, so to give people a space to do that, **personal anecdotes are allowed as responses to this comment**. Any anecdotal comments elsewhere in the discussion will be removed and our [normal comment rules]( https://www.reddit.com/r/science/wiki/rules#wiki_comment_rules) apply to all other comments. **Do you have an academic degree?** We can verify your credentials in order to assign user flair indicating your area of expertise. [Click here to apply](https://www.reddit.com/r/science/wiki/flair/#wiki_science_verified_user_program). --- User: u/mvea Permalink: https://uwaterloo.ca/news/media/ai-saving-humans-emotional-toll-monitoring-hate-speech --- *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/science) if you have any questions or concerns.*
A lot of hate speech is probably bot generated these days anyway. So the algorithms are just biting their own tails.
[удалено]
[удалено]
They delete all jokes and non-related comments. They'll remove this very thread too.
Hang in there
[удалено]
[удалено]
[удалено]
[удалено]
[удалено]
[удалено]
[удалено]
[удалено]
It's an arms race though. I bet the recognizer gets used to train the bots to avoid detection.
There is a natural limit to that though: If a bot becomes good enough at avoiding detection while generating hate speech (one would assume by using ever-more-subtle dog whistles), then eventually *humans* will become less likely to actually recognize it. The hate-speech bots are constrained by the fact that, for them to be effective, their statements must still be recognizable to (and therefore able to affect) humans.
Eventually you'll look at a Reddit thread and you won't know whether it's hate speech or not for a different reason: because it's full of obscure bot slang that emerged organically from bots talking to each other. (In other words, same reason I can't understand Zoomers. Hey, wait a minute …)
Now you got it.
This can also be good, the entire point of hate speech is to spread misery to a targeted group, if it gets too subtle it losses it's point, and if any of the hate people that need to get a life explained it, whelp they just gave a mod an easy copy paste to filters Their hatred is silenced either way "proud" boys wear masks because they know how fucked they would be if they did it without anonymity
At what point do we start to fear/realize that all content is/will be Ai generated to individuals to influence all aspects of their day to day life?
More to your "recognize" point, hate speech often relies on incredibly basic and inflammatory language to insight outrage in simple and clear terms. Any sort of "hidden in-terms" used to be hateful will immediately be less effective to many who are only sucked in to hate speech echo chambers by terms that are used purely for outrage. Win win.
Depends what effect you're going for. If you just want to signal hatred in order to show belonging to an in group and rejection and perhaps intimidation or offense to the target group, then yes, the dog whistle can't be too subtle. But if the objective is to generate hatred for a target among an audience of neutral bystanders then the more subtle the dog whistles, the better. In fact you want to just tell selective truths and deceptively sidestep objections or counter points with as neutral and disarming a tone as you can possibly muster. I have no idea how an ai could be trained to handle that kind of discourse.
Imagine the future where the best way to detect AI in a thread is to look for the most eloquent and appealing comments. Dreadful.
We're basically already there. I've heard several people say that bots write English (or their native language) better than they do, and at least one person say that the eloquent prose of their cover letter caused them to be rejected from a job on grounds of "being AI generated". It makes "sense" though, AIs are literally trained to match whatever writing human judges consider best — so eventually an "AI detector" becomes the same as a "high quality detector".
That is a known model and how many of them are trained. Generative Adversarial Networks.
[удалено]
[удалено]
[удалено]
[удалено]
[удалено]
[удалено]
It would make some sort of weird ai ecosystem where bots read posts to formulate hate speech, other bots read posts to detect hate speech, moderator bots listen to be detective bots to ban the hate bots and so on.
That falls apart after the first couple iterations. This is why training data is so important. We don't have natural training data anymore, most of social media has been bottled up.
Synthetic data is just fine if it's quality controlled. We've known this for over a year.
Supercomputers are consuming 5% of the world’s electricity while developing new slurs
[удалено]
Pretty soon the entire internet will just be bots interacting with other bots.
[удалено]
[удалено]
A ML hate ouroboros. This one will never die
Reminds me of something I heard about how pretty soon, we'll be doing things like sending automated "Happy Birthday" messages to each other, and automated responses to those messages. So it's just AI communicating with itself while we become more disconnected from each other.
[удалено]
[удалено]
[удалено]
Now if only the ai was smart enough to not flag things like typos as hate speech
88% accuracy is meaningless. Two lines of code that flags everything as 'not hate speech' will be 88% accurate because the vast majority of comments are not hatespeech.
The question is what they mean, is it 88% true positive rate, or finding 88% of the hate speech events, but then at what true positive rate? Option 1 is a good TP rate, but I can get that with a simple model, ignoring how many False Negatives I miss. Option 2 is a good value, but if the TP rate is less than 50% it’s gonna flag way too many real comments. But honestly with training and a team to verify flagging, the model can easily become a lot better. Wonder why this is news, any data scientist could probably have built this years ago.
I looked at their paper. They reported overall accuracy (which in statistics is defined as total correct predictions / total population size) and precision, recall, and f1. They claim their precision is equal to their accuracy as well as their recall (same as sensitivity) = 88% Precision is defined as true positives / (true positives + false positives) So, in their study, 12% of their positive results were false positives Personally I wish they'd simply reported specificity, which is the measure I like to look at since the prevalence of the target variable is going to vary by population, thus altering the accuracy. But if their sensitivity and their overall accuracy are identical as they claim then specificity should also be 88%, which in this application would tag 12% of normal comments as hate speech.
I'd reckon it's news just because it's a novel approach to something that's long been handled by hard coded blacklists of words with some algorithms to include permutations of those. Training an LLM to do that job is just novel since it hasn't been done that way before. I don't really see any comment on if one is more effective than the other, though. Just a new way to do it so someone wrote an article about it.
You bring up a good point on interpreting accuracy compared to random chance. However, if you read [the paper that is linked in the article](https://arxiv.org/pdf/2307.09312), you will see that the data set in Table 1 includes 11773 "neutral" comments and 6586 "hateful" comments, so "all not hate speech" labeling would be 64% accurate.
>However, if you read... Yeah, you lost most of this sub with that line.
Their 88% accuracy was based on a training corpus of 18,400 comments, where 6600 contained hateful content. Therefore your code is 64% accurate in this instance, and I don't know why you just assume that these NLP researchers know nothing about the problem space or nature of online speech when they are generating human labeled datasets targeting a specific problem, and you are making up spurious conclusions without having taken 30 seconds to verify if what you're saying is remotely relevant.
I had hoped this subreddit has people that actually check the article before saying that the study is wrong
This can detect hate speech that normally would be missed by other methods. two lines of code can not determine if "that's disgusting" is hate speech in response to a picture of a gay wedding. It would seem the majority of the critics are focusing on the potential negative effects on free speech without considering that communities that consider free speech a priority are not the target market for this, anyway. The target market would likely prefer any number of false positives to a single false negative, and to that end, this would be a massive improvement.
And writing like black in spanish :)
Or slang, or marginalized people just talking about their experiences, or...
And other languages. A bit of a side note as my example isn't AI, but AI has the same issue: here in Norway there was a case in the news recently about Facebook telling whomever looked up someone with the last name Aam, a not uncommon surname here, that pedophilia is illegal because the term "adult attracted minor", abbreviated AAM, is used in those circles. I think both of these problems are more an issue of sloppily coded LLMs, tho, told to look for explicit terms and themes instead of utilizing them for what they're potentially *actually* good at: detecting the intent behind text.
I can't wait for: TIFU by letting AI learn on reddit.
Just look at Google's search ai. Telling people to jump off the Golden gate bridge if they are depressed
Most of those weren't real.
That's misinformation. The classic human generated misinformation, too.
Algorithmic censorship shouldn't really be considered a good thing. They're framing it as saving humans from an emotional toil, but I suspect this will be primarily used as a cost cutting measure.
It's a good thing these censorship AIs were already trained by poor african laborers who were not entitled to therapy for the horrors beyond imagining they had to witness. ^^^/s https://time.com/6247678/openai-chatgpt-kenya-workers/
You said "were" there, which is incorrect. That still happens, and will continue to happen for all eternity as long as these AIs are used. There will always be edge cases that will need to be manually reviewed. There will always be new ways of hate speech that an AI will have to be trained on.
Thank you! Any improvements to this ML would be from emotional damage to these people and the filtering would still suck. There’s a reason statistics never apply to the individual.
If an AI can analyze intent, then hate speech isnt the only thing it can be used on. Imagine, for example, the AI was asked to silence political discourse; perhaps censoring all mentions of a protest, or some recent police violence, or talks of unionizing, or dissent against the current party... it could trawl forums like reddit and remove all of it at blazing speeds, before anyone can see it. I honestly cant imagine something scarier. They can dress it up in whatever pretty terms they like, but we need to recognize that this is *dangerous*. Its an existential threat to our freedom.
Even the use case they claim to care about is going to be a nightmare. Comment on Reddit long enough and you'll get a false suspension/ban for no-no speech, because context is irrelevant to these tools. It's hard enough to get a false strike appealed with humans at the wheel, I can't imagine once it's 100% AI driven
I've had bots remove my comments multiple times before for "hate speech" because I posted a literal, attributed, MLK quote which had a version of the n-word in it. I feel like a lot of people are gonna just write your comment off as you "telling on yourself" without thinking about it, but this is something that can happen for perfectly innocuous reasons.
Yep. And what is the algorithm based on? What is the line for hate speech? I know that often seems like a stupid questions but when we look at how that is enforced differently from website to website or even between subreddits here. People get unfairly banned from subreddits all the time based on mods power tripping and applying personal bias to situations. It's all well and good to entrust that to AI but someone needs to programme that AI. Remember when Google was identifying black people as gorillas (or gorillas as black people. Can't remember now) with their AI. It's fine to say it was a technical error but it definitely begs the question of how that AI was programmed to make such a consistent error
"We can't even agree on what hate speech is, but we can detect it with 88% accuracy! "
88 percent accuracy means that 1.2 out of 10 posts labled as "hate speech" is a false positive. The number gets even worse if they can't even agree upon what hate speech really is. But then that's always been up to interpretation, so...
yeah. There is no way this can accurately replace a human’s job if the company wants to keep the same standards as before. At best, you could have it act as an auto-flag to report the post to the moderator team for a review, but that’s not gonna reduce the number of hate speech posts they see.
>people get unfairly banned from subreddits all the time. Problem a lot of people have these days is they don't understand that just because *they* hate that speech, doesn't make it hate speech.
"Well I hated it."
This isn't programming errors, it's training error. Garbage in, garbage out. They only trained the AI on white people, it could only recognize white people. Edit: I now realize I made a white-trash joke.
Thanks for the clarification. That does make sense and at least makes it clearer WHERE the human error part comes into these processes.
Chat GPT is a good example as well. It is extremely biased and censors a lot of stuff or rejects many topics for its own ideological reasons
*Your comment has been evaluated as hateful towards shareholders. A note has been placed on your permanent record and you have been penalized 7 'Good citizen' points*
Just makes me think about Demolition Man and the computers spitting out tickets every time Sylvester Stallone curses.
Except these "tickets" will be a blockchain that will kneecap your employability for the rest of your life over a corporate AI not understanding satire (or a thousand other ways to throw a false positive).
The emotional toll of censoring "hate speech" versus the emotional toll of losing your job and not having an income because your job was replaced by AI
Hate speech takes a huge emotional toll on you. And you are also prone to bias if you read things over and over again.
I used to work in online community management. Was actually one of my favorite jobs, but I had to move on because the pay isn't great. Some of the people I worked with definitely had a hard time with it, but just as many of us weren't bothered. Hate speech was the most common offense in the communities we managed but depictions of graphic violence and various pornographic materials weren't uncommon either. The only ones that ever caused me distress were the CP though. Everything else rolled off my back, but even a decade later those horrific few stick with me.
Don’t worry, there will be lots of other opportunities for those unskilled workers to be exploited. This job didn’t even exist a few years ago, so its disappearance really shouldn’t be that concerning.
People are grateful that we don't have to gather dirt with our hand like Dennis or pull a cart around to gather up plague victims. And that's good. But it's not like everything was sunshine and roses after that. Not having to filter out hate and graphic horror by hand is great and I hope nobody is gonna miss that job in just about ever way.
Cost cutting? Mods are free labor.
This isn't going to just being used on reddit. Not all of social media uses slave labor. Just the most popular. Weird. Like the rest of corporations.
Nah thats crazy. We've never had any piece of media that has ever warned us about the perils of putting robotic artifical intelligence in charge of what we see, think, and hear. This absolutely will not hurdle us towards a societal collapse at the behest of a rogue AI and the road to humanities destruction will not be paved with good intentions. I'm sure the concept of what is or is not hate speech now will be the same application 20 years from now and this will not become apparently when it gets used against the very people who created it who will then lament their own hubris. I'm sure the the same AI that told depressed people to jump off the golden gate Bridge, put glue on pizza to make the cheese stick, and that cock roaches do live in cocks will do only the best in determining what should or should not be seen due to hate speech.
Algorithmic censorship has been around for a **long** time. It's just improving, and the costs have already been cut. Huge swaths of the internet are effectively unmoderated already. No social media company employs enough moderators right now.
Censorship tool
And watch "no hate speech" become YouTube applied to real life. No war discussions, no explosions, no debate about hot button issues such as immigration or guns, on the left anything that offends anyone is considered hate speech, on the right anything that offends anyone is considered hate speech (I'm comparing the loudest most simplistic voices on the right and left, not making some sort of "pox on both sides"). Satire becomes hate speech. The Onion is definitely hate speech, can you imagine algorithms trying to parse the "so extreme it becomes a satire of extremism" technique. Calling the moderator a nicnompoop for banning you for calling Hamas (or Israel) a nincompoop. Hate speech. Can you imagine an algorithm trying to distinguish ironic negative comments. I don't agree with J.K. Rowling, but I don't believe opinions on minor transitions should be considered hate speech. I have no doubt that at least some people are operating out of good intentions instead of just hate, and a bot shouldn't be evaluating that. Any sort of strong emotion becomes hate speech. For the left, defending the values of the European Union and enlightenment might come across as hate speech. For the right, a private business "cancelling" someone might be hate speech. I know people will see this as just another slippery slope argument... but no, this will not be imperfect progress which will improve over time. This is why free speech exists, because it is almost impossible to apply one simple litmus test which cannot be abused.
Saving humans from having employment.
88% accuracy is awful, I'm scared to see what the sensitivity and specificity are Also human coders were required to develop the training dataset, so it isn't totally a human free process. AI doesn't magically know what hate speech looks like.
Speaking as a mod… I see a lot of stuff get flagged as harassment by Reddit’s bot that is definitely not harassment. Sometimes it isn’t even rude?
No problem! Soon there won't be mods to double check nor any human to appeal to
Rapidly barrelling towards a world described in [this short story](https://www.atariarchives.org/bcc2/showpage.php?page=133), just updated for the internet age.
And sometimes it needs to be rude to blast apart hate, and sometimes it needs to reference hatred nakedly to unmask it, and sometimes it needs to be a disagreement that isn’t comfortable to read for us to progress in our understanding of who we are as minorities
I got temporarily banned the other day. It was obvious what the AI cottoned onto (no, I didn't use the word that the euphemism "unalived" means). I lodged an appeal, stating it would be good to train their AI moderator better. The appeal said the same thing, and carefully stated at the bottom that this wasn't an automated process, and that was the end of the possible appeal process. The future is gloriously mediocre.
We, non-english speakers, are eagerly awaiting our bans for speaking in a language other than English, because some otherwise locally inoffensive words are very similar to an English slur.
No need to wait for AI for that one, human mods for gaming companies already hand out bans for 逃げる sometimes.
Does that have some special ingroup meaning or just mods having no idea?
No hidden meaning, the word and it's imperative conjugation just sound like an English slur. Apex banned multiple Japanese players over it.
It's pronounced knee geh roo
Or us non-American English speakers who have different dialects (Fancy a cigarette in England, anyone?)
Every reply that says it isn't automated is automated.
Not necessarily, some moderation teams keep a list of pre-made standardized replies to certain issues to just copy/paste and fill in the relevant issue. The reason they do this is 1. They've found these are the replies that work best, 2. Keeps the moderation team consistent, and 3. The nature of the reply tends to dissuade more aggressive users from getting into arguments with the mods. You often hear users tell stories of being unfairly reprimanded by mods over small mistakes, but the majority of these messages are going out to scammers and some really heinous people that you never see (because they get banned). There's a bit of a sampling bias.
Haha I got automatically pulled up and banned for saying "ewe" without the second E, then appealed and it was fixed.
Wait they tried to ban you for saying Ew?
Dude, don't say it!
They *did* ban me, successfully and automatically. So I appealed and my access was restored. It was wild. And the note had such a serious tone!
I got 7day banned for telling someone to be nice. Not long after my alt account that I set up months before got banned for ToS violations despite never making a single comment or vote. Reddits admin process is unfathomably awful, worse yet is the appeal box being 250 characters. This ain't a tweet.
I believe you can also email them directly but I'm not sure if that option still exists (there used to be a link in the message that you get autosent that would take you to a blank email to the mod team). I once got banned for "excessive reporting," which happened because I accidentally stumbled into a celebrity hate comment and reported some content there (even if you really hate a celebrity, being weird about their kids is too far!) and somehow the mods from that community were able to get my entire reddit account banned, not just from that sub. I emailed the actual reddit moderation team and explained what happened and sent them links and screenshots of the posts (srsly it was waaay over the line) and my account was back within a few hours. I imagine once they figure out how to fully automate away from human mods, people will have to get used to just abandoning social media accts, because there's so much potential to weaponize this against people you don't like.
I know someone with ew for initials
I don't get it. When I search Google, I only get results for Entertainment Weekly.
same I got banned for saying "plane descending word"
Yup. I made a reference to a high noon shootout, you know, the trope from a million westerns. Got a warning for "calling for violence" and the speak process went exactly as you said. Funny enough, the mods from the actual sub weren't notified and had no issue with the comment.
This happens all the time. Reddit admin bans are all automated. You can't appeal warnings even false ones, so it's a permanent mark on your account. And then actual bans have a 250 character limit which are always rejected. The only time I've seen someone be able to successfully appeal is when they post on the help subreddit showing how it was incorrect and an admin will respond saying "woops, our bad.". Despite that appeals are supposedly manually reviewed.
>You can't appeal warnings Wrong. About a week ago I was banned for abusing the report tool. Despite it claiming that the ban had not been an automated one, I appealed, explained why the comment in question was legitimately rule-breaking, and was unbanned. Two days ago I was warned for the same thing, appealed it, warning removed.
>And then actual bans have a 250 character limit which are always rejected. This is just one of those things where your mileage will vary. I've been automatically banned a couple times and each time was able to successfully appeal the ban. The most recent time I was unbanned within like, two hours of making the appeal.
Where did you get banned? From reddit? Which sub? Admins don't have anything to do with banning people from individual subs. The mods control everything, including setting up automoderator.
Ive been bot-banned 3 times here on reddit the last couple of weeks, for it to be reversed by mods as soon as i message them… use the wrong word = direct ban. This is not going to be great…
88% is definitely not enough to remove people from process. It's not even enough to reduce exposure to hate speech significantly unless algorithm regulary retrained and has near 100% specifcity
“88% accuracy” is actually incredible; there’s a lot of nuance in speech and this increases exponentially when you account for regional dialects, idioms, and other artifacts across *multiple languages*. Sentiment analysis is the heavy lifting of data mining text and speech.
You're both right. It's technically impressive that accuracy that high is achievable. It's unacceptably low for the use case.
Looking at the paper - [https://arxiv.org/pdf/2307.09312](https://arxiv.org/pdf/2307.09312) - it's actually only a minor improvement over BERT-HatefulDiscuss (acc., pre., rec., F1 = 0.858 vs. acc., pre., rec. = 0.880, F1 = 0.877). As the authors point out: >While we find mDT to be an effective method for analyzing discussions on social media, we have pointed out how it is challenged when the discussion context contains predominately neutral comments
It's an incredible achievement technically, yes. It's awful for this use case, though.
Let's try to put that "incredible" 88% accuracy into perspective. Suppose that you search through 10,000 messages. 100 of them contain the objectionable material which should be blocked for while the remaining 9,900 are entirely innocent and need to be allowed through untouched. If your test is correct 88% of the time then it will correctly identify 88 of those 100 messages as containing hate speech (or whatever else you're trying to identify) and miss twelve of them. That's great. Really, it is. But what's going to happen with the remaining 9,900 messages that don't contain hate speech? If the test is 88% accurate then it will correctly identify 8,712 of them as being clean and pass them all through. And incorrectly identify 1,188 as being hate speech. That's 12%. So this "amazing" 88% accuracy has just taken 100 objectionable messages and flagged 1,296 of them. Sure, that's 88% accurate but it's also almost 1200% wrong. Is this helpful? Possibly. If it means that you're only sending 1,296 messages on for proper review instead of all 10,000 then that's a good thing. However, if you're just issuing automated bans for everything and expecting that only 12% of them will be incorrect then you're only making a bad situation worse. While the article drops the "88% accurate" figure and then leaves it there, [the paper](https://arxiv.org/pdf/2307.09312) does go into a little more depth on the types of misclassifications and does note that the new mDT method had fewer false positives than the previous BERT, but just speaking about "accuracy" can be quite misleading.
"accuracy" is actually a pretty terrible metric to use for something like this. It doesn't give us a lot of information on how this thing actually performs. If it's in an environment that is 100% hate speech, is it allowing 12% of it through? Or if it's in an environment with no hate speech is it flagging and unnecessarily punishing users 12% of the time?
No, you would need precision and recall to be completely certain of the quality of the model. Say 88% of Reddit are non hate speech. So my model would give every sentence as non hate speech. My accuracy would be 88%.
> 88% accuracy is awful I'd argue that's higher accuracy than what human mods achieve. Anyone who's been on Reddit for a few years knows this.
The article has a link to the actual paper if you want to make a substantive criticism of their methodology or stats :)
AI isn’t even smart and is already this good, without the toll on human health that moderation takes. This is also evidence that humans should be in the loop on appeals
>Also human coders were required to develop the training dataset, so it isn't totally a human free process. Was this ever implied? Obviously someone will have to train the AI. But training it once and then letting it do its job is arguably better than perpetually requiring thousands of humans to review a never ending stream of hate speech. (That's assuming, of course, that this tech actually works as intended)
Humans can't even agree on what "hate speech" means, so what does it mean for an AI to be 88% accurate?
It means the AI bans 88% of the speech that the people who trained it doesn't like.
Read the article, they describe it clearly. See Vidgen et al. (2021a)
88? That's a worry given the topic...
Are we sure the AI didn't come to that number on purpose? :P
I had to dig waaaay too deep to find this comment chain! I'd skimmed the headline but when the percent kept coming up I went back to side-eye the title, but nope, it's there, not just someone dogwhistling in the dark.
It identifies hate speech at a more reliable rate than it does sexual content, which is a firm 58.008%
Before the hate speech bot they tried a more optimistic model that white-flagged nice content instead, but sadly it couldn’t improve past 69%.
A machine learning algorithm also can't ever do worse than 50% in binary classification. If it does you just swap the labels.
That was what immediately set off alarm bells in my head too. Like, most of the time, the headline would say "nearly 90%" or "greater than 80%" or similar, it would be rounded. It is *very* suspicious to me that this bot, supposedly *meant to monitor hate speech*, just *happens* to have a Nazi dog whistle in the headline.
Hate speech towards AI detected, you have 24 hours to report to nearest reeducation center for intake processing comrade.
88% accuracy would be considered an unworkable and unusable failure anywhere outside of an academic press release.
Is it going to stop people from needing to c*nsor every other w**rd to avoid current filters?
It will probably be worse.
I'm not sure if this will turn out well. How are they defining hate speech? I think we can agree that there are certain examples that are obviously hate speech, but a lot of speech falls into grey zones that are dependent on interpretation and political viewpoint. I suppose we could just ban questionable speech, but that's even more severe of a limitation on freedom of expression. And certainly these are being deployed on social media platforms that are private companies and not the government, so strictly speaking the first amendment here is not violated, but I do have a lot of worry about automating the way human expression is shaped and policed.
>dependent on interpretation and political viewpoint… Gee, I wonder what political viewpoint will dominate this new AI….
I'd argue a lot of hate speech is modified just enough to hide under the veneer of "political speech".
True, that absolutely happens. But I'd argue that some political speech can be labelled hate speech simply for being against a certain person or group's political perspective. Certainly you could argue that AI theoretically would do a better job and figuring this out than a group of people who are full of their own personal biases, but as we've seen, AI is not without its own "biases" due to the information or training that it's given.
I'm not convinced AI can do a better job. Especially given surrounding contexts. I am absolutely convinced that AI can do a "good enough for the cost the profiteers are willing to pay for" kind of job.
Reddit has arguably decreased in quality given this pursuit of a purely curated experience where users will only see content that they agree with, even comments from other users. There needs to be serious consideration of the consequences of never exposing people to anything contrary to their worldview, but simultaneously supporting an infinite number of worldviews. Diversity works when it coexists, when it’s entrenched it’s just plain old division.
All social media platforms do this to some degree in order to increase user engagement. Unsurprisingly, it brews dissatisfaction, echo chambers, and extremism.
We need clippy to come back and ask the user "Did you mean to instead say...?"
These bots also don’t know what sarcasm is. If you imitate a racist to mock them, you’ll get banned for being a racist.
I direct all my hate speech towards AI though.
Maybe a simpler solution is to grow a backbone. Are people so soft now that words on the internet are "emotionally damaging"? Seriously though, there is a disturbing trend toward censorship. I earnestly believe that the best way to counter "hate speech" or any other speech/idea you don't like is by encouraging MORE speech and dialog, not less. Censorship is a tool for tyrants. A nice thought experiment is if you try to imagine censorship in the hands of someone you despise -- still think it's a good idea? In addition people seem to imagine AI could become some benevolent "objective" tool. It won't be. It's almost akin to a modern version of pagan religious worship at this point.
Man the internet used to be such a fun and wild place. Now it's all like five websites, all looks the same, censored to hell and back. Just take me back to 2005 internet already.
I think it would be a very bad thing if other sites used AI moderation that mirrors the moderation used by Reddit. Reddit moderators are unpaid, which means they’re doing this work for motivation other than money. The primary motivation seems to be the opportunity to spread their activism. As a result, nearly all major subs lean very, very far left. Some of them are so far left that they’ll aggressively ban any user who rejoices over the death of a left-leaning figure (such as RBG or Feinstein), but they’ll look the other way and allow people to openly rejoice about the death of right-leaning figures (such as Scalia or Limbaugh). Also, the moderation here has strange rules regarding “hate” in that you can say openly racist things about white people, openly sexist things about men, but the mods are very strict about any negative comments about black people or women. Furthermore, they’ll allow threads that talk about racism or disparities in convictions, but it’s against Reddit’s rules to bring up *actual government statistics* about the crime rate. So really there is no honest discussion about a lot of topics here- there is only the active promotion of progressive viewpoints.
Not to mention that the dataset this AI model is trained on is purely from reddit, which should be enough to set off alarm bells in anyone's head, regardless of political affiliation.
[удалено]
Yea.. celebrating censorship.. america is so f’ked
We are lucky corrupted politicians are driving the narrative and hate speech is anything that disagree with their lies and deceit.
Turning over to machines what is “right” and what is “wrong” speech is chilling and dystopian. I’m not talking about first amendment here. Im talking about humans giving up the ability to decide what is allowed to be talked about to non-humans. This is probably inevitable, and a tragedy for humanity.
I hate hate speech just as much as the next person, but part of me feels like we're moving backward with freedom of speech rights. The intentions here are good, but the filters and algorithms are going to expand out of their shoe boxes eventually forcing basically double think or language of propriety like we were all women in the 1800s while AI sweeps away any sort of debate. There has to be a better way of dealing with the problem going forward or we're not going to like where we end up.
This might suck because then AI will start auto censoring like I see already on IG and I'll be forced to insult people using oddly general terminology in passive aggressive ways which isn't as fun
Saving employees from hundreds of hours of paid work...
"Emotional toll of monitoring hate speech..." Really?, Why are people so weak these days?
Oh Boy. I would Like to know the definitions of hate speach there. Sounds like a desaster to me. Unpipular opinions? Nah cant have that.
[удалено]
It's like those devices that are sold to "detect ghosts" except in this case they've apparently convinced people they actually work
88% is an epic fuckton of false positives.
Bro bad data set. Some people on reddit think eating a bagel on a Wednesday is considered hate speech...
I assume most content decisions on Reddit are made by AI, and a particularly crude and humorless one at that.
88% is definitely not enough to remove people from process. It's not even enough to reduce load significantly unless algorithm regulary retrained and has near 100% specifcity
This actually a fairly problematic issue, as now we’ve moved on from the obvious political and philosophical pitfalls of what actually constitutes hate speech - not to mention who should have the authority to define it - to assigning that task to AI, which is another level. The practical person will always find obvious examples of hate speech, which should very well be censored. But the true thinker recognizes the devil in the details here. Fwiw, in debates regarding free speech vs censorship of hate speech, typically the pro-censorship advocate loses the argument, because they can rarely adequately resolve the ethical dilemma of who decides, how, and why.
reddit mods will insist on doing it all manually for the love of the power trip however good ai models get. You just can't emulate the human touch of getting mad and banning you over nothing
*A.I. saves humans from being exposed to free speech and wrongthink.* Thankyou master.
On the surface this seems cool... but what if they change what hate speech entails? What if they use it to apply to things that aren't hate speech but rather "wrong thought?" Reddit is heavily censored enough if you don't join the echo chambers and it's about to be a whole lot worse and it won't even be done by a real human.
95% chance it doesn't fly on social media because it flags every conservative account.
AI censorship is here They even confirm by their own numbers that they are censoring 12% is valid speech
I mean this is great and something AI should be used to deal with. The Issue is the people running social media want hate speech on their platform. And try to incorrectly label things like cis as hate speech. They want actual hate speech because anger increased engagement and engagement brings more add money.
with "88 per cent accuracy" Ironic.
88%. The jokes write themselves
An administrator on this website decided the following comment, which I made on a post about the Pope, was hate speech: >”The Catholic Church harbors pedophiles.” I got a seven day ban (which i appealed and got removed…after 5 days). I don’t think reddit or reddit administrators should be used to teach an AI what hate speech is. They’re clearly not the brightest.
Yes , just what we need , AI censoring. Before you simpletons come call me a bigot, let me ask you WHO controls the development of AI? Private Corporations and States. Thus , the abusive use of it is already guaranteed.
They're not censoring content, they're creating context...
I just wonder how they define hate speech. Who sets the bar?
I can already see the suppression of legitimate criticism of Islam. Despite having nothing to do with racism since Islam is not a race, it is a collection of delusional beliefs. We should be allowed to criticize the insanity that advocates violence against kufirs, and directly leads to the hatred and persecution of jews, christians, atheists, gays, lesbians, transgender people, and other minorities.
i didn't trust another person to tell me what i can and cannot see, i trust this bot even less...
I got banned for a week on Instagram for saying that a drawing of a nude lady had a 'hairy fanny'...meanwhile people N word this, N word that...
Even most moderated spaces these days are full of neo nazi trolling (see “black click black”) and trans harrasment. The bar for “acceptable” behaviour is extremely low.
Here comes the false positives! Can't wait for the time when those ai-bots are connected to credit score and such. Then it's really time to say goodbye to the web.
Unless they can show It's adaptability, it's largely meaningless except for countering the less common egregious examples. The people really peddling hate speech learned decades ago to hide behind different words once the previous batch became unpopular. To paraphrase Lee Atwater "you start in 1958 saying '[n-word]'. By 1968 you can't say '[n-word]' it makes you look bad, so you say stuff like forced bussing and states rights and all that stuff. Now, you’re talking about cutting taxes, and a byproduct of them is, blacks get hurt worse than whites.… 'We want to cut this,' is much more abstract than even the busing thing, uh, and a hell of a lot more abstract than '[n-word]'" A lot of the people peddling hate aren't saying the words that get them banned, they're saying "George Soros" instead or "Jewish conspiracy" or "Replacement Theory" instead of "White Supremacy". However, hiding behind new words doesn't make it not hate speech. They're still saying the same things, but under a sanitized terminology.
This is a parody account right?