e dot dot dot - a mostly about the Internet blog by James Raposa 31 10 2020

a mostly about the Internet blog by

Content Moderation Case Studies: Using AI To Detect Problematic Edits On Wikipedia (2015)

Furnished content.

Summary: Wikipedia is well known as an online encyclopedia that anyone can edit. This has enabled a massive corpus of knowledge to be created, that has achieved high marks for accuracy, while also recognizing that at any one moment some content may not be accurate, as anyone may have entered in recent changes. Indeed, one of the key struggles that Wikipedia has dealt with over the years is with so-called vandals who change a page not to improve the quality of an entry, but to deliberately decrease the quality.In late 2015, the Wikimedia Foundation, which runs Wikipedia, announced an artificial intelligence tool, called ORES (Objective Revision Evaluation Service) which they hoped might be useful to effectively pre-score edits for the various volunteer editors so they could catch vandalism quicker.

ORES brings automated edit and article quality classification to everyone via a set of open Application Programming Interfaces (APIs). The system works by training models against edit- and article-quality assessments made by Wikipedians and generating automated scores for every single edit and article.What's the predicted probability that a specific edit be damaging? You can now get a quick answer to this question. ORES allows you to specify a project (e.g. English Wikipedia), a model (e.g. the damage detection model), and one or more revisions. The API returns an easily consumable response in JSON format:

The system was not designed, necessarily, to be user-facing, but rather as a system that others could build tools on top of to help with the editing process. Thus it was designed to feed some of its output into other existing and future tools.Part of the goal of the system, according to the person who created it, Aaron Halfaker, was to hopefully make it easier to teach new editors how to be productive editors on Wikipedia. There was a concern that more and more of the site was controlled by an increasingly small number of volunteers, and new entrants were scared off, sometimes by the various arcane rules. Thus, rather than seeing ORES as a tool for automating content moderation, or as a tool for quality control over edits, Halfaker saw it more as a tool to help experienced editors better guide new, well-meaning, but perhaps inexperienced editors in ways to improve.

The motivation for Mr. Halfaker and the Wikimedia Foundation wasn't to smack contributors on the wrist for getting things wrong. I think we who engineer tools for social communities, have a responsibility to the communities we are working with to empower them, Mr. Halfaker said. After all, Wikipedia already has three AI systems working well on the site's quality control, Huggle, STiki and ClueBot NG.I don't want to build the next quality control tool. What I'd rather do is give people the signal and let them work with it, Mr. Halfaker said.The artificial intelligence essentially works on two axes. It gives edits two scores: first, the likelihood that it's a damaging edit, and, second, the odds that it was an edit made in good faith or not. If contributors make bad edits in good faith, the hope is that someone more experienced in the community will reach out to them to help them understand the mistake.If you have a sequence of bad scores, then you're probably a vandal, Mr. Halfaker said. If you have a sequence of good scores with a couple of bad ones, you're probably a good faith contributor.

Decisions to be made by Wikipedia:

How useful is artificial intelligence in helping to determine the quality of edits?
How best to implement a tool like ORES?
- Should it automatically revert likely bad edits?
- Should it be used for quality control?
- Should it be a tool to just highlight edits for volunteers to review?
What is likely to encourage more editors to help keep Wikipedia as up to date and clean of vandalism?
What data do you train ORES on? How do you validate the accuracy of the training data?

Questions and policy implications to consider:

Are there issues when, because the AI has scored something, the tendency is to assume the AI must be correct? How do you make sure the AI is accurate?
Does AI help bring on new editors or does it scare away new editors?
Are there ways to prevent inherent bias from being baked into any AI moderation system, especially one trained by existing moderators?

Resolution: Halfaker, who later left Wikimedia to go to Microsoft Research, has published a few papers about ORES since it launched. In 2017, a paper by Halfaker and a few others noted that the tool was increasingly used over the previous three years.

The ORES service has been online since July 2015. Since then, usage has steadily risen as we've developed and deployed new models and additional integrations are made by tool developers and researchers. Currently, ORES supports 78 different models and 37 different language-specific wikis.Generally, we see 50 to 125 requests per minute from external tools that are using ORES' predictions (excluding the MediaWiki extension that is more difficult to track). Sometimes these external requests will burst up to 400-500 requests per second

One thing they noticed was that those using the ORES output often wanted search through the metrics and set their own thresholds rather than accepting the hard coded ones in ORES:

Originally, when we developed ORES, we defined these threshold optimizations in ourdeployment configuration. But eventually, it became apparent that our users wanted tobe able to search through fitness metrics to choose thresholds that matched their ownoperational concerns. Adding new optimizations and redeploying quickly became a burden on us and a delay for our users. In response, we developed a syntax for requesting an optimization from ORES in realtime using fitness statistics from the models tests

The project also appeared to be successful in getting built into various editing tools, and possibly inspiring ideas for new editing quality tools:

Many tools for counter-vandalism in Wikipedia were already available when we developed ORES. Some of them made use of machine prediction (e.g. Huggle27, STiki, ClueBot NG), but most did not. Soon after we deployed ORES, many developers that had not previously included their own prediction models in their tools were quick to adopt ORES. For example, RealTime Recent Changes includes ORES predictions along-side their realtime interface and FastButtons, a Portuguese Wikipedia gadget, began displaying ORES predictions next to their buttons for quick reviewing and reverting damaging edits.Other tools that were not targeted at counter-vandalism also found ORES predictionsspecifically that of article quality (wp10)useful. For example, RATER,30 a gadget forsupporting the assessment of article quality began to include ORES predictions to help their users assess the quality of articles and SuggestBot,31[5] a robot for suggesting articles to an editor, began including ORES predictions in their tables of recommendations.Many new tools have been developed since ORES was released that may not have been developed at all otherwise. For example, the Wikimedia Foundation product department developed a complete redesign on MediaWiki's Special:RecentChanges interface that implements a set of powerful filters and highlighting. They took the ORES Review Tool to it's logical conclusion with an initiative that they referred to as Edit Review Filters. In this interface, ORES scores are prominently featured at the top of the list of available features, and they have been highlighted as one of the main benefits of the new interface to the editing community.

In a later paper, Halfaker explored, among other things, concerns about how AI systems like ORES might reinforce inherent bias.

A 2016 ProPublica investigation [4] raised serious allegations of racial biases in a ML-based tool sold to criminal courts across the US. The COMPAS system by Northpointe, Inc. produced risk scores for defendants charged with a crime, to be used to assist judges in determining if defendants should be released on bail or held in jail until their trial. This expos began a wave of academic research, legal challenges, journalism, and organizing about a range of similar commercial software tools that have saturated the criminal justice system. Academic debates followed over what it meantfor such a system to be fair or biased. As Mulligan et al. discuss, debates over these essentially contested concepts often focused on competing mathematically-definedcriteria, like equality of false positives between groups, etc.When we examine COMPAS, we must admit that we feel an uneasy comparison between how it operates and how ORES is used for content moderation in Wikipedia. Of course, decisions about what is kept or removed from Wikipedia are of a different kind of social consequence than decisions about who is jailed by the state. However, just as ORES gives Wikipedia's human patrollers a score intended to influence their gatekeeping decisions, so does COMPAS give judges a similarly functioning score. Both are trained on data that assumes a knowable ground truth for the question to be answered by the classifier. Often this data is taken from prior decisions, heavily relyingon found traces produced by a multitude of different individuals, who brought quite different assumptions and frameworks to bear when originally making those decisions

Read more here

posted at: 12:00am on 31-Oct-2020
path: /Policy | permalink | edit (requires password)

0 comments, click here to add the first

Billy Mitchell's Defamation Case Against Twin Galaxies Over 'Donkey Kong' High Score Can Go Forward

Furnished content.

We've discussed Billy Mitchell a couple of times here at Techdirt, both times due to his overtly litigious nature, rather than his apparent video game playing prowess. See, Mitchell is rather well known primarily as the record holder for video game scores, including briefly holding the Guinness World Record for a Donkey Kong high score, until he was stripped of it. See, Twin Galaxies, an official tracker of such video game records, determined based on video evidence that Mitchell wasn't playing an official version of the arcade cabinet of the game. Upon being stripped of his records, Mitchell sued for... defamation. Oh, and he also sued the Cartoon Network over a very clear parody depiction in part inspired by his gregarious personage.But back to the defamation suit, which Mitchell filed against Twin Galaxies. He recently got the court allow him to proceed to trial after Twin Galaxies brought an anti-SLAPP suit to the court.

In his ruling on Twin Galaxies' anti-SLAPP motion, Judge Gregory Alarcon ruled that Mitchell is a public figure in the gaming community and that Twin Galaxies was discussing a controversy of public interest to that community. That means Mitchell will have to prove at trial both the falsity of Twin Galaxies' claims and that the organization acted with "actual malice" in making them.While the ruling is careful not to "weigh evidence or resolve conflicting factual claims" on that score until the full trial, Judge Alarcon does tip his hand a little as to what evidence he finds potentially compelling. In particular, the judge seems interested in why Twin Galaxies refused to interview a number of witnesses Mitchell put forward to testify to the authenticity of his score performances.

Apparently at issue is that Mitchell wanted a witness, specifically the referee of his Donkey Kong high score, to be interviewed and considered by Twin Galaxies prior to their having negated Mitchell's high scores. Here's the problem: that referee has also been banned by Twin Galaxies for cheating in trying to get high scores in an unrelated video game. Still, the court notes that, at this stage, the deference is given to the plaintiff and, in that specific light, Twin Galaxies' refusal to consider witness testimony is enough to let this proceed.Mitchell, true to his self-promoting history, is taking a victory lap on all this, as though he'd won a trial. He hasn't. Instead, he's opened himself up to discovery.

Even as Mitchell has met his burden of "minimal merit" in the anti-SLAPP motion, Judge Alarcon also writes that Twin Galaxies has "satisfied the low burden to show a reasonable possibility of prevailing in this action" in a separate motion.The scoreboard has presented evidence to "[support] that its statement does not show actual malice," the judge writes, and which "supports that Twin Galaxies did not harbor doubt as to the truth of its statement, as its statement was made after Twin Galaxies' lengthy investigation on the dispute. The testimony of [Twin Galaxies owner and CEO Jace] Hall's belief that eyewitness evidence was unnecessary may reasonably go in [Twin Galaxies'] favor on this point, undermining [Mitchell]'s claim that [Twin Galaxies] acted with reckless disregard of the truth."

It's important to be able to keep two things in your head at once: that Mitchell may well have validly broken the Donkey Kong record as he claims and that Twin Galaxies did not do anything close to reaching the actual malice standard required for a defamation case of a public figure. And, despite Mitchell's public statements to the contrary, the lawyer for Twin Galaxies doesn't seem particularly worried.

Mitchell was helped in the anti-SLAPP motion, Tashroudian says, by the fact that "at this early stage the court is bound to accept whatever Mitchell puts forward as true." That includes a lot of what Tashroudian calls hearsay evidence involving phone calls where Hall allegedly told Mitchell and Twin Galaxies founder Walter Day that he "didn't care about certain evidence.""The court is not allowed to determine the credibility of these statements [at this point] and must accept them as true," Tashroudian tells Ars. "[At trial] we'll be able to show that Mitchell is not credible because we have numerous situations of documented falsehoods in his papers. I'm confident that... after all of the evidence has been adduced, and when Billy is deposed and not allowed to hide behind declarations, the truth will come out."

I have no tea leaves to read, but those are the words of an attorney quite confident in the video evidence his client has to back up its statements. All I'll say is that Mitchell had damned well better be right that he broke those Donkey Kong records on a legit and standard Donkey Kong machine if he's really goin to proceed to discovery.

Read more here

posted at: 12:00am on 31-Oct-2020
path: /Policy | permalink | edit (requires password)

0 comments, click here to add the first

RSS (site) RSS (path)

ATOM (site) ATOM (path)

Categories

- blog home

- Announcements  (0)
- Annoyances  (0)
- Career_Advice  (0)
- Domains  (0)
- Downloads  (3)
- Ecommerce  (0)
- Fitness  (0)
- Home_and_Garden  (0)
     - Cooking  (0)
     - Tools  (0)
- Humor  (0)
- Notices  (0)
- Observations  (1)
- Oddities  (2)
- Online_Marketing  (0)
     - Affiliates  (1)
     - Merchants  (1)
- Policy  (3743)
- Programming  (0)
     - Bookmarklets  (1)
     - Browsers  (1)
     - DHTML  (0)
     - Javascript  (3)
     - PHP  (0)
     - PayPal  (1)
     - Perl  (37)
          - blosxom  (0)
     - Unidata_Universe  (22)
- Random_Advice  (1)
- Reading  (0)
     - Books  (0)
     - Ebooks  (0)
     - Magazines  (0)
     - Online_Articles  (5)
- Resume_or_CV  (1)
- Reviews  (2)
- Rhode_Island_USA  (0)
     - Providence  (1)
- Shop  (0)
- Sports  (0)
     - Football  (0)
          - Cowboys  (0)
          - Patriots  (0)
     - Futbol  (0)
          - The_Rest  (0)
          - USA  (0)
- Technology  (1204)
- Windows  (1)
- Woodworking  (0)

Archives
-	2024 April (140)
-	2024 March (179)
-	2024 February (168)
-	2024 January (146)
-	2023 December (140)
-	2023 November (174)
-	2023 October (156)
-	2023 September (161)
-	2023 August (49)
-	2023 July (40)
-	2023 June (44)
-	2023 May (45)
-	2023 April (45)
-	2023 March (53)

My Sites

- Millennium3Publishing.com

- SponsorWorks.net

- ListBug.com

- TextEx.net

- FindAdsHere.com

- VisitLater.com