October 18, 2018

Image from UK Black Tech’s stock photo project to increase Open Licensed photos of black people in business and tech – Wikimedia Commons CC BY-SA 4.0

So you’re a business. You’ve got a company that’s number #3 in the UK at making spoons, or something like that, and you want to make sure that when people search for your company, they can see you’re legit because a Wikipedia page confers an aura of legitimacy on your noble pursuit of creating the best spoons in the land.

You tried to make a page for your spoon business before, but for some reason it disappeared. No doubt the anti-spoon lobby have got their knives out for you in their cynical attempt to stop people using your quality products. You’ve found the charity responsible for Wikipedia in the UK (that’s us!) and you want to know how you can get your spoon business listed on Wikipedia.

I’m afraid that we may have some bad news for you. You see, Wikipedia is not a business directory. It’s not the Yellow Pages or whatever website has put the Yellow Pages out of business. So you probably need to stop and think ‘is my business notable enough to be in an encyclopaedia?’ It’s estimated that there are somewhere around 200 million companies in the world, so only very few of these will be famous enough to appear in an encyclopaedia.

Maybe you don’t know the answer because you’re not sure what makes something notable enough to be on Wikipedia. Well, luckily we have a set of Notability guidelines for that.

The basic criteria for notability is that “a topic has received significant coverage in reliable sources that are independent of the subject”. So I’m afraid that links to your own site, quotes in articles about another subject, or references to other self-published sources like blogs, petitions or social media posts just won’t meet this standard.


A presentation on verifiability and notability – Wikimedia Commons CC BY-SA 4.0

This standard isn’t supposed to be easy to meet. Your business might be doing really well, it might make the biggest spoons in Britain, but if you’ve not had the Times, or at least the local newspaper down to cover your amazing spoon production in an article which is specifically about your business, then as far as Wikipedia is concerned, it’s not going to be notable enough. But don’t get disappointed. If you want your spoons to be famous, you need to concentrate on getting some media coverage for those spoons. Wikipedia can only cover what has been already published elsewhere.

If your company is notable, it’s likely that someone will eventually get around to creating a page for it. You’re just going to have to be patient. If you try to create the page yourself, without really understanding the core rules of Wikipedia, you might make some mistakes, like putting in Non-Neutral Point of View language, which will show others that you might be connected to the subject matter, and result in the article’s deletion for Conflict of Interest (CoI) editing.

You should also most definitely not pay someone to create a page for you. Paid editing, without a declaration that someone is being paid to edit, is against the rules. If the page for a company keeps getting made and then deleted, editors may ban the creation of the page indefinitely. In 2015 Wikipedia editors uncovered a group trying to make money by scamming businesses by telling them they could make and protect their company’s Wikipedia articles.

The main lesson in this is that if you are going to use Wikipedia properly, you really have to understand how it works. You can’t just stumble into it and start changing important things without appreciating what you’re allowed to change and what kinds of edits are acceptable.  On English Wikipedia, you can’t even create new articles anymore without having a registered account with a certain number of edits.

We recognise that this can be frustrating and offputting to some businesses who could theoretically have good reason to interact with Wikipedia. However, there are things your company could consider doing to make it more likely that someone will create a page for you. You could consider releasing photos of your company or its products under an Open, Creative Commons license, meaning that these photos can be used on Wikipedia.

All the content on Wikipedia is shared on Open Licenses, so we can’t use any media about your company unless you publish it specifically on an Open License. The Welsh music label, Sain Records, released the cover art of many of their Welsh-language records on Open Licenses, along with 30 second clips of some of their artists songs. This means there is now much better coverage of the company and its products on Wikipedia.

A guide to the different types of Creative Commons Open License, and what you are allowed to do with the content published on each one. Image via ANDS.

I have been trying to do outreach to the music industry to encourage them to donate content, like photos of their artists, which Wikipedia editors can use to improve pages on notable musicians. There are lots of black and ethnic minority musicians who don’t have pages on Wikipedia, and we would like to change that. Again, we don’t encourage people who work for music companies to make pages about their artists, but if those companies would like to work with Wikimedia UK, we could organise Wikipedia editing workshops for fans of the artists, and use photos donated by the artists’ companies to create pages for notable people who deserve to be on the encyclopaedia.

We’ve already had a very fruitful collaboration with the Parliamentary Digital Service, who released official parliamentary photos of MPs in 2017, and you will now see that most MPs pages use their official photograph in the infobox on the right of the page.

The best way to learn how Wikipedia works is to get involved. Come to our events. Come to meetups to talk to other Wikimedians and ask their advice. The community is huge, and has over the past 18 years created a complex set of rules to govern the living, constantly changing nature of Wikipedia. We think it’s an amazing achievement, and that’s why we treat it as so much more than an advertising platform.

“I’ve done a complete 180,” commented Marie Butcher, who is currently teaching her first Wiki Education-supported courses this fall. “When I was first approached by Amy [to participate in a Wikipedia teaching project], I was a skeptic. But she’s converted me.” Marie is referring to her colleague, Amy Collier, the Middlebury Associate Provost for Digital Learning and leader of Middlebury’s Newspapers on Wikipedia team. “I believe this will be a hugely productive project, especially if we can get into student heads that Wikipedia can be a great first stop for learning, but should be challenged as well.”

Marie and her colleagues at the Middlebury Institute of International Studies in Monterey, California and Middlebury College in Vermont, were brought to the idea of teaching with Wikipedia via the Newspapers on Wikipedia project, an initiative working to create and expand articles about newspapers that have no information or very little information on a Wikipedia entry. But not all instructors in the Middlebury network are focused on work related to newspapers, so in September, I had the opportunity to visit the Monterey group in person and host a discussion, also attended by remote staff from Vermont, about the value of using Wikipedia in education. Many attended because they were curious to learn more about Wikipedia broadly, while others noted an interest in expanding their use of technology in their teaching.

Together, we discussed some of the benefits of students working on Wikipedia, especially around digital fluency, and giving students an authentic place to practice fact checking. We also discussed some of the challenges, especially around students not knowing how to use Wikipedia as a research resource. In fact, Middlebury College has a history of pushing back against Wikipedia, even making it in the news in 2007 when “a history department banned citing Wikipedia as a research resource.” With the help of Wiki Education’s online tools and resources, however, we can support students through the fact checking and research process, guiding them to an understanding of Wikipedia’s policies, and supporting them as they work with their instructors and librarians to select topics for their research that have significant coverage by secondary sources.

Marie joins fellow instructor Gabriel Guillen in teaching their first courses with us this fall. We are excited to continue growing our support of courses in the Middlebury network over the next few years. A special thank you to Bob Cole, Director of Exploratory Initiatives and Partnerships, who helped set up our workshop. If you’d like to get involved in these initiatives, please join us by assigning your students to contribute to Wikipedia as part of a course research project in your next term! To find out more visit teach.wikiedu.org or email us at contact@wikiedu.org. You can also join Middlebury faculty and students in their NOW edit-a-thon October 26th from 12-4 Pacific (3-7 Eastern).

One of the hazards of contributing to Wikipedia is that one does not read enough of what is on it. Bumping into a series of interesting paintings of South African birds I looked up the artist marked as Sergeant C. G. Davies. Turns out that he was Claude Gibney Finch-Davies, a somewhat lesser known artist. Born in Delhi in 1875 he went to England and joined the army in South Africa. Somewhere along the line he picked up an interest in birds and art. A couple of biographies have been written about him by A C Kemp, but it would seem like he has largely been unknown, partly due to something he did that blemished his career and led perhaps to his death/suicide. His keen interest in illustration led him to remove plates from books in the museums and libraries that he referred to. Today there are probably art collectors who must be eager to steal this man's paintings.

The Natural History Museum at London holds some of his unpublished notebooks and paintings. Fortunately for us his paintings are out of copyright since 70 years have passed since his untimely death. Some of his paintings can be found here on Wikimedia Commons

His biography on Wikipedia is interesting but some of the details seem to be untraceable - it says [emphasis mine]:
He was born in Delhi, India, the third child and eldest son of Major-General Sir William and Lady Elizabeth B. Davies née Field. His father later became Governor of Delhi and was awarded the Order of the Star of India, while his mother was said to be an expert on Indian snakes.
The names of the mother and father are confirmed elsewhere as well. But it is odd that no further information is found on his father in the ODNB. Does anyone know further details and sources?

October 16, 2018

What insulates Wikipedia from the criticisms other massive platforms endure? We explored some answers—core values, lack of personalization algorithms, and lack of data collection—in last week’s “How Wikipedia Dodged Public Outcry Plaguing Social Media Platforms.”

But wait, there’s more:

Wikipedia moderation is conducted in the open.

“The biggest platforms use automated technology to block or remove huge quantities of material and employ thousands of human moderators.” So says Mark Bunting in his July 2018 report Keeping Consumers Safe Online: Legislating for platform accountability for online content.” Bunting makes an excellent point, but he might have added a caveat: “The biggest platforms, like Facebook, Twitter, and YouTube, but not Wikipedia.

Wikipedia, one of the top web sites worldwide for about a decade, works on a different model. The volunteers writing Wikipedia’s content police themselves, and do so pretty effectively. Administrators and other functionaries are elected, and the basic structure of Wikipedia’s software helps hold them accountable: actions are logged, and are generally visible to anybody who cares to review them. Automated technology is used; its coding and its actions are transparent and subject to extensive community review. In extreme cases, Wikimedia Foundation staff must be called in, and certain cases (involving extreme harassment, outing, self-harm, etc.) require discretion. But the paid moderators certainly don’t number in the thousands; the foundation employs only a few hundred staff overall.

More recently, a Motherboard article explored Facebook’s approach in greater depth: “The Impossible Job: Inside Facebook’s Struggle to Moderate Two Billion People.” It’s a long, in-depth article, well worth the read.

One point in that article initially stood out to me: presently, “Facebook is still making tens of thousands of moderation errors per day, based on its own targets.” That’s a whole lot of wrong decisions, on potentially significant disputes! But if we look at that number and think, “that number’s too high,” we’re already limiting our analysis to the way Facebook has presented the problem. Tech companies thrive on challenges that can be easily measured; it’s probably a safe bet that Facebook will achieve something they can call success…that is, one that serves the bottom line. Once Facebook has “solved” that problem, bringing the errors down to, say, the hundreds, Facebook execs will pat themselves on the back and move on to the next task.

The individuals whose lives are harmed by the remaining mistakes will be a rounding error, of little concern to the behemoth’s leadership team. On a deeper level, the “filter bubble” problem will remain; Facebook’s user base will be that much more insulated from information we don’t want to see. Our ability to perceive any kind of objective global reality—much less to act on it—will be further eroded.

As artificial intelligence researcher Jeremy Lieberman recently tweeted, we should be wary of a future in which “…news becomes nearly irrelevant for most of us” and “our own private lives, those of our friends; our custom timelines become the only thing that really matters.” In that world, how do we plan effectively for the future? When we respond to natural disasters, will only those with sufficiently high Facebook friend counts get rescued? Is that the future we want?

It’s not just moderation—almost all of Wikipedia is open.

“If you create technology that changes the world, the world is going to want to govern [and] regulate you. You have to come to terms with that.” —Brad Smith, Microsoft, May 2018. As quoted in Bunting (2018).

From the start, Wikipedia’s creators identified their stakeholders, literally, as “every single human being.” This stands in stark contrast to companies that primarily aim to thrive in business. Wikipedia, on the whole, is run by a set of processes that is open to review and open to values-based influence.

This point might elicit irate howls of frustration from those whose ideas or efforts have been met with a less-than-respectful response. Catch me on another day, and the loudest howls might be my own. But let’s look at the big picture, and compare Wikipedia to massive, corporate-controlled platforms like YouTube, Facebook, or Google.

  • Wikipedia’s editorial decisions are made through open deliberation by volunteers, and are not subject to staff oversight.
  • Most actions leading up to decisions, as well as decisive actions themselves, are logged and available to public review and comment.
  • It’s not just the content and moderation: the free software that runs Wikipedia, and the policies that guide behavior on the site, have been built through broad, open collaboration as well.
  • The Wikimedia Foundation has twice run extensive efforts to engage volunteers in strategic planning, and in many instances has effectively involved volunteers in more granular decision-making as well.

There is room for improvement in all these areas, and in some cases improvement is needed very badly. But inviting everyone to fix the problems is part of what makes Wikipedia thrive. Treating openness as a core value invites criticism and good faith participation, and establishes a basic framework for accountability.

“While principles and rules will help in an open platform, it is values that [operators of platforms] should really be talking about.” — Kara Swisher in the New York Times, August 2018.

Wikipedia lacks relentless public relations & financial shareholders.

There’s another frequently overlooked aspect of Wikipedia: financially speaking, the site is an ant among elephants.

The annual budget of the Wikimedia Foundation, which operates Wikipedia, is about $120 million. That may sound like a lot, but consider this: Just the marketing budget of Alphabet (Google’s parent company) is more than $13 billion.

In terms of the value Wikipedia offers its users, and the respect it shows for their rights, Wikipedia arguably outstrips its neighbors among the world’s top web sites. But it does so on a minuscule budget.

Wikipedia doesn’t have armies of public relations professionals or lobbyists making its case. So part of the reason you don’t hear more about Wikipedia’s strategy and philosophy is that there are fewer professionals pushing that conversation forward. The site just does its thing, and its “thing” is really complex. Because it works fairly well, journalists and policymakers have little incentive to delve into the details themselves.

Wikipedia also doesn’t have armies of stockholders exerting pressure, forcing the kind of tension between profit and ethics that often drives public debate.

Wikipedia is driven by philosophical principles that most would agree with; so the issues that arise are in the realm of implementation. There is little pressure to compromise on basic principles. Tensions between competing values, like business interests vs. ethical behavior, drive the debate over corporate-controlled platforms; but those tensions basically don’t exist for Wikipedia.

In 1973, video artist and provocateur Richard Serra produced the short film “Television Delivers People.” It suggested that those consuming “free” television were not the customers, but the product…being sold to advertisers. In the Internet era, the notion has been frequently applied to media companies. Reasonable people might debate how well this line of thinking applies to various media and social media companies. But with Wikipedia, unique among major Internet platforms, this particular criticism clearly does not apply.

Concluding thoughts

The reasons you don’t hear much about Wikipedia’s governance model are that it is rooted in clearly articulated principles, works fairly well, is reasonably open to benevolent influence, and lacks a public relations campaign.

Those are all good things—good for Wikipedia and its readers. But what about the rest of the Internet? The rest of the media world, the rest of society? If the notion of objective truth is important to you, and if you’re concerned about our access to basic facts and dispassionate analysis in today’s rapidly shifting media landscape, you might want to challenge yourself to learn a bit more about how Wikipedia has been made and how it governs itself…even if you have to dig around a bit to do so.

This article was also published on LinkedIn and Medium.

Everybody has an opinion about how to govern social media platforms. It’s mostly because they’ve shown they’re not too good at governing themselves. We see headlines about which famous trolls are banned from what sites. Tech company executives are getting called before Congress, and the topic of how to regulate social media is getting play all over the news.

Wikipedia has problematic users and its share of controversies, but as web platforms have taken center stage in recent months, Wikipedia hasn’t been drawn into the fray. Why aren’t we hearing more about the site’s governance model, or its approach to harassment, bullying? Why isn’t there a clamor for Wikipedia to ease up on data collection? At the core, Wikipedia’s design and governance are rooted in carefully articulated values and policies, which underlie all decisions. Two specific aspects of Wikipedia inoculate it from some of the sharpest critiques endured by other platforms.

Wikipedia exists to battle fake news. That’s the whole point.

Wikipedia’s fundamental purpose is to present facts, verified by respected sources. That’s different from social media platforms, which have a more complex project…they need to maximize engagement, and get people to give up personal information and spend money with advertisers. Wikipedia’s core purpose involves battling things like propaganda and “fake news.” Other platforms are finding they need to retrofit their products to address misinformation; but battling fake news has been a central principle of Wikipedia since the early days.

1. Wikipedia lacks “personalization algorithms” that get other kids in trouble.

The “news feed” or “timeline” of sites like Facebook, Twitter, or YouTube is the source of much controversy, and of much talk of regulation. These platforms feed their users content based on…well, based on something. Any effort to anticipate what users will find interesting can be tainted by political spin or advertising interests. The site operators keep their algorithms private. Each social media company closely guards its algorithm as valuable intellectual property, even as they tinker and test new versions.

That’s not how Wikipedia works. Wikipedia’s front page is the same for all users. Wikipedia’s volunteer editors openly deliberate the about what content to feature. Controversies sometimes spring up, but even when they do, the decisions leading to them are transparent and open to public commentary.

Search within Wikipedia is governed by an algorithm. But relative to a Twitter feed, it’s fairly innocuous; when you search for something, there are probably only a handful of relevant Wikipedia articles, and they will generally come up in the search results. Much of the work that guides Wikipedia search is open, and is generated by Wikipedia’s user community: redirects, disambiguation pages, and “see also” links. And the MediaWiki software that drives the site, including the search function, is open source.

But even so, an ambitious Wikimedia Foundation executive tried to take bold action around the search algorithm a few years ago. The “Knowledge Engine” was conceived as a new central component of Wikipedia; artificial intelligence and machine learning would have taken a central role in the user experience. The plan was hatched with little regard for the values that drive the Wikipedia community, and was ultimately scuttled by a full-blown revolt by Wikipedia’s users and the Foundation’s staff. Would an algorithm-based approach to driving reader experience have exposed Wikipedia to the kind of aggressive scrutiny Twitter and Facebook now face? Perhaps the problems Wikipedia dodged in that tumultuous time were even bigger than imagined.

The Wikimedia Foundation’s fund-raising banners are driven by algorithms, too. These spark frequent debates, but even the design of those algorithms is somewhat transparent, and candid discussion about them is not unusual. Those of us who care deeply about Wikipedia’s reputation for honesty sometimes find significant problems with the fund-raising messages; but the impact of problems like these is limited to Wikipedia’s reputation, not the public’s understanding of major issues.

2. Wikipedia isn’t conspiring to track your every move.

Most web sites collect, use, and sell a tremendous amount of data about their users. They’ve gotten really sophisticated, and can surmise an incredible amount of information about us. But that’s a game that Wikipedia simply doesn’t play.

In 2010, the Wall Street Journal ran a series on how web sites use various technologies and business partnerships to track all kinds of information about their users. Journalists Julia Angwin and Ashkan Soltani were nominated for a Pulitzer Prize, and won the Loeb Award for Online Enterprise. It’s still relevant in 2018.

Even back then, coverage of the issue managed to neglect one vital fact: Wikipedia, unlike all the other top web sites, does not track your browsing history. The site barely captures any such information to begin with, and its operators don’t share it unless legally compelled. When considered by the Electronic Frontier Foundation in their “Who Has Your Back” report (and I’ll claim a little credit for their considering Wikipedia to begin with), the Wikimedia Foundation has earned stellar marks.

Why Wikipedia’s principled design matters

At its core, Wikipedia is avoiding scandal by two core aspects of how it functions: it doesn’t try to predict and guide what you encounter online, and it doesn’t capture and analyze user data.

It might be possible for social media platforms to constrain their approach to those activities enough to satisfy their critics. Just like it might be possible for a heroin addict to limit their use enough to function in society, or for a cabbie to minimize the possibility of a car wreck through attentive driving.

But it would have been safer for the heroin addict to avoid using heroin to begin with, or for the cabbie to have taken a desk job. That’s how it is with Wikipedia. The site has relentlessly kept its focus on its main goal of providing informationeven to the exclusion of chasing money from advertisers or by reselling user data.

One benefit of that clarity of vision among the designers and maintainers of Wikipedia is that we’ve been able to govern ourselves reasonably well. Which means the government and media pundits aren’t trying to do it for us.

This article was also published on LinkedIn and Medium.

Concern about social media and the quality of news is running high, with many commentators focusing on bias and factual accuracy (often summarized as “fake news”). If efforts to regulate sites like Facebook are successful, they could affect the bottom line; so it would behoove Facebook to regulate itself, if possible, in any way that might stave off external action.

Facebook has tried many things, but they have ignored something obvious. It’s something that has been identified by peer reviewed studies as a promising approach since at least 2004…the same year Facebook was founded.

Instead of making itself the sole moderator of problematic posts and content, Facebook should offer its billions of users a role in content moderation. This could substantially reduce the load on Facebook staff, and could allow its community to care of itself more effectively, improving the user experience with far less need for editorial oversight. Slashdot, once a massively popular site, proved prior to Facebook’s launch that distributing comment moderation among the site’s users could be an effective strategy, with substantial benefits to both end users and site operators. Facebook would do well to allocate a tiny fraction of its fortune to designing a distributed comment moderation system of its own.

Distributed moderation in earlier days

“Nerds” in the late 1990s or early 2000s—when most of the Internet was still a one-way flow of information for most of its users—had a web site that didn’t merely keep them them informed, but let them talk through the ideas, questions, observations, or jokes that the (usually abbreviated and linked) news items would prompt. Slashdot, “the first social news site that gained widespread attention,” presented itself as “News for Nerds. Stuff that Matters.” It’s still around, but in those early days, it was a behemoth. Overwhelming a web site with a popular link became known as “slashdotting.” There was a time when more than 5% of all traffic to sites like CNET, Wired, and Gizmodo originated from Slashdot posts.

Slashdot featured epic comment threads. It was easy to comment, and its readers were Internet savvy almost by definition. Slashdot posts would have hundreds, even thousands, of comments. According to the site’s Hall of Fame, there were at least 10 stories with more than 3,200 comments.

But amazingly—by today’s diminished standards, at least—a reader could get a feel for a thread of thousands of messages in just a few minutes of skimming. Don’t believe me? Try this thread about what kept people from ditching Windows in 2002. (The Slashdot community was famously disposed toward free and open source software, like GNU/Linux.) The full thread had 3,212 messages; but the link will show you only the 24 most highly-rated responses, and abbreviated versions of another 35. The rest are not censored; if you want to see them, they’re easy to access through the various “…hidden comments” links.

As a reader, your time was valued; a rough cut of the 59 “best” answers out of 3,212 is a huge time-saver, and makes it practical to get a feel for what others are saying about the story. You could adjust the filters to your liking, to see more or fewer stories by default. As the subject of a story, it was even better; supposing some nutcase seized on an unimportant detail, and spun up a bunch of inaccurate paranoia around it, there was a reasonable chance their commentary would be de-emphasized by moderators who could see through the fear, uncertainty, and doubt.

At first blush, you might think “oh, I see; Facebook should moderate comments.” But they’re already doing that. In the Slashdot model, the site’s staff did not do the bulk of the moderating; the task was primarily handled by the site’s more active participants. To replicate Slashdot’s brand of success, Facebook would need to substantially modify the way their site handles posts and comments.

Going meta

Distributed moderation, of course, can invite all sorts of weird biases into the mix. To fend off the chaos and “counter unfair moderation,” Slashdot implemented used what’s known as “metamoderation.” The software gave moderators the ability to assess one another’s moderation decisions. Moderators’ decisions needed to withstand the scrutiny of their peers. I’ll skip the details here, because the proof is in the pudding; browsing some of the archived threads should be enough to demonstrate that the highly-rated comments are vastly more useful than the average comment.

Some Internet projects did study Slashdot-style moderation

For some reason, it seems that none of the major Internet platforms of 2018—Facebook, Twitter, YouTube, etc.—have ever experimented with meta-moderation.

From my own experience, I can affirm that some projects intending to support useful online discussion did, in fact, consider meta-moderation. In its early stages, the question-and-answer web site quora.com took a look at it; so did a project of the Sloan Foundation in the early days of the commentary tool hypothes.is.

If Facebook ever did consider a distributed moderation system, it’s not readily apparent. Antonio García Martínez, a former Facebook product manager, recently tweeted that he hadn’t thought about it at length, and expressed initial skepticism that it could work.

There are a few reasons why Facebook might be initially reluctant to explore distributed moderation:

  • Empowering people outside the company is always unsettling, especially when there’s a potential to impact the brand’s reputation;
  • Like all big tech companies, Facebook tends to prefer employing technical, rather than social, interventions;
  • Distributed moderation would require Facebook to put data to use on behalf of its users, and Facebook generally seeks to tightly control how its data is exposed;
  • Slashdot’s approach would require substantial modification to fit Facebook’s huge variety of venues for discussion.

Those are all reasonable considerations. But with an increasing threat of external regulation, Facebook should consider anything that could mitigate the problems its critics identify.

Subject of academic study

If you’ve used a site with distributed moderation, and a meta-moderation layer to keep the mods accountable, you probably have an intuitive sense of how well it can work. But in case you haven’t, research studies going back to 2004 have underscored its benefits.

According to researchers Cliff Lampe and Paul Resnick, Slashdot demonstrated that a distributed moderation system could help to “quickly and consistently separate high and low quality comments in an online conversation.” They also found that “final scores for [Slashdot] comments [were] reasonably dispersed and the community generally [agreed] that moderations [were] fair.” (2004)

Lampe and Resnick did acknowledge shortcomings in the meta-moderation system implemented by Slashdot, and stated that “important challenges remain for designers of such systems.” (2004) Software design is what Facebook does; it’s not hard to imagine that the Internet giant, with annual revenue in excess of $40 billion, could find ways to address design issues.

The appearance of distributed moderation…but no substance

In the same year that Lampe and Resnick published “Slash(dot) and burn” (2004), Facebook launched. Even going back to the site’s earliest days, the benefits of distributed meta-moderation had already been established.

Facebook, in the form it’s evolved into, shares some of the superficial traits of Slashdot’s meta-moderation system. Where Slashdot offered moderators options like “insightful,” “funny,” and “redundant,” Facebook offers options like “like,” “love,” “funny,” and “angry.” The user clicking one of those options might feel as though they are playing the role of moderator; but beneath the surface, in Facebook’s case, there is no substance. At least, nothing to benefit the site’s users; the data generated is, of course, heavily used by Facebook to determine what ads are shown to whom.

In recent years, Facebook has offered a now-familiar bar of “emoticons,” permitting its users to express how a given post or comment makes them feel. Clicking the button puts data into the system; but it’s only Facebook, and its approved data consumers, who get anything significant back out.

When Slashdot asked moderators whether a comment was insightful, funny, or off-topic, that information was immediately put to work to benefit the site’s users. By default, readers would see only the highest-rated comments in full, and would see a single “abbreviated” line for those with medium ratings, and would have to click through to see everything else. Those settings were easy to change, for users preferring more or less in the default view, or within a particular post. Take a look at the controls available on any Slashdot post:

Where Facebook’s approach falls short

Facebook’s approach to evaluating and monitoring comments falls short in several ways:

  1. It’s all-or-nothing. With Slashdot, if a post was deemed “off topic” by several moderators, it would get a low ranking, but it wouldn’t disappear altogether. A discerning reader, highly interested in the topic at hand and anything even remotely related, might actually want to see that comment; and with enough persistence, they would find it. But Facebook’s moderation—whether by Facebook staff or the owner of a page—permits only a “one size fits all” choice: to delete or not to delete.
  2. Facebook staff must drink from the firehose. When the users have no ability to moderate content themselves, the only “appeal” is to the page owner or to Facebook staff. Cases that might be easily resolved by de-emphasizing an annoying post either don’t get dealt with, or they get reported. Staff moderators have to process all the reports; but if users could handle the more straightforward cases, the load on Facebook staff would be reduced, permitting them to put their attention on the cases that really need it.
  3. Too much involvement could subject Facebook to tough regulation as a media company. There is spirited debate over whether companies like Facebook should be regarded as a media company or a technology platform. This is no mere word game; media companies are inherently subject to more invasive regulation. Every time Facebook staff face a tricky moderation decision, that decision could be deemed an “editorial” decision, moving the needle toward the dreaded “media company” designation.

Facebook must learn from the past

Facebook is facing substantial challenges. In the United States, Congress took another round of testimony last week from tech executives, and is evaluating regulatory options. Tim Wu, known for coining the term “net neutrality,” recently argued in favor of competitors to Facebook, perhaps sponsored by the Wikimedia Foundation; he now says the time has come for Facebook to be broken up by the government. In the same article, antitrust expert Hal Singer paints a stark picture of Facebook’s massive influence over innovative competitors: “Facebook sits down with someone and says, ‘We could steal the functionality and bring it into the mothership, or you could sell to us at this distressed price.’” Singer’s prescription involves changing Facebook’s structure, interface, network management, and dispute adjudication process. Meanwhile in Europe, the current push for a new Copyright Directive would alter the conditions in which Facebook operates.

None of these initiatives would be comfortable for Facebook. The company has recently undertaken a project to rank the trustworthiness of its users; but its criteria for making such complex evaluations are not shared publicly. Maybe this will help them in the short run, but in a sense they’re kicking the can down the road; this is yet another algorithm outside the realm of public scrutiny and informed trust.

If Facebook has an option that could reduce the concerns driving the talk of regulation, it should embrace it. According to Lampe and Resnick, “the judgments of other people … are often the best indicator of which messages are worth attending to.” Facebook should explore an option that lets them tap an underutilized resource: the human judgment in its massive network. The specific implementation I suggest was proven by Slashdot; the principle of empowering end users also drove Wikipedia’s success.

Allowing users do play a role in moderating content would help Facebook combat the spread of “fake news” on its site, and simultaneously demonstrate good faith by dedicating part of its substantial trove of data to the benefit of its users. As Cliff Lampe, the researcher quoted above, recently tweeted: “I’ve been amazed, watching social media these past 20 years, that lessons from Slashdot moderation were not more widely reviewed and adopted. Many social sites stole their feed, I wish more had stolen meta-moderation.”

All platforms that feature broad discussion stand to benefit from the lessons of Slashdot’s distributed moderation system. To implement such a system will be challenging and uncomfortable; but big tech companies engage with challenging software design questions routinely, and are surely up to the task. If Facebook and the other big social media companies don’t try distributed moderation, a new project just might; and if a new company finds a way to serve its users better, Facebook could become the next Friendster.

This article was also published on LinkedIn and Medium.

Maybe you’ve already heard the story of how the global edit-a-thon known as Art+Feminism got started. It goes something like this:

Five years ago, four friends—Siân Evans, Jacqueline Mabey, Michael Mandiberg, and Laurel Ptak—gathered together to discuss an idea for promoting Wikipedia as a place to challenge one of the ways women are silenced: through the preservation of information. That discussion became the Art+Feminism campaign.

Our goals today still revolve around combating gender inequity on the internet, using Wikipedia as a tool for correcting the written record on cis and trans women. And in the last year, all of us—from the leadership collective to the thousands of organizers, artists, librarians, activists, and editors who make up our global Art+Feminism community—have experienced many lessons, challenges, and triumphs.

Take the month of March, for example. Over 4,000 people at more than 315 Art+Feminism events around the world came together to create or improve nearly 22,000 pages on Wikipedia, with a total of 43,000 content pages created or improved! That’s four times the output of our 2017 events. Four times. Out of 357 initiatives across 80 countries, we were named as 1 of 5 finalists in the #EQUALSinTech Leadership Award Category.

Our fifth-annual Wikipedia edit-a-thon at the Museum of Modern Art, an all-day event designed to generate coverage of feminism and the arts on Wikipedia, took place on 3 March 2018 with hundreds of partner events around the world. It featured tutorials for the beginner Wikipedian, ongoing editing support, training on combating implicit and explicit biases, reference materials, childcare, and refreshments, with the leading panel “Careful with Each Other, Dangerous Together,” about the relationship between structures of inequality on the internet, the emotional labor of internet activism, and creating inclusive online communities with Caroline Sinders, Sydette Harry, Salome Asega, and Sarah Jaffe.

Art+Feminism’s regional organizers continue to amplify the way the project resonates with and reaches people all over the world. This year, we focused on Latin America, where Melissa Tamani has done concentrated outreach, quadrupling the number of events in that region compared to last year.  These nearly 30 events stretched from Laboratorio Cultural del Norte, Chihuahua, Mexico, to the Museo Nacional de Bellas Artes de Santiago.

Panel discussion about Art+Feminism at The Museum of Modern Art, New York City, 3 March 2018.

Our success is because of the commitment that we share with hundreds of organizers around the world to see the voices of cis and trans women made visible and their achievements shared just as widely as their male identified peers. We want to see justice done and each year,  we work to refine our strategies, our organizing, and our materials so that they are made even more accessible than the year before. With that goal in mind, we launched our Quick Guide for Organizers and our Quick Guide for (New) Editors. Both guides have been made available on our training materials page in English and Spanish. On 30 September of this year, we are launching our redesigned training guides for organizers in PPT form in English and Spanish with more guides to follow translated into French and at least one African language with the goal of translating our materials for new editors and organizers into at least five additional languages in the next three years of our campaign. In addition to content focused on quick tips for organizing edit-a-thons and editing Wikipedia for the first time; we’ll be launching even more materials in the future focused specifically on advancing gender justice on the internet, designing brave/safe space policies through a lens knowledge equity and anti-harassment, and more.

Beginning this project, we knew that our role would not only be to empower cis and trans women to edit online but to stand with them as they are challenged by those who do not see value in their perspective nor value in them. Art+Feminism is about making Wikipedia a more complete and representative source of information, but it doesn’t end there for us. It’s about dismantling systems of thought that diminish or erase entirely the place marginalized people and their communities have in our history.

As we have addressed many glaring omissions about women on Wikipedia, we have seen our focus turn towards improving existing articles: for example, the first year we created 101 new articles, and improved roughly the same number, while this year we improved 7 times as many articles as we created.

Our task is to take what we’ve made to the next level. From there, our task is to leverage the reach and impact of our social justice community to recognize that Wikipedia is just one of many tools that can be used to combat gender inequity and biased, informative content on the internet. We have so much to do, and we’re ready to take on the task of continuing to iterate and improve as our campaign evolves and adapts as it needs to.

Look at what we’ve done—and there’s even more to come.

McKensie Mack, Director

We couldn’t do this work without our supporters, our partners, and our Regional Ambassadors: AfroCROWD; Black Lunch Table; Women in Red; the Professional Organization for Women in the Arts (POWarts); and The Museum of Modern Art. Art+Feminism receives support from Qubit New Music and Wikimedia New York City. The Art+Feminism leadership collective includes Mohammed Sadat Abdulai; Stacey Allan; Amber Berson; Sara Clugage; Richard Knipel; Stuart Prior; Melissa Tamani; and Addie Wagenknecht.

Jay Rowland, student at Arizona State University.
Image: File:Jay Rowland headshot.pngBovina17, CC BY-SA 4.0 via Wikimedia Commons.

When Jay Rowland heard he would be adding content to Wikipedia as a class assignment, he was excited for what he would learn. “I thought that it would be good to learn how to edit Wikipedia articles,” says Jay. Wikipedia is a resource that students use all the time, but often without the robust digital literacy skills to discern the trustworthiness of what they’re reading. By learning how to create and expand Wikipedia articles as an assignment, students learn the ins and outs of the platform, gaining valuable information literacy and research skills in the process.

Not only is the assignment greatly beneficial for students, but the impact of their work also extends beyond the classroom. Most Wikipedia articles are comprehensive and well-sourced, but when it comes to academic topics, articles are incomplete or simply missing. That’s where students can make a difference. Thousands of higher education instructors have taught their students how to create and expand Wikipedia content using our tools and support, making the online encyclopedia better for all future readers.

Jay improved the article about Evacuation in the Soviet Union, a mass migration of people and industries out of the Soviet Union during German military occupation in 1941. The topic fit well within his course, Stalinism Society and Culture in the USSR, 1924-1953, taught by Dr. Benjamin Beresford at Arizona State University.

Joe Cantu, student at Arizona State University.
ImageFile:Me in Chicago.jpgJoecantu1134, CC BY-SA 4.0 via Wikimedia Commons.

Jay worked with a number of other students in his class to improve the article, including Joe Cantu. Joe spoke to the different nature of a Wikipedia assignment as compared to a traditional research paper. “Typically I have always written persuasive essays. But when writing for Wikipedia you have to present just the facts, without bias.” Encyclopedic writing differs heavily from argumentative writing in this way. Often students find that writing a paragraph on an encyclopedia can be just as much work as a whole paper, sometimes more. It’s an exercise in conciseness; students must include only essential and well-sourced information in the final product. Students often consult many more sources than in a traditional research paper, as well. The Evacuation in the Soviet Union article, for example, now boasts 35 distinct sources in its reference section, as opposed to the 4 that were cited before the students improved it.

With a world audience, Dr. Beresford’s students felt pressure to present accurate, thorough information. “It wasn’t until I saw the many steps and procedures that it took to create the article that I became a little hesitant in not wanting to put wrong information into the subject,” says Jay. “Wikipedia is not accepted as a reliable source, but it is a website that is constantly viewed by everyone to get acquainted with the topic or subjects one is investigating.” Although Jay felt an added pressure to produce quality work, the public audience aspect of the assignment was also his favorite part. Producing an article that millions could use to learn something new was exciting, “just knowing that you had a part in creating that.”

The students’ hard work paid off. Not only is the improved article well-written and well-sourced now, but Jay finds pride in having made it better with his classmates. “Compared to a traditional class assignment, the Wikipedia project was more fulfilling in that I can always go back to the Evacuation in the Soviet Union article and still make edits if new material arises or make grammatical adjustments to previously written paragraphs or sentences that we initially missed.” Take a look at the dramatic difference in the before and after!

Taking the time to improve Wikipedia does a public service to so many more people beyond the classroom. “We need to make Wikipedia somewhat of a reliable first stop for students, because we know they are looking at this website first. Everybody does,” Jay says, “More colleges and universities need to participate in the drafting and editing of Wikipedia articles in the future. Think of the millions of reports or papers that students write every year.” That hard work could instead be channeled into a public resource for the benefit of all.

If you want to learn more or incorporate Wikipedia into your own course, visit teach.wikiedu.org or reach out to contact@wikiedu.org with questions.

ImageFile:Music Auditorium ASU Tempe AZ 220398.JPG,  Wars, CC BY-SA 3.0 via Wikimedia Commons.

October 15, 2018

For the past couple of months, in collaboration with researchers, I've been applying machine learning to RUM metrics in order to model the microsurvey we've been running since June on some wikis. The goal being to gain some insight into which RUM metrics matter most to real users.

Having never done any machine learning before, I did a few rookie mistakes. In this post I'll explain the biggest one, which led us to believe for some time that we had built a very well-performing model.

Class imbalance

The survey we're collecting user feedback with has a big class imbalance issue when it comes to machine learning. A lot more people are happy about the performance than people who are unhappy (a good problem to have, for sure!). In order to build a machine learning model that works, we used a common strategy to address this: undersampling. The idea is that in a binary classification, if you have too many of one of the two values, you just discard the excess data for that type.

Sounds simple, right? in Python/pandas it looks something like this:

dataset.sort_values(by=[column_prefix + 'response'], inplace=True)
negative_responses_count = dataset[column_prefix + 'response'].value_counts()[-1]
dataset = dataset.head(n=int(negative_responses_count) * 2)

Essentially we sort by value, with the ones we have the least values for at the top, then we used head() to get the first N records, where N is twice the amount of negative survey responses. With this, we should end up with exactly the same amount of rows for each value (negative and positive response). So far so good.

Then we apply our machine learning algorithm to the dataset (for example, for a binary classification of this kind, random forest is a good choice). At first the results were poor, and then we added a basic feature we forgot to include: time. Time of day, day of the week, day of the year, etc. When adding these, things started to work incredibly well! Surely we discovered something groundbreaking about seasonality/time-dependence in this data. Or...

I've made a huge mistake

A critical mistake was made in the above code snippet. The original dataset has chronological records. When we sort by "response" value, this chronological order remains, within the context of each sorted section of the dataset.

We have to perform undersampling because we have too many positive survey responses over the full timespan. We start by keeping all the negative responses, which happen over the full timespan. But we only keep the first N positive responses... which, due to the chronological ordering of records, come from a much shorter timespan. In the same dataset we end up with rows that contain negative responses ranging for example from June 1st to October 1st. And positive responses only ranging from June 1st to June 15th, for instance.

The reason why the model started giving excellent results when we introduced time as a feature, is that it basically detected the date discrepancy in our dataset! It's pretty easy to guess that a response is likely positive if you look at its date. If the date is later than June 15th, everything in our dataset is negative responses... Our machine learning model just started excelling at detecting our mistake :)

A simple solution

The workaround for this issue is simply to pick N positive responses at random over the whole timespan when undersampling, to make sure that the dataset is consistent:

dataset.sort_values(by=[column_prefix + 'response'], inplace=True)
negative_responses = dataset.head(n=int(negative_responses_count))
positive_responses = dataset.tail(n=int(dataset.shape[0] - negative_responses_count))
positive_responses = shuffle(positive_responses).head(n=int(negative_responses_count))
dataset = pandas.concat([negative_responses, positive_responses])

This way we ensure that we're not introducing a time imbalance when working around our class imbalance.

We are pleased to announce a $2 million gift to the Wikimedia Endowment from George Soros, one of the world’s leading philanthropists.

Soros is known for his extensive philanthropy to support ideals underpinning a free and open society, including access to knowledge, education, economic development and policy reform. He is also known for founding the Open Society Foundations, one of the preeminent international grantmaking networks supporting civil society groups around the world, and giving over $32 billion of his personal wealth to the organization.

“George’s generous gift to the future of free knowledge is reflective of his deep commitment to supporting openness in all its forms” said Katherine Maher, Executive Director of the Wikimedia Foundation. “His gift will help us ensure the sum of all knowledge remains free and open for the benefit of generations to come.”

This gift provides vital momentum to the Wikimedia Endowment Campaign. Wikimedia believes that free knowledge is the foundation for human potential, opportunity, and freedom. Since the launch of the endowment in January 2016, the campaign has raised over $26.5 million from generous donors, philanthropists, and Wikimedia community members.

“The Endowment is not just a practical way to support Wikipedia,” Soros says. “My gift represents a commitment to the ideals of open knowledge—and to the long-term importance of free knowledge sources that benefit people around the world.”

“The Wikimedia Endowment guarantees that the next generation of Wikipedia readers and contributors will have even better educational opportunities than the previous generations had,” said Peter Baldwin, co-founder of the Arcadia Fund, Wikimedia Endowment Board member, and long-time supporter. “Time and again George has been a philanthropic leader in ensuring access and opportunity around the world, and his gift to the endowment furthers that for generations to come.”

Kaitlin Thaney, Endowment Director
Wikimedia Foundation

TriangleArrow-Left.svgprevious 2018, week 42 (Monday 15 October 2018) nextTriangleArrow-Right.svg
Other languages:
Deutsch • ‎English • ‎dansk • ‎français • ‎italiano • ‎polski • ‎suomi • ‎svenska • ‎čeština • ‎русский • ‎українська • ‎עברית • ‎العربية • ‎فارسی • ‎हिन्दी • ‎মেইতেই লোন্ • ‎中文 • ‎日本語 • ‎한국어

October 14, 2018

Mix’n’match is one of my more popular tools. It contains a number of catalogs, each in turn containing hundreds or even millions of entries, that could (and often should!) have a corresponding Wikidata item. The tool offers various ways to make it easier to match an entry in a catalog to a Wikidata item.

While the user-facing end of the tool does reasonably well, the back-end has become a bit of an issue. It is a bespoke, home-grown MySQL database that has changed a lot over the years to incorporate more (and more complex) metadata to go with the core data of the entries. Entries, birth and death dates, coordinates, third-party identifiers are all stored in separate, dedicated tables. So is full-text search, which is not exactly performant these days.

The perhaps biggest issue, however, is the bottleneck in maintaining that data – myself. As the only person with write access to the database, all maintenance operations have to run through me. And even though I have added import functions for new catalogs, and run various automatic update and maintenance scripts on a regular basis, the simple task of updating an existing catalog depends on me, and it is rather tedious work.

At the 2017 Wikimania in Montreal, I was approached by the WMF about Mix’n’match; the idea was that they would start their own version of it, in collaboration with some of the big providers of what I call catalogs. My recommendation to the WMF representative was to use Wikibase, the data management engine underlying Wikidata, as the back-end, to allow for a community-based maintenance of the catalogs, and use a task specific interface on top of that, to make the matching as easy as possible.

As it happens with the WMF, a good idea vanished somewhere in the mills of bureaucracy, and was never heard from again. I am not a system administrator (or, let’s say, it is not the area where I traditionally shine), so setting up such a system myself was out of the question at that time. However, these days, there is a Docker image by the German chapter that incorporates MediaWiki, Wikibase, Elasticsearch, the Wikibase SPARQL service, and QuickStatements (so cool to see one of  my own tools in there!) in a single package.

Long story short, I set up a new Mix’n’match using Wikibase ad the back-end.

Automatic matches

The interface is similar to the current Mix’n’match (I’ll call it V1, and the new one V2), but a complete re-write. It does not support all of the V1 functionality – yet. I have set up a single catalog in V2 for testing, one that is also in V1. Basic functionality in V2 is complete, meaning you can match (and unmatch) entries in both Mix’n’match and Wikidata. Scripts can import matches from Wikidata, and do (preliminary) auto-matches of entries to Wikidata, which need to be confirmed by a user. This, in principle, is similar to V1.

There are a few interface perks in V2. There can be more than one automatic match for an entry, and they are all shown as a list; one can set the correct one with a single click. And manually setting a match will open a full-text Wikidata search drop-down inline, often sparing one the need to search on Wikidata and then copying the QID to Mix’n’match. Also, the new auto-matcher takes the type of the entry (if any) into account; given a type Qx, only Wikidata items with name matches that are either “instance of” (P31) Qx (or one of the subclasses of Qx), or items with name matches but without P31 are used as matches; that should improve auto-matching quality.

Manual matching with inline Wikidata search

But the real “killer app” lies in the fact that everything is stored in Wikibase items. All of Mix’n’match can be edited directly in MediaWiki, just like Wikidata. Everything can be queried via SPARQL, just like Wikidata. Mass edits can be done via QuickStatements, just like… well, you get the idea. But users will just see the task-specific interface, hiding all that complexity, unless they really want to peek under the hood.

So far with the theory; sadly, I have run into some real-world issues that I do not know how to fix on my own (or do not have the time and bandwidth to figure out; same effect). First, as I know from bitter experience, MediaWiki installations attract spammers. Because I really don’t have time to clean up after spammers on this one, I have locked account creation and editing; that means only I can run QuickStatements on this Wiki (let me know your Wikidata user name and email, and I’ll create an account for you, if you are interested!). Of course, this kind of defeats the purpose of having the community maintain the back-end, but what can I do? Since the WMF has bowed out in silence, the wiki isn’t using the WMF single sign-on. The OAuth extension, which was originally developed for that specific purpose, ironically doesn’t work for MediaWiki as a client.

But how can people match entries without an account, you ask? Well, for the Wikidata side, they have to use my Widar login system, just like in V1. And for the V2 Wiki, I have … enabled anonymous editing of the item namespace. Yes, seriously. I just hope that Wikibase data spamming is a bit in the future, for now. Your edits will still be credited using your Wikidata user name in edit summaries and statements. Yes, I log all edits as Wikibase statements! (Those are also used for V2 Recent Changes, but since Wikibase only stores day-precision timestamps, Recent Changes looks a bit odd at the moment…)

I also ran into a few issues with the Docker system, and I have now idea how to fix them. This includes:

  • Issues with QuickStatements (oh the irony)
  • SPARQL linking to the wrong server
  • Fulltext search is broken (this also breaks the V2 search function; I am using prefix search for now)
  • I have no idea how to backup/restore any of this (bespoke configuration, MySQL)

None of the above are problems with Mix’n’match V2 in principal, but rather engineering issues to fix. Help would be most welcome.

Other topics that would need work and thought include:

  • Syncing back to Wikidata (probably easy to do).
  • Importing of new catalogs, and updating of existing ones. I am thinking about a standardized interchange format, so I can convert from various input formats (CSV files, auto-scrapers, MARC 21, SPARQL interfaces, MediaWiki installations, etc.).
  • Meta-data handling. I am thinking of a generic method of storing Wikidata property Px ID and a corresponding value as Wikibase statements, possibly with a reference for the source. That would allow most flexibility for storage, matching, and import into Wikidata.

I would very much like to hear what you think about this approach, and this implementation. I would like to go ahead with it, unless there are principal concerns. V1 and V2 would run in parallel, at least for the time being. Once V2 has more functionality, I would import new catalogs into V2 rather than V1. Suggestions for test catalogs (maybe something with interesting metadata) are most welcome. And every bit of technical advice, or better hands-on help, would be greatly appreciated. And if the WMF or WMDE want to join in, or take over, let’s talk!

What happened?
On September 24, 2018 a series of malicious edit attempts were detected on translatewiki.net. In general, these included attempts to inject malicious javascript, threatening messages and porn.

Upon detection it was determined that while the attacker’s attempts were unsuccessful there was a vulnerability that if properly leveraged could affect users. Because of the vulnerability it was decided to temporarily disable translation updates until countermeasures could be applied.

What information was involved?
No sensitive information was disclosed.

What are we doing about it?
The security team and others at the foundation have been working with translatewiki.net to add security relevant checks into the deployment process. While we currently have appropriate countermeasures in place we will continue to partner with translatewiki.net to add more robust security processes in the future. Translation updates will go out with the train while we continue to address architectural issues uncovered during the security incident investigation.

John Bennett
Director of Security, Wikimedia Foundation

The assumption that the heart of women and the heart of men are the same proved to be lethal. The "Hartstichting" is a Dutch charity that raises funds to combat heart disease. One of its studies is done by professor Hester den Ruijter of the Utrecht Medical Centre. Her study aims to map those differences and it is part of an effort to provide equal quality medical support for heart matters for both genders.

As a scientist, Mrs den Ruijter was involved in the production of many scholarly papers with many co-authors and this is best presented by Scholia. Yesterday Mrs den Ruijter was only known to Wikidata through her papers. Today she has her own item, the papers have been associated with her and so have been many of her co-authors. Many other authors have their own item who are associated with the research that indicates how the heart and its diseases differs between the genders and differs based on ethnic background.

It is vital to recognise these differences, survival relies on it.

October 13, 2018

  • Albania report: Collections of Museums in Albania
  • Armenia report: GLAM+Wikidata
  • Australia report: WikiTour AU
  • Brazil report: Developing tGLAM: a landing-page generator for GLAM initiatives
  • France report: European Heritage Days; Linked data for archaeology; Paris: Edit-a-thon at Mobilier National
  • Germany report: History of Women and Democracy, Wikipedia-Culture-Ambassadors and two GLAM-on-Tour-stations in just four weeks
  • Macedonia report: Wiki camps in Macedonia
  • Malaysia report: Wikipedia for Galleries, Libraries, Archives and Museum
  • Mexico report: Open GLAM Mexico 2018
  • Netherlands report: >20,000 press photographs 1940-1990 uploaded, GLAM Wiki Meeting, Aerial Photographs, GLAM-Wiki Manual & Wikipedia Course for Historical Societies
  • Norway report: Women in Red; Researhers Days 2018; The 2019 edition of #wikinobel
  • Poland report: Archival photographs and literary knowledge enrich Polish Wikipedia
  • Serbia report: Impact of GLAM seminars: Decentralization of GLAM activities
  • Sweden report: Wikidata P3595 Biografiskt lexikon för Finland; Student Project at the Nordic Museum; Learning about sources on Swedish Wikipedia
  • UK report: Botanical illustrations and Wiki Loves Monuments in Scotland
  • USA report: Back to school
  • Wikipedia Library report: Books & Bytes–Issue 30, August–September 2018
  • Wikidata report: Wikidata Tour Down Under
  • Calendar: October’s GLAM events

October 12, 2018

October 12, 2018

Semantic MediaWiki 3.0 (SMW 3.0.0), the next feature version after 2.5 has now been released.

This new version brings many enhancements and new features such as most notably the reworked "list" and "template" formats, many user interface changes mostly for special pages, extended query syntax (conditions, markers, printouts), grouping of properties, the property create protection feature and new API modules ("smwbrowse" and "smwtask"). ElasticStore was added as a new query backend and not to forget special page "SemanticMediaWiki" was further extended and improved.

See also the version release page for information on further improvements and new features. Additionally this version fixes a lot of bugs and brings stability and performance improvements. Automated software testing was again further expanded to assure software stability. Please see the page Installation for details on how to install and upgrade.

We’re always looking for ways to strengthen the open source ecosystem. Over the past two months, the Developer Advocacy team at the Wikimedia Foundation collaborated with two open source initiatives: Mozilla’s Open Source Student Network (OSSN) and TeachingOpenSource.org’s Professors’ Open Source Software Experience  (TOS and POSSE, respectively). OSSN is designed to bring more students into open source projects, while POSSE aims to provide more professional development resources for professors.

Mozilla’s pilot project, which is currently underway at colleges in the United States and Canada, aims to help university students find suitable open source projects to engage with. Before its launch, Mozilla researched the state of open source on college campuses, as well as criteria that students might consider in choosing a project. Based on this research, Mozilla developed an overview template for project selection that presents the four most relevant (and often not properly surfaced) criteria at a glance: the mission of the project, technology, time needed for setting up the development environment, and how to connect with the community.

POSSE, meanwhile, provides professional development experience to professors interested in teaching open source through Humanitarian Free and Open Source Software (HFOSS) projects. POSSE began as an outreach effort by Red Hat to the higher education community.

Earlier this year, our team, met with the contributors to the Mozilla’s Open Source Student Network (OSSN) and POSSE initiative at the LibrePlanet conference in Cambridge, Massachusetts. We saw an opportunity to further support student contributors on the Wikimedia projects by collaborating with these two initiatives.

There are two primary reasons why we are excited about this collaboration.

First, we have received quite a lot of requests from professors or students interested in teaching open source in academia through Wikimedia projects. We get asked how to assess projects and incorporate them into semester-long coursework. We’ve always wondered how to help them best.

In the past, we have partnered directly with university faculty to help shape projects for their students, which are both educational for them and beneficial for us. For example, last spring we worked with graduate students and faculty in Boise State University’s Technical Writing program to audit and evaluate technical documentation for our Cloud Services team.  The students were able to advise a real world client (our team), and we used their valuable recommendations to inform our approach. We would love to be able to provide focused support like this to everyone who asks. We are often able to  provide a project showcase and developer support help, but we find we also have to defer many requests.

Second, in our year-long research study with 61 new developers to Wikimedia projects, we learned that most new developers who we attract are working professionals. However, students get onboarded through our targeted outreach efforts via mentoring programs such as Google Summer of Code, Outreachy, Google Code-in and mentoring programs at hackathons.

On the whole, this also ties into our bigger goal of collaborating with other FOSS communities for engaging technical contributors.

This collaboration with both the initiatives in its initial state is quite straight-forward and not quite resource intensive! We’ve proposed five newcomer-friendly projects to be listed in the OSSN’s directory of projects: https://projects.ossn.club/. The project and mentor names are listed below:

“Joining forces with Wikimedia for findings more effective ways to support University students while they are trying to contribute code to Open Source Projects is exciting! Students are looking to contribute to projects that are simple, diverse and have an impact; which is precisely what Wikimedia projects offer. I appreciate the time effort the project maintainers are putting aside for onboarding and supporting new contributors and looking forward to working with them.” – Christos Bacharakis, Mozilla Open Source Student Network – Program Manager

For POSSE, we are now listed as an organization in their directory of HFOSS projects under the Education category. The link contains getting started instructions. When we are approached by students and professors, we are willing to offer help!

“We’re excited to have the Wikimedia Foundation working with TOS to engage students in Wikimedia projects,” said Gregory Hislop of the TOS Coordinating Committee. “Today’s students have grown up using Wikipedia, and they are quite interested in learning about the technology and community behind the project. The educational goals of Wikimedia also provide an excellent introduction to computing for social good. We appreciate the Wikimedia Foundation’s leadership in providing opportunities for students.”

With this collaboration, we are hoping to engage more university students in contributing to open source projects and support professors in their efforts of teaching open source through Wikimedia projects. We are looking forward to learning from the research findings that Mozilla will conduct post completion of the first pilot, which we could then leverage to improve some of our processes and workflows of supporting new developers. We would also be curious how professors use our projects to teach in a classroom setting, which we could then be shared with our collaborators in academia.

Srishti Sethi, Developer Advocate, Technical Engagement
Sarah Rodlund, Technical Writer, Technical Engagement
Wikimedia Foundation

Madeleine Shepherd and Anne-Marie Scott, ALD18, CC BY, Lorna M. Campbell

I didn’t manage to post a blog post on Ada Lovelace Day this year because I spent most of my spare time in the run up to the event looking for sources for the twenty contemporary Women in STEM nominated for Wikipedia article creation as part of the University of Edinburgh’s Ada Lovelace Day Editathon. The event itself is always one of the highlights of the year and this year was no exception. We had a really inspiring series of talks in the morning from the University’s Women in STEM and Physics Societies and the student WellComm Kings initiative. Mathematician and maker Madeleine Shepherd of Knot Unknot also came along and showed us her amazing knitted portraits of Ada Lovelace and Mary Somerville, which she created on a hacked knitting machine. We had a range of activities including DIY Filmschool and cake decorating followed by Wikipedia and Wikidata editing in the afternoon.

Back to those sources though…

Finding good quality secondary sources for contemporary academics can be tricky and it’s doubly difficult for female academics whose work is less visible and less widely reported. Wikipedia relies on independent secondary sources; it’s not sufficient for an academic to have published extensively, to be regarded as notable, it’s necessary to show that they have had a significant impact in their field. This can be problematic for female academics, and particularly for women in STEM, who routinely face discrimination on account of their gender.

There was much outrage in the press recently when it was reported that Donna Strickland did not have a Wikipedia entry until she received the Nobel Prize for Physics, with some news reports throwing up their hands in horror at Wikipedia’s gender bias. This isn’t news to anyone who has engaged with or edited Wikipedia of course. We are all well aware of Wikipedia’s gender bias, there’s even a Wikipedia article about it, and we’re working hard to fix it through our Wikimedia chapters, editathons and projects such as Wiki Women in Red. Also as Alex Hinojo pointed out:

In an article titled Wikipedia is a mirror of the world’s gender biases, Wikimedia Foundation’s Executive Director Katherine Maher, noted that it’s somewhat disingenuous for the press to complain about Strickland’s lack of Wikipedia entry when the achievements of women scientists are routinely under reported. We need more reports and independent secondary sources so we can improve the coverage of women on the encyclopaedia.

Wikipedia is built on the shoulders of giants. We’re generalists who learn from the expertise of specialists, and summarize it for the world to share. If journalists, editors, researchers, curators, academics, grantmakers, and prize-awarding committees don’t apply their expertise to identifying, recognizing, and elevating more diverse talent, then our editors have no shoulders upon which to stand. It’s time for these other knowledge-generating institutions to join us in the pursuit of knowledge equity. Wikipedia alone can’t change how society values women, but together we can change how they are seen.

A case in point is Mary Etherington, one of the women nominated for our Ada Lovelace Day editahon. The person who nominated Mary wrote

Mary Etherington was integral to the protection of the Exmoor pony breed after the war. She saw the importance of protecting the breed which was nearly extinct after the ponies had been used as a meat source during rationing and as target practise for the armies on Exmoor.

Whilst she is well known within the Exmoor pony breed, I believe she may be lost to time due to her rural links and the general lack of representation for rural matters on Wikipedia as well as her being a woman.

I really struggled to find many good sources about Mary online, but one of our editathon participants, Vicki Madden, was captivated by her story and determined to create an article about her. After some creative research and round about thinking, Vicki and Anne-Marie were able to find a whole range of independent sources and Mary Etherington now has her own shiny new Wikipedia entry.

Meanwhile I wrote an article on Tara Spires-Jones Professor of Neurodegeneration and Deputy Director of the Centre for Discovery Brain Sciences at the University of Edinburgh. I don’t know Tara personally but in her nomination she was described as:

World-leading research into molecular mechanisms of dementia. Works tirelessly to promote public understanding of science through expert comment in press and public engagement activities. Lovely person and very supportive of other women.

I hope her new Wikipedia article will help to raise awareness of her work to the general public and go a little way to replaying the support she has provided to others.

October 11, 2018

Writing about Islam in Hebrew: Dar AsSalam Editors

Last January, Wikimedia communities around the world celebrated Wikipedia’s seventeenth birthday—and Wikimedia Israel joined in. Held at the IBM office in Tel Aviv, nearly 60 Wikimedians attended, including a group of Palestinian editors from the Dar AsSalam Islamic Center (meaning House of Peace) in Kafr Qara, Haifa district.

During the event, the Dar As-Salam group expressed their interest in volunteering to edit the Hebrew Wikipedia’s pages about Islam as a means to share Islamic culture with the Hebrew Wikipedia reader. A resulting few months of collaboration between the Dar AsSalam group and the Hebrew Wikipedia community has resulted the creation of 60 new pages, in addition to 150 others improved.

It all started when the Dar AsSalam group introduced themselves to the Hebrew Wikipedia community. They offered to improve the content about Islam, Islamic culture, history and theology on the Hebrew Wikipedia. In their conversations with veteran Hebrew Wikipedia editors, the group mentioned their need for support, and they got some from Avner Avandrovic, veteran Hebrew Wikipedian and a member of the training staff at Wikimedia Israel. Avandrovic traveled to Kafr Qara in March 2018 to lead a Wikipedia editing workshop. Moderated by Bekriah S. Mawasi, the Arabic Education coordinator at Wikimedia Israel, the meetings took place in Dar AsSalam center, located inside the Nour al-Haq mosque in Kafr Qara.

Dar AsSalam center is an information center aiming at providing information about Islam to Muslims and to visitors from different backgrounds and religions. The main objective of the center is to print out and give away Islamic books promoting the values of tolerance, moderation, openness, modesty and spirituality. Understanding other religions, develops curiosity about Islam. Another goal is creating a space for mutual understanding between different cultures in Israel as a way to promote a shared life, peace and respect for each other’s culture. The center publishes books and pamphlets explaining Islam, in both Hebrew and Arabic, including translation of the meanings of the Holy Koran.

The editing course at Dar AsSalam included three meetings. Twelve Arabic speaking participants participated in the workshops and edited content in Hebrew. In addition to learning technical skills and editing the Hebrew Wikipedia, editors were introduced to the norms and policies of the Hebrew Wikipedia community and were encouraged to participate in discussions. They also learned about encyclopedic writing on religious topics. The group showed proficient understanding and full collaboration, and by the third meeting in August, they had written over 60 new articles and edited more than 150 other ones. The articles were diverse in content, ranging from books, theology, and profiles of distinguished religious and historical figures. The group created new categories too and the mentioned achievements were completed within one month. They look forward to continuing in the future.

Members of Dar AsSalam made prolific contributions to the Hebrew Wikipedia with references in English and Hebrew. Wikimedia Israel appreciates the commitment the group showed to free knowledge. Five of the group members joined the celebrations of the 15th anniversary of Hebrew Wikipedia which took place in Tel Aviv on 10 August 2018. Such collaborations are great opportunities to share diverse opinions within the Wikipedia communities and enhances collaborative content writing in the largest knowledge base.

Bekriah S. Mawasi, Wikimedia Israel

GLAM activity kicks off in Indonesia with digitizing museum collections

Due to policy issues, museums in Indonesia are not widely utilized in education and learning. There has been some progress in the governance of the cultural sector, particularly by passing the Law of Cultural Advancement No.5/2017 last year. However, the artists and art communities are by far noted as the primary parties making considerable contributions within the Indonesian art world.

Given this landscape, a new GLAM-Wiki project could make a major difference in people’s access to digital collections in the Indonesian GLAM sector.

On 30 and 31 May, Wikimedia Indonesia, together with Indonesian Arts Coalition, arranged a visit to Dr. H.C. Oemboe Hina Kapita Museum in Waingapu, East Nusa Tenggara. The museum allowed us to digitize their collection, which ranged from ceramic plates, ritual ceremony equipment, a few of traditional percussions, and of course, Sumbanese’s proud: Weft ikat, traditional pattern textile.

On the first day, we started a discussion with the participants about how museums can respond to the digital era by utilizing technology to engage with the public. It is as easy as maintaining the information through apps and social media or using multimedia. The audience included museum staff, art practitioners, indigenous people and officials from the Office of Culture, Tourism, Public Library and Archive.

Most locals can afford smartphones and frequently go online on social media. Thus, it is merely a matter of first; the lack of references on various internet platforms, and second; the lack of awareness about the internet. We are looking forward to doing some outreach work introducing Wikimedia Commons and Creative Commons licenses, in which they are embraced to be more active in sharing files which link to the local heritage they possess.

On the second day we held a session for the participants to try their hands on Wikimedia Commons. We invited the museum staff members to guide the participants while they look around and pick the objects which represented local culture to be digitized, and this was the interesting part. The section where ceramic plates stored became our first destination. It was nice to see that at this phase both the staff and the audience starting to discuss the objects, which are believed to have a strong relationship with the Chinese trading, exchanging information to each other whilst enriching the story behind it. For instance, there was new information on one of the plates found inside the cemetery in an archaeological site.

Over the two day event, we have successfully digitized and uploaded 18 images to Wikimedia Commons, including short descriptions about the objects.

“I could easily follow the instructions. There were no technical problems while working on Wikimedia Commons. I suppose we can do more to promote arts and culture in Sumba,” said Gusti Dida, a museum staff member who became a Wikimedia Commons contributor since then

The Sumba Museum, as it also is called, is the only museum in Waingapu, West Sumba, where the island is sometimes visited by local and foreign tourists because of its beautiful savanna, horse riding culture, and famous colorful weft ikat. The museum once temporarily closed and reopened in 2016, yet the collections are not well maintained. We were lucky to be invited by the Secretary of the Office of Culture and Tourism to hold this first GLAM activity in a local museum in Indonesia.

Indonesian art and culture in the digital landscape

Geographical and governance factors are the biggest challenges we are facing in building a digital environment for the Indonesian art and culture. Based on that, Wikimedia Indonesia is seeking to collaborate with local cultural institutions to digitize their archives as well as persuaded them to be the contributors to all kinds of Wikimedia platforms. Majalah Horison, one of the oldest Indonesian literature magazine and Majalah Kajawen, the Javanese magazine during colonial times are examples of the digitization projects that running ahead.

Now the museum staff of Dr. H.C Oemboe Hina Kapita Museum are able to produce and share knowledge by actively sharing their collection on Wikimedia Commons. At the end of the event, some participants were quite excited and offered the plan of making such follow-up activity in order to increase public attention towards the museum.

Any contribution on Wikimedia projects related to Indonesian art and culture could be seen as ways to building wider network and existence on the Internet, emphasizing the local’s possession of their own culture, their own narratives that complement the interpretation by others.

Annayu Maharani, Researcher, Indonesian Arts Coalition and member, Wikimedia Indonesia

In brief

The Wikipedia Library names Shweta Yadav as their star coordinator for April to June: “In 2007 while still a school student, our teacher asked us not to prepare assignments using Wikipedia considering it a non-reliable source,” Shweta Yadav recalls. This was the beginning of Shweta’s curiosity towards Wikipedia and to understanding how it worked. Shweta further explained that even 10 years after these instructions, she become a culprit of the same criticisms as she found herself offering the same advice to her students. In an attempt to do what was right and to clear her conscience in teaching her students the right way to use an encyclopedia, she first participated in a 100 Wikidays challenge. She was introduced to the campaign by a colleague who got her editing. Through these efforts she discovered her curiosity was justified as the instructions given to her may have not been quite accurate.

Shweta gradually started outreach activities in Karnal to increase awareness amongst students and academics about her discovery, and her plight to completely detach the “unreliable tag” to Wikipedia led her to The Wikipedia Library Program (TWL). She volunteered to become the coordinator for the Hindi Branch. “As a researcher I knew how knowledgeable a well-referenced article could be and that the availability of resources will attract good editors and increase quality of content.” Her thoughts on how crucial well-referenced articles are to Wikipedia led her to the creation of the first branch in India, the Hindi TWL Branch.

Bengali Wikisource community releases three promotional videos: The Bengali Wikisource community has launched an awareness campaign about their project. The campaign has started as an idea on the IdeaLab on Meta-Wiki, which has qualified it to be funded by a Wikimedia Foundation grant. More information and links to the videos on Wikimedia-l.

Call for volunteers to join the Wikimedia Ombudsman Commission: The Ombudsman Commission investigates complaints about infringements of the Privacy Policy, the Access to nonpublic information policy, the CheckUser policy and the oversight policy on any Wikimedia project for the Board of Trustees. They also investigate for the Board the compliance of local CheckUser or Oversight policies or guidelines with the global CheckUser and Oversight policies. The Commission is calling for new volunteers to join. More information and how to apply on Wikimedia-l.

Call for proposals to host Wikimania 2020: The Wikimania Steering Committee and Wikimedia Foundation are seeking expressions of interest from interested parties for hosting Wikimania 2020. More information can be found on Meta-Wiki.

The Wikipedia and Education user group board election results are announced: The Wikipedia & Education user group aims at to enhance and scale local and global educational efforts, advocate for education within the Wikimedia movement, and elevate the narrative of education outside our movement. Last week, the first board election results were announced and the board member names were shared on Wikimedia-l.

Punjabi Wikimedians hold a Wiki Loves Food editathon: Punjabi Wikimedians is organizing a Wiki Loves Food edit-a-thon which is happening in collaboration with the World Heritage Cuisine Summit & Food Festival 2018. The organizers are inviting Wikimedians to join hands with them to take part in the international contest to create content about food items in respective languages. These articles will be presented in the form of QR Codes at the World Heritage Cuisine Summit 2018 to be held in Amritsar, Punjab, India. More information about the contest is on Meta-Wiki.

Samir Elsharbaty, Writer, Communications
Wikimedia Foundation


Dr. Thomas Peace taught a course called Crises and Confederation at Huron University College in the spring. The course focused on Canadian history from 1867 to the present and explored four main themes: Indigenous peoples, language and multiculturalism, war, and gender.

“In the past I have had students prepare proposals for exhibits that connect the broad themes to our local context in London, Ontario,” Dr. Peace writes about the course. “This year, I’m planning to shift the proposal into a Wikipedia article on a specific moment of historical significance, asking them to include in their article a photograph of an artifact or location here in London that connects to the broader subject.”

Wiki Education assists instructors, like Dr. Peace, who want to incorporate Wikipedia editing into their higher education courses. We provide assignment templates, how-to trainings for students, and staff support to help ensure students make meaningful contributions to the site and have a good experience doing it. Dr. Peace’s students did just that, adding more than 40,000 words to Wikipedia articles on a wide range of topics. In addition to expanding 17 different Wikipedia articles, students created five new ones.

One such new article is about the Black Power movement in Montreal during the 1960s. Student editor User:Pridenkom wrote more than 6,000 words in the article and cited 17 references. Pridenkom also found an existing photo on Wikimedia Commons to illustrate the information.

The Black Power movement in Montreal began in the 1960s, building off of decades of frustration over structural racism affecting Montreal’s black community. The movement sought change from cultural, economic, and political angles and found inspiration in other movements of the time like the Harlem RenaissanceGarveyismPan-Africanism, and Rastafari. The movement culminated in a student occupation of Sir George Williams University in 1969, which inspired conversations about racism both within the Montreal community and internationally. Thanks to Dr. Peace’s student, anyone with internet access can read about this history on Wikipedia.

Interested in teaching with Wikipedia? Visit teach.wikiedu.org for all you need to get started. Or reach out to contact@wikiedu.org for more information about how you and your students can get involved.

Header imagesFile:Drapeau de Montréal (2311120212).jpgabdallahh, CC BY 2.0, via Wikimedia Commons.
QuickStatements logo – Wikimedia Commons CC BY-SA 4.0

By Charles Matthews, Wikimedian in Residence at ContentMine

With the end of October, Wikidata’s birthday comes round once more, and on the 29th it will be six years old. With the passing of time Wikimedia’s structured data site grows, is supported by an increasingly mature set of key tools, and is applied in new directions.

Fundamental is the SPARQL query tool at query.wikidata.org, an exemplary product of Wikimedia Foundation engineering. But I wanted to talk here about its “partner in crime”, the QuickStatements tool by Magnus Manske, which is less known and certainly comparatively undocumented. QuickStatements, simply put, allows you to make batch edits that add hundreds or thousands of statements to Wikidata.

So QuickStatements is a bot, but importantly you don’t need to be a bot operator to use it. You do need to have an account on Wikidata (which is automatic if you have a Wikipedia account). And you do need to allow QuickStatements to edit through your account. That can be carried out by means of a WiDar login. For that you simply need to go to https://tools.wmflabs.org/widar/ and click the button.

So far, so good. Now we need to look at your “use case”: the data you have that you think should be in Wikidata. How is it held, and how far have you got in translating it into Wikidata terms? Are you envisaging simple Wikidata statements, or are you reckoning on adding qualifiers, or references, or both? One of the issues with the documentation I come across is that “or both” may be the underlying assumption, but it can make it harder to see the wood for the trees.

Charles Matthews and Jimmy Wales at Wikimania 2018 – Wikimedia Commons CC BY-SA 4.0

A further question that is fundamental is whether you are adding statements to existing items, or creating new items with statements. In the first case, without qualifiers or references (though referencing matters greatly, on Wikidata as on Wikipedia), we can say straightforwardly that you’ll need three columns of data. At the very least, understanding this case is the natural place to start.

Let the first column be the items you’ve identified that need to have statements added. Getting this far may indeed be the most important step. If you have a list of people, or of places, they need to be matched correctly to Wikidata Q-numbers. Proper names are very often ambiguous: for example Springfield shows 41 places called Springfield (where fictionally The Simpsons live: the idea that there is one in every state is an urban myth, it turns out). Matching into Wikidata is a cottage industry in its own right, around the mix’n’match tool.

Suppose then you have your first column in good shape. You now need properties (basically predicates in the statements), and either objects, for forming predicates, or other strings, depending on the type of property. For example, if what you have is a list of people born in Birmingham, UK, you need a second column for P19, “place of birth”, and a third column of Q2256. For the population of a place you need P1082, for books where you are adding publication date there is property P577. You always need a second column which is filled with the property code, and then a third column giving the “object” data.

So the assumption is that you are now manipulating the data in a spreadsheet. I find filling a column in Google Sheets can be troublesome, because it wants to increment numbers, so I use a dummy word to fill and then apply find-and-replace.

To avoid disappointment, you also really need to read the instructions, some time or other. These explain that string values such as numbers need to be in “quotes”, but dates need to have a code appended. More spreadsheet skills may therefore be needed, to wrangle the data, but such is modern life.

The payoff comes in being able to paste from the spreadsheet columns into QuickStatements. That introduces the tab characters spoken of in the documentation.

Actually this is not the pro way to use the tool, but does fine anyway: it is officially “Version 1 format” of the “old interface” of QuickStatements2 ,. Under the “Import commands” menu select Version 1, and paste into the “Import V1 commands” box. Click the “Import” button for a preview, and then the “Run” button. You should definitely run a small test first.

Charles Matthews and Martin Poulter at WikidataCon2017 – Wikimedia Commons CC BY-SA 4.0

QuickStatements runs quite slowly for a bot, taking about a second over each statement. Since the edits are credited to your account, you can see them happening through the “Contributions” link you have when logged in on Wikidata. A top tip is to use the analytics tool, which is easy to do with the property number in the “Pattern” field by setting the approximate times of the run.

There is quite a lot more to learn, obviously. For example, for populations of towns, a qualifier with P585 for “point in time” is the first request anyone would make, and a reference perhaps the second. So more data work, but the same process of creation.

QuickStatements is a workhorse behind numerous other Wikidata tools that create items or add statements to them. In my Wikimedian in Residence work on the ScienceSource project we will use it both on our own wiki to move in text-mining data, and for exporting referenced facts from biomedical articles to Wikidata itself. For more about Wikidata and that project, there is a Wikidata workshop in Cambridge on 20 October.


October 10, 2018

The great potential of AI: Scaling wiki-work

At Wikimedia, AIs help us support quality control work, task routing, and other critical infrastructures for maintaining Wikipedia and other Wikimedia wikis. To make it easier to support wiki-work with AI’s, we built and maintain ORES, an open AI service that provides several types of machine predictions to Wikimedia volunteers. For example, ORES’ damage detection system flags edits that appear (to a machine) to be sketchy, and this flag helps steer a real live human to review that edit, who will make sure it gets cleaned up if it is a problem. By highlighting edits that need review, we can reduce the overall reviewing workload of our volunteers by a factor of 10. This turns a 270 hours per day job into a 27 hours per day job. This also means that Wikipedia could grow by 10 times and our volunteers could keep up with the workload.

ORES' edit quality models flag edits that are likely to be vandalism for review.

By deploying AIs that make Wikipedians’ work more efficient, we make it easier to grow—to keep Wikipedia’s doors open. If counter-vandalism or another type of maintenance work in Wikipedia were to overwhelm our capacity, that would threaten our ability to keep Wikipedia the free encyclopedia that anyone can edit. This is exactly what happened recently around one of the quality control activities in Wikipedia. Reviewing new articles for vandalism and notability is a huge burden in English Wikipedia. Over 1,600 new article creations need to be reviewed every day. The group of volunteers who review new articles couldn’t keep up, and in the face of a growing backlog, they decided to disable new article creation for new editors. We’re actively working on AIs that can help filter and route new page creations, to lessen the workload and to make it easier to reopening article creation to new editors.

Without a strategy for increasing the efficiency of review work, reopening article creation to new editors would just re-introduce the same burden.

JADE collects examples of human judgement and ORES learns from that judgement to make predictions.

Biases and diversity

But with all of the efficiency benefits that come with AI, we must be wary of problems. Humans have biases whether we choose to or not. When we train AIs to replicate human judgement, we can hope that at best, those AIs will only be biased in the same ways as their instructors. Worse, these and additional biases can appear in insidious ways because an AI is far more difficult to interrogate than a real live human being. Recently, the media and the research literature has been discussing ways in which bias creeps into AIs like ours. For example, Zeynep Tufecki has warned that “We are in a new age of machine intelligence, and we should all be a little scared.”[1]And when we first announced ORES to our editing community, our own Wnt urged us to “Please exercise extreme caution to avoid encoding racism or other biases into an AI scheme. […] My feeling is that editors should keep a healthy skepticism – this was a project meant to be written, and reviewed, by people.”[2] We agree. AI-reinforced biases have the potential to exacerbate already embattled diversity in Wikipedia—especially if they will be used to help our volunteers more efficiently reject contributions that don’t fit the mold of what is typically accepted.

In order to directly address these biases, we’re trying a strategy that may be novel but will be familiar to Wikipedians. We’re working to open up ORES, our AI service, to be publicly audited by our volunteers. When we first deployed ORES, we noticed that many of our volunteers began to make wiki pages specifically for tracking its mistakes (Wikidata, Italian Wikipedia, etc.). These mistake reports were essential for helping us recognize and mitigate issues that showed up in our AIs predictions. Interestingly, it was difficult for any individual to see any of the problematic trends. But when we worked together to record ORES’ mistakes in a central location, it became easier to see trends and address them. You can watch a presentation by one of our research scientists on some of the biases we discovered and how we mitigated the issues.

By observing how Wikipedians gathered reports, we were able to recognize a set of pain points that made the process of auditing ORES difficult. From that, we’ve begun to design JADE—the Judgement and Dialog Engine. JADE is intended to make the work of critiquing our AIs easier by providing standardized ways to agree or disagree with an AI’s judgement. We’re building in ways for our volunteers to discuss and refine their own examples for training, testing, and checking on ORES in ways that are just too difficult to do with wiki pages.

Open auditing and the future

We think we’re onto something here. We need ORES and other AIs in order to help our volunteers scale up their wiki-work—to build Wikipedia and other open knowledge projects efficiently and in a way that aligns with our values of open knowledge. JADE is intended to put more power into our volunteers’ hands, to help us make sure that we can take full advantage of AIs while efficiently detecting and addressing the biases and bugs that will inevitably appear. In this case, we’re hoping to lead by example. While some organizations responsible for managing online platforms actively prevent audits of their algorithms to protect their intellectual property (see the lawsuit by C. Sandvig et al.[3]), we’re preemptively opening up our AIs to public audits in the hope of making them better and more human.

Aaron Halfaker, Principal Research Scientist, Scoring Platform
Wikimedia Foundation


  1. Hope Reese, “It’s time to become aware of how machines ‘watch, judge, and nudge us,’ says Zeynep Tufekci,” Tech Republic, 29 September 2015, https://www.techrepublic.com/article/its-time-to-become-aware-of-how-machines-watch-judge-and-nudge-us-says-zeynep-tufekci/.
  2. EpochFail, とある白い猫, and He7d3r, “Revision scoring as a service,” The Signpost, 18 February 2018, https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2015-02-18/Special_report.

We use both synthetic and RUM testing for Wikipedia. These two ways of testing performance are best friends and help us verify regressions. Today, we will look at two regressions where it helped us to get metrics both ways.

In our synthetic lab environment, we update the browser version in a controlled way. When there’s a new browser release, we wait for a new Docker container with the latest version. We stop the current tests, update the Docker container and restart the tests and look at the metrics that we graph in Grafana. That way, we can check whether a new browser version introduced a regression.

Our users’ browsers usually upgrade slowly. The browser vendors usually push the browser to a percentage of users first, and then give the green light to update all of them. When we collect performance metrics, we also collect browser names and versions. That way we can see when users pick up a new browser version and if that version has any impact on our metrics. The adoption of new versions by real users takes time, and when we see a regression in our synthetic testing, it can take a couple of weeks until we see the same effect in our user metrics.

Chrome 67

When we pushed Chrome 67 we noticed a regression in our first visual change synthetic testing (T196242).

Here you can see what it looked like for our test of https://en.wikipedia.org/wiki/Facebook. The blue vertical line is when we pushed the new browser version.

Most of the pages we test were affected, but not all of them. For our tests of the "Barack Obama" English Wikipedia article, it was hard to spot any change at all.

We could only see the change on desktop. And we could verify the regression in both of our synthetic tools (WebPageTest and Browsertime/WebPageReplay). Could it be some content that causes that regression, since it only affected some pages?

Next step

When we see a regression, we always first try to rule out that it has something to do with a change we have done. If the regression happens when we update the browser in our tests, it’s easy: we roll back the browser version and collect new metrics to see if the metrics goes back down. And then we update to the new version again. In this case, we confirmed it was only the browser causing our first visual change metric to jump. (Not a change in our content.)

When we find a browser regression, we try to collect as much data as possible and file an upstream issue. In this case it became Chromium issue 849108.

The next step is to see if we can find the same change in the metrics that we collect directly from users. The firstPaint metric in Chrome is similar to the first visual change metric we use in our synthetic testing. Which means that when we have enough traffic coming from Chrome 67, we should be able to see the change on first paint.

The conversion rate from Chrome 66 to 67 looked like this:

If you look real closely, you can see that around the 15th of June we started getting enough traffic for Chrome 67 to see the effect on our metrics.

To see the change in Chrome, we look at the metrics we collect from all versions of Chrome and check the median and 75th percentile of first paint.

In the following graph, we take the average over one day to try to minimize spikes. If you look at the right side (Chrome 67) of the graphs you can see that it has a slightly higher first paint than to the left (Chrome 66).

To verify the metrics, we also looked at first paint on mobile. There’s no regression there, it rather looks like there could be a small win in first paint.

To be 100% sure that there’s nothing we introduced, we take another look at synthetic testing at that time when the increase in first paint was seen for real users (15th of June).

There’s no increase in the metrics from synthetic tests at that time. This confirms it was a (small) regression in Chrome 67.

Chrome 69

Some time ago, our performance alerts in Grafana fired about first paint in Chrome having increased for our users. We looked at it, and couldn’t find an exact issue that could have caused it. It looked like the metric had slowly increased over time. That type of regression are always the hardest to deal with because it’s hard to see exactly what’s causing the regression.

We could see the regression both on desktop and mobile. It was most obvious when we checked the first paint on mobile. You can see the weekly pattern we have but the highs are getting higher and higher.

But we actually had the answer: When we updated to Chrome 69 in our synthetic testing a couple of weeks ago, we again saw an increase in first visual change. This time, we could see the regression on some wikis but not all of them. We’ve switched back and forth between Chrome 68 and 69 and first visual change for the Japanese wiki looked like this:

This time, it seems like a bigger impact on first visual change. We track this issue in T203543 and filed an upstream bug with Chromium.

Is this the same regression as we see in RUM? Let us look again at when the majority of Chrome users switched from 68 to 69.

And then let’s go back to first paint metric. First, we look at our metric for desktop only. Around September 22nd almost all traffic was from 69, but you can also see that it was introduced in early September.

It looks like when Chrome 69 was introduced, first paint slowly rose and then when all our metrics were collected from 69, both median and 75th percentile were higher than with 68.

What does it look like for mobile?

We see the same pattern here. Let us check our synthetic testing at the same time, to see if we could have introduced a code change that affected first visual change. Our metrics on mobile are even more stable than desktop. We look at Swedish Wikipedia, because we didn’t deploy any change on that test server during this period.

No regression there. It looks like this also could be a performance regression in Chrome.


Working with both synthetic metrics and metrics from real users, helps us to confirm issues. In this case, it helped us to find two browser regressions that impact our users. We hope that we can get help from the Chromium team to resolve these issues.

This column was first published in Finnish at Suur-Jyväskylän Lehti on 10th October 2018. 

Last weekend group of Finnish wikimedians, including me, attended Wikimedia Northern Europe Meeting in Stockholm. Over the years various organisations have been created around Wikipedia: there is the American foundation and different kind of user groups and chapters that want to support the activities of Wikipedists and volunteers in other Wikimedia projects locally. Some of these local organizations are occupied purely by volunteers, while some have also permanent employees or project-related employees.

Wikinem October 2018 03

The WIKINEM participants. I am the one with they grey hat in the first row. Image: Arild Vågen, CC BY-SA 4.0

The highlight of the year in the Wikimedia community is the Wikimania event that brings together both people from different Wikimedia organisations as well as volunteers. In order to get as many people as possible to participate in Wikimania, it has been organised in different parts of the world, most recently in Mexico, Italy and South Africa. Next summer Wikimania will be held for the first time in Northern Europe, Stockholm, and this was one of the most important reasons behind our gathering.

But there are also other reasons why we want to intensify the cooperation between the Nordic countries and the Baltic countries. We are close to each other, both geographically and culturally. Each Wikimedia project has however its own culture and a good example of this is the recent hullabaloo around the English Wikipedia article of Donna Strickland, the Nobel Prize winner. Somebody had tried to create an article for her already in 2014, but it was then removed because of copyright infringement. However e.g. the readers of The Guardian may have thought that Strickland was not considered significant enough for Wikipedia before the Nobel Prize.

Wikipedia editors decide themselves about which subjects they are writing to Wikipedia. This is why there is actually nobody that you can blame if a certain significant person is still missing a Wikipedia article. However, these so called wiki gaps can be influenced for example by organizing editing competitions where editors are producing the missing content together. About 17% of Wikipedia’s articles of people are telling about women, and when almost the same number of editors, 16%, are women, one could imagine that these things are connected.

I think that also the masculine culture of our society should be blamed for the gap. For example if you are asked to make a list of experts in some field, your list can become very manly unless you consciously mind the issue. If you don’t believe this, try asking your acquaintances to name the professionals of a certain field! With help of women-related competitions run by Wikimedia movement the attitude of active Wikipedists for example in Norway has already changed: ”Let’s start by making articles about women’s football players and add men players after them,” they may say now.

Further reading:
Two months of Women’s Day events: over 1000 new female biography articles created  in events in Finland
Exploring Wikimedia’s gender gap with six contributors from Scandinavia

The post Ladies first, also on Wikipedia appeared first on Wikimedia Suomi.

All humans move plants, most often by accident and sometimes with intent. Humans, unfortunately, are only rarely moved by plants. 

Unfortunately, the history of plant movements is often difficult to establish. In the past, the only way to tell a plant's homeland was to look for the number of related species in a region to provide clues on their area of origin. This idea was firmly established by Nikolai Vavilov before he was sent off to Siberia, thanks to Stalin's crank-scientist Lysenko, to meet an early death. Today, genetic relatedness of plants can be examined by comparing the similarity of DNA sequences (although this is apparently harder than with animals due to issues with polyploidy). Some recent studies on individual plants and their relatedness have provided insights into human history. A study on baobabs in India and their geographical origins in East Africa established by a study in 2015 and that of coconuts in 2011 are hopefully just the beginnings. These demonstrate ancient human movements which have never received much attention from most standard historical accounts.

Unfortunately there are a lot of older crank ideas that can be difficult for untrained readers to separate. I recently stumbled on a book by Grafton Elliot Smith, a Fullerian professor who succeeded J.B.S.Haldane but descended into crankdom. The book "Elephants and Ethnologists" (1924) can be found online and it is just one among several similar works by Smith. It appears that Smith used a skewed and misapplied cousin of Dollo's Law. According to him, cultural innovation tended to occur only once and that they were then carried on with human migrations. Smith was subsequently labelled a "hyperdiffusionist", a disparaging term used by ethnologists. When he saw illustrations of Mayan sculpture he envisioned an elephant where others saw at best a stylized tapir. Not only were they elephants, they were Asian elephants, complete with mahouts and Indian-style goads and he saw this as definite evidence for an ancient connection between India and the Americas! An idea that would please some modern-day Indian cranks and zealots.

Smith's idea of the elephant as emphasised by him.
The actual Stela in question
 "Fanciful" is the current consensus view on most of Smith's ideas, but let's get back to plants. 

I happened to visit Chikmagalur recently and revisited the beautiful temples of Belur on the way. The "Archaeological Survey of India-approved" guide at the temple did not flinch when he described an object in the hand of a carved figure as being maize. He said maize was a symbol of prosperity. Now maize is a crop that was imported to India and by most accounts only after the Portuguese sea incursions into India in 1492. In the late 1990s, a Swedish researcher identified similar  carvings (actually another one at Somnathpur) from 12th century temples in Karnataka as being maize cobs. It was subsequently debunked by several Indian researchers from IARI and from the University of Agricultural Sciences where I was then studying. An alternate view is that the object is a mukthaphala, an imaginary fruit made up of pearls.
Somnathpur carvings. The figures to the
left and right hold the puported cobs in their left hands.
(Photo: G41rn8)

The pre-Columbian oceanic trade ideas however do not end with these two cases from India. The third story (and historically the first, from 1879) is that of the sitaphal or custard apple. The founder of the Archaeological Survey of India, Alexander Cunningham, described a fruit in one of the carvings from Bharhut, a fruit that he identified as custard-apple. The custard-apple and its relatives are all from the New World. The Bharhut Stupa is dated to 200 BC and the custard-apple, as quickly pointed out by others, could only have been in India post-1492. The Hobson-Jobson has a long entry on the custard apple that covers the situation well. In 2009, a study raised the possibility of custard apples in ancient India. The ancient carbonized evidence is hard to evaluate unless one has examined all the possible plant seeds and what remains of their microstructure. The researchers however establish a date of about 2000 B.C. for the carbonized remains and attempt to demonstrate that it looks like the seeds of sitaphal. The jury is still out.
I was quite surprised that there are not many writings that synthesize and comment on the history of these ideas on the Internet and somewhat oddly I found no mention of these three cases in the relevant Wikipedia article (naturally, fixed now with an entire new section) - pre-Columbian trans-oceanic contact theories

There seems to be value for someone to put together a collation of plant introductions to India along with sources, dates and locations of introduction. Some of the old specimens of introduced plants may well be worthy of further study.

Introduction dates
  • Pithecollobium dulce - Portuguese introduction from Mexico to Philippines and India on the way in the 15th or 16th century. The species was described from specimens taken from the Coromandel region (ie type locality outside native range) by William Roxburgh.
  • Eucalyptus globulus? - There are some claims that Tipu planted the first of these (See my post on this topic).  It appears that the first person to move eucalyptus plants (probably E. globulosum) out of Australia was  Jacques Labillardière. Labillardiere was surprized by the size of the trees in Tasmania. The lowest branches were 60 m above the ground and the trunks were 9 m in diameter (27 m circumference). He saw flowers through a telescope and had some flowering branches shot down with guns! (original source in French) His ship was seized by the British in Java and that was around 1795 or so and released in 1796. All subsequent movements seem to have been post 1800 (ie after Tipu's death). If Tipu Sultan did indeed plant the Eucalyptus here he must have got it via the French through the Labillardière shipment.  The Nilgiris were apparently planted up starting with the work of Captain Frederick Cotton (Madras Engineers) at Gayton Park(?)/Woodcote Estate in 1843.
  • Muntingia calabura - when? - I suspect that Tickell's flowerpecker populations boomed after this, possibly with a decline in the Thick-billed flowerpecker.
  • Delonix regia - when?
  • In 1857, Mr New from Kew was made Superintendent of Lalbagh and he introduced in the following years several Australian plants from Kew including Araucaria, Eucalyptus, Grevillea, Dalbergia and Casuarina. Mulberry plant varieties were introduced in 1862 by Signor de Vicchy. The Hebbal Butts plantation was establised around 1886 by Cameron along with Mr Rickets, Conservator of Forests, who became Superintendent of Lalbagh after New's death - rain trees, ceara rubber (Manihot glaziovii), and shingle trees(?). Apparently Rickets was also involved in introducing a variety of potato (kidney variety) which got named as "Ricket". -from Krumbiegel's introduction to "Report on the progress of Agriculture in Mysore" (1939) [Hebbal Butts would be the current day Airforce Headquarters)

Further reading
  • Johannessen, Carl L.; Parker, Anne Z. (1989). "Maize ears sculptured in 12th and 13th century A.D. India as indicators of pre-columbian diffusion". Economic Botany 43 (2): 164–180.
  • Payak, M.M.; Sachan, J.K.S (1993). "Maize ears not sculpted in 13th century Somnathpur temple in India". Economic Botany 47 (2): 202–205. 
  • Pokharia, Anil Kumar; Sekar, B.; Pal, Jagannath; Srivastava, Alka (2009). "Possible evidence of pre-Columbian transoceanic voyages based on conventional LSC and AMS 14C dating of associated charcoal and a carbonized seed of custard apple (Annona squamosa L.)" Radiocarbon 51 (3): 923–930. - Also see
  • Veena, T.; Sigamani, N. (1991). "Do objects in friezes of Somnathpur temple (1286 AD) in South India represent maize ears?". Current Science 61 (6): 395–397.
Dubious research sources
  • Singh, Anurudh K. (2016). "Exotic ancient plant introductions: Part of Indian 'Ayurveda' medicinal system". Plant Genetic Resources. 14(4):356–369. 10.1017/S1479262116000368. [Among the claims here are that Bixa orellana was introduced prior to 1000 AD - on the basis of Sanskrit names which are assigned to that species - does not indicate basis or original dated sources. The author works in the "International Society for Noni Science"! ] 

October 09, 2018

Ada Lovelace, the first computer programmer.

Today is Ada Lovelace Day, a day that stresses the importance of acknowledging, documenting, and celebrating the achievements of women in STEM. Women have made valuable contributions to science and mathematics throughout the ages, but aren’t remembered in history as often or as accurately as their male colleagues.

Ada Lovelace, for example, did not receive the recognition she deserved during her lifetime for her contributions to technology. As the first computer programmer, her work built the foundation of modern computer technology – an accomplishment we now acknowledge in the history books and by celebrating days like today.

“Though the canon has perpetually erased the contribution of women and their work has been systematically discredited, devalued and derided, their light has doggedly broken through the cracks,” writes Harriet Hall about Ada Lovelace Day.

Wikipedia is another canon of sorts, as it houses the largest collection of knowledge in the world. Wikipedia has its own set of systemic biases, but it is also a unique resource in that its information is in constant flux. Wikipedia editors add, update, and remove information based on the latest in research and journalism. That’s a great opportunity to take an active role in making recorded knowledge more representative of women’s lives and accomplishments. Anyone can edit and help close the gender gap!

“Research illustrates that a sense of belonging is critical to success. Yet our history books and ‘books’ like Wikipedia (the 5th most visited website in the world) reflect a very white, very male centric view on everything – including science and scientists,” says Dr. Rebecca Barnes of Colorado College, an instructor in our program.

That’s why Dr. Barnes is having all of her students write Wikipedia biographies of women in STEM as an assignment this academic year. Using our resources and systems of support, Dr. Barnes can guide students as they become Wikipedia editors themselves.

“The seed for this project came from a post by Dr. Maryam Zaringhalam on Twitter linked to a Guardian article on Dr. Jess Wade, a physicist at Imperial College who wrote 270 Wikipedia profiles in 2017 – all of women scientists. … I thought – I can do this. Better yet, I work at a liberal arts college – my students can also do this!”

Dr. Denneal Jamison-McClung of the University of California, Davis is also using our resources and staff support to assign students to edit Wikipedia.

“To help change cultural perceptions of who can contribute to STEM and to inspire the next generation of young scientists and engineers, it is essential that open access platforms, especially Wikipedia, offer a realistic perspective on the diversity of people already working to tackle big global challenges and historical contributions by underrepresented groups,” writes Dr. Jamison-McClung. “Let’s speed up the process…”

Dr. Kelee Pacion is another instructor who engages her students in Wikipedia editing. She believes that including more people of diverse genders and ethnicities in science communication is important because they bring diverse perspectives. And ultimately, science communication should be representative of all knowledge from all angles, written by and representative of all people.

Not only are students expanding Wikipedia’s coverage of women in STEM, so are experts. Our new professional development courses train subject-matter experts to edit Wikipedia in their field.

Samantha Kao is one of those experts. After learning that 80-85% of Wikipedia editors identify as men and only about 17% of Wikipedia biographies feature women, she knew she had to help close the gender gap.

Like Kao, Dr. Laura Hoopes of Pomona College was dissatisfied with Wikipedia’s coverage of women and science. She also took our professional development course and, with her newfound Wikipedia skills, has since made great strides in making the site more representative. Read about the many women in STEM whose biographies she created here!

To get you and your students involved, visit teach.wikiedu.org. Or, read moreaccounts of how our program participants are making Wikipedia’s knowledge more equitable.

ImageFile:Ada Lovelace portrait.jpgScience Museum Group, public domain, via Wikimedia Commons.

October 08, 2018

Donna Watson, Academic Support Librarian at the University of Edinburgh, presenting at the EAHIL Conference Wikipedia editathon – image by Ruth Jenkins

By Ruth Jenkins, Academic Support Librarian at the University of Edinburgh.

For some time, Wikipedia has been shown to be a resource to engage with, rather than avoid. Wikipedia is heavily used for medical information by students and health professionals – and the fact that it is openly available is crucial for people finding health information, particularly in developing countries or in health crises. Good quality Wikipedia articles are an important contribution to the body of openly available information – particularly relevant for improving health information literacy. In fact, some argue that updating Wikipedia should be part of every doctor’s work, contributing to the dissemination of scientific knowledge.

Participants editing Wikipedia at the EAHIL Conference

With that in mind, Academic Support Librarians for Medicine Marshall Dozier, Ruth Jenkins and Donna Watson recently co-presented a workshop on How to run a Wikipedia editathon, at the European Association for Health Information and Libraries (EAHIL) Conference in Cardiff in July. Ewan McAndrew, our Wikimedian in Residence here at the University of Edinburgh, was instrumental in the planning and structuring of the workshop, giving us lots of advice and help. On the day, we were joined by Jason Evans, Wikimedian in Residence at the National Library of Wales, who spoke about his role at NLW and the Wikimedia community and helped support participants during editing.

We wanted our workshop to give participants experience of editing Wikipedia and build their confidence using Wikipedia as part of the learning experience for students and others. Our workshop was a kind of train-the-trainer editathon. An editathon is an event to bring people together at a scheduled time to create entries or edit Wikipedia on a specific topic, and they are valuable opportunities for collaborating with subject experts, and to involve students and the public.

Where a typical editathon would be a half-day event, we only had 90 minutes. As such, our workshop was themed around a “micro-editathon” – micro in scale, timing and tasks. We focused on giving participants insights into running an editathon, offered hands-on experience, and small-scale edits such as adding images and missing citations to articles.

Systematic review edit
Key stats from the EAHIL editathon

We also presented on the Wikipedia assignment in the Reproductive Biology Honours programme here at Edinburgh, including a clip from this video of a student’s reflections on the assignment, which sparked discussion from the attendees. Jason Evans’ talk about Wikimedia UK and Wikiproject Medicine, contextualised the participants’ edits within the wider Wikimedia community.

We are waiting on feedback from the event, but anecdotally, the main response was a wish for a longer workshop, with more time to get to know Wikipedia better! There was lots of discussion about take-home ideas, and we hope they are inspired to deliver editathon events in their own organisations and countries. We also spotted that some of our participants continued to make edits on Wikipedia in the following weeks, which is a great sign.

If you want to know more, you can visit the event website which roughly follows the structure of our workshop and includes plenty of further resources: https://thinking.is.ed.ac.uk/eahil-editathon/

Further information.

Pic of Ruth Jenkins at the Reproductive Biology Hons. Wikipedia workshop.
By Stinglehammer [CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0)], from Wikimedia Commons
TriangleArrow-Left.svgprevious 2018, week 41 (Monday 08 October 2018) nextTriangleArrow-Right.svg
Other languages:
Deutsch • ‎English • ‎Tiếng Việt • ‎dansk • ‎français • ‎italiano • ‎polski • ‎suomi • ‎svenska • ‎čeština • ‎русский • ‎українська • ‎עברית • ‎العربية • ‎हिन्दी • ‎মেইতেই লোন্ • ‎中文 • ‎日本語 • ‎한국어

October 07, 2018

The question "I have an ORCiD profile, how do I get it in Wikidata" was asked on Twitter. Using Magnus's tool public information was imported and as a result information can be shown in Scholia.

Paolo Cignoni made a request using the #IcanHazWikidata hash tag and his papers were imported and it shows nicely in Scholia. It includes several of his co-authors, for the ones in white we have no indication for their gender in Wikidata. That is easy to fix.

There are probably a lot of co-authors missing.. One way of finding the missing co-authors is by adding "/missing" to the Scholia link. You can check for an ORCiD identifier and add a found identifier. You identify the papers already known to Wikidata and they are attributed to the co-author or, to a citing author.. I added a John W Goodby to make the picture more complete. It is easy and mostly obvious what to do.

What makes all this possible? Open data and a bit of effort.. As you can see in the later picture, just running Magnus's tool for a few co-authors changes the outlook considerably.

Are you a scholar and do you want to see your initial Scholia information? Just add your Ordid ID in a tweet with the #IcanHazWikidata hash tag.
A lot of soul searching happened to determine why Wikipedia failed to notice Donna Strickland only once she received the Nobel Prize.. What is more astounding is that Wikidata failed to include her.. No Scholia information for her and her research. What we have at this is likely to be a subset of the "Stricklands papers".

We do not know who will be seen as a scientist of similar relevance but we do know that a lot of rubbish is floating around.. it is called fake science, fake news and countering this is where big organisations like Google and Facebook rely on the information in Wikipedia.

So Mrs Kate Ricke is another scientist that did not get Wikipedia attention so far. Mrs Ricke tweeted about her paper Country-level social cost of carbon. It and the papers produced by her and her co-authors are quite potent.

When you learn about a paper like this, you can add it and its authors to Wikidata. When Orcid has information about other papers, you can import these papers as well building on the web of science about of one of the most important subjects of our time. In addition co-authors of these other papers can be included as well as the authors citing these papers.

When relevance is given to the science of a subject like climate science, it becomes possible to contrast it with what some politicians want us to believe.

Older blog entries