August 28, 2015

Wikimedia Tech Blog

Wikimedia’s new developer-friendly trademark guidelines for apps

The new guidelines encourage app innovation. Image by VS-QQ freely licensed under CC-by-SA 3.0.

At the Wikimedia Foundation, we would love to see more app development, especially in today’s environment where users are increasingly migrating to mobile and wearable devices. Therefore, we created new app guidelines that encourage developer innovation while also providing tips on how to use Wikimedia marks and content in a way that properly represents the Wikimedia community.

These new guidelines are based on our Trademark Policy, published last February after a seven month long consultation with the Wikimedia community. The guidelines provide practical examples of how to use the Wikimedia marks and Wikimedia content in apps while also supporting our mission.

For example, the guidelines make it easier for developers to understand how to use Wikimedia marks and content in the following ways:

  • Using marks in apps without requesting a license: The guidelines make clear for app developers how to use certain marks without requesting a license as long as such use advances the Wikimedia mission and abides by the Trademark Policy.
  • Clearer visual examples of fair use for apps: In our Trademark Policy consultation users expressed confusion over how to use Wikimedia marks. In response, the app guidelines provide clear examples of how to use marks in app buttons, app descriptions, and app titles.
  • Location of licenses: The Wikimedia marks and content are released under different Creative Commons licenses. The guide tells developers where to find those licenses and gives examples of how to properly apply them.
  • Plain English: Unlike typical legal documents, these guidelines use simple words, short sentences, and straightforward sentence structure to make them easy to follow. They recognize that some readers may not be native English speakers and avoid using legalese to facilitate easy translation into multiple languages. To verify the simplicity of the language, we applied various readability indices for better comprehension, like we did with the Trademark Policy.
  • User-friendly layout: Also, as with the Trademark Policy, after considering design techniques, we used visual examples, section breaks, and white-space to make the guidelines both visually appealing and easily accessible.

In a world where people are increasingly accessing knowledge through different devices, the new guidelines are intended to empower designers, developers, and the Wikimedia community to collaborate around the Wikimedia projects while maintaining legal protections. If you have any questions, please email us at trademarks@wikimedia.org.

Victoria Baranetsky, Legal Counsel
Yana Welinder, Legal Director

Many thanks to James Alexander, Community Advocate/Project Manager; Corey Floyd, Software Engineer; Manprit Brar, Legal Counsel; Stephen LaPorte, Legal Counsel; James Buatti, Legal Fellow; and Marshall Olin, Alex Krivit, and Arielle Friehling, Legal Interns for their incredible work on the app guidelines.  We would also like to thank the rest of the Legal Team as well as the Community Engagement and Communications Teams for their assistance and support in this effort.

by Victoria Baranetsky at August 28, 2015 06:18 PM

Data suggests Google may not be to blame for drop in referrals

The Discovery team wrangled referal data to get to the bottom of things. Photo by Luis Llerena, freely licensed under CC0 1.0.
The Discovery team wrangled referal data to get to the bottom of things. Photo by Luis Llerena, freely licensed under CC0 1.0.

Over the past month, several news articles, based on a report from web analytics company SimilarWeb, stated that Wikipedia’s traffic had dropped by “more than 250 million desktop visits in just three months”. The Discovery team’s research into referrals from Google suggests this news may not be as incredible as it first seems.

One theory as to how this drop came about is that Google began prioritising their own, or other, sources of information like the Knowledge Graph, suggesting that the traffic drop came from a reduction in visits from Google. Oliver Keyes, of the Discovery team, was tasked with investigating this drop and understanding if there was a visible reduction in traffic from the data stored by the Wikimedia Foundation’s servers.

While the Foundation doesn’t track “visits”, pageviews are tracked, as well as whether those pageviews have referers. By investigating the proportion of pageviews that come with Google referrals and identifying whether it has reduced over time, he could go some way towards confirming or rejecting the idea that Google has been impeding traffic to Wikimedia sites.

One factor worth including, however, is that some HTTPS connections deliberately don’t serve any form of referrer data. This means that a decline in Google-sourced pageviews could show itself not as a decline in pageviews from Google, but as a decline in pageviews from nobody. Accordingly, Keyes looked at both the rate at which Google-sourced traffic comes in, and the rate at which traffic that doesn’t have a source at all comes in.

Due to privacy and performance restrictions around the request logs, the period whiched needed to be covered (January–August 2015) is not available unsampled. Instead, Keyes had to rely on the sampled logs, parsing them with the Wikimedia Foundation’s internal R framework for reading and validating this data.

This introduces some risks—the parsing methodology might be different from that used on the unsampled log, or the simple variation in values between sampled and unsampled data could introduce inaccuracies. Comparing the unsampled and sampled logs revealed no major difference in the number of pageviews identified in each dataset. This means the sampled logs can probably be relied on to answer his questions.


Looking at the proportion of pageviews with Google referers, seen in the above graph, we see no decrease in pageviews with Google referers. In fact, based on the localised “smoothing” represented by the blue line, Google refers are actually trending up over the last few months.

unnamed (1)

The proportion of pageviews with no referer whatsoever, however—seen in the chart above—is trending down. This could indicate a multitude of things, including Google passing more referers through (which would explain the rise in the proportion of traffic traceable to Google). However, it could also indicate that a chunk of site traffic, with no referrer, has been decreasing—which could conceivably be Google-sourced.

Keyes says that, based on the data studied, it can be established that the most obvious avenues for verifying or dismissing SimilarWeb’s claim show no evidence that Google traffic has declined. Ultimately, though, his team doesn’t have the data at their end to eliminate all avenues of possibility.

Joe Sutherland
Communications Intern
Wikimedia Foundation

by Joe Sutherland at August 28, 2015 05:42 PM

Content Translation Update

August 27 Update: “New Article” Campaign Enabled in the Italian Wikipedia

Only a short update about ContentTranslation software this week: The campaign that suggests users who haven’t tried ContentTranslation yet to translate an article instead of creating it from scratch was enabled in the Italian Wikipedia.

New article campaign in the Italian Wikipedia
New article campaign in the Italian Wikipedia

Fixes for issues with link and category adaptation in the Norwegian are ready, but are expected to be deployed next week.

 The building of the National Institute of Statistics of Italy (IStat), about which you can now read in Greek thanks to ContentTranslation. Photo by LPLT, licensed under CC-BY-SA 3.0 unported.
The building of the National Institute of Statistics of Italy (IStat), about which you can now read in Greek thanks to ContentTranslation. Photo by LPLT, license: CC-BY-SA 3.0 unported.

On this occasion I’d like to congratulate the Italian Wikipedia on the creation of the 200th article, watchOS. Another milestone, which happens to also be related to Italy, is the creation of the 17,000th page with ContentTranslation: the article about the National Institute of Statistics of Italy, which was translated from English to Greek.

by aharoni at August 28, 2015 03:05 PM

Gerard Meijssen

#Wikidata - Joseph Reagle not an #author?

The English Wikipedia article says: "Joseph Michael Reagle Jr. is an American academic and author focused on technology and Wikipedia". It seems obvious that the occupation "author" fits Mr Reagle.

Not so I am told, the word "auteur" is a generic term in French, so it is at best an anglicism. This gets us in a tricky position because it is suggested that if this appears in infoboxes which automatically import stuff from wikidata, it will create an absolute mess in the French wikipedia, with everybody being credited as an "auteur" which does not make sense at all.

When you analyse "author" in Wikidata, it is a subclass of "creator". Creator seems to me to be what the French understand for "auteur". Consequently, the labels used in French do not match what is meant by author in English.

Arguably, when items are labelled in a way where the meaning in one language is not the same as in other languages,  This has major consequences for the integrity of Wikidata.

NB Mr Reagle wrote a few books, that makes him more than an "essayist".

by Gerard Meijssen (noreply@blogger.com) at August 28, 2015 05:25 AM

August 27, 2015

Wikimedia Foundation

Crowdfunding free knowledge

Class using Wikireaders in India.JPG
A class in Moradabad, Uttar Pradesh, India, using WikiReaders. 500 of those devices were distributed to schools in India, South Africa and Mexico, in part through a grant from the Wikimedia Foundation. Photo by Ashstar01, freely licensed under CC BY-SA 3.0.

Crowdfunding has been steadily growing as an alternative method to fund various projects and ventures on the Internet over the past few years. It has been used to successfully fund video games, a smartwatch, the renovation of a German schloss, a political action committee and, more humorously, the making of a potato salad. You can even crowdfund an online encyclopedia—and people have been doing so very generously over the past 14 years.

It is perhaps no surprise, then, that Wikimedia volunteers are increasingly embracing this way of funding their own initiatives to improve Wikipedia and its sister projects.

A female Megistocera, a genus of crane fly, photographed in Kadavoor, Kerala, India. This newly promoted quality image was taken by Jeevan Jose using gear purchased with funds from his crowdfunding campaign and was released under the CC BY-SA 4.0.

This year has seen particular activity in that regard, with three campaigns—Wikimédia France’s WikiCheese, Diego Delso’s equipment restoration and Jeevan Jose’s new photography gear fundraisers—receiving over US$13,500 in donations from the public. Following this surge, I asked some of the contributors involved in Wikimedia-related crowdfunding campaigns about the lessons they learned from them and the suggestions they might have for future crowdfunding organizers.

The most popular suggestion is to start with a feasible target. “Indiegogo have a reasonable fee if you meet your target but incur a hefty penalty if you don’t,” says Colin, one of a group of volunteers behind Jeevan Jose’s successful fundraiser that exceeded its target by over fourfold, collecting US$3,150 in total. “That really encourages you to be modest, to work out what you need rather than what is in your dreams,” he adds.

Being able to show a good track record is quite important, too. “People giving their money to non-profit projects want to see a pilot first or some track record that shows you can responsibly use their money effectively to make change in a long-term sustainable way,” says Aislinn D Grigas, organizer of the ultimately unsuccessful WikiReader distribution campaign in 2013. “Tracking success is also essential to discovering the outcome of a pilot project and determining if it is a worthy candidate for more investment.”

Colin adds that you should “make your campaign personal with photos of yourself, and to make it clear why a donation would be for a good cause and would give a good return. In our case, Jee is a very popular member of the Wikimedia Commons community, which definitely helped since many of the donations came from wiki-friends.”

Building a team of collaborators seems to be another crucial factor. “Having a team is recommended … We invited a few of Jee’s wiki-friends to help with the campaign, [and] Christian created an excellent video made from clips of Jee’s work. This really made the campaign look professional,” says Colin.

Asking for outside help is also advised by Aislinn. “One of the things I learned is to reach out to as many people and organizations as possible. Finding partners and collaborators is important and can be a great source of support and help. We didn’t initially think that a Wikimedia Foundation grant would be a match but in talking to enough people, we realized there were other sources that could supplement our crowdfunding campaign,” she finishes.

Tomasz W. Kozlowski
Wikimedia community volunteer

by Tomasz Kozlowski at August 27, 2015 08:04 PM

Wikimedia Tech Blog

Wikimedia strategy consultation shows potential in mobile, rich content, and translations

Strategy consultations help us understand where we are and where we’re going. Photo by Martin Fisch, freely licensed under CC BY-SA 2.0.

Earlier this year, the Wikimedia Foundation led a consultation with the Wikimedia community of editors and readers, in order to inform our strategy[1] for the future. The goal of the consultation was to collect input on how we should respond to future trends that will affect the Wikimedia movement, and incorporate that insight into our emerging strategy. We are pleased to be able to make the complete results of this consultation available to all.

In this post, we’ll provide a brief overview of the consultation and findings. For more detail, please see the full results on Wikimedia Commons, or the metrics presentation at the July 2015 Metrics Meeting on Wikimedia Commons and YouTube.


The consultation consisted of a 10-day global consultation across Wikimedia projects and languages, lasting from February 23 – March 6, 2015. We introduced the consultation by acknowledging that the world is going mobile and the next billion Internet users are coming online. We translated the questions into 15 languages to reflect the international nature of the Wikimedia movement.

The consultation used two open-ended prompts to elicit broad, qualitative feedback and insights:

  1. What major trends would you identify in addition to mobile and the next billion users?
  2. Based on the future trends that you think are important, what would thriving and healthy Wikimedia projects look like?

This is the second time the Wikimedia Foundation has undertaken a collaborative strategy-setting process. However, this consultation was designed as part of a more nimble process than the previous strategic planning process conducted in 2010, in order to allow the Foundation to respond to a quickly changing world.


Nearly 1,300 editors and readers offered their thoughts on these questions across 29 languages. We found 69% were anonymous users from 86 different countries, 24% were logged-in users with established records of participation on the Wikimedia projects, and 7% were new users (all of whom registered during the consultation itself). These latter two groups came from 30 different wikis. All of the comments offered were broken down into 2,468 comments on 28 general themes.

Patterns of response during the 2015 Wikimedia strategy consultation. Graph by Wikimedia Staff, freely licensed under CC BY-SA 3.0.

1.   English (887) 9.   French (17) 17. Vietnamese (3) 25. Hindi (1)
2.   Spanish (63) 10. Italian (11) 18. Bengali (2) 26. Interlingua (1)
3.   German (45) 11. Portuguese (11) 19. Hebrew (2) 27. Norwegian (1)
4.   Russian (37) 12. Japanese (10) 20. Polish (2) 28. Slovak (1)
5.   Turkish (32) 13. Dutch (5) 21. Ukrainian (2) 29. Swedish (1)
6.   Farsi (30) 14. Indonesian (4) 22. Afrikaans (1)
7.   Chinese (18) 15. Czech (3) 23. Azerbaijani (1)
8.   Arabic (17) 16. Korean (3) 24. Finnish (1)

n = 1295 respondents
Translation languages highlighted


The report’s findings were multi-faceted, reflecting the many emerging trends and experiences identified by the international participants. We analyzed each of the 28 themes for key takeaways, with interesting perspectives emerging from both anonymous and logged-in users. These complete takeaways can be found in the consultations slides on Commons.

2015 Wikimedia strategy consultation results, qualitative comment categories; n = 2,468 comments. Graph by Wikimedia Staff, freely licensed under CC BY-SA 3.0.

Anonymous and new users tended to focus on the look and feel of the site itself on varied devices. Their feedback focused on the site’s presence on mobile, use of multimedia, accuracy and reliability of the existing content, and integration with social media. The anonymous respondents primarily hailed from the United States, but also included significant contingents from India, Germany, the United Kingdom, and Iran. Seventeen countries had more than ten people answer.

Logged-in users commented on similar topics but from a different perspective. For example, mobile-related comments were typically confined to the feasibility of editing on mobile devices, which are usually much smaller than a desktop window. They additionally commented on citation quality—the use of stronger, more reliable sources—a bureaucratic climate on some wikis, and strategic threats to the projects, in addition to giving the foundation direct feedback. Sites with more than ten respondents included the English, German, and Spanish Wikipedias, along with the Wikimedia Commons. As the IPs of logged-in users are hidden, we have no geographical data for them.

The precise findings from this study are outlined in the complete slides. All themes are being taken into account and can inform our work moving forward. Here are some highlights:[2]

  • Mobile and app: Mobile-related comments reveal an opportunity to improve our existing mobile offerings for both editors and readers and raise awareness about our native apps. Participants (mostly anonymous users) urged us to “make an app,” when one is already available for iOS and Android devices. We also saw comments that stressed the importance of mobile editing, formatting for smaller (mobile) screen sizes, article summaries for different usage patterns, and the value of “going mobile.”[3]
  • Editing and collaboration: In this category, we find requests to make editing simpler, ideas for enhancing collaboration among editors, suggestions for editing tools, and proposals to build editor rating and qualification programs. This is one of the few categories in which logged-in comments, at 56%, outnumber comments from anonymous and new users. This category provides valuable insight for improvements in editor support including Wikipedia’s visual editor and future projects in the newly created Community tech team, as well as potential new editor support initiatives.
  • Rich content: Participants requested more rich content on Wikimedia sites, suggesting more video, audio, video, and images. Most (80%) of these comments were submitted by anonymous and new users. One United States-based participant commented: “is there any major website in the world with less video?”
  • Volunteer community: We saw a particular interest in improving “community climate” in this category, with a focus on interpersonal dynamics and culture. Participants identified a need to increase diversity (in particular, gender diversity), improve processes and workflows, and address bureaucracy-related challenges. This is another category in which logged-in comments, at 54%, outnumber comments provided by anonymous and new users.
  • Wikimedia Foundation feedback: This category focused on the relationship between the Wikimedia Foundation and the volunteer community and includes suggestions of how the Foundation might change its practices and priorities to align with the volunteer community. These comments are from mostly logged-in users (88%), most of them highly experienced users with an average edit count of more than 64,000 edits. Suggestions included providing better support to editors in a variety of ways and continuing to ask for feedback from core community members.
  • Content quality (accuracy): These comments emphasized the importance of content accuracy, trustworthiness, and reliability. Comments focused on citation quality, the use of expert editors, and even restricting editing (so that “not everyone can edit”). Most (73%) of comments in this category were from anonymous and new users, signaling an opportunity to communicate to readers about the accuracy and trustworthiness of the content within Wikipedia and sister projects.
  • Education and universities: These comments reflected both a concern about the perception of Wikipedia as a (non)credible source for academic inquiry, and also recognition of the growing opportunity for Wikimedia to extend its content, brand, and global presence into online education by developing courses, curricula, and partnering with other online educational resources. Most (76%) of the comments in this category came from anonymous and new users, whereas only 24% originated from logged-in users.  
  • Translation and languages: We saw a collective interest in this category from logged in, anonymous, and new users. Key suggestions included a focus increasing translation capabilities and tool, expanding into more languages, and developing the ability to easily translate across projects. These comments validate the need for the Content Translation tool, which is now available on 224 language versions of Wikipedia as a beta feature.

Thank you to everyone who participated in this consultation. The findings of the consultation will play a key role in our work moving forward, influencing how engineering teams develop forward-looking plans and validate proposed roadmaps and projects.

Terence Gilbey
Interim Chief Operating Officer
Wikimedia Foundation

[1] Unlike in past years, we are approaching strategy not as a set of goals or objectives, but rather as a direction that will guide the decisions for the organization.
[2] These examples do not mean that these themes are more important than others. They are simply highlights for this particular blog post. We are assessing all of the themes to incorporate this feedback at all levels of our work.
[3] We realize the mention of mobile in the consultation’s framing may have impacted the prominence of this theme in the comments.

by Terence Gilbey at August 27, 2015 04:49 PM

Amir E. Aharoni

Amir Aharoni’s Quasi-Pro Tips for Translating the Software That Powers Wikipedia

As you probably already knew, Wikipedia is a website. A website has content—the articles, and user interface—the menus around the articles and the various screens that let editors edit the articles and communicate to each other.

Another thing that you probably already knew is that Wikipedia is massively multilingual, so both the content and the user interface must be translated.

Translation of articles is a topic for another post. This post is about getting all of the user interface translated to your language, as quickly and efficiently as possible.

The most important piece of software that powers Wikipedia and its sister projects is called MediaWiki. As of today, there are 3,335 messages to translate in MediaWiki. “Messages” in the MediaWiki jargon are strings that are shown in the user interface, and that can be translated. In addition to core MediaWiki, Wikipedia also has dozens of MediaWiki extensions installed, some of them very important—extensions for displaying citations and mathematical formulas, uploading files, receiving notifications, mobile browsing, different editing environments, etc. There are around 3,500 messages to translate in the main extensions, and over 10,000 messages to translate if you want to have all the extensions translated. There are also the Wikipedia mobile apps and additional tools for making automated edits (bots) and monitoring vandalism, with several hundreds of messages each.

Translating all of it probably sounds like an enormous job, and yes, it takes time, but it’s doable.

In February 2011 or so—sorry, I don’t remember the exact date—I completed the translation into Hebrew of all of the messages that are needed for Wikipedia and projects related to it. All. The total, complete, no-excuses, premium Wikipedia experience, in Hebrew. Every single part of the MediaWiki software, extensions and additional tools was translated to Hebrew, and if you were a Hebrew speaker, you didn’t need to know a single English word to use it.

I wasn’t the only one who did this of course. There were plenty of other people who did this before I joined the effort, and plenty of others who helped along the way: Rotem Dan, Ofra Hod, Yaron Shahrabani, Rotem Liss, Or Shapiro, Shani Evenshtein, Inkbug (whose real name I don’t know), and many others. But back then in 2011 it was I who made a conscious effort to get to 100%. It took me quite a few weeks, but I made it.

Of course, the software that powers Wikipedia changes every single day. So the day after the translations statistics got to 100%, they went down to 99%, because new messages to translate were added. But there were just a few of them, and it took me a few minutes to translate them and get back to 100%.

I’ve been doing this almost every day since then, keeping Hebrew at 100%. Sometimes it slips because I am traveling or ill. It slipped for quite a few months because in late 2014 I became a father, and a lot of new messages happened to be added at the same time, but Hebrew is back at 100% now. And I keep doing this.

With the sincere hope that this will be useful for translating the software behind Wikipedia to your language, let me tell you how.


First, let’s do some work to set you up.

  • Get a translatewiki.net account if you haven’t already.
  • Make sure you know your language code.
  • Go to you preferences, to the Editing tab, and add languages that you know to Assistant languages.
  • Familiarize yourself with the Support page and with the localization guidelines for MediaWiki.
  • Add yourself to the portal for your language. The page name is Portal:Xyz, where Xyz is your language code.

Priorities, part 1

The translatewiki.net website hosts many projects to translate beyond stuff related to Wikipedia. Among other things it hosts such respectable Free Software projects as OpenStreetMap, Etherpad, MathJax, Blockly, and others. Also, not all the MediaWiki extensions are used on Wikimedia projects; there are plenty of extensions, with many thousands of translatable messages, that are not used by Wikimedia, but only on other sites, but they use translatewiki.net as the platform for translation of their user interface.

It would be nice to translate all of them, but because I don’t have time for that, I have to prioritize.

On my translatewiki.net user page I have a list of direct links to the translation interface of the projects that are the most important:

  • Core MediaWiki: the heart of it all
  • Extensions used by Wikimedia: the extensions
  • MediaWiki Action Api: the documentation of the API functions, mostly interesting to developers who build tools around Wikimedia projects
  • Wikipedia Android app
  • Wikipedia iOS app
  • Installer: MediaWiki’s installer, not used in Wikipedia because MediaWiki is already installed there, but useful for people who install their own instances of MediaWiki, in particular new developers
  • Intuition: a set of different tools, like edit counters, statistics collectors, etc.
  • Pywikibot: a library for writing bots—scripts that make useful automatic edits to MediaWiki sites.

I usually don’t work on translating other projects unless all of the above projects are 100% translated to Hebrew. I occasionally make an exception for OpenStreetMap or Etherpad, but only if there’s little to translate there and the untranslated MediaWiki-related projects are not very important, for example, they are unlikely to be used by anybody except a few software developers, but I translate those, too.

Priorities, part 2

So how can you know what is important among more than 15,000 messages from the Wikimedia universe?

Start from MediaWiki most important messages. If your language is not at 100% in this list, it absolutely must be. This list is automatically created periodically by counting which 600 or so messages are actually shown most frequently to Wikipedia users. This list includes messages from MediaWiki core and a bunch of extensions, so when you’re done with it, you’ll see that the statistics for several groups improved by themselves.

Now, if the translation of MediaWiki core to your language is not yet at 18%, get it there. Why 18%? Because that’s the threshold for exporting your language to the source code. This is essential for making it possible to use your language in your Wikipedia (or Incubator). It will be quite easy to find short and simple messages to translate (of course, you still have to do it carefully and correctly).

Getting Things Done, One by One

Once you have the most important MediaWiki messages 100% and at least 18% of MediaWiki core is translated to your language, where do you go next?

I have surprising advice.

You need to get everything to 100% eventually. There are several ways to get there. Your mileage may vary, but I’m going to suggest the way that worked for me: Complete the piece that is the easiest to get to 100%! For me this is an easy way to strike an item off my list and feel that I accomplished something.

But still, there are so many items at which you could start looking! So here’s my selection of components that are more user-visible and less technical, sorted not by importance, but by the number of messages to translate:

  • Cite: the extension that displays footnotes on Wikipedia
  • Babel: the extension that displays boxes on userpages with information about the languages that the user knows
  • Math: the extension that displays math formulas in articles
  • Thanks: the extension for sending “thank you” messages to other editors
  • Universal Language Selector: the extension that lets people select the language they need from a long list of languages (disclaimer: I am one of its developers)
    • jquery.uls: an internal component of Universal Language Selector that has to be translated separately for technical reasons
  • Wikibase Client: the part of Wikidata that appears on Wikipedia, mostly for handling interlanguage links
  • ProofreadPage: the extension that makes it easy to digitize PDF and DjVu files on Wikisource
  • Wikibase Lib: additional messages for Wikidata
  • Echo: the extension that shows notifications about messages and events (the red numbers at the top of Wikipedia)
  • WikiEditor: the toolbar for the classic wiki syntax editor
  • ContentTranslation extension that helps translate articles between languages (disclaimer: I am one of its developers)
  • Wikipedia Android mobile app
  • Wikipedia iOS mobile app
  • UploadWizard: the extension that helps people upload files to Wikimedia Commons comfortably
  • MobileFrontend: the extension that adapts MediaWiki to mobile phones
  • VisualEditor: the extension that allows Wikipedia articles to be edited in a WYSIWYG style
  • Flow: the extension that is starting to make talk pages more comfortable to use
  • Wikibase Repo: the extension that powers the Wikidata website
  • Translate: the extension that powers translatewiki.net itself (disclaimer: I am one of its developers)
  • MediaWiki core: the software itself!

I put MediaWiki core last intentionally. It’s a very large message group, with over 3000 messages. It’s hard to get it completed quickly, and to be honest, some of its features are not seen very frequently by users who aren’t site administrators or very advanced editors. By all means, do complete it, try to do it as early as possible, and get your friends to help you, but it’s also OK if it takes some time.

Getting All Things Done

OK, so if you translate all the items above, you’ll make Wikipedia in your language mostly usable for most readers and editors.

But let’s go further.

Let’s go further not just for the sake of seeing pure 100% in the statistics everywhere. There’s more.

As I wrote above, the software changes every single day. So do the translatable messages. You need to get your language to 100% not just once; you need to keep doing it continuously.

Once you make the effort of getting to 100%, it will be much easier to keep it there. This means translating some things that are used rarely (but used nevertheless; otherwise they’d be removed). This means investing a few more days or weeks into translating-translating-translating.

But you’ll be able to congratulate yourself on the accomplishments along the way, and on the big accomplishment of getting everything to 100%.

One strategy to accomplish this is translating extension by extension. This means, going to your translatewiki.net language statistics: here’s an example with Albanian, but choose your own. Click “expand” on MediaWiki, then again “expand” on “MediaWiki Extensions”, then on “Extensions used by Wikimedia” and finally, on “Extensions used by Wikimedia – Main”. Similarly to what I described above, find the smaller extensions first and translate them. Once you’re done with all the Main extensions, do all the extensions used by Wikimedia. (Going to all extensions, beyond Extensions used by Wikimedia, helps users of these extensions, but doesn’t help Wikipedia very much.) This strategy can work well if you have several people translating to your language, because it’s easy to divide work by topic.

Another strategy is quietly competing with other languages. Open the statistics for Extensions Used by Wikimedia – Main. Find your language. Now translate as many messages as needed to pass the language above you in the list. Then translate as many messages as needed to pass the next language above you in the list. Repeat until you get to 100%.

For example, here’s an excerpt from the statistics for today:

MediaWiki translation stats exampleLet’s say that you are translating to Malay. You only need to translate eight messages to go up a notch. Then six messages more to go up another notch. And so on.

Once you’re done, you will have translated over 3,400 messages, but it’s much easier to do it in small steps.

Once you get to 100% in the main extensions, do the same with all the Extensions Used by Wikimeda. It’s over 10,000 messages, but the same strategies work.

Good Stuff to Do Along the Way

Never assume that the English message is perfect. Never. Do what you can to improve the English messages.

Developers are people just like you are. They may know their code very well, but they may not be the most brilliant writers. And though some messages are written by professional user experience designers, some are written by the developers themselves. Developers are developers; they are not necessarily very good writers or designers, and the messages that they write in English may not be perfect. Keep in mind that many, many MediaWiki developers are not native English speakers; a lot of them are from Russia, Netherlands, India, Spain, Germany, Norway, China, France and many other countries, and English is foreign to them, and they may make mistakes.

So report problems with the English messages to the translatewiki Support page. (Use the opportunity to help other translators who are asking questions there, if you can.)

Another good thing is to do your best to try running the software that you are translating. If there are thousands of messages that are not translated to your language, then chances are that it’s already deployed in Wikipedia and you can try it. Actually trying to use it will help you translate it better.

Whenever relevant, fix the documentation displayed near the translation area. Strange as it may sound, it is possible that you understand the message better than the developer who wrote it!

Before translating a component, review the messages that were already translated. It’s useful for learning the current terminology, and you can also improve them and make them more consistent.

After you gain some experience, create a localization guide in your language. There are very few of them, and there should be more. Here’s the localization guide for French, for example. Create your own with the title “Localisation guidelines/xyz” where “xyz” is your language code.

As in Wikipedia, Be Bold.

OK, So I Got to 100%, What Now?

Well done and congratulations.

Now check the statistics for your language every day. I can’t emphasize how important it is to do this every day.

The way I do this is having a list of links on my translatewiki.net user page. I click them every day, and if there’s anything new to translate, I immediately translate it. Usually there is just a small number of new messages to translate; I didn’t measure, but usually it’s less than 20. Quite often you won’t have to translate from scratch, but to update the translation of a message that changed in English, which is usually even faster.

But what if you suddenly see 200 new messages to translate? It happens occasionally. Maybe several times a year, when a major new feature is added or an existing feature is changed.

Basically, handle it the same way you got to 100% before: step by step, part by part, day by day, week by week, notch by notch, and get back to 100%.

But you can also try to anticipate it. Follow the discussions about new features, check out new extensions that appear before they are added to the Extensions Used by Wikimedia group, consider translating them when you have a few spare minutes. At the worst case, they will never be used by Wikimedia, but they may be used by somebody else who speaks your language, and your translations will definitely feed the translation memory database that helps you and other people translate more efficiently and easily.

Consider also translating other useful projects: OpenStreetMap, Etherpad, Blockly, Encyclopedia of Life, etc. The same techniques apply everywhere.

What Do I Get for Doing All This Work?

The knowledge that thanks to you people who speak your language can use Wikipedia without having to learn English. Awesome, isn’t it?

Oh, and enormous experience with software localization, which is a rather useful job skill these days.

Is There Any Other Way in Which I Can Help?


If you find this post useful, please translate it to other languages and publish it in your blog. No copyright restrictions, public domain (but it would be nice if you credit me). Make any adaptations you need for your language. It took me years of experience to learn all of this, and it took me about four hours to write it. Translating it will take you much less than four hours, and it will help people be more efficient translators.

Filed under: Free Software, localization, Wikipedia

by aharoni at August 27, 2015 01:05 PM

Gerard Meijssen

#Wikidata - Heinz R. Pagels Human Rights of Scientists Award

Awards are often the subject of this blog. Every award has its own merit and every award connects many people as a result. The Heinz R. Pagels Human Rights of Scientists Award is an award hidden in an article on the Committee on Human Rights of Scientists. The story of Mr Pagels is interesting but so are the people who received the award.

Some of them have been prisoners of conscience, all of them have relevance. Most of them deserve more attention, be it in improving their articles, by adding statements in Wikidata, or reading about them. For people to receive an award like this, they have to have been in harms way. It is important to know how easy it is to get into problems and also why some of such problems are worth it.

By exposing awards like this, the people connected in this way get more attention. It is one way of making sure that their effort is valued.

by Gerard Meijssen (noreply@blogger.com) at August 27, 2015 08:22 AM

August 26, 2015

David Gerard

Another blog to not remember to keep up!

If you’re starved for wit and wisdom, I have a Tumblr. There may even be occasional relevant content amongst the social justice and cat memes!

by David Gerard at August 26, 2015 11:49 PM

Andy Mabbett (User:Pigsonthewing)

Documenting public art, on Wikipedia

Wikipedia has a number of articles listing public artworks (statues, murals, etc) in counties, cities and towns, around the world. For example, in Birmingham. There’s also a list of the lists.

Gilded statue of three men

Boulton, Watt and Murdoch (1956) by William Bloye.
Image by Oosoom, CC BY-SA 3.0

There are, frankly, not enough of these articles; and few of those that do exist are anywhere near complete (the best is probably the list for Westminster).

How you can help

I invite you to collaborate with me, to make more lists, and to populate them.

You might have knowledge of your local artwork, or be able to visit your nearest library to make enquiries; or to take pictures (in the United Kingdom, of “permanent” works, for copyright reasons — for otehr countries, read up on local ‘Freedom of Panorama‘) and upload them to Wikimedia Commons, or even just find coordinates for items added by someone else. If you’re a hyperlocal blogger, or a journalist, perhaps you can appeal to your readership to assist?

Practical steps

You can enter details of an artwork using the “Public art row” family of templates. A blank entry looks like:

{{Public art row
| image =
| commonscat =
| subject =
| location =
| date =
| show_artist= yes
| artist =
| type =
| material =
| dimensions =
| designation =
| coordinates =
| owner =
| show_wikidata= yes
| wikidata =
| notes =

(change “yes” to “no” if a particular column isn’t wanted) and you simply type in the information you have, like this:

{{Public art row
| image = Boulton, Watt and Murdoch.jpg
| commonscat = Statue of Boulton, Watt and Murdoch, Birmingham
| subject = ''[[Boulton, Watt and Murdoch]]''
| location = Near the House of Sport – Broad Street
| date = {{Start date|1956}}
| artist = [[William Bloye]]
| type = statue
| material = Gilded [[Bronze]]
| dimensions = 10 feet tall
| designation = Grade II listed
| coordinates = 52.478587,-1.908395
| owner = [[Birmingham City Council]]
| show_wikidata= yes
| wikidata = Q4949742
| notes = <ref>http://www.birminghammail.co.uk/whats-on/things-to-do/top-5-statues-birmingham-5678972</ref>

Apart from the subject, all the values are optional.

In the above (as well as some invented values for illustrative purposes):

but if that’s too complicated, you can just enter text values, and someone else will come along and do the formatting (experienced Wikipedians can use the {{Coord}} template for coordinates, too). If you get stuck, drop me a line, or ask for help at Wikipedia’s Teahouse.

What this does

The “Public art row” template makes it easy to enter data, keeps everything tidy and consistently formatted, and makes the content machine-readable, That means that we can parse all the contents and enter them into Wikidata, creating new items if required, as we go.

We can then include other identifiers for the artworks in Wikidata, and include the artworks’ Wikidata identifiers in other systems such as OpenStreetMap, so everything becomes available as linked, open data for others to reuse and build new apps and tools with.

by Andy Mabbett at August 26, 2015 04:05 PM

Pete Forsyth, Wiki Strategies

Journalism and Wikipedia: A discussion in San Francisco

Panelists Andrew Lih, Dan Cook, and Liberty Madison share a laugh with moderator Pete Forsyth. This photo and those below by Eugene Eric Kim, licensed CC BY.

Pair_discussions_on_what_journalists_like_and_are_annoyed_by_about_Wikipedia cropped

We began by asking pairs of participants to discuss a time when Wikipedia delighted them, and a time when it frustrated them.

We hosted a discussion, with the meetup group Hacks & Hackers, last week: The Future of Journalism in a Wikipedia World. (original Meetup.com announcement & description)

My colleague Dan Cook and I wanted to engage journalists to share some of our own thoughts about the intersection of Wikipedia and Journalism — but also to hear from them about what Wikipedia looks like from their vantage point. Do reporters and editors see Wikipedia as a threat? a resource? How do they feel about its coverage of topics they cover themselves?

Hacks_and_Hackers_discuss_Journalism_and_Wikipedia cropped

I suspect some of the side discussions were the best part of the event! Looking forward to hearing them recapped.

I had been inspired by recent gatherings hosted by my friend Eugene Eric Kim (whose photos illustrate this piece); his approach emphasizes participation over pontification. With his help, I devised a format that would help the “audience” engage with the opportunities, risks, and general significance associated with Wikipedia from the beginning, spurring discussion that would last throughout the evening.

After we heard a few observations and anecdotes from the crowd, our panelists presented some ideas. Andrew Lih, a longtime Wikipedian and journalism scholar, presented the idea that Wikipedia fills a void between the cutting edge of news publications and longer-term scholarship and curation typically conducted by, for instance, academia and museums. Liberty Madison, founder of #ThatTechGirl Digital, spoke of millenials’ perception of Wikipedia, and the perception that if something isn’t on Wikipedia, it isn’t true or significant. Dan Cook, an editor and enterprise reporter, presented some case studies from our work at Wiki Strategies, highlighting opportunities for journalists to further their efforts to inform the public by contributing to, or at least commenting on, Wikipedia content. And Jack Craver, who wrote our recent blog post about undisclosed paid editing by PR companies, joined us to recap that piece.

When we wrapped up the panel, the discussion continued over delicious food and drinks from our host, Nomiku; and then spilled over to an impromptu outing to a local bar. You’ll find various links, photos, video, and social media commentary about the event here; if you have further thoughts, let us know in the comments here, or via social media! We look forward to hearing your thoughts, whether your background or interests are in Wikipedia or journalism.

by Pete Forsyth at August 26, 2015 03:39 PM

Wikimedia Foundation

News on Wikipedia: Stock markets plunge, train attack thwarted, and more

Montage for News on Wikipedia August 25.jpg

Here are some of the global news stories covered on Wikipedia this week:

Shoreham Airshow crash

Hawker Hunter T7 'WV372 - R' (G-BXFI) (12863569924).jpg
The pilot of the Hawker Hunter, pictured in 2013, is in critical condition. Image by Alan Wilson, freely licensed under CC-BY-SA 2.0.

On Saturday, August 22, a Hawker Hunter T7 jet aircraft crashed into several vehicles on the busy A27 road during a display at the Shoreham Airshow in Shoreham-by-Sea in England. The plane was performing a vertical loop but, for reasons as yet unknown, failed to complete the manoeuvre and crashed. At least 11 people died on the ground and 16 were injured. The pilot, experienced former British Airways captain Andy Hill, is in critical condition. It is the most deadly airshow accident since the 1952 Farnborough air show crash, where 31 died.

Learn more in these related Wikipedia articles: 2015 Shoreham Airshow crash

Temple of Baalshamin destroyed

Temple of Baal-Shamin, Palmyra.jpg
The temple had stood for thousands of years before being demolished. Image by Bernard Gagnon, freely licensed under CC-BY-SA 3.0.

The Temple of Baalshamin, an ancient temple in Palmyra which was built in the second century BCE, was destroyed by the Islamic jihadist group ISIL. The temple has been a UNESCO World Heritage Site since 1980, and classed as “in danger” since 2013. It comes after ISIL allegedly suggested they would not destroy the site and would instead target statues deemed “polytheistic.” Reports conflict as to the actual date of the demolition.

Learn more in the related Wikipedia article: Temple of Baalshamin

Stock markets seesaw

NASDAQ stock market display.jpg
Stock markets around the world suffered this week. Image by bfishadow, freely licensed under CC-BY 2.0.

This week proved turbulent for the world’s stock markets, following the crash of the Chinese stock market, which began in June and intensified this week. Several factors compounded the event globally, including the Greek government-debt crisis and the collapse in oil prices. On Monday (August 24), the Dow Jones plunged 1,000 points immediately after trading opened. This was briefly offset by a
morning rally on what US media dubbed “Turnaround Tuesday”, but stocks
failed to rebound and ended more than 200 points below Monday’s
closing. Also impacted were the SENSEX in India and the FTSE in the United Kingdom, both of which suffered, and recovered from, heavy losses.

Learn more in these related Wikipedia articles: 2015 Chinese stock market crash, 2015 stock market selloff

Okanogan Complex fire rages on

Okanogan Complex Fire - USFS.jpg
The fire has been burning since August 15. Image by U.S. Department of Agriculture, freely licensed under CC-BY 2.0.

On Monday (August 24), the Okanogan Complex fire, a wildfire burning in Okanogan County, Washington, became the state’s largest-ever wildfire. It began as five separate fires caused by lightning strikes earlier in the month, and has now burned through more than 400 square miles of land. The fire has caused the evacuation of several towns in the county, and thus far more than 1,250 firefighters have been deployed to tackle the blaze. Irregular terrain means that the fires are proving difficult to deal with using traditional methods.

Learn more in the related Wikipedia article: Okanogan Complex fire

French terror attack foiled

Gare du Nord, Paris 9 April 2014 004.jpg
The Thalys train, similar to that pictured, was en route from Amsterdam to Paris when the incident took place. Image by Chris Sampson, freely licensed under CC-BY 2.0.

Passengers onboard a Thalys train travelling from Amsterdam to Paris on Friday (August 21) subdued a gunman as the train passed through Oignies, France. The suspect, armed with a Kalashnikov rifle, a Luger pistol and a utility knife, opened fire near the toilets at around 5:45 p.m. Several passengers were involved in tackling the gunman, two of which were off-duty members of the U.S. Armed Forces. Thanks primarily to the passengers’ actions, nobody on board was killed and only five were injured. Seven of the passengers were awarded the Chevalier de la Légion d’honneur, France’s highest decoration.

Learn more in the related Wikipedia article: 2015 Thalys attack

Photo montage credits: “Hawker Hunter T7 ‘WV372 – R’ (G-BXFI) (12863569924).jpg” by Alan Wilson, freely licensed under CC-BY-SA 2.0.; “Okanogan Complex Fire – USFS.jpg” by U.S. Department of Agriculture, freely licensed under CC-BY 2.0.; “Temple of Baal-Shamin, Palmyra.jpg” by Bernard Gagnon, freely licensed under CC-BY-SA 3.0.; “NASDAQ stock market display.jpg” by bfishadow, freely licensed under CC-BY 2.0.; “Gare du Nord, Paris 9 April 2014 004.jpg” by Chris Sampson, freely licensed under CC-BY 2.0.; Collage by Andrew Sherman.

To see how other news events are covered on the English Wikipedia, check out the ‘In the news’ section on its main page.

Joe Sutherland
Communications Intern
Wikimedia Foundation

by Joe Sutherland at August 26, 2015 12:44 AM

August 25, 2015

Wiki Education Foundation

Join us at WikiConference USA!

The WikiConference USA presentation proposal and scholarship deadlines are fast approaching!

Do you have an experience, a problem, a solution, or a question you’d like answered? Have a theory or a new strategy for improving the community, outreach, technological infrastructure, or connections to institutions like museums, libraries, or academia? Then WikiConference USA is your chance to network, brainstorm, collaborate and learn.

Last year’s event saw speakers address a gamut of topics. We saw presentations including MOOCs, socializing students to Wikipedia, and Wiki Ed shared our Seven Biggest Mistakes. There were sixteen panels of interest to librarians, exploring how different types of libraries engage with Wikipedia, including case studies.

This year, WikiConference USA will take place at the National Archives Building in Washington, D.C., October 9–11. We’ve watched the connections between Wikipedia and higher education strengthen, and we’re excited to see discussions that highlight the power, challenges, and opportunities that have emerged as a result.

The event is open to anyone with an interest in Wikipedia or related Wikimedia projects. One goal of the conference is to build better connections and understanding between academia and the Wikimedia movement in the United States. That’s why conference organizers are seeking Wikipedia volunteers, researchers, academics, librarians, and curatorial staff to come together and share ideas and perspectives.

Anyone can submit a conference session proposal. There are a list of themes and timelines for proposals, as well as instructions on how to submit, on the WikiConference USA submissions page. You can find a submission template here.

The WikiConference USA planning committee is also making scholarships available to cover the cost of travel and stay in Washington, D.C. Both presentation and scholarship requests must be submitted by August 31, 2015. This is an ideal opportunity for students who have made outstanding contributions to Wikipedia and want to learn more or get more deeply involved in the movement. More information on scholarships can be found here.

Please note that while the Wiki Education Foundation is a sponsor of WikiConference USA, all conference programming and scholarship decisions will be made exclusively by volunteers from the scholarship committee.

We hope to see you in Washington, D.C.!

Photo:2014 WikiConferenceUSA 06” by AWang (WMF)Own work. Licensed under CC BY-SA 3.0 via Wikimedia Commons.

by Eryk Salvaggio at August 25, 2015 07:00 PM

Thank you to OpenSym 2015

This week, the Wiki Education Foundation hosted an open house and poster session for about 60 OpenSym conference attendees. During the poster session, people mingled, traded ideas, and learned about research and practices from a variety of participants and attendees. Our Summer Fellow, Andrew Lih, and Wikipedia Content Expert Adam Hyland, also attended the conference.

An emergent theme was how to best encourage users with various levels of experience to make meaningful contributions to open projects.

Benjamin Mako Hill, from the University of Washington, presented research on Wikipedia’s page protection data. He examined when Wikipedia pages were “locked” because of controversial edits. Eva Zangerle of the University of Innsbruck looked at the relationship between Wikipedia and Twitter. The work revealed when and how Twitter communities in certain languages linked to own- and other-language Wikipedias. Michelle Purcell, from Drexel University, examined the process of feature requests in open source projects.

We also heard about technology infrastructure projects, which could someday have an impact on recognizing and categorizing types of edits. Combined with assessment tools, this research could help create tools that identify edits from new users, and react with helpful advice or suggestions.

Finally, we also came across some interesting work exploring gender biases across language Wikipedias. Max Klein’s gender gap poster used WikiData to conduct research on biographies of women. He compared the number of biographies to gender inequality indices. The result suggested correlations between gender inequality and biographical coverage of women on Wikipedia. Based on inequality data from English-speaking countries, English Wikipedia should cover more women than it does now, his poster concluded.

These were just some of the inspiring researchers and conversations at OpenSym this year. You can access most papers or abstracts at the OpenSym website, http://www.opensym.org/

by Eryk Salvaggio at August 25, 2015 04:30 PM

August 24, 2015

Wiki Education Foundation

Monthly Report for July 2015


  • The Wiki Education Foundation released its annual plan for fiscal year 2015–16. The plan reports on the first year of our work, and looks ahead to what we’ll set out to achieve in our next fiscal year (July 1 through June 30). Our biggest goals include scaling the impact of our Classroom Program, the Year of Science initiative, and deepening connections between academia and Wikipedia.
  • Sue Gardner and Shadi Bartsch-Zimmer were elected to the Wiki Education Foundation board. Gardner was the Executive Director of the Wikimedia Foundation from 2007 to 2014. Dr. Bartsch-Zimmer is the Inaugural Director of the University of Chicago’s Stevanovich Institute on the Formation of Knowledge.
  • We’ve launched a new course management tool, the Dashboard. The Dashboard fully handles the course setup and assignment design process, and gives an even better picture of what student editors are doing throughout the term. Pages are created on Wiki Ed’s website, wikiedu.org, and mirrored on Wikipedia. This gives Wiki Ed the option of developing the platform independently, and to make relatively quick adjustments as we scale, adopt new programs, and improve our understanding of best practices.
  • Our final numbers for the spring 2015 classroom program are in. We supported 117 courses, more than ever before, and up from 98 courses in the fall. We supported 2,326 student editors who contributed roughly 2.5 million words to 3,429 articles, which were read by 101 million readers — that’s more than the population of Germany.


As outlined in our Annual Plan, we’ve re-organized our former Programs and Research & Development departments into three separate departments: Programs, Program Support, and Program Innovation, Analytics, and Research.

The Programs department houses our core programmatic work: the Classroom Program (led by Classroom Program Manager Helaine Blumenthal), our Educational Partnerships (led by Educational Partnerships Manager Jami Mathewson and assisted by Outreach Manager Samantha Erickson), and Community Engagement work, including the Visiting Scholars program (led by Community Engagement Manager Ryan McGrady).

Program Support work includes our technology development (led by Product Manager for Digital Services Sage Ross), communications work (led by Communications Manager Eryk Salvaggio), and support for new editors (led by Wikipedia Content Experts Adam Hyland and Ian Ramjohn). LiAnna Davis moves to become Director of Program Support, and we are hiring for a new Director of Programs.

We’re slated to begin hiring for the Director of Program Innovation, Analytics, and Research in the fall, with LiAnna overseeing this area in the meantime. As a programs-focused organization, we’re excited to be expanding our offerings.

Educational Partnerships

Jami Mathewson at the ASPB Conference 2015

Jami attended the annual conference for the International Association for Feminist Economics (IAFFE) in Berlin, Germany. Wiki Ed and IAFFE members have worked together in the classroom, and have collaborated to find and improve content gaps in economics. Jami, alongside Wiki Education Foundation Board Chair Diana Strassmann and University of Utah Economics Professor Gunseli Berik, hosted a presentation focused on teaching with Wikipedia as a means to improve public scholarship in feminist economics. Conference attendees came from a variety of disciplines, and through discussions, many have committed to teach with Wikipedia in their next course.

Wiki Ed has been discussing a Year of Science collaboration with the Association for Women in Mathematics. The members of this association are a great fit, as their students would edit math articles as well as biographies about women mathematicians.

Jami and Samantha attended the American Society of Plant Biologists (ASPB)’s annual conference in late July, which served as Wiki Ed’s first in-person roll-out of our new course design and management tools to new instructors. The feedback was overwhelmingly positive, and we’ve already had ASPB members sign up to teach with us in the fall term. ASPB is recommending Wikipedia assignments to its members, as they wish to expand the quality, depth and breadth of content and experiential learning opportunities on Wikipedia about plant science.

Classroom Program

Our final numbers for the spring 2015 term are in. In the spring, we supported 117 courses, more than ever before and up from 98 courses in the fall. We supported 2,326 student editors who contributed roughly 2.5 million words to 3,429 articles, which were read by 101 million readers — that’s more than the population of Germany!

Status of the Classroom Program for the summer 2015 term as of July 31:

  • 14 Wiki Ed-supported courses had Course Pages (5, or 33%, were led by returning instructors)
  • 214 student editors were enrolled
  • 165 (or 77%) students successfully completed the online training
  • Students edited 1370 articles and created 41 new articles

After a busier than usual summer term at Wiki Ed, we are gearing up for the fall 2015 term. We’re anticipating a large number of classes for the fall, and the new Dashboard (see Digital Infrastructure section, below) should help us as we scale.

Student work highlights:

  • Behavioral Communication was expanded from 529 to 1509 words by students in the University of Detroit, Mercy’s Social Psychology class. They turned a poorly referenced article written with a distinctly non-encyclopedic tone into a non-biased and well-referenced article.
  • Students in Amherst College’s Rotherwas Fellows course added well-written and well-sourced material to Rotherwas Room They also uploaded media and incorporated images from their own library collections.

Community Engagement

July marked the launch of the Wikipedia Visiting Scholars program. A Visiting Scholar is an official university position for experienced Wikipedia editors who are granted remote access to library research resources. These Visiting Scholars use those resources to improve Wikipedia content in one of the university’s focus areas. Ryan announced five open positions at five educational institutions in a Wiki Ed blog post on the program. In preparation for the announcement, Ryan and Eryk continued to develop informational materials for our website and for the program’s Wikipedia page.

The five open Wikipedia Visiting Scholars positions are:

  • DePaul University: focused on Chicago history, Catholic social justice studies, and/or Vincentian Studies (including French history during the Napoleonic Era)
  • McMaster University: emphasizing peace and war (with a particular emphasis on the Holocaust and resistance), Bertrand Russell, Canadian literature, and/or Canadian popular culture
  • Smithsonian Institution: particularly Modern African art and artists
  • University of Pittsburgh: focusing on Pittsburgh and Pennsylvania history, Colonial American history, historic American songs, and/or philosophy of science
  • University of Washington: interesting especially in labor and working classes in the Pacific Northwest, Pacific Northwest history, Pacific Northwest literature, and/or Pacific Northwest architecture

Each institution is working on its own selection schedule, but every position has received applications from qualified Wikipedians.

Ryan has continued to develop the program and its processes, working with Jami and Samantha on outreach strategies and Sage on adapting Wiki Ed’s tools for Visiting Scholars.

Program Support

Members of the Program Support team during the all-staff meeting

The Program Support team spent much of July focusing on preparation for the new term, including rolling out a new course page system and preparing updated training and support materials for next term’s participants.


Eryk spent much of July updating training materials for instructors ahead of the fall 2015 term, and supporting staff with materials for the Wikipedia Visiting Scholars and Partnerships programs staff.

Eryk has also been working closely with Executive Assistant to the ED Renee LeVesque, Wikimedia DC, and Wikimedia NYC volunteers, and the communications staff at the National Archive to organize and promoteWikiConference USA. The conference will be held at the historic and symbolic National Archive building October 9-11.

Blog posts:

Press Releases:

Digital Infrastructure

Sage and the design team at WINTR have been focused on the rollout of our new dashboard.wikiedu.org course page system, which we are using for all courses beginning with the fall 2015 term. Our usability testing with new instructors provided a tremendous amount of information on what we should focus on improving. Most of our development work this month has gone into improving the course creation process. In late July, we opened up the new system for new and returning instructors to set up their courses, and we’re continuing to polish up the user experience, fix bugs, and solve any key unanticipated use cases that come up as instructors put the system through its paces.

Sage also traveled to the annual Wikimania conference, in Mexico City, to connect with the wider Wikimedia world, and to work with developers interested in building on Wiki Ed’s dashboard platform. Our hope has always been that our technology — free software that anyone can use and build upon — would find its way into other Wikipedia projects and other languages. During the hackathon, Sage worked with several Wikimedia Foundation developers to bring our dashboard work to users beyond Wiki Ed’s own programs. The Outreach Dashboard is now up and running for trial projects — like edit-a-thons and other events, or courses outside of the US and Canada — on English Wikipedia. We also began putting together a roadmap for internationalization, so that Wikimedians in many different languages may eventually have access to their own Wikipedia dashboards. While Wiki Ed won’t take on that work directly, we hope to work with others to build a strong community platform.

Program Innovation, Analytics, and Research

Summer Seminar

We kicked off a short pilot to investigate the potential for instructors interested in contributing content in their area of expertise during the summer. Our first Summer Seminar focuses on psychology, and instructors will instructors meet weekly for one month to learn how to edit Wikipedia articles. In July, Helaine and Ian set up the course page for the Summer Seminar in Psychology using Wiki Ed’s new course dashboard. Enrollment for the program closed on July 24, and there are 20 participants signed up on the course page. Participants began doing preliminary work during the last week of July, with the first online session scheduled for August 6. Helaine and Ian are working closely together to make sure that the course covers a range of relevant topics related to editing psychology articles on Wikipedia, including an outline for the weekly online sessions.

Summer Research Fellowship

Andrew Lih joined the Wiki Education Foundation as our inaugural Summer Research Fellow in late July. The Summer Research Fellowship is a month-long pilot for a potential future program in which we’ll host professors or graduate students to help us answer questions about our programmatic work. This summer, Andrew will create a strategy and select case studies for outlining how university libraries, museums, and archives could work with instructors, students, and/or the community of Wikipedia editors as part of the Year of Science.

Andrew is an ideal person to be our first Fellow. As User:Fuzheado, Andrew has been editing Wikipedia since 2003 and was one of the first to use Wikipedia as a teaching tool that year. He’s taught numerous courses where he assigned students to contribute content to Wikipedia, including several affiliated with Wiki Ed’s Classroom Program. His work connecting students to museums in Washington, D.C., as part of his Wikipedia assignment can be found in our Case Studies brochure. He is also a member of the GLAM-Wiki US Consortium Advisory Group.

Finance & Administration / Fundraising

Finance & Administration

Our second-year funding commitment from Stanton Foundation had already been received prior to June 30, 2015. As a result, we are comfortably starting our second year.


The month of July started our new fiscal year. Expenses for the month, and year-to-date, were $264,022 (91%) versus the plan of $291,420. The variance of $27k was mainly the result of delayed timing of expenditures in recruitment ($17k); outside contract services ($6k); and All Staff meeting ($5k), along with savings in promotional items ($7k). These savings and timing delays, helped to offset increased travel expenditures for fundraising and governance.


On July 23, we welcomed Victoria Hinshaw in the new Development Associate role. Victoria will work closely with Senior Manager of Development, Tom Porter, to conduct individual and institutional prospect research, assist with daily development activities, develop and maintain prospect tracking systems, and assist in the planning of fundraising events. Additionally, Tom Porter has activated key Wiki Education Foundation board members to connect to potential funders.


The board welcomes Sue Gardner and Shadi Bartsch-Zimmer as new members. Sue Gardner is the former Executive Director of the Wikimedia Foundation where she oversaw the creation of the Public Policy Initiative, which later emerged as the Wikipedia Education Program and Wiki Education Foundation. Shadi Bartsch-Zimmer is currently the Inaugural Director of the University’s Stevanovich Institute on the Formation of Knowledge, which examines the historical, social, and intellectual circumstances that give rise to different kinds of knowledge across different cultures and in different eras.

Office of the ED

  • Current priorities:
    • Securing funding for upcoming major programmatic initiatives
    • Filling the Director of Programs position
    • Setting up a monthly report card to track key organizational health and progress indicators
  • In early July, staff gathered in San Francisco for Wiki Education Foundation’s second all-staff meeting. During five days, staff built a shared understanding of what the general team mandates and team key focus areas for next year are, kicked off a organization­wide team processes documentation, learned the basics of program evaluation, and set individual milestones for the first two quarters of fiscal year 2015–​1​6. Also, Frank and LiAnna provided every individual on staff with a performance review that covered the last fiscal year. As a social event, staff attended a “A History of the World in Ten Cheeses” presentation at the Cheese School of San Francisco.
  • Also in July, Shadi Bartsch-Zimmer hosted a meeting with Diana Strassmann, John Willinsky, Bob Cummings, Richard Knipel, and Frank Schulenburg at the University of Chicago. Board members and the ED discussed opportunities of future cooperation between Wiki Ed and the Stevanovich Institute on the Formation of Knowledge and continued their work on Wiki Ed’s vision statement.
  • After signing the official partnership agreement with the National Archives, Wiki Education Foundation officially announced its support for WikiConference USA 2015. Proposals for workshops, tutorials, panels, or presentation can be submitted on the website for the conference.

Visitors and guests

  • Sabine Blankenship, Science Liaison Officer at the German Consulate General in San Francisco
  • Mathias Haas, Trend Researcher
  • Andrew Lih, Summer Fellow

by Eryk Salvaggio at August 24, 2015 08:38 PM

The Roundup: A room with some page views

Did You Know… that the Rotherwas Room, once used as a private dining parlor for nobles in 17th-century England and for public poetry readings by Robert Frost, is now open to visitors in the Mead Art Museum?”

Students from Nicola Courtright’s Amherst College Rotherwas Fellows Summer 2015 course proposed that question as a “Did You Know?” on Wikipedia, and it made Wikipedia’s front page, gathering more than 1,600 page views. Students created the article, drawing from historical books and newspaper articles, and contributed photographs.

The Rotherwas Room itself dates to the 17th century and was reassembled at Amherst College’s Mead Art Museum in 1948. It serves as an example of the Jacobean style of architecture popular at the time.

The article is a great integration of photography into a written Wikipedia assignment, which can create richer articles and deepen the assignment’s media literacy impact. You can read more about photos as a stand-alone assignment, or complement to the writing assignment, here.

Thanks to these students for contributing both text and photos of a local landmark to share with the world!

Photo: “Rotherwas Room East View” by Stephen Petegorsky – Mead Art Museum. Licensed under CC BY-SA 4.0 via Wikimedia Commons –here.

by Eryk Salvaggio at August 24, 2015 03:30 PM

Wikimedia UK

Welcoming Karla Marte to our team

The image is a photograph of Karla Marte in the Wikimedia UK office

Karla Marte at the Wikimedia UK office

This post was written by Daria Cybulska, Head of Programmes and Evaluation

We are excited to introduce to everyone a new addition to the team, Karla Marte, who is joining us as our Administration and Programme Assistant.

Having recently undergone a process of refocusing of our activities, we are now entering an exciting new phase where we want to build on large scale partnerships with external organisations. Strong reporting and administration will be a key element of that.

Karla will be providing both core administration and financial support for Wikimedia UK activities, with focus on its programmes and reporting. The role spans across the organisation though, providing administrative help wherever needed, which will make her much appreciated! She will also be the first port of contact for the organisation, so you will definitely come across her soon.

Karla has a very interesting background, coming to us from the Dominican Republic. Here she’s introducing herself:

“I recently relocated to the UK from the Dominican Republic. I have worked for an international development donor agency in the Caribbean region for the last ten years providing a range of administrative, financial and programmatic support to its education, health, youth, democracy and governance, and environmental programmes. I have an undergraduate degree in Business Administration and a masters degree in Management and Productivity.

“In my spare time I enjoy reading, baking, watching films, exploring museums and historic places and am a big follower of Formula 1.

“I am really looking forward to using what I have learnt from my previous work experience to further Wikimedia UK’s objectives.”

Please join us in welcoming Karla to the team!

by Daria Cybulska at August 24, 2015 10:08 AM

Tech News

Tech News issue #35, 2015 (August 24, 2015)

TriangleArrow-Left.svgprevious 2015, week 35 (Monday 24 August 2015) nextTriangleArrow-Right.svg
Other languages:
čeština • ‎English • ‎español • ‎français • ‎עברית • ‎日本語 • ‎Ripoarisch • ‎polski • ‎português • ‎português do Brasil • ‎русский • ‎svenska • ‎українська • ‎Tiếng Việt • ‎中文

August 24, 2015 12:00 AM

August 22, 2015

Gerard Meijssen

#Wikidata - recent #changes

Databases change all the time. The expectation is that these changes make things, different, better. This is true for all the online resources Wikidata connects to.

There are several good reasons to refer to an external database:
  • to indicate that the external source is about the same subject
  • to acknowledge the external source served as the source for a statement
  • to indicate whether shared values match
As databases change all the time, there is little value to indicate that a database shared the same value at a given date and time. Consider for instance the item for Mr Sudar Pichai, apparently he went twice to the Indian Institute of Technology Kharagpur and to Stanford University. When two source states that he went there, one source may know what academic degree was achieved at the end of the study where the other does not. When you only verify if the information in the two sources match, both sources match. One source may not care about what degree or when it was achieved and the other does. When you quote them as the source for the statement, you expect them to fully endorse the current content. Mr Pichai went to either educational institution once. Having two statements for the same thing completely defeats the objective of Wikidata; the objective of Wikidata being useable.

Having references for statements make sense when statements are exactly the same. When they are not, arguably there is little point but indicate that all values for a source match. This can be done by showing the source in green. It is a lot more reassuring to see all sources in green than a lot of references that give no assurance that the values are indeed the same,

by Gerard Meijssen (noreply@blogger.com) at August 22, 2015 06:13 AM

August 21, 2015

Wikimedia Foundation

As Odia Wikipedia turns 13, what happens next?

Mrutyunjaya Kar 04.JPG
Mrutyunjaya Kar, Administrator, Odia Wikipedia. Photo by Jasanpictures, freely licensed under CC BY-SA 3.0.

Odia Wikipedia, one of several Indian-language Wikipedia projects, celebrated thirteen years of free knowledge contribution on June 3.

Launched in 2002—just a year after the English Wikipedia was launched—Odia Wikipedia has grown to be the largest content repository in the Odia language available in Unicode on the Internet. With over 8,900 articles and about 17 active editors (also known as “uikiali”) spread across various parts of India and abroad, the project has become more than just an encyclopedia. The voluntary editor community has put its efforts into acquiring valuable content, re-licensing them under Creative Commons (CC) licenses, and building tools for acquiring more encyclopedic content from various sources.

Statistics showing monthly page view, active editors and new editors in Odia Wikipedia for February-July 2015. Photo by Subhashish Panigrahi, freely licensed under CC BY-SA 4.0.

The community has also engaged over 2,500 people through various outreach programs, such as the Wikipedia Education Program. For its thirteenth anniversary, the community released a character-encoding converter that promises to unlock massive amounts of content—from government portals to journals, newspapers and magazines that have their content in various legacy encoding systems other than Unicode. This has been a major roadblock in the search for and reuse of digital content in the Odia language.

During our thirteenth anniversary celebrations, the community spent a day assessing community needs, addressing issues, and identifying priority areas to focus on in the future. This was arguably the first time almost the entire community gathered in a physical space. This allowed Odia Wikipedia administrator, and the most active Odia Wikimedian, Mrutyunjaya Kar and I to design a needs assessment survey. Participating Wikimedians were asked to brainstorm various problems they face in two major areas: editing and outreach.

What challenges the Odia community is facing–an infographic based on the community survey during Odia Wikipeia 13 event. Photo by Subhashish Panigrahi, freely licensed under CC BY-SA 4.0.

Two-fifths of participants said that problems with the rendering of Odia characters in different operating systems, ignorance or lack of more documentation about enabling encoding for Odia, input methods and keyboard layouts, and other font- and keyboard-related issues as the major reason for low readership and contribution on Wikipedia.

More than a third instead blamed the lack of good quality content on Wikipedia and the Internet as a whole in the Odia language, in English related to Odia language, and in Odisha on the Internet. A quarter blamed other technical issues, including the lack of mobile input in Odia, for low editorship, while low interest for contributing in Odia language by native language speakers was blamed by eight percent of survey participants.

Aditya Mahar01.jpg
Aditya Mahar, Wikimedian. By Jnanaranjan sahu, released under CC BY-SA 3.0.

Aditya Mahar, the second-most active editor on the Odia Wikipedia, feels that the biggest setback for the Odia Wikipedia is a lack of interest in contributing in Odia language. He says many Odia speakers feel that Odia is not needed to acquire and share knowledge.

“Like many others, I have been very eager to learn and share more about my home state and culture,” he says. “That’s why I started to contribute to the Odia Wikipedia, to tell my people about our rich cultural heritage in my language.”

He adds that he is concerned by the way many have been alienating Odia with the excuse of learning English to connect to the rest of the world. “I want my future generation to find everything they want to learn in Odia—from the history of Odisha, our art, architecture, Odia language and people, and our cultural extravaganza,” he says.

Mrutyunjaya Kar, an administrator on the Odia Wikipedia and the most active contributor, feels that Wikipedia is like a marathon and there is a great need for fresh blood in the community. Asked if the community will ever die out, Mrutyunjaya feels that even with a small community, the Odia Wikipedia community is always going to thrive, even if, in the worst case, only one active Wikipedian remains. So, we badly need new users to pass the baton. For him, community support and bonding with the fellow editors is the most important thing to lead a community.

Mrutyunjaya adds that creating many short articles, known as “stubs”, is a necessary evil and essential for a project like the Odia Wikipedia, as they are drafted as a by-product of collaborative editing.

“It takes little longer for a small community like Odia to expand and enhance quality of stubs though,” he says. “Many-a-times, new editors learn about Wikipedia editing while creating stubs. However, all the active Wikipedians agreed during the thirteenth anniversary not to create or promote many stubs.”

Sailesh Patnaik, Odia Wikimedian
Subhashish Panigrahi, Odia Wikimedian and Programme Officer, Access To Knowledge (CIS-A2K)

by Subhashish Panigrahi and Sailesh Patnaik at August 21, 2015 06:06 PM

Content Translation Update

August 20 Update: Fixes for Stability, Publishing Errors and Reference Issues

This week’s ContentTranslation software update was a bit different, focusing on stability rather than features or front-end bug fixes:

  • A bug in the Flow extension caused publishing failures in ContentTranslation. Although the articles were subsequently published, the contenttranslation tag was missing from the revision history, the edit didn’t appear in RecentChanges, and confusing error messages were shown. This problem has now been resolved in collaboration with the Flow project developers.
  • Deletion of references from the source article while the translation was going was causing “503” errors. This has been fixed. (bug report)
  • The configuration of Apertium machine translation services has been updated so that they would be consistently stable and not fill up the memory, causing machine translation to stop working. (bug report)

The developers are completing the work on the first phase of the feature that will show article suggestions on the dashboard as well as different bugs in the support of the Norwegian language. Also, a few general fixes in the user interface are expected next week.

by runa at August 21, 2015 12:08 PM

August 20, 2015

Wikimedia Foundation

When cultural heritage gets a digital life

Coding da Vinci 2015 - Preisverleihung (18880680843).jpg
Coding da Vinci featured 20 different projects and added 600,000 files to the Wikimedia Commons. Photo by Thomas Nitz/Open Knowledge Foundation Deutschland, freely licensed under CC BY 2.0.

An additional 600,000 free files are now available for the Wikimedia Commons thanks to Coding da Vinci, a recent cultural data hackathon held at Berlin’s Jewish Museum. They range from century-old films to recordings of mechanical pianos, World War II photographs, scans of dried flowers, and other art and heritage, all sourced from German museums, archives, and libraries.

Other achievements ranged from including 65 million pieces of metadata, such as the Integrated Authority Files (GND) and inventory of the Deutsche Digitale Bibliothek, and about 180 people came to Coding da Vinci’s 5 July award ceremony, despite heat that reached 34°C, for presentations from the competition’s 20 projects.

Still, you can find all of this information in the competition’s press report, along with the five jury prizes and “everybody’s darling” plant identification app Floradex (i.e. won the audience prize). Instead, this blog post focuses on the competition’s project that, in my personal opinion, epitomizes a very special quality of Coding da Vinci.

The Imperii-Viz project

Lehensurkunde Götz von Berlichingen Burg Hornberg.jpg
This deed (a transcription) was proof of Götz von Berlichingen‘s ownership of Hornberg Castle. Photo by Castellan/Burg Hornberg archives, public domain.

Documents similar to the deed above, a remnant from the Holy Roman Empire, are on display in museums around the world. They can be attractive to look at, but very few of us can actually read them—and among the few who can decipher the content, fewer still can understand it.

In Europe, a good number of these documents—most of them deeds of legal transactions—have survived into the present day, despite wars, fires, mold, and voracious bugs, and the sheer amount of time that has passed since they were written. Some are carefully preserved in archives, and non-medievalist historians and similar specialists rarely get to see them; their age alone makes them too precious and fragile.

Now, thanks to the Academy of Sciences and Literature in Mainz, many of these documents are now becoming accessible to a wider audience. The academy is home to the nearly 200-year-long research project “Regesta Imperii” (RI). Here, all administrative documents issued by Roman-German kings and emperors are summarized in what are known as regesta—similar in function to a book’s dust jacket—in a database. At present it contains 130,000 entries, enough that a large team of specialists from various fields was required.

One visit to RI’s website is enough to realize you have to be a specialist yourself not to get lost in the thousands of entries. Goethe once observed that you only see what you know. If you don’t know what you’re looking for because you can’t imagine what these historical documents might hold, then you won’t even be able to think of a research question. When this happens, the documents remain purely decorative: one deed quickly starts to look like a thousand others, and the visitor’s attention soon drifts away.

The dialectic takes hold

Coding da Vinci 2015 - Preisverleihung (19474485476).jpg
The Imperii-Viz team present their app at Coding da Vinci. Photo from Thomas Nitz/Open Knowledge Foundation Deutschland, freely licensed under CC BY 2.0.

In order to reach and hold onto these non-specialists, the academy decided to conduct an experiment: what would happen if they enlisted curious hackers to play around with their database? They released the data under a free license specially for Coding da Vinci, thus making them available for use in an app for the very first time. For programmers, 130,000 data sets make for a very attractive offer, and five young IT students from Stuttgart and Leipzig took up the challenge. They called the result Imperii-Viz.

On the web-based app, the RI data sets are expanded with images from Wikimedia Commons and text on the emperors and kings from Wikipedia. When the user selects a king, a heat map appears showing the European regions where this king most frequently issued such deeds.

Dr. Andreas Kuczera, a scientific researcher at RI, is very positive about the results of the experiment:

The Imperii-Viz app is really interesting. It supports a new approach we should be taking to our database, viewing it from the perspective of big data. That’s new for us. The app isn’t just making these documents available to non-professionals; it’s also helping us researchers to formulate new questions. We definitely want to continue working with the Imperii-Viz team. The first lesson we learned is that we need to standardize the names of all the rulers so the data sets can be used in a more consistent way. We now have to implement this lesson. Discussions on topics like this with the hackers at Coding da Vinci were really valuable for us.

This assessment reflects perfectly, I believe, the dialectical quality of Coding da Vinci: the dialogue and exchange of experiences between two worlds. In an age of increasingly structured data, cultural institutions can use the technical know-how of programmers to build bridges between us and our cultural heritage—thus making our world a more versatile and richer place, and helping us anchor our present lives in history.

Coding da Vinci was organized by Wikimedia Deutschland together with its partners the Deutsche Digitale Bibliothek (DDB), the Service Center for Digitization, and the Open Knowledge Foundation; several other reports from the competition are available.

Barbara Fischer
Curator for GLAM Partnerships at Wikimedia Deutschland

by Barbara Fischer at August 20, 2015 07:10 PM

August 19, 2015

Wikimedia Foundation

News on Wikipedia: Explosions in China, Brazilian protests, and more

News on Wikipedia Week of Aug 17 lead image.jpg

Here are some of the global news stories covered on Wikipedia this week:

Bomb blasts in Bangkok

Erawan Shrine, Ratchaprasong, May 2015.jpgThe Erawan Shrine, seen here in May, was targeted. Image by Cantab12, freely licensed under CC-BY-SA 4.0.

A TNT bomb was detonated outside the Erawan Shrine on the busy Ratchaprasong intersection in Bangkok on Monday (August 17). The attack, thought to have been politically motivated, killed twenty people and injured more than 100. Most of those were tourists, though several Thai nationals are among the dead. Thai Prime Minister Prayuth Chan-ocha called it the “worst-ever attack” in the country’s history. So far, twenty-three countries have issued travel advisories in the wake of the attack as Thai police track down the suspect who is as yet unidentified.

Learn more in the related Wikipedia article: 2015 Ratchaprasong bombing

Brazilians protest Dilma

Protestos de 15 de março de 2015 em São Paulo-3.jpgProtests have been ongoing in Brazil throughout 2015; here, the people of São Paulo demonstrate in March. Image by Agência Brasil, freely licensed under CC-BY 3.0 Brazil.

On August 16, protests took place around Brazil in more than 200 cities across all 26 of the country’s states. Demonstrators were again demanding the impeachment of president Dilma Rousseff, whose reign has gradually become more unpopular among some sections of the public. Though turnout was smaller than similar protests earlier in the year, sources suggest around 200,000 people took part around Brazil. A July poll found that Rousseff’s approval rating had dropped to 7.7 percent, with almost two thirds of respondents wishing to see her impeachment.

Learn more in these related Wikipedia articles: 2015 protests in Brazil, Dilma Rousseff

Massive explosions in Tianjin

Tianjin explosion scene 20150813 (20).jpgThe blasts caused extensive damage to vehicles and buildings. Image by Voice of America, in the public domain.

A series of massive explosions near the Chinese city of Tianjin last week have killed at least 114 people, including more than twelve firefighters. The cause of the blasts is not yet known, though initial reports suggest they were the result of an industrial accident at a dangerous goods containment station near the Port of Tianjin. The explosions also caused extensive damage to nearby buildings, destroying several, and were visible from space. Thousands of residents were evacuated due to the chemicals released in the blast, thought to include sodium cyanide, and most are now staying in temporary shelters.

Learn more in the related Wikipedia article: 2015 Tianjin explosions

Spieth and Day celebrate golfing successes

Jason Day 2011 cropped.jpgJason Day, pictured in 2011, won his first major at the 2015 PGA Championship. Image by Keith Allison, freely licensed under CC-BY-SA 2.0.

Australian golfer Jason Day won his first major at the 2015 PGA Championship on Sunday, August 16. He recorded a score of –20, the lowest score in relation to par ever recorded in a major. Thanks in part to this win, he rose to number three in the Official World Golf Ranking. It was Jordan Spieth of the United States who made most headlines on the day—despite finishing three strokes behind day, Spieth still climbed to number one in the ranking, superceding Rory McIlroy, becoming the eighteenth different golfer to earn this ranking since 1986.

Learn more in these related Wikipedia articles: 2015 PGA Championship, Jordan Spieth, Jason Day

Indonesian domestic flight crashes

PKYRN.JPGThe plane involved in the incident, pictured here in 2008. Image by YSSY guy, freely licensed under CC-BY-SA 3.0.

Trigana Air Service Flight 257, a 45-minute domestic flight from Sentani to to Oksibil in the eastern Indonesian province of Papua, crashed on Sunday, August 16, thirty minutes after takeoff. The plane was carrying 49 passengers and five crew, all of whom died on impact. It is Trigana Air Service‘s deadliest crash in the airline’s 25-year history, and the third-deadliest in eight months in Indonesia. The cause of the crash is being investigated; analyst Mary Schiavo suggested that “pilots don’t have enough training in their landing sequences and they need more training and more oversight”.

Learn more in the related Wikipedia article: Trigana Air Service Flight 257

Photo montage credits: “Jason Day 2011 cropped.jpg” by Keith Allison, freely licensed under CC-BY-SA 2.0.; “Protestos de 15 de março de 2015 em São Paulo-3.jpg” by Agência Brasil, freely licensed under CC-BY 3.0 Brazil; “Tianjin_explosion_scene_20150813_(20).jpg” by Voice of America, in the public domain; “PKYRN.JPG” by YSSY guy, freely licensed under CC-BY-SA 3.0.; “Erawan_Shrine,_Ratchaprasong,_May_2015.jpg” by Cantab12, freely licensed under CC-BY-SA 4.0.; Collage by Andrew Sherman.

To see how other news events are covered on the English Wikipedia, check out the ‘In the news’ section on its main page.

Joe SutherlandCommunications InternWikimedia Foundation

by Joe Sutherland at August 19, 2015 09:33 PM

August 18, 2015

Wiki Education Foundation

Wiki Education Foundation at OpenSym 2015

Researchers and practitioners of open collaboration will flock to San Francisco from August 19 to 21. They’ll be in town for OpenSym, a conference dedicated to open collaborative projects, such as Wikipedia.

For the Wiki Education Foundation, this is right in our neighborhood. That’s not only because we’re also focused on open access collaboration in classrooms. It’s also literally in our neighborhood, as the Golden Gate Club — where OpenSym is held — is just footsteps away from our office in the Presidio.

This is a great opportunity to meet with the social scientists, digital humanities researchers, and computer scientists who make up the multidisciplinary attendee list of OpenSym. Our Wikipedia Content Expert Adam Hyland will attend the conference, and we’re hosting a Welcome Reception for registered attendees on Wednesday, August 19.

We’re pleased to be joining the OpenSym conference in 2015, and look forward to contributing to the spirit of open, multi-disciplinary research and collaboration.

Photo: “Below Golden Gate Bridge” by Wa17gsOwn work. Licensed under CC BY-SA 4.0 via Wikimedia Commons.

by Eryk Salvaggio at August 18, 2015 07:07 PM

Wikimedia Foundation

Despite headlines, frequent edits don’t cause inaccuracy

Iceberg_IlulissatWikipedia articles on controversial scientific topics, like ‘Global warming,’ receive more edits. Contrary to recent media reports, this does not make them more inaccurate. Photo by Sir48, freely licensed under CC BY-SA 3.0.

Wikipedia is the encyclopedia anyone can edit. This open, collaborative model is what makes it one of the world’s most popular sources of information. It is also what makes Wikipedia reliable and accurate, as everyone can review changes and additions to its articles. Although vandalism and inaccuracies can occur, its community of volunteer editors has established mechanisms to ensure that the vast majority of inaccurate content is addressed within minutes.

Last week, a study was published in the open-access journal PLOS One: “Content Volatility of Scientific Topics in Wikipedia: A Cautionary Tale.” The study prompted a flurry of discussion around the accuracy of scientific articles on Wikipedia. The Wikimedia community has longstanding support for academic research about Wikipedia. However, the media coverage of this particular study has drawn some questionable conclusions.

According to the study, articles on politically controversial scientific articles on English Wikipedia tend to receive higher edit rates and larger edits than scientific articles considered to be politically uncontroversial. The authors cite three topics they identified as politically controversial—acid rain, global warming, and evolution—and four they identified as politically uncontroversial: heliocentrism, general relativity, continental drift, and the standard model in physics.*

It didn’t surprise us to learn that articles considered to be controversial are frequently edited. The nature of controversy, after all, is that it generates discussion and public attention. For example, while the properties of water (H2O) have been well established, the causes of the Arctic sea ice decline are the subject of ongoing scientific inquiry and political debate.

Unfortunately, the study also jumped to conclusions about what this means for Wikipedia’s reliability, overstating findings and inferring facts not in evidence. Much of the press about the study has repeated the assertion that controversial articles are also more likely to be inaccurate, despite a lack of strong supporting evidence: the study only references a handful of anecdotal examples of inaccuracies. Instead, the study simply seems to confirm that the articles chosen as controversial are, in fact, controversial and thus frequently edited. One of the authors has since responded that they intended no claim about a relationship between higher edit rates and lower accuracy.

In fact, several prior studies have found the opposite to be true, demonstrating that more edits are correlated with higher quality articles. For example, a 2007 study published in the peer-reviewed journal First Monday found “a strong correlation between number of edits, number of distinct editors, and article quality in Wikipedia.” Similarly, in 2013 researchers observed that the number of contributions to high-quality articles is about one order of magnitude higher than that of low-quality articles, according to the book Confidentiality and Integrity in Crowdsourcing Systems.

In addition, the study covered a very small sample size, using just seven of the 35 million articles available across Wikipedia’s many languages.

Wikipedia’s community of volunteer editors take the commitment to accuracy very seriously. Many of them have personal academic or data science interests. In fact, a robust discussion critiquing the methodology of this study has taken place publicly on Wikipedia.

The aim of Wikipedia is to make the sum of all knowledge available to every person in the world. External research and observation are critical to helping Wikipedia grow and improve. But in true Wikipedian spirit, we believe any research should be assessed and reported with rigor and care. It is the same approach Wikipedia editors use to keep building Wikipedia as a reliable, accurate, and neutral resource for all.

*We’ll note that we found the inclusion of heliocentrism in the category of politically uncontroversial amusing. Hundreds of years later, we hope Galileo would appreciate the nod.

Katherine Maher, Chief Communications Officer, Wikimedia Foundation
Juliet BarbaraSenior Communications Manager, Wikimedia Foundation

by Katherine Maher and Juliet Barbara at August 18, 2015 07:00 PM

Wikimedia UK


This post was written by Rebecca O’Neill


Just a few of the books Rebecca used during 100wikidays

It is hard to believe that I completed the #100wikidays challenge on the 9th August, as the time absolutely flew by. The challenge, as many people know, is to write an article a day for 100 days straight and draws on the idea of the 100 days of happiness. Within a few days of the challenge being mentioned to me by Asaf from the foundation, I had fallen down the rabbit hole and created a To Do list in my user space. Unlike the project’s originator, Vassia, I could not place my faith in finding a subject on each day or letting the article subject find you, I needed a plan of attack. As I’m involved in Wikimedia Community Ireland, I had become familiar with the list of Irish National Monuments through our running of Wiki Loves Monuments, and knew that many did not have articles. That was my jumping off point. From there I went to my own areas of interest, Irish naturalists from around the turn of twentieth century, and Irish museums. I choose these areas as I worked for a number of years in the Natural History Museum in Dublin and had become intrigued by the social history and people behind the specimens. My excuse on the museums is a childhood spent in local museums dotted across the county as my parents attempted to entertain children and visitors over the years. Soon enough I had a list of almost 100 potential articles right there.

Although I was not entirely new to creating new articles, I certainly had not created many, so had a lot of the new(ish) editor fears of deletion or criticism. Particularly as I am a woman, you sometimes come primed to expect a little push back, and as I began to focus on women more and more I wondered would I ever have the notability of my articles contested. I was pleasantly surprised. All of my articles are surviving as of right now, and I’m delighted to say that some have been improved upon since I created them. There was no greater pleasure for me than to see an article on an Irish botanical artist or collector edited by someone else adding to the story. It meant that I’m not the only one on Wikipedia who cares!

Soon the Irish naturalists and botanists I was writing about led me to the list of Irish botanical illustrators, which had its fair share of red links. It was finding this that led me to searching the Dictionary of Irish Biography for female entries that mentioned the word “artist”. Suddenly the flood gates opened. Having been an art student in a previous life I have some interest and limited knowledge of art history, and even I was shocked to find the obvious omissions from Wikipedia of Irish female artists. I had found a niche that felt more like a lacuna. If I had fallen down a rabbit hole with 100wikidays, I was through the looking glass now, with a seemingly endless list of artists to write about! Every one artist seem to alert me to at least one or two more red links. As it turns out, 100 days was never going to be enough. It looks as if a second challenge may be on the horizon for me, and rather than just having a general Irish theme it may be 100 Irish women, as there seems to be no end in sight.

Many of the red links languishing in my To Do list are still National Monuments and museums. Non-promotional, non-tourist driven, and comprehensive sources were hard to come by. My hope is to find homes for some of these smaller, or more obscure monuments and institutions, within other articles on their localities etc. Some people I have listed are perhaps not suited to Wikipedia and may be retired from the list, though I hold out hope for some of those early geologists and botanists yet! Doing the challenge has definitely made me a more confident Wikipedian, it has made me feel more like a “real” Wikipedian too, rather than just an enthusiast. I have met some wonderful people both on Wiki and in real life through it, and it has made editing more of a daily habit for me. Saying that I have taken a short break in editing to get PhD and other work done, but it is only a matter of time before another 100 days begins. Having written about everything from the stump of a windmill, to a butter museum, to an almost literal flying nun, I feel like this might only be the beginning.

by Rebecca O'Neill at August 18, 2015 01:08 PM

August 17, 2015

Wikimedia Foundation

How Wikipedia responds to breaking news

Msc 2008-Saturday, 09.00 - 11.00 Uhr-Moerk001 Sa.jpgWikipedia is capable of covering news like any news agency. Photo by Kai Mörk, freely licensed under CC BY 3.0 (Germany).

For almost fifteen years, the scope of topics that Wikipedia covers has grown steadily. Now, the free online encyclopedia covers everything from music, film and video games to geography, history, and the sciences. It also contains articles on topics trending in the news, updated by tens of thousands of volunteer editors as swiftly as the news breaks.

To investigate aspects of this phenomenon, such as the speed with which breaking news is covered on Wikipedia, the verifiability of information added over time, and the distribution of edits among Wikipedia’s editors, I selected an article for further analysis in the form of a dissertation.[1]

Comparing page views and daily edit counts for the article, highlight key elements in the story’s development. Image by Joe Sutherland, freely licensed under CC BY-SA 4.0.

The article selected was “Shooting of Michael Brown“, which covered the killing of 18-year-old Michael Brown in Ferguson, Missouri, by police officer Darren Wilson. The incident attracted much press attention fuelled by local protest in the suburb of St. Louis. I observed the article’s history until January 12, 2015.

The resulting data was split into two “peaks” in the development of this story: the initial media scramble after protests began in mid-August, and the Ferguson grand jury’s decision not to indict Wilson for the teenager’s death in late November.[2] Each “peak” represented 500 individual “revisions” of the article in question. The use of peaks in this case allowed for cross-case analysis—that is, a direct comparison between two case studies.

Speed of editing

607 Journalists - editing speeds both peaks.pngGraphing the speed of editing across both peaks of development. Image by Joe Sutherland, freely licensed under CC BY-SA 4.0.

Notably, pageviews and edit rates didn’t line up as one might expect. Instead, there was a great flurry of edits a few days after the article was created, presumably as the editing community learned of the article’s existence or heard about the event. The speed of editing was incredibly fast during this initial period of rioting and press attention, though these speeds were highly inconsistent. The mean editing rate across this period was 18.57 edits per hour, more than eleven times the overall average for the article.

Media coverage, however, seems to have a much more acute impact on pageviews: upon Darren Wilson’s indictment decision in November, almost half a million people visited the article in just one day. A somewhat surprising observation was that this second peak resulted in much slower rates of editing. The mean for this period was just 7.21 edits per hour, which was two and a half times slower than in the first. It is also very inconsistent, mirroring the first peak—editing speeds varied widely throughout both peaks and were largely unpredictable.

In terms of text added to the article, the first peak—which was observed over a much shorter period of time—saw an average of 501.02 bytes of text added per hour, some 3.6 times quicker than the rate of the second peak. By then, however, the article was much longer and the causation can likely be that there wasn’t much left to add by that point.

Use of sources

To judge the article’s accuracy is a very difficult task, which would by its very nature be subjective and require an in-depth knowledge of what happened in Ferguson that afternoon. To this end, I instead looked at the verifiability of the article—specifically, the volume of sources per kilobyte of text, referred to for this study as the article’s “reference density”.

607 Journalists - reference densities per peak.png“Reference densities” over each peak. Image by Joe Sutherland, freely licensed under CC BY-SA 4.0.

Ten samples were taken systematically for this research from each peak, and their references tallied. This was used in conjunction with the page’s size in kilobytes to find the reference density.

In both peaks, the reference density steadily increased over time. It was significantly higher overall in the earlier peak, when the article was shorter and rapidly-changing information required more verification. This rise in reference density over time likely indicates Wikipedia editors’ desire to ensure information added is not removed as unverifiable.

The majority of sources used in the article were from publications which focus on print media. This is more obvious in the second peak than the first, where local newspaper The St. Louis Post-Dispatch became much more common among the article’s sources.

607 Journalists - locations of sources.pngOrigins of sources used within the article per peak. Image by Joe Sutherland, freely licensed under CC BY-SA 4.0.

Relatedly, it was discovered that a high volume of the sources were from media based in the state of Missouri, obviously local to the shooting location itself. The proportion falling into this category actually increased by the second peak, from just over 18 percent to just over a fifth of all sources. Other local sources which were regularly used in the article included the St. Louis American and broadcasters KTVI and KMOV.

It was the state of New York which provided the majority of sources, however; this seems to indicate that editors tend towards big-name, reputable sources such as the New York Times and USA Today, which both placed highly on ranking lists. Notably, the state of Georgia was almost exclusively represented by national broadcaster CNN, yet still made up around 10 percent of all sources used.

Range of contributors

Finally, the editing patterns of users were examined to judge the distribution of edits among a number of groups. To do this, users were placed into categories based on their rates of editing—which, for the purposes of this study, was defined as their mean edits per day. Categories were selected to divide editors as evenly as possible for the analysis, and six bots were excluded to prevent the skewing of results.

Edits/day Category Count % Count of which status % Status
40+ Power users 27 4.49% 20 74.07%
10–40 Highly active users 73 12.15% 38 52.05%
5–10 Very active users 67 11.15% 26 38.81%
1–5 Active users 105 17.47% 19 18.10%
0.1–1 Casual users 92 15.31% 4 4.35%
0.01–0.1 Infrequent users 62 10.32% 0 0%
<0.01 Very infrequent users 13 2.16% 0 0%
IPs Anonymous users 162 26.96% 0 0%
Total/average 601 100% 107 17.80%

Clearly, the majority of users in the highly active and power users brackets hold some kind of status, whether that be the “rollback” tool given out by administrators, or elected roles such as administrator or bureaucrat. This at least implies that more daily edits can translate roughly into experience or trust on the project.

Looking at data added per category, highly active users have been responsible for the vast majority of the total content added to the article—over half of the total. However, breaking it down into mean content added per edit for each category provided some intriguing results.

607 Journalists - content added per edit per experience category.pngMean content added per edit, in bytes, per experience category. Image by Joe Sutherland, freely licensed under CC BY-SA 4.0.

While the highly active users take this crown too, it is a much closer race. Perhaps unintuitively, “casual” editors—those with fewer than one edit per day, but more than 0.1—added an average of 95.81 bytes per edit, and the category directly below that added 93.70 bytes per edit. This suggests that article editing is not just done by the heavily-active users on Wikipedia, but by a wide range of users with vastly different editing styles and experience.

Edits to the article were most commonly made by a very small group of users. Indeed, 58 percent of edits made to the article were by the top ten contributors, while over half of contributors made just one edit. Text added to the article followed the same pattern, though more pronounced: the same top ten contributed more than two-thirds of the content article content. This lends weight to theories that Wikipedia articles tend to be worked on by a core “team”, while other individual editors contribute with more minor edits and vandalism reversion.

Overall, the study shows that Wikipedia works on breaking news much like a traditional newsroom—verifiability is held in high regard, and a “core group” of editors tend to contribute a vast majority of the content. Editing rates, however, do not match up as obviously with peaks of media activity, which is worth investigating in future more qualitatively.

If you’re interested in reading the full thesis, it’s available from my website. For more academic research into Wikipedia, consider subscribing to the monthly Wikimedia Research newsletter.

Joe Sutherland, Wikimedia Foundation communications intern


    1. Others have done research into this area; their work, methods and outcomes heavily influenced this study. In particular, Brian Keegan‘s work was instrumental in guiding the direction for this research. His 2013 study into breaking news, co-authored with Darren Gergle and Noshir Contractor, covers a far wider range than this thesis did.
    2. The first peak depicted is the 500 edits made between 09:38 UTC on 16 August 2014 and 17:54 UTC on 18 August 2014 (a period of 2 days, 8 hours and 16 minutes); the second is between 00:57 UTC on 23 November 2014 and 22:36 UTC on 01 December 2014 (a period of 8 days, 21 hours and 39 minutes).


by Joe Sutherland at August 17, 2015 08:36 PM

Wiki Education Foundation

The Roundup: Helping out

The “helper theory” in social psychology is based on the idea that when you help someone, you benefit, too. The theory suggests that helpers draw various benefits from their support, including the meaningful development of abilities through teaching others.

The theory seems like it could explain some of the benefits we see when students help articles improve on Wikipedia. So we were happy to see that the article on helper theory was developed by student editors in Dr. Sheryl Boglarksy’s Social Psychology course at the University of Detroit-Mercy!

Those students expanded the article from a short stub with just 107 words and three references, to a deeper article with 946 words and 15 quality references.

Thanks to Dr. Sheryl Boglarksy and her class of student editors for this great contribution to Wikipedia.

by Eryk Salvaggio at August 17, 2015 04:49 PM

Tech News

Tech News issue #34, 2015 (August 17, 2015)

TriangleArrow-Left.svgprevious 2015, week 34 (Monday 17 August 2015) nextTriangleArrow-Right.svg
Other languages:
čeština • ‎English • ‎Esperanto • ‎español • ‎français • ‎עברית • ‎italiano • ‎日本語 • ‎Ripoarisch • ‎lietuvių • ‎Nederlands • ‎polski • ‎português • ‎português do Brasil • ‎русский • ‎svenska • ‎українська • ‎Tiếng Việt • ‎中文

August 17, 2015 12:00 AM

August 16, 2015

Pete Forsyth, Wiki Strategies

Read between the lines: Wikipedia’s inner workings revealed

Wikipedia's web features let you learn more about what you're reading than you can from a traditional book. Photo (c) Julia Spranger, licensed CC BY

Wikipedia’s web features let you learn more about what you’re reading than you can from a traditional book. Photo (c) Julia Spranger, licensed CC BY.

Since it launched in 2001, Wikipedia has become the most widely-viewed source of original content in the world. But the way it’s built is utterly different from other publications: newspapers, books, traditional encyclopedias. Most of us have a basic understanding how a traditional publication is created; for instance, the difference between a reporter and a news editor, or the role of a publisher as a gatekeeper in determining what authors get published. But that understanding won’t help much when it comes to Wikipedia.

But there’s plenty you can do to learn about how a Wikipedia article has evolved, and assess things like its accuracy bias. Wikipedia is built on the principle of transparency; with very few exceptions, every edit in an article is preserved, and publicly visible. So is every discussion about how the article should be written or edited.

But if you want to explore these edits and discussions, you will need a basic understanding of how Wikipedia works. Below is an exercise you can complete in 30-60 minutes, designed to help you build some skills as a critical Wikipedia reader.

Before you get started, review this case study of the evolution of the article on Celilo Falls. Don’t worry about absorbing every detail; but take in a view of the kind of things Wikipedia editors do. Keep the case study open in a browser tab while you do the following exercises.

  1. Explore that article’s “Talk” page. Read through the beige boxes at the top. Explore some of the links in them. Can you find one or more that were mentioned in the case study? (These beige boxes are typical of an article’s talk page, but they have nothing to do with discussion; it’s just an oddity of Wikipedia that this is where they happen to live.)
    Now explore the rest of the page. Do you see any discussions related to the case study?
  2. Explore that article’s “View history” tab. (See below for some help with this one.) Use the date fields at the top of the page to navigate to one of the dates mentioned in the case study. Can you find any of the edits mentioned in the case study? Do you see any others that are interesting?
    Now look at the line near the top of the view history page labeled “external tools.” Try clicking some. Can you find out how many times the article was viewed in the last month? Who has written the bulk of the article? Can you find when a certain sentence was originally added?
  3. Look at the public logs for this article. (The link is at the top of the view history page.) Logs will reflect certain administrative actions, for instance, protecting an article from anonymous edits. For the Celilo Falls article, you won’t see any entries; but for a more controversial article, you probably will.
  4. Finally, look at the “page information” for this article. (The link is under “Tools” on the lefthand side.) Anything interesting here?

Once you are familiar with each of these links, try them all with another article — pick a topic you know or care about. Run through each screen again, and see if you can learn anything interesting about the article’s evolution. In particular, try step #3 with a highly controversial article (for instance, a major politician or political topic).

Guide to the “view history” screen

The "View History" tab of any Wikipedia article is a vital tool in understanding how it has evolved.

The “View History” tab of any Wikipedia article is a vital tool in understanding how it has evolved.

The view history tab of every Wikipedia article provides a great deal of information. The color coding here will help you understand what you’re looking at. (In the upper left of the graphic, notice that it’s the “View history” tab for the Article, not for its Talk page; each has its own history.)

Below the header section, each line beginning with a bullet reflects a revision to the article.

  • The green column has tools for to comparing any two revisions of the article.
  • The yellow column states the date of that revision; clicking it will take you to the version of the article as of that date.
  •  The grey column tells you who made the revision, and provides links relevant to that user.
  • The blue column tells you about the size of the article as of that revision (and will indicate if the user labeled the edit as “minor,” and a few other things.) This is probably the least important of the columns.
  •  The pink column shows the edit summary the user provided (and in some cases, additional information).
  • ░ The unhighlighted links at the right give you the ability to easily undo recent edits, or thank the user for making the edit. Try both links; don’t worry, neither will take any action without you confirming it first. (You have to be logged into an account to thank somebody.)

For a more thorough overview of the view history screen, see this 10 minute video, which covers each item and link in detail; and see this brochure (designed to be printed) for a general overview of evaluating Wikipedia article quality.

by Pete Forsyth at August 16, 2015 10:05 PM

Outernet edit-a-thon

The Wired CD, one of the "bins" I uploaded to Outernet this weekend.

The Wired CD, one of the “bins” I uploaded to Outernet this weekend.

I was recently introduced to Thane Richard, founder of Outernet, and was honored to help him think through the design of Outernet’s first edit-a-thon, held this weekend. Much like our Wikipedia Barn Raising (a year ago to the day!), Thane planned an in-person event, but also invited participation from all over the world.

Amusingly, the Wikipedia Twitter feed referred to it as a chance to “build Wikipedia” — but this was actually a different kind of edit-a-thon, designed to build Outernet, an entirely separate project to help bridge the Digital Divide. Outernet puts old satellites to use, broadcasting “bins” of data to (as of now!) the entire world — even places where the Internet doesn’t reach. It’s one-way communication — you can’t upload, or access the entire Internet through it — but once you have their $150 “Lantern,” you can receive the broadcasts for free, and share them for free on a local network.

This weekend’s event was an opportunity to learn Outernet’s procedures for creating and uploading “bins” — basically, a folder of files in a certain theme, unencumbered by restrictive copyright — for future broadcast via Outernet.

On the surface, it seemed like a cool opportunity to package up Wikipedia articles. I started creating PDFs of the articles about the watersheds of Portland, Oregon (most of which are exceptionally high quality, thanks mostly to the efforts of Wikipedia user Finetooth); however, for reasons I will explore in a followup blog post, I had some issues with providing attribution soon realized that there was no convenient way to upload these in compliance with Wikipedia’s attribution requirements (which means naming all the people who have contributed to the articles). So instead, I uploaded a small bin of articles about Open Educational Resources, and another with the music from Wired Magazine’s 2004 CD “Rip. Sample. Mash. Share.”

I uploaded them just after the end of the edit-a-thon, so I haven’t yet gotten any feedback. I hope these are useful — but it’s possible they won’t be, since in an effort to move forward and actually upload something, I mostly disregarded the guidelines about what kind of content was most desired. But even if this was just a “practice run,” I’m happy to have gotten a feel for how they Outernet is approaching their excellent work, and learn how I can contribute. I could tell from their Etherpad page that a number of people were working at it too; it was fun to work with an ad-hoc global team. I look forward to contributing more substantially to Outernet’s future efforts!

by Pete Forsyth at August 16, 2015 08:41 PM

August 14, 2015

Wiki Education Foundation

Wiki Ed announces ASPB partnership

I’m excited to announce that Wiki Ed has signed a partnership agreement with another academic association—the American Society of Plant Biologists (ASPB). Plant scientists and their students will add important research to Wikipedia, making information available to people outside of the discipline. ASPB’s mission to advance the scientific study of plants aligns with our goal of improving Wikipedia’s science articles, making them the perfect partner for the upcoming Year of Science.

Mission alignment is one of the primary reasons Wiki Ed focuses on building educational partnerships with academic associations. Targeted outreach to experts in a shared discipline means we not only increase Wikipedia’s breadth and depth in that topic area, but we also open the door to systematic support for students. We can create subject-specific handouts, generate content gap lists, and use an association’s resources and expertise to identify women in the field who are underrepresented on Wikipedia.

Outreach Manager Samantha Erickson and I attended ASPB’s conference last month, and we will continue to engage their members to improve Wikipedia’s coverage and quality of plant science. Members can join the Classroom Program by assigning their students to edit Wikipedia, or they can host a Visiting Scholar at their institution’s Biology department or science library.

The Year of Science starts in January 2016, so we’re excited to engage scientists and their students now, as it takes time to develop meaningful, impactful partnerships with like-minded organizations. ASPB is our first partner for the Year of Science, and we are in talks with other experts who have identified Wikipedia as an important resource that needs to reflect the scholarship in their field. We will also continue partnering with non-science academic associations.

If you are interested in starting a Wikipedia initiative or participating in the existing initiatives with the National Women’s Studies Association, the Midwest Political Science Association, the American Sociological Association, or the Association for Psychological Science, please contact me at jami@wikiedu.org.

Jami Mathewson
Educational Partnerships Manager

by Jami Mathewson at August 14, 2015 10:20 PM

Gerard Meijssen

#Wikidata - Mr Sundar Pichai

I heard of a dispute about the facts of Mr Pichai's study by Wikipedians. That was yesterday so I hoped that some of that discussion would transpire at the item for Mr Pichai.

Mr Pichai's item is indeed in need of serious attention. The stated place of birth should be more specific and, his education has the same school entered twice for no obvious reason. He was born in India but Wikidata has him as an "Indian American" for whatever reason.

The information when you Google Mr Pichai is much better. When Google and Wikidata were to compare each others records, the Wikidata item would certainly be flagged as problematic.

As a lot of Wikipedians have invested serious attention to Mr Pichai, comparing the Wikipedia article will expose the weakness of the Wikidata entry. I am not particularly interested in Mr Pichai, I leave it for someone else to sort this out.

by Gerard Meijssen (noreply@blogger.com) at August 14, 2015 04:50 AM

August 13, 2015

Wikimedia Foundation

When countries disappear

Sigiriya frescoes.jpg

The Anuradhapura Kingdom, a former state in Sri Lanka (c. 377 BCE–1017), left several paintings and frescoes behind. This one, from Sigiriya, is the oldest and best preserved from that period. Photograph by Chamal N, public domain.

History shows that there is a long list of countries that have simply ceased to exist, but there is no one way to go about it. The Russian Empire dissolved in violence in 1917, and its successor state (the Soviet Union) gave way to present-day Russia in the 1990s. The modern-day split of Czechoslovakia was peaceful, while nearby Yugoslavia broke apart during a civil war into several new countries. The Republic of Texas was willfully annexed by the United States. The Republic of Vietnam was taken over by its northern counterpart; similarly, the Songhai Empire was briefly annexed by Morocco, and the Anuradhapura Kingdom of Sri Lanka was overtaken by armies from India.

The Songhai Empire once covered much of west Africa; it included Timbuktu, seen here. Engraving from Le Tour du Monde, public domain.

To cover this diverse set of geographical entities, the English Wikipedia has formed WikiProject Former countries. The project was created in 2004 and currently boasts 42 active editors; aside from the countries listed above, it includes several states that were critically important in world history, including Assyria, Sumer, the Mongol Empire, and Nazi Germany.

44 of the project’s articles have attained “featured” status, as determined by a peer review from editor colleagues. These include the encyclopedia’s coverage of the Byzantine and British Empires, both sprawling and continent-spanning, along with an entire series on the Brazilian monarchy of the nineteenth century—including the first and second emperors, and the latter’s wife, first son, second son. Another 77 articles are rated as “good.”

Last month, the Signpost, the English Wikipedia’s community-written news journal, interviewed two members of the former countries project about their background and goals for the future.

OwenBlacker noted that his grade school experiences with history as a topic were less than stellar, something that put him off for many years “until I discovered my uncle’s copy of The Times Atlas of World History and realised that learning more about history meant there were even more maps to look at—always a guilty pleasure. That helped me realise it wasn’t history I disliked, just the parts I [had been taught]. Since then I have read (and looked at maps) of history in great quantity.”

OwenBlacker and his fellow project member, MirkoS18, differed greatly in their choices of favorite topics. Owen chose pre-Napoleonic Europe, a topic in which wrote the good article Principality of Stavelot-Malmedy—a small, strangely shaped voting Imperial abbey of the Holy Roman Empire. Mirko—who hails from Croatia—focused on Yugoslavia. He noted that this was a very different choice from much of the project, as it occurred only recently, and he believes that “there is almost no one who would be emotionally distant and neutral. … [their objectives] are glorification or vilification.”

The project’s largest need is in article writing. Owen noted that their coverage of the Global South was sorely lacking and showed on articles like the Songhai Empire, “one of the largest empires in Islamic and African history,” but currently has a low article quality rating.

Ed ErhartEditorial intern

by Ed Erhart at August 13, 2015 10:25 PM

Content Translation Update

August 13 Update: Fixed Gallery Links, Title Alignment and Reference Templates

Here’s a summary of the changes in Content Translation software in Wikipedia today:

  • On August 12 there was an issue in the machine translation servers, and machine translation didn’t work. It is now fixed, and we are working with the Wikimedia Operations team on making these servers more consistently stable.
  • Links in galleries in the translation interface were opening the target page in the same browser tab and closing the translation interface, which was non-destructive, but uncomfortable. Now they open in a new tab. (bug report)
  • When the target article title was edited, it was not automatically aligned, and this caused the rest of the paragraphs to be misaligned as well. This is now fixed. (bug report)
  • Another bug in support of reference templates was fixed, an edge case of translating the references list first and the text later. (code change) A similar bug with images was fixed as well. (bug report)

The team continues to search for the reasons for the loss of translated paragraphs and for publishing errors. Please report any such errors if they still happen. It’s a very high-priority issue, but we need more examples for investigation.

On the new features front, the team is working on the first stage of the feature that displays suggestions of articles to translate in the dashboard.

by aharoni at August 13, 2015 08:14 PM

Wikimedia Tech Blog

Content Translation updates from Wikimania 2015

Wikimania Translathon 20150718 162444.jpg

Content Translation session at Wikimania 2015. Photo by Amire80, freely licensed under CC BY-SA 4.0.

Wikimania 2015, the 11th edition of the annual gathering of Wikimedians from around the world, was recently held in Mexico City. The Wikimedia Foundation’s (WMF) Language Engineering team participated in the Hackathon and Wikimania sessions, hosting several talks and two translation workshops. The primary focus was the Content Translation project—interacting with users, understanding their issues, and raising awareness about this new article creation tool.

During the Hackathon and Wikimania, the Language Engineering team members met with Content Translation users. New users were introduced to the tool and they generally provided encouraging initial feedback. Deeper discussion revealed several issues, some of which were quickly investigated and resolved. The ‘translathon’ sessions on day two and three of Wikimania were well attended and some attendees created their first articles using Content Translation. On the first day, 21 new articles were created in just an hour by 16 participants. While several issues surfaced, the participants provided suggestions that would be helpful to better support article translation. The second translathon was conducted in Spanish and aimed at Central and South American languages. The main conference sessions were not recorded, but you can view the presentation slides.

Upcoming plans

The Language Engineering team follows a three month development cycle that allows us to plan and showcase the outcome alongside the larger departmental and organizational goals. The results from work done between April to June 2015 can be seen in the Quarterly Review presentation. Highlights included making Content Translation available as a beta-feature on all Wikipedias, making it easily accessible for users, and better representation of analytics.

Until end of September 2015, we plan to do the following:

  • Resolving blocking problems identified by the community to begin initial preparations in making Content Translation eventually usable as a non-beta tool.
  • The initiative to engage translators continues with newer ways to connect with users and to help them return and translate more articles.
  • The translated content being generated through Content Translation is an important asset for ongoing development in machine translation systems. Support for parallel corpora from Content Translation would be our contribution in this endeavour.
  • Extending support for mobile users is a key focus area for the Wikimedia Foundation. As part of the greater initiative, we will begin an initial exploration of how Content Translation can support this effort.


Interactions and testing sessions

More articles are being published each day with Content Translation. Photo by Pau Giner, freely licensed under CC BY-SA 4.0.

In addition to our usual channels for communication, like the Content Translation talk page and Phabricator, we plan to host another online interaction session shortly. The last session was held in June 2015 simultaneously on Google Hangout and IRC. While we plan to follow a similar format this time, we are open to feedback on what can work best for our participants.

In addition, our UX interaction designer Pau Giner is hosting testing sessions over the next few weeks. Please sign up if you would like to participate in these sessions and provide inputs about the future features of Content Translations.

A round-up of activities from the Language team is available in our monthly report. For Content Translation, we now have a weekly newsletter of the new features and bug fixes.

Runa Bhattacharjee
Language Engineering (Editing)
Wikimedia Foundation

by Runa Bhattacharjee at August 13, 2015 12:29 PM

Weekly OSM

weekly 259-260-261-262 – July


Campuskarte mit 3d BuildingsCampus with 3D Buildings [1]

About us

  • Since issue #262, Marc Gemis from Belgium has joined the German Wochennotiz team and adds news from Belgium and from the French community.
  • We are also happy to announce that Ruben Lopez Mendoza from Ayacucho, Peru has joined the Spanish weeklyOSM team. Bienvenido Ruben!


  • OSM may continue to use Bing imagery even after the takeover by Uber.
  • SK53 asks on twitter how to tag this new electricity tower design.
  • Martijn van Exel announces a Maproulette challenge about missing railroad crossings in the USA.
  • Mapbox published the internal OSM mapping instructions that are used by their data team.
  • Yandex (a Russian search engine with a map portal) now offers a Street View service. The images (not the map and aerial photographs) may be used for mapping. (automatic translation)
  • OpenHistoricalMap builds maps on its own infrastructure using OSM tools. (via @GuerrillaCarto)
  • On GitHub it has been possible for some time to represent a GeoJSON file via an OSM Layer (example). Now a link to the MapBox “Improve this map” feature has been added to help resolve (via iD or OSM Notes) any map errors.
  • On the talk-fr list there has been a discussion about the border between France and Italy in the Mont Blanc area. This was followed by a discussion on the tagging list about how to map border disputes.
  • Marc Zoutendijk explains how you can find strange tagged POIs with the OpenPOIMap. To do this click in the top right corner of OpenPOIMap “User POIs” and use a query e.g. amenity=shop.
  • In a twelve-part series, Tlatet analysed the quality of POIs in OSM (specifically retail in England).
  • Andreas Weber and Dennis Nienhüser (FZI Karlsruhe) have created a system for use with a Segway that recognises traffic signs so that they can be uploaded it to OSM (Video).



OpenStreetMap Foundation

  • The draft minutes of OSMF´s board meeting as of 2015-06-15 is now available in the OSMF-Wiki.
  • On July 20 the first public OSMF board meeting was held. The minutes and an audio recording of the meeting will follow soon.

Humanitarian OSM


  • Users ccalista explains in her blog how current OSM data can be loaded into OsmAnd via Overpass to test the most recent changes with OsmAnd.
  • Andy Allan proposes (in an issue of the OSM Carto project) to introduce a directive to decide what new features are to be included in the rendering of the main style on osm.org and which are not.
  • Adrien Pavie published OpenLevelUp!, an online map for watching objects in the specific floors of indoor-mapped buildings
  • The Austrian “Der Standard” reported on the company Pentamap, a spin-off from the Geodetic Institute of the Technical University of Graz. The company works on an off-road routing in the Alps for mountain rescuers and hunters using aerial photography and digital terrain models and OSM data. (automatic translation)
  • [1] The University of Leicester has created a map of the campus based on OSM data and OSM Buildings. (via @GISPowered)
  • The GPSies website hasbeen redesigned. See the blog post here.
  • Mateusz Konieczny presents a new default map style version and asks for feedback (see other diary entries for previous posts on the same topic).
  • On the British OSM mailing list, one topic is a suggestion for a rendering style specifically for the UK.
  • Omar Ureta visualised OSM-based tiles with Stamen-Watercolour style with terrain elevation data of the regional administration.


  • The Vehrkehrsmanagementzentrale (traffic management center) of Lower Saxony published a new website. There are shown traffic jams and construction sites all over Germany. The map display on Tablet and Desktop uses OSM maps.


  • The Austrian Federal Office of Metrology and Surveying (Bundesamt für Eich- und Vermessungswesen, BEV Österreich) has published all address details with pinpoint accuracy. Thomas Ruprecht is in contact with the BEV, to allow the use of these data and the administrative boundaries for OSM.
  • The Belgian Council of Ministers adopted a new open data strategy. Basically, all the records are to be published from 2020 under CC0 license.


  • As the main client of MapDB has withdrawn, Jan Kotek is looking for new sponsors for the project.
  • Michael Zangl presents the first release of its OpenGL MapView of JOSM, (part of the Google Summer of Code).
  • Do you or your company want to sponsor a new QGIS-Feature?
  • Developer Dennis Nienhüser has released a new Version 1.11.3 of Marble for Windows.
  • The Geometalab of the University of Rapperswil is working on an extract-tool for OSM data for various formats: GeoPackage, Shapefile, FileGeodatabase and Garmin (.img).
  • Version 1.1.1 of the iOS-Port of OsmAnd was released.
  • The OSM-based and open source Geo Search Engine Photon now also supports “Reverse Geocoding”, i.e. output of address to a transferred X / Y coordinate.
  • User Amaroussi asks on Twitter, how to deal with the imminent end of Flash in respect of Potlatch. Luckily in the future there will still be ID, Level0, Merkaartor, JOSM, Vespucci and, er, Potlatch!.
  • NGA and DigitalGlobe have jointly published Hootenanny. A free and open project to facilitate the handling of large amounts of spatial data. (via un-spider)
  • MapBox published a first release of a minimalist C ++ 11 protobuf de- and encoder. (via @springmeyer)
  • Mapbox has created a node.js and browser Javascript client for Mapbox services (geolocation, routing, etc.). (via @tmcw)
  • The backend code of the OSRM project can now be compiled under Windows as a DLL. (via @tdihp)
  • Omniscale published a tool called Magnacarto, which can convert map styles from CartoCSS format into Mapnik XML or MapServer mapfile.

Did you know …

  • … about the 3D rendering from F4Map?

Other “geo” things

by Andy Townsend at August 13, 2015 10:29 AM

Gerard Meijssen

#Wikidata - #Quality, #probability and #set theory

The problem with any source is that it has errors. It cannot be helped. There is always a certain percentage that is wrong. When you take all the items of Wikidata that have statements, the type of process that added those statements provides an indication of the percentage of errors that were included.

I made thousands of mistakes. In a way I am entitled to have made those mistakes because I made over 2 million edits. Amir made even more edits with his bot. Because of the process involved the percentage of his errors will be fewer. When you only look at Wikidata and its items, you can be confident that these errors exist, you can be confident about what percentage is likely but there is no way to make an educated guess what is right or what is wrong. The only way to improve the data is by sourcing one statement at a time. It is a process that will introduce its own errors. That is something we know from experience elsewhere.

To add value to Wikidata, we need both quality and quantity. Let us consider the use of external sources that are known to have been created with the best of intentions. Consider one type of information, the place of birth for example. It is highly likely that Wikidata and that external source have many items in common. Once they are defined as being about the same person, we can use the logic of set theory. We can establish the number of records where both have a value for the place of birth. We can determine the amount of matching items, we can determine the number where one has a value and the other does not and, we can determine the number of items where there is a mismatch.

It is probable that most errors will be found where Wikidata and the source do not match. It is certain that even where the two match there will still be factual errors as both can be wrong.

Quality and confidence have much in common. Wikipedia has quality but we know it has issues. Wikidata has quality but we know it has issues. The easiest and most economical way to improve the quality of Wikidata is by comparing sources, many sources and concentrating on the differences. It is easy and obvious and when we ask someone to add a source to a statement we are confident that the result matters. It matters for both Wikidata and the external source.

This approach is not available to Wikipedia. It cannot easily compare with other sources and therefore there is no option but to source everything. Given that many statements find their origin in Wikipedia, new insights in Wikidata may prove a point and a need to adapt articles.

Consequently, applying set theory and probability will enhance the quality of Wikidata. It will help drive fact checking in Wikipedia and it is therefore the best approach to improve quality. Accepting new data from external sources and iterating this process of comparison will ensure that Wikidata will become much more reliable. Reliable because you can expect that the data is there and, reliable because you know that quality has been a priority in what we do.

by Gerard Meijssen (noreply@blogger.com) at August 13, 2015 06:39 AM

August 12, 2015

Wikimedia Foundation

Wikimedia Highlights, July 2015

Wikimedia highlights, July 2015 lead image.jpg

Here are the highlights from the Wikimedia blog in July 2015.

Konkani Wikipedia goes live

File:Darshan Kandolkar talks about Konkani Wikipedia.webm

Wikimedian Darshan Kandolkar shares his experience of contributing to Konkani Wikipedia. Video in Konkani. Video by Wikimedia India, freely licensed under CC-BY-SA 4.0.

Goan Konkani Wikipedia went live on June 18 after being in the Incubator for around nine years. The project went through many challenges, but the hard work of the editor community, mostly from the Indian state of Goa, paid off in bringing the project out of incubation.

I have a dream to start a project for the freedom fighters of Goa and involve a diverse set of people, from students to journalists and columnists. I also want to build partnership with educational institutions so we could engage with the students for a longer run and the existing Konkani community could mentor them.

Darshon Kandolkar.

“Becoming involved in making the changes you want to see”: Leigh Thelmadatter

Leigh Thelmadatter-7697.jpg
Leigh Thelmadatter, photographed for the 2012 Wikimedia Foundation fundraising campaign. Photo by Karen Sayre for the Wikimedia Foundation, released under the CC BY-SA 3.0 license.

A university teacher and Wikipedia volunteer, Leigh Thelmadatter has helped write articles on topics ranging from Mexican food and drinks to biographies and churches. Today, she works with students to improve Wikipedia’s coverage of Mexico and its culture, and travels across the country in search of information worth sharing.

I think that Wikimedia and similar movements offer at least the idea that we can make more information available more easily to more people.

Leigh Thelmadatter.

Wikidata, coming soon to a menu near you

Wikidata tastydata.svg
The logotype for the Wikidata Menu Challenge. Logo by Offnfopt, freely licensed under CC0 1.0

At Wikimedia Sverige (Sweden), we like food, traveling, and open data. So we started thinking: What can we could do to make life a bit easier for the frequent flyer? With help from Wikidata, members decided to host a menu challenge for restaurants, around the world, to expand their menus with key information. Participants used Wikidata to provide labels, translations, and images.

Wikidata is a collection of structured data that can be edited by computers and people alike. A main focus is of course Wikipedia, but the possibilities are unlimited, which was what we wanted to show with this project.

Wikimedians urge the EU to protect freedom of panorama

Images of the London Eye can be shared online under freedom of panorama rights. Photo by Kham Tran, feely licensed under CC BY-SA 3.0.

Recently, the European Parliament amended recommendations that places restrictions across all European Union member states. The Wikimedia community has mobilized in response.

Update (July 13): On July 9, the European Parliament voted on the Reda Report. The paragraph addressing the Freedom of Panorama was ultimately deleted from the report. This means that for now, nothing has changed: countries that had Freedom of Panorama rights under their domestic laws still have them. Countries that lacked Freedom of Panorama rights under their domestic laws still do not have them.

ACLU files amended complaint on behalf of the Wikimedia Foundation

Albert V Bryan Federal District Courthouse - Alexandria Va.jpg
Blind Justice stands with scales aloft over the Albert V. Bryan United States Courthouse in Alexandria, Virginia. Photo by Tim Evanson, freely licensed under CC BY-SA 2.0.

On March 10th, the Wikimedia Foundation joined a lawsuit against the NSA over its upstream surveillance program. A hearing is scheduled for late September on the government’s recently filed motion to dismiss the lawsuit.

Get the latest Wikipedia updates easily with IFTTT

IFTTT Logo.svg
If This Then That, or IFTTT, introduces new tools to make connecting with Wikipedia’s public data simpler than ever. Logo by IFTTT, public domain.

Wikipedia’s new Channel on IFTTT makes it easier than ever to share free knowledge. Recipes include:

  • Picture of the day: An alert with the Wikimedia Commons picture of the day
  • Article of the day: An interesting article from Wikipedia, chosen daily from among Wikipedia’s best articles
  • Word of the day: The definition of the Wiktionary word of the day
  • New edits to a Wikipedia article: New edits on any Wikipedia page (similar to your watchlist on Wikipedia)
  • New edits from a specific user: New contributions from a specific Wikipedia user
  • New edit with a hashtag in the edit summary: Watch for a hashtag in the edit summary (try a hashtag for your next #editathon!)
  • Article updated in a category: New edits to any Wikipedia page in a category
  • Articles added to a category: Each time an article is added to a category

Andrew ShermanDigital Communications InternWikimedia Foundation

Photo Montage Credits:“Darshan Kandolkar talks about Konkani Wikipedia.webm” by Wikimedia India, freely licensed under CC-BY-SA 4.0.; “File:Leigh Thelmadatter-7697.jpg” by Karen Sayre for the Wikimedia Foundation, released under the CC BY-SA 3.0 license; “London-Eye-2009.JPG” by Kham Tran, feely licensed under CC BY-SA 3.0.; “IFTTT Logo.svg” by IFTTT, public domain; “File:Albert_V_Bryan_Federal_District_Courthouse_-_Alexandria_Va.jpg” by Tim Evanson, freely licensed under CC BY-SA 2.0.; Collage by Andrew Sherman.

Information For versions in other languages, please check the wiki version of this report, or add your own translation there!

by Andrew Sherman at August 12, 2015 09:36 PM

Wiki Education Foundation

Hacking and collaboration at Wikimania

In July, I was in Mexico City for Wikimania, the global Wikimedia community’s annual conference. I was there to kickstart the adaptation of Wiki Ed’s Dashboard system for use beyond our programs, and to catch up with the latest developments in Wikimedia technology ecosystem. I’ll give a quick overview of what I was up to, which involves some work to help groups adapt our tools for other projects, some interesting quality assessment tools for articles, and some news about our plagiarism detection infrastructure.

The Wiki Education Foundation’s programs focus on the United States and Canada. However, the tools we’re building could be very useful to the broader Wikimedia community. They could be adapted for education programs in other countries and languages, as well as other outreach projects, such as Art+Feminism.

During the Wikimania Hackathon — two days of developers collaborating on whatever projects catch their fancy — I worked with two Wikimedia Foundation software engineers, Andrew Green and Adam Wight. We took the first steps toward making dashboards available for other projects. By the end of the hackathon, we had a new version up and running at outreachdashboard.wmflabs.org, which some English Wikipedia edit-a-thons and other outreach programs plan to try out soon.

The next steps for the general community version of the dashboard will be to see how it performs for English Wikipedia projects. We’ll fix any problems that turn up, and then move toward full internationalization. The dashboard interface is already being translated at translatewiki.net. For ideal use across Wikipedia languages, though, we need to update the dashboard’s core to track languages and handle data from multiple language versions of Wikipedia at once. (If you’re interested in hacking on it, and you know some Ruby on Rails, get in touch!)

Another, more serendipitous bit of exploration at Wikimania was based on the “revision scoring as a service” project. Wikimedia Foundation researcher Aaron Halfaker and several collaborators have been developing artificial intelligence techniques to evaluate Wikipedia revisions and diffs. Those AI’s make predictions about whether changes will be reverted, and what rating an article would have on the “Wikipedia 1.0” quality scale. I applied the rating predictions to articles in some of Wiki Ed’s courses from recent terms. I converted the predictions into a numerical score, and charted the change in quality from the beginning of the course to the end. We came up with some really interesting results. Here’s the revision scoring model estimates for the quality of “Antitheatrical prejudice”, a student editor’s project from a fall 2014 theater history course:

graph of antitheatrical prejudice article

The chart tells an interesting story. In the first revisions, the student created a short stub — just a single sentence. Then, in one big edit, it was expanded to a full article. The next significant change in the quality estimate is a drop. What happened there? As it turns out, the images in that initial version were deleted, resulting a big hit to estimated quality. In subsequent revisions, the student editor did a lot of copyediting, which didn’t affect the estimate very much. Eventually, the student editor found some appropriately licensed images to illustrate the article (and bumped up the quality estimate to a new high).

This kind of data could help surface major events in the evolution of an article, and to identify edits worth investigating more closely. We shouldn’t think of it in terms of article quality per se, as it doesn’t understand the meaning, or even the grammar, of the text. But as a measure of structural completeness, it seems to do a pretty good job.

Finally, I had a chance to discuss the future of our plagiarism-checking infrastructure with James Heilman and Eran Rosenthal, who have been working on the EranBot copy-and-paste detection system. The next version will involve a queryable database of checked revisions, which could be used not only by EranBot, but by other services such as the Wiki Ed Dashboard. I’ll be tackling that project in the coming months, and we’re hoping to build on the exciting work that Eran has started.

On all these fronts, I’m really excited about the collaboration that’s coming!

Sage Ross
Product Manager, Digital Services

Photo of Wikimania Hackathon by Ralf Roletschek / fahrradmonteur.de [GFDL 1.2], via Wikimedia Commons.

by Sage Ross at August 12, 2015 08:02 PM

August 11, 2015

Wikimedia Foundation

News on Wikipedia: Google reshuffles into Alphabet, Japan remembers, and more

News on Wikipedia lead image for the week of August 10.png

Here are some of the global news stories covered on Wikipedia this week:

Google to reorganise

Googleplex-Patio-Aug-2014.JPGGoogle’s announcement came as a shock to the tech industry. Image by Jijithecat, freely licensed under CC-BY-SA 4.0.

California-based Google announced plans to reorganise their operations under a new umbrella organization, Alphabet Inc., on August 10. Google co-founders Larry Page and Sergey Brin will serve as CEO and President, respectively, of Alphabet, while Sundar Pichai is set to become CEO of Google. Some of the new organization’s subsidiaries will include Google Inc, Calico, Google Ventures, Google Capital, Google X and Nest Labs. Alphabet will continue to trade using Google’s ticker symbols. The announcement was greeted with surprise in the tech press.

Learn more in these related Wikipedia articles: Alphabet Inc., Google

Japan remembers

HiroshimaPeaceBell7129.jpgThe Bell of Peace in Peace Memorial Park, Hiroshima, serves as a reminder of the incident. Image by Fg2, in the public domain.

Japan this week commemorated the atomic bombings of the cities of Hiroshima and Nagasaki seventy years ago on August 6 and August 9. The bombings, arguably the most infamous incidents in the Pacific War, took place following Japan’s refusal to agree to Allied demands for unconditional surrender following the surrender of Nazi Germany two months previous. The bombings marked the only two occassions where nuclear weapons have been used against human targets, and killed at least 190,000 people in the two cities. There is still debate on whether the bombings were ethically justified.

Learn more in these related Wikipedia articles: Atomic bombings of Hiroshima and Nagasaki, Debate over the atomic bombings of Hiroshima and Nagasaki

New Suez Canal opens

USS America (CV-66) in the Suez canal 1981.jpgThe Suez Canal has been a vital naval passageway for almost 150 years; here, a US aircraft carrier navigates the canal in 1981. Image by W. M. Welch of the U.S. Navy, in the public domain.

An expansion to the Suez Canal, an artificial sea-level waterway in Egypt, was opened on August 6. The extension, dubbed the New Suez Canal, adds a new “lane” for ships to pass each other in opposite directions, something not previously possible due to the narrow nature of the canal. The development should cut ships’ waiting times from 11 hours to 3 hours, and almost double the daily capacity of the canal. Six new tunnels are planned for underneath the canal to connect the Sinai peninsula to the rest of Egypt. The budget of 60 billion Egyptian pounds ($8.4 billion) was funded largely by citizens’ public subscription.

Learn more in the related Wikipedia article: New Suez Canal

Typhoon Soudelor makes landfall

Taipei after Typhoon Soudelor 2015 13.jpgTyphoon Soudelor ripped trees from the ground as it passed over Taiwan. Image by FramaKa, freely licensed under CC-BY-SA 4.0.

Typhoon Soudelor made landfall over Hualien, Taiwan on August 7 as a Category 3-equivalent typhoon. Around 4.85 million households lost power at the storm’s peak, and eight people are now known to have died in the country. The storm spread to eastern China the next day, resulting in the heaviest rains in over a century in parts of the country. 21 people are known to have died, and several are still missing. Altogether, some $1.38 billion worth of damage was caused by the typhoon, which has since degraded to a tropical depression as it passes over China.

Learn more in these related Wikipedia articles: Typhoon Soudelor (2015)

North Korea moves clocks back

The statues of Kim Il Sung and Kim Jong Il on Mansu Hill in Pyongyang (april 2012).jpgNorth Korea argues the move is a step away from imperialism. Image by J.A. de Roo, freely licensed under CC-BY-SA 3.0.

North Korea announced on August 5 that the country will move clocks back half an hour to “Pyongyang Time” at midnight on Saturday, August 15. The Korean peninsula originally made use of UTC+0830, but upon annexation by the Empire of Japan in 1910, they moved to UTC+0900 to match Japan Standard Time. North Korea say the move back thirty minutes is a step away from “imperialism” imposed upon the country by Japan. The intended date of change, August 15, is known as Jogukhaebangui nal (“Fatherland Liberation Day”) in North Korea, celebrating independence from Japan.

Learn more in the related Wikipedia article: Pyongyang Time

Photo montage credits: “Googleplex-Patio-Aug-2014.JPG” by Jijithecat, freely licensed under CC-BY-SA 4.0; “File:Taipei after Typhoon Soudelor 2015 13.jpg” by FramaKa, freely licensed under CC-BY-SA 4.0.; “The statues of Kim Il Sung and Kim Jong Il on Mansu Hill in Pyongyang (april 2012).jpg” by J.A. de Roo, freely licensed under CC-BY-SA 3.0.; Image by W. M. Welch of the U.S. Navy, in the public domain; “HiroshimaPeaceBell7129.jpg” by Fg2, in the public domain; Collage by JSutherland (WMF)

To see how other news events are covered on the English Wikipedia, check out the ‘In the news’ section on its main page.

Joe SutherlandCommunications InternWikimedia Foundation

by Joe Sutherland at August 11, 2015 10:48 PM

Jeroen De Dauw

Wikibase DataModel Services

I’m happy to announce the immediate availability of a new Wikibase library: Wikibase DataModel Services (which I’ll in this blog post refer to as DMS).

Rationale behind the library

The main motivation for introducing this new library is to reduce technical debt and draw more solid architectural boundaries in the Wikibase code.

Near the beginning of the Wikibase project we had all our code in the Wikibase.git repository. This could be subdivided into 3 big parts: Wikibase Repoistory (MediaWiki extension), Wikibase Client (MediaWiki extension) and Wikibase Lib. The later got introduced to hold code needed by both Wikibase Client and Wikibase Repository. While on the surface that was a reasonable idea, a lot of the things that could go wrong with it unfortunately did. No real boundaries where created in the code, resulting in a tightly coupled blob that even included global state and circular dependencies on Wikibase Client and Wikibase Repository. There also was no contract for what should or can go into Wikibase Lib. Things needed by only one of the MediaWiki extensions where added, on the grounds of them being potentially useful elsewhere, while similar code was left in one of the extensions.

About a year into the project we realized that things where going into the wrong direction. Even though there was some disagreement on the extend of the problem, the consensus to get rid of Wikibase Lib emerged over time. The first big step there was the extraction of Wikibase DataModel, which resulted in the moved out code improving in quality significantly more than what was left behind. This set the stage for the creation of additional components such as the Wikibase DataModel Serialization one that has replaced a chunk of Wikibase Lib code. Unfortunately a lot of the code in Wikibase Lib does not belong to a whole as cohesive as the serialization component. This has led to such code not being moved out, and indeed, has resulted in many new classes being added to Wikibase Lib over time, all but negating extraction work.

This is not that hard to understand when considering the dilemma faced by people introducing new code needed by both Wikibase Client and Wikibase Repository. Either it needs to go into Wikibase Lib, or a component further down the dependency graph such as Wikibase DataModel. I’ve certainly added several classes to Wikibase DataModel since that seemed to be the best place to put them at the time, polluting this component with DataModel related services that nevertheless are not needed for defining the data model itself. A third place where code inappropriately found its home are the MediaWiki extensions themselves. Code dealing with domain logic or otherwise being application independent is best kept devoid of framework binding.

All of this taken together suggested the creation of a new library sitting between Wikibase DataModel and the MediaWiki extensions. A library to collect the functionality for which no cohesive component can be created, or for which such creation is not justified. One concern comes up right away: won’t this new general library become a dump ground and ball of mud like Wikibase Lib? To avoid any such fate, we carefully defined the requirements code must satisfy before it is allowed into the component. Such code must…

  • Be using Wikibase DataModel
  • Not belong to a more specific component (such as the Serialization component)
  • Not introduce heavy dependencies to this component (database, framework, big libraries, etc)
  • Not be presentation code

Current state: Wikibase DataModel Services 1.1

This library is of particular interest to third parties as it makes code that used to be bound to the Wikibase Client and Wikibase Repository MediaWiki extensions reusable. The following list contains the newly available classes and interfaces:

  • DataValue\ValuesFinder
  • Entity\EntityPrefetcher
  • Entity\EntityRedirectResolvingDecorator
  • Entity\NullEntityPrefetcher
  • EntityId\EntityIdFormatter
  • EntityId\EntityIdLabelFormatter
  • EntityId\EscapingEntityIdFormatter
  • EntityId\PlainEntityIdFormatter
  • EntityId\SuffixEntityIdParser
  • Lookup\EntityLookup
  • Lookup\EntityRedirectLookup
  • Lookup\EntityRetrievingDataTypeLookup
  • Lookup\EntityRetrievingTermLookup
  • Lookup\LabelDescriptionLookup
  • Lookup\LanguageLabelDescriptionLookup
  • Lookup\TermLookup
  • Statement\StatementGuidValidator
  • Tern\PropertyLabelResolver
  • Term\TermBuffer

These have all been moved from Wikibase Lib. DMS also contains code that uses to be in Wikibase DataModel, and got moved out in version 4.0:

  • Entity diffing and patching functionality in Services\Diff
  • EntityIdParser and basic implementations in Services\EntityId
  • ItemLookup, PropertyLookup and PropertyDataTypeLookup interfaces
  • Statement GUID parser and generators in Services\Statement
  • ByPropertyIdGrouper

Into the future!

The 20 or so classes and interfaces we moved from Wikibase Lib are just the start. We’re taking an incremental approach to moving over the code to avoid needing to maintain two copies and synchronize changes from Wikibase Lib to their moved copies. So new releases with additional functionality can be expected in the near future.

As with the other Wikibase libraries, contributions are very welcome, and can be done without much setup work or the need to understand our entire codebase. You can find instructions of how to install the library and run its tests in its README file. Changes relevant to users of the library are always mentioned in the RELEASE-NOTES.

by Jeroen at August 11, 2015 03:48 PM

Gerard Meijssen

#Wikidata - #Pen #awards

The many chapters of Pen International confer many awards. Mr Mazin Darwish now has his 2014 award and would it not be fun to have a query that shows all the people who ever were awarded one of the many, many "Pen awards".

First, all the chapters have to be part of Pen International, then all the awards have to be conferred by a Pen chapter and finally all the people have to be recognised as honored with one of the Pen awards.

This is something that is of interest, it is awarding and, why not.

It is much better than following the "instructions" on solving the "garbage" that is the honorary university degrees and doctorates. I am told to find sources for people who have an honorary doctorate or whatever and add sources that provide credence to such a statement. It may be a solution but it is a solution that does not scale.

To be honest, I cannot be bothered. When Wikidata in its infinite wisdom does not have a way to deal with contaminated data, it has a bigger problem, it makes me doubt all existing statements. All that is needed to cope with such issues is a way to flag data for being "suspicious".

With known "no good" data, you invite people to participate in providing a solution. The proposed solution however is not my cup of tea; it is not what I do. I cannot be bothered.

by Gerard Meijssen (noreply@blogger.com) at August 11, 2015 10:49 AM

Wikimedia Foundation

My father’s railroad photographs now benefit the world, free of charge

Plymouth, MI C&O Jul 1977 10-31.jpg
Michael Barera’s father and grandfather were both prolific railroad photographers, and now their imagery is accessible to the world. Here, the Chessie Steam Special with ex-Reading Railroad #2101 pauses at Plymouth, Michigan (July 1977). Photo by Lawrence and David Barera, freely licensed under CC BY-SA 2.0.

Once when I was young, growing up in the 1990s, my father pulled his collection of railroad slides out from the basement, set up his projector, and shared a glimpse into American railway history with our family. I was too young to remember the slides distinctly, but I do remember being really impressed by them.

Many years later, a sequence of seemingly unrelated events would lead me back to these slides and a vision for digitizing them. In 2013, while I was the Wikipedian in Residence at the Gerald R. Ford Presidential Library, I met Edward Vielmetti for a conference panel on the relationship between wikis and libraries. Before the panel, he introduced me to ArborWiki, a LocalWiki for all things related to Ann Arbor, Michigan. Then, while I was attending an ArborWiki meetup in 2014, I met David Erdody, who runs an analog-to-digital media conversion service called A2Digital. After learning that he had the equipment and expertise necessary to digitize slides, I immediately thought back to my father’s collection and the possibility of digitizing it.

An Ann Arbor Railroad Steam Special in Ann Arbor, Michigan (circa 1950). Photo by Lawrence and David Barera, freely licensed under CC BY-SA 2.0.

The slides themselves were taken both by my father (David) and his father (my grandfather, Lawrence). Most were created in the Midwestern United States, especially Michigan, Indiana, and Illinois, chiefly during the 1960s and 1970s (although one photograph of an Ann Arbor Railroad Steam Special dates back to circa 1950). Their featured subjects are largely passenger trains, and due to their dates of creation they document both the last decade of private passenger rail service in the United States and the early years of Amtrak.

According to my father, the majority of the photographs were taken by my grandfather, who was an avid amateur photographer; however, both my father and my grandfather would often go railfanning together, making it impossible to discern who took each individual photograph in most cases. For this reason, all of the digitized photographs credit “Lawrence and David Barera” as the photographer. However, because my father is my grandfather’s legal heir, he controlled all of the copyrights to the entire collection, including for those photographs taken by his father.

I eventually decided to have the slides digitized as a Father’s Day gift for my Dad, after which I agreed to terms with Erdody and handed off all of the slides to him. Initially, I thought about this project as simply a way to make the slides conveniently accessible to my father, and after receiving the digital surrogates from Erdody I began uploading them to Flickr for this purpose in May 2014. While doing this, I realized that there was tremendous potential for the further sharing of these digitized photographs, so I asked my father if he would be willing to release them under a Creative Commons license so I could also upload them to Wikimedia Commons. He graciously agreed to this proposal and released them freely under the CC BY-SA 2.0 license, which is conveniently supported by Flickr.

Chicago South Shore and South Bend interurban #102 street running in South Bend, Indiana (August 1962). Photo by Lawrence and David Barera, freely licensed under CC BY-SA 2.0.

My father’s motivation for freely licensing these images was rooted in the fact that his slides had been underused prior to their digitization; in his own words, they had been “tucked away with other family artifacts” and only ever brought out of storage “every dozen years or so”. Further explaining his rationale, he noted that “I was proud of the quality of most of the photos, and thought there was no better way to honor the work of my father than to make his photos available for public use.”

In less than a year since they were uploaded, my father’s donation of 146 original images (now 151 total files, including retouched derivatives) to Wikimedia Commons has certainly benefited the Wikimedia community, as over 10% have already been added to Wikipedia articles (chiefly but not exclusively on English Wikipedia). Interestingly but not surprisingly, my father’s decision to freely license these images has also benefited him directly in the form of both subject identification and color correction, largely thanks to Wikimedians Mackensen and MagentaGreen, respectively. By voluntarily releasing his collection of railroad slides into the commons, my father has benefited from the volunteered efforts of other users while also enriching the content of both Wikimedia Commons and Wikipedia.

An Amtrak RTG Turboliner at Ann Arbor, Michigan (May 1975). Photo by Lawrence and David Barera, freely licensed under CC BY-SA 2.0.

In light of my experience with this digitization project, I believe that motivations for freely licensing older analog personal photographs are very similar to those for contemporary digital photographs, including the motivations that catalyzed my own personal photographic contributions to Wikimedia Commons back in the mid to late 2000s. The economics of their creation appear to be essentially the same, necessitating only a camera and the desire and ability to take photographs, often as a hobby; I believe that this makes the amateur analog photographer’s decision to freely release his or her images very similar to the equivalent decision made by contemporary amateur digital photographers.

The major challenge, however, is the cost and equipment required to digitize these images before they can be uploaded or freely licensed. While the cost is not insignificant, it was not prohibitive in my case—$0.50 per slide, which made it a feasible and affordable gift idea.

From my personal experience I would certainly recommend A2Digital, although according to Erdody it is a “strictly local service”; while he told me that he would be willing to “receive materials by mail from anywhere”, he also described the idea of mailing slides or similar analog materials back and forth for digitization as “very risky” (emphasis in the original). As a protective measure, he recommends that his customers deliver their materials to him by hand, which is precisely what I did. While this worked perfectly well for me as an Ann Arbor resident, it simply will not suffice for the rest of the world.

The Santa Fe‘s Grand Canyon Limited at Joliet, Illinois (August 1963). Photo by Lawrence and David Barera, freely licensed under CC BY-SA 2.0.

Due to the fragility of the medium in question, successfully digitizing slides nonetheless requires, as Erdody terms it, “a grassroots solution”. Asking your local library or historical society about how they digitize slides or negatives is probably the best place to start. Although not terribly common, according to Erdody, some libraries do provide lists of their digitization vendors; an example is the state-run Library of Michigan in Lansing, which maintains this webpage on the subject.

Perhaps the easiest way to locate such a service, however, is to simply search the Internet for “slide digitization” and the name of your city, town, or the nearest metropolitan area. However you find a slide digitizer, I’d highly recommend that you explore the possibility of digitizing any slides you may have of potential interest to Wikipedia and other Wikimedia projects.

In terms of the final results in my case, I think that my father said it best: “I know my Dad would be pleased and proud to know that his work was finally being enjoyed and appreciated by railfans (and others) all over the world.”

Michael Barera
English Wikipedia editor

This blog post was originally published in the Signpost, a news journal about the English Wikipedia and the Wikimedia community. It was lightly edited and several images removed for publication on the Wikimedia Blog.

by Michael Barera at August 11, 2015 08:51 AM

Gerard Meijssen

#Wikidata - #Free Mazin Darwish

It is satisfying to learn that Mr Darwish has been freed. He was jailed since February 2012 and, the BBC has it that we was freed. It mentions that he is the director of the Syrian Centre for Media and Freedom of Expression (SCM) and received many international awards.

Wikidata already knew about a few of those awards, finding more awards was a matter of reading the three Wikipedia articles. It is just a matter of doing the research. One of the awards Mr Darwish received was the PEN Pinter Prize in 2014. However, the Wikipedia article calls it the "Pinter International Writer of Courage Award". This award is not listed on the "List of PEN awards".

There is a reason to celebrate. Mr Darwish is free. It is satisfying to see that a lot of information is already there. Working on the data that exists on Mr Darwish connects him with more people sharing similar connections.

Every day there is someone who is worthy of attention. I can do this, you can do this. It is how Wikidata gains relevance. Relevance because it is information available for use in any language including Arabic.

by Gerard Meijssen (noreply@blogger.com) at August 11, 2015 08:26 AM

This month in GLAM

This Month in GLAM: July 2015

by Admin at August 11, 2015 12:39 AM

August 10, 2015


Outreach – What is it good for?

A few months ago stared transferring images of species form Flickr to Wikimedia Commons which had an acceptable license. However rewarding this might be for the grand scheme of free knowledge and information, it feels terrible to see images which has more restrictive licenses.

I took it upon myself to contact a few of these Flickr-photographers and asked them nicely if they could agree to change their licenses to a more open and free license (of course I used the ‘can be used in Wikipedia‘-aspect), and one of them actually responded to me and agreed to change their licenses on all their ~11 000 images of species.

These images can now be found in Category:Photographs by Bernard Dupont.

This only goes to show that outreach is the best way to help the free knowledge movement. People want to help out and have their material shared and used, all it takes it a push in the right direction.

by Jonatan Svensson Glad (Josve05a; @JonatanGlad) at August 10, 2015 08:09 PM

Wiki Education Foundation

Debut of our new tools in-person at ASPB conference

Wiki Ed has been eager to show off the course design and monitoring tools that we launched last month. We had our first chance to do so in late July, when Outreach Manager Samantha Erickson and I attended the American Society for Plant Biologists (ASPB)’s annual conference in Minneapolis.

ASPB is recommending Wikipedia assignments to its members, as they wish to expand the quality, depth, and breadth of content and experiential learning opportunities on Wikipedia about plant science. ASPB Education Coordinator Katie Engen believes we can increase our impact on Wikipedia content by bringing more instructors who are members of ASPB into Wiki Ed’s programs. During the conference, she helpfully introduced us to members whose courses she thinks will be a good fit in our Classroom Program.

We met with about 75 instructors at ASPB’s education booth, many of whom tried out our course creation tools and dashboards on a laptop. People were excited, and many explicitly mentioned an interest in getting students to contribute to public scholarship, especially when they realized the dashboard highlights page views of the articles students have created or expanded. Seeing our tools in action helped instructors see how easily they can set up a Wikipedia assignment and track student work. Watching people react to these tools has convinced me that they’re going to make an enormous positive impact on our courses.

This was a conference about plant science, so we highlighted the upcoming Wikipedia Year of Science initiative. Instructors told us that Wikipedia’s coverage of plants is fairly good, but it can use improvement from a communications perspective—particularly in making existing articles more accessible to non-scientists. Students completing a Wikipedia assignment follow key practices of a communication-intensive assignment by identifying their audience, writing drafts, reviewing their peers’ work, and revising their contributions. We think this is a great way for students to improve Wikipedia’s coverage of plant science during the Year of Science, and we look forward to collaborating with ASPB to improve Wikipedia.

Jami Mathewson
Educational Partnerships Manager

by Jami Mathewson at August 10, 2015 05:41 PM

Content Translation Update

Content Translation Now Creates Articles With Cleaner Wiki Syntax

One of the frequent complaints about Content Translation is that it creates articles with dirty wiki syntax, which experienced wikipedians need to clean up.

We are glad to report that we just deployed a change that will make wiki syntax considerably cleaner, by using the scrubWikitext technology. It is already used in the Visual Editor for making wiki syntax cleaner. It’s being actively developed and now it’s shared with ContentTranslation, so ContentTranslation users will enjoy all the new improvements that will happen in VisualEditor in this regard.

by aharoni at August 10, 2015 03:17 PM

Wikimedia Foundation

Sharing a million photographs

Dodo detail from Atlas de Zoologie.jpg
One millionth image—the head of a Dodo. Illustration from “Atlas de Zoologie” (1844) by Paul Gervais. Originally scanned by the Natural History Museum, London, public domain

Over the past three years, my volunteer time has been devoted to releasing free cultural and historical imagery on the Wikimedia Commons. My part-time hobby—relying on a cheap netbook and an old but trusty Macmini home desktop—has reached 1,000,000 diverse and high quality images for public reuse, carrying with it a massive long term educational impact. The milestone makes this a good moment to highlight a handful of interesting projects and gives an insight into the experience of being a Wikimedia Commons batch uploader, pointing to the methods used to help anyone interested in having a go.

Watercolour painting, India 1825. Painting by Los Angeles County Museum of Art, public domain.

Los Angeles County Museum of Public Art

The LACMA upload of 25,000 high resolution photographs of museum artwork was started in July 2013, and 500 volunteers have contributed to the categorization and reuse of images. The upload relied on a custom Python program to take information from the LACMA website and create the image page text, with some catalogue entries having several useful photographs of the same museum object. Most of the programming time was spent debugging how to get Japanese and Chinese characters used in the museum’s descriptions to display correctly on Commons. Anyone using Python to take data from international websites is going to face the challenge of changing formats between different web standards and languages.

As this was my first serious attempt to use Python scripts to release a large number of images, it took about three months of experimenting and testing before I felt safe enough to do a final run; I was helped by the “beta” version of Commons, which is a safe “non-production” space where your uploads and changes will not harm other projects.

A current experiment reusing these uploaded images, attempts to test the public impact of posting the collection to this Flickr group (using Flickr’s free programming interface). The experiment checks which images are most popular, or reused, compared to those hosted on Wikimedia Commons, and whether this form of sharing results in viewers following the links back to Commons. It is hoped to demonstrate the value of co-releasing Wikimedia Commons media with good quality metadata on Flickr and/or other free channels hosting image, audio and video.

A kiss celebrating “marriage equality decision day”, 26 June 2015. Photo by Elvert Barnes, freely licensed under CC BY-SA 2.0.

LGBT Free Media Collective

The idea of the LGBT Free Media Collective was to encourage more uploads of LGBT-related free media of historic and cultural interest, as LGBT culture is under-represented on Wikimedia Commons; there are few archives of photographs with expired copyright that are relevant. The volunteer network and its IRC channel, which started in this 2012 event, was a precursor to formalizing the Wikimedia LGBT+ official user group and supporting the series of highly successful Wiki Loves Pride events around the world—an annual event that continues today.

Uploads included several thousand photographs from a large number of Flickr accounts, ensuring excellent global coverage of the LGBT Pride movement’s impact. Methods have included using the simple Flickr2commons tool through to custom uploads relying on the Flickr API.

If you have taken photographs of LGBT+ cultural events, this gives an easy way of sharing them and getting better public reuse than just keeping your best photographs on Flickr or Facebook.

Saint Mark, 1495. Photo by Wellcome Images, freely licensed under CC BY 4.0.

Medical history scans from Wellcome Images

At the end of 2014, 100,000 high resolution images from the Wellcome Library—an institution devoted to the history of medicine—were released on Wikimedia Commons after several meetings and discussions I had with the library over a period of two years. The library changed from defaulting to a non-commercial copyright restriction to allowing full free use for all of its scanned historic medical images. The in-person meetings had another benefit as well: when it came time to mass upload the images, they were kind enough to provide me with a hard disk with over 300 GB of files to avoid file-by-file manual downloads (along with their website’s bot-resisting formatting). For an upload of this size, it is possible for the Wikimedia Foundation to upload files directly. In this particular case, a disk was lost in the mail, so to avoid any other mishaps, I used my home broadband connection and netbook. They took around two months (eight weeks) to upload.

You can read more about this project in my blog post from last January. More recently, Wikimedia UK gave the Wellcome Library a “Wiki
partnership of 2015” award, partially in recognition of the importance
of this project.

Waterless toilets, Valley View University, Ghana. Photo by Wolfgang Berger, freely licensed under CC BY 2.0.

Sustainable Sanitation Alliance (SuSanA)

This upload is still ongoing: it runs a couple of times a year on request as more photographs are published on SuSanA’s Flickr account. Over 10,000 photographs have been released on Wikimedia Commons, with over 200 volunteers helping to categorize them. This partnership is a good example of how Wikimedia volunteers can work to the benefit of unrelated organizations, NGOs (non-governmental organizations), charities etc. at zero-cost, increase the educational value of Wikimedia Commons, and help illustrate health related Wikipedia articles. SuSanA is a loose network or alliance of organizations active in the field of sustainable sanitation. The photographs are for example showing how to build low-cost hygienic and sustainable toilet facilities in developing countries, ensuring that some of the poorest people in the world are safer from disease, have access to unpolluted water and recycle their excreta to create more fertile soil for farming.

You can find the source code for free reuse here on github.

Golden Gate Bridge, 1984. Photo by Library of Congress, public domain.

Historic American Buildings Survey

The largest single upload project happened when I was exploring different high-quality photograph collections in the Library of Congress archives.

The HABS archive is maintained by the U.S. National Parks Service, being a continuous set of survey records and photographs spanning over a hundred years for buildings and sites of historic interest in America. 300,000 photographs were uploaded, along with their map coordinates, with significant testing going into ensuring suitable categories were added for the different sites, along with most of the information carried over from the Library of Congress catalogues. Fortunately there is a consistent system of site numbering (the National Register of Historic Places) and this reduces confusion for how best to name or structure categories.

The GLAMwiki toolset was newly available to perform the huge number of image file uploads, so this became a flagship example of how large uploads could use the tool. Most of the “real work” is structuring the metadata that goes to create the image text pages, but not having to pass all the images through my home broadband is a great improvement!

The images have been useful and popular, especially due to the cross-over with Wiki Loves Monuments, so that new photographs can be compared with archive shots from decades earlier. An amazing 1,400 Commons volunteers have supported the project with categorization and improvements.


Getting to a million educational images—which is 4% of all of the files on Wikimedia Commons—has been a personal marathon. I have been busy creating new tools, learning how to write in Python, navigating the Wikimedia Commons API and improving project guidelines. Still, this is a hugely rewarding hobby; you could have the chance to become part of an open knowledge community, the experience of working with major educational and cultural institutions, and seeing your volunteer time have direct outcomes to improve educational resources world-wide. For these selfish reasons, I hope this is just my first million!

If these case studies have whetted your appetite for contributing to Wikimedia Commons, there is a set of helpful links here. When you are ready to try uploading larger collections of files, please first read the guide to batch uploading to avoid frequent pitfalls!

/ Ashley Van Haeften
Wikimedia Commons volunteer

Editor’s note: this post was updated after publication due to community comments.

by Ashley Van Haeften (Fæ) at August 10, 2015 08:52 AM

Sumana Harihareswara

How To Improve Bus Factor In Your Open Source Project

Someone in one of my communities was wondering whether we ought to build a new automated tool to give little tasks to newcomers and thus help them turn into future maintainers. I have edited my replies to him into the How To Build Bus Factor For Your Open Source Project explanation below.

In my experience (I was an open source community manager for several years and am deeply embedded in the community of people who do open source outreach), getting people into the funnel for your project as first-time contributors is a reasonably well-solved problem, i.e., we know what works. Showing up at OpenHatch events, making sure the bugs in the bug tracker are well-specified, setting up a "good for first-timers" task tag and/or webpage and keeping it updated, personally inviting people who have reported bugs to help you solve them, etc. If you can invest several months of one-on-one or two-on-one mentorship time, participate in Google Summer of Code and/or Outreachy internship programs. If you want to start with something that's quantitative and gamified, consider using Google Code-In as a scaffold to help you develop the rest of these practices.

You need to quickly thank and give useful feedback to people who are already contributing, even if that feedback will include criticism. A fast first review is key, and here's a study that backs that up. Slide 8: "Most significant barrier to engaging in onramping others is unclear communications and unfriendly community. Access to the right tools has some effect." Slide 26:

"Contributors who received code reviews within 48 hours on their first bug have an exceptionally high rate of returning and contributing.
Contributors who wait longer than 7 days for code review on their first bug have virtually zero percent likelihood of returning.
Showing a contributor the next bug they can work on dramatically improves the odds of contributing."
(And "Github, transparency, and the OTW Archive project" discusses how bad-to-nonexistent code review and bad release management led to a volunteer dropping out of a different open source project.)

In my opinion, building bus factor for your project (growing new maintainers for the future) is also a solved problem, in that we know what works. You show up. You go to the unfashionable parts of our world where the cognitive surplus is -- community colleges, second- and third-tier four-year colleges, second- and third-tier tech hubs, boring enterprise companies. You review code and bug reports quickly, you think of every contributor (of any sort) as a potential co-maintainer, and you make friendly overtures to them and offer to mentor them. You follow OpenHatch's recommendations. You participate in Google Summer of Code and/or Outreachy internship programs.

Mentorship is a make-or-break step here. This is a key reason projects participate in internship programs like GSoC and Outreachy. For example, Angela Byron was a community college student who had never gotten involved in open source before, and then heard about GSoC. She thought "well it's an internship for students, it'll be okay if I make mistakes". That's how she got into Drupal. She's now a key Drupal maintainer.

paper curlicues and other papercraft surrounding a copy of Norbert Wiener's Cybernetics Dreamwidth, an open source project, started with two maintainers. They specifically decided to make the hard decision to slow down on feature development, early on, and instead pay off technical debt and teach newcomers. Now they are a thriving, multimaintainer project. "dreamwidth as vindication of a few cherished theories" is perhaps one of my favorite pieces on how Dreamwidth did what it did. Also see "Teaching People to Fish" and this conference report.

Maintainers must review code, and that means that if you want someone to turn into a maintainer in your project, you must help them learn the skill of code review and you must help them get confident about vetoing and merging code. In my experience, yes, a good automated test suite does help people get more confident about merging changes in. But maintainers also need to teach candidates what their standards ought to be, and encourage them (many contributors' first thought when someone says "would you want to comaintain this project with me?" is "what? me? no! I'm not good enough!"). Here's a rough example training.

If you want more detailed ways to think about useful approaches and statistics, I recommend Mel Chua's intro to education psychology for hackers and several relevant chapters in Making Software: What Really Works and Why We Believe It, from O'Reilly, edited by Greg Wilson & Andy Oram. You'll be able to use OpenHub (formerly Ohloh) for basic stats/metrics on your open source project, including numbers of recent contributors. And if you want more statistics for your own project or for FLOSS in aggregate, the open source metrics working group would also be a good place to chat about this, to get a better sense of what's out there (in terms of dashboards and stats) and what's needed. (Since then: also see this post by Dawn Foster.)

We know how to do this. Open source projects that do it, that are patient with the human factor, do better, in the long run.

August 10, 2015 02:52 AM

Tech News

Tech News issue #33, 2015 (August 10, 2015)

TriangleArrow-Left.svgprevious 2015, week 33 (Monday 10 August 2015) nextTriangleArrow-Right.svg
Other languages:
čeština • ‎English • ‎español • ‎suomi • ‎français • ‎עברית • ‎Ripoarisch • ‎português • ‎português do Brasil • ‎русский • ‎svenska • ‎українська • ‎Tiếng Việt • ‎中文

August 10, 2015 12:00 AM

August 09, 2015

Gerard Meijssen

#Wikidata - #Google on Morton Mintz

Mr Mintz is contrary to what Google has to say about him not born in 1946. It is impossible when the Wikipedia article is correct in stating that "in his early years (1946–1958) reported for two St. Louis, Missouri newspapers, the Star-Times and the Globe-Democrat".

This is exactly one of those exceptions where sources matter. Typically Google will have it right and in my opinion only when sources differ it becomes relevant to provide a source for the provided information.

Mr Mintz was born on 26 January 1922 and, he wrote this himself in an essay on ahbj.org. I expect that Google will pick up on this information. It is interesting to learn from where they pick this up. Will it be from Wikidata, from the source I provided above or from this blogpost.

I will be most happy when it is Wikidata. It would beg the question if Google would be interested in reporting on differences it is aware of. For me such flags will improve our quality rapidly. Concentrating on differences will have a huge impact and not only at Wikidata.

by Gerard Meijssen (noreply@blogger.com) at August 09, 2015 10:02 PM

#Wikidata - #corroboration and #sourcing

The problem with sources available on statements in Wikidata is that even when they are by definition the source of a statement, it is not what we understand a source to be. When I use tools to add statements to Wikidata based on lists and categories from a Wikipedia, that Wikipedia is my source. My tools do not help me add this fact so I do not add Wikipedia as a source. Other tools do and consequently there are some 20 million statements sourced in this way.

When no source is available, a statement can be corroborated by finding identical information in an external source. The difference is important. The external source is no source proving the veracity or the origin of the fact, it merely indicates that it does not differ. Corroboration is important, it does improve the likelihood that a statement is correct. It adds a notion of quality.

Wikidata items often refer to many external sources. Only when a fact new to Wikidata is added as a statement from one of these sources, the external source IS the source.

Some external sources provide information with the authority of a respected organisation. When the RKD Netherlands Institute for Arts History indicates that Nora van de Vlier received the Willink van Collen Prize in 1954 I would consider it a source and happily accept it as a source for a new statements in Wikidata. When such information is from DBpedia or Freebase, I would appreciate more references at a later date.

When it is not the original source the only thing I care to know is that there is no discrepancy between the data provided and the data available at the external sources. When external data is pushed into Wikidata as a reference, it could easily be considered a fraud. It is certainly clutter.

by Gerard Meijssen (noreply@blogger.com) at August 09, 2015 10:00 PM

Wikimedia India

Natural World Edit-a-thon Bangalore

wiki1 wiki8 wiki4 wiki9 wiki5 wiki7

Wikipedians in Bengaluru and neighbouring cities with an interest in the natural world conducted an edit-athon in the city to improve articles on the plants, animals, geology and other aspects of nature on Wikipedia. Helping identify and obtain reliable sources, learning to structure an article, paraphrase, cite and polish up articles was also on the agenda. The workshop was aided by wildlife enthusiasts User:Chinmayisk  and User:Shyamal. Noted naturalists Dr.Subramanya and  Dr.Vijay Barve , Wikimedia Program Director Ravishankar and Wikipedian Dinesh Kumar interacted with the participants.

Excerpts of the Interview from the Hindu Article http://www.thehindu.com/features/metroplus/society/on-how-wiki-editathon-enhances-articles-related-to-nature/article7507471.ece

Talking about the workshop, Chinmayi said, “The focus was on enhancing wiki articles related to the natural world. We discovered that most articles about flora and fauna are either faulty or have become outdated. In this workshop, we have bought together a host of naturalists and wiki enthusiasts. We hope that the workshop will help them learn more about using edit functions on Wikipedia and get a forum to share nature-related information and media. In this workshop, the participants are building new pages and updating older ones in real time. Most participants are regular users of wikipedia, but are editing articles for the first time.”

Wikimedia project manager Ravishankar contends, “We are also looking at popularising the use of Wikipedia in regional languages. In the realm of wildlife, we find that many Indian animals do not have much information uploaded on wiki pages. This anomaly is caused since English is not the first language of most Indians. If we make Wikipedia in local languages more popular, more information will make it to the public domain. The response to this edit-athon has been very encouraging. We want to conduct such events across the country. Having a broad range of contributors and content is necessary to achieve our goal. These events bring more contributors and help add more content ”

Wildlife enthusiast and one of the organisers of the event, Shyamal says, “Most people who participate in such events are motivated to learn about the natural world. We lack a lot of information on nature, flora and fauna. Wiki needs authentic scholarly information and these workshops aim at putting this information out in the public domain. Most of the information is available only in expensive and rare books or science journals. This is an attempt to ensure that a lot of this information comes in the public domain.”

Subramanian Sankar, an avid birder came from Chennai to attend the event. “It is very informative. I have learnt a lot. I think that these workshops will go a long way in improving the quality of information available on Wikipedia.”

The list of articles that were edited during the workshop

On the whole the workshop  had a positive impact and was well appreciated by those who attended it.

by poornima at August 09, 2015 05:37 PM

Brion Vibber

Video decoding in the JavaScript platform: “ogv.js, how U work??”

We’ve started deploying my ogv.js JavaScript video/audio playback engine to Wikipedia and Wikimedia Commons for better media compatibility in Safari, Internet Explorer and the new Microsoft Edge browser.

“It’s an older codec, but it checks out. I was about to let them through.”

This first generation uses the Ogg Theora video codec, which we started using on Wikipedia “back in the day” before WebM and MP4/H.264 started fighting it out for dominance of HTML5 video. In fact, Ogg Theora/Vorbis were originally proposed as the baseline standard codecs for HTML5 video and audio elements, but Apple and Microsoft refused to implement it and the standard ended up dropping a baseline requirement altogether.

Ah, standards. There’s so many to choose from!

I’ve got preliminary support for WebM in ogv.js; it needs more work but the real blocker is performance. WebM’s VP8 and newer VP9 video codecs provide much better quality/compression ratios, but require more CPU horsepower to decode than Theora… On a fast MacBook Pro, Safari can play back ‘Llama Drama’ in 1080p Theora but only hits 480p in VP8.

Llama drama in Theora 1080p

That’s about a 5x performance gap in terms of how many pixels we can push… For now, the performance boost from using an older codec is worth it, as it gets older computers and 64-bit mobile devices into the game.

But it also means that to match quality, we have to double the bitrate — and thus bandwidth — of Theora output versus VP8 at the same resolution. So in the longer term, it’d be nice to get VP8 — or the newer VP9 which halves bitrate again — working well enough on ogv.js.

emscripten: making ur C into JS

ogv.js’s player logic is handwritten JavaScript, but the guts of the demuxer and decoders are cross-compiled from well-supported, battle-tested C libraries.

Emscripten is a magical tool developed at Mozilla to help port large C/C++ codebases like games to the web platform. In short, it runs your C/C++ code through the well-known clang compiler, but instead of producing native code it uses a custom LLVM backend that produces JavaScript code that can run in any modern browser or node.js.

Awesome town. But what are the limitations and pain points?

Integer math

Readers with suitably arcane knowledge may be aware that JavaScript has only one numeric type: 64-bit double-precision floating-point.

This is “convenient” for classic scripting in that you don’t have to worry about picking the right numeric type, but it has several horrible, horrible consequences:

  1. When you really wanted 32-bit integers, floating-point math is going to be much slower
  2. When you really wanted 64-bit integers, floating-point math is going to lose precision if your numbers are too big… so you have to emulate with 32-bit integers
  3. If you relied on the specific behavior of 32-bit integer multiplication, you may have to use a slow polyfill of Math.imul

Luckily, because of #1 above, JavaScript JIT compilers have gone to some trouble to optimize common integer math operations. That is, JavaScript engines do support integer types and integer math, just you don’t know for sure when you have an integer at the source level.

Did I say “luckily”? 😛

So this leads to one more ugly consequence:

  1. In order to force the JIT compiler to run integer math, emscripten output coerces types constantly — “(x|0)” to force to 32-bit int, or “+x” to force to 64-bit float.

This actually performs well once it’s through the JIT compiler, but it bloats the .js code that we have to ship to the browser.

The heap is an island

Emscripten provides a C-like memory model by using Typed Arrays: a single ArrayBuffer provides a heap that can be read/written directly as various integer and floating point types.


Because all pointers are indexes into the heap, there’s no way for C code to reference data in an external ArrayBuffer or other structure. This is obviously an issue when your video codec needs to decode a data packet that’s been passed to it from JavaScript!

Currently I’m simply copying the input packets into emscripten’s heap in a wrapper function, then calling the decoder on the copy. This works, but the extra copy makes me sad. It’s also relatively slow in Internet Explorer, where the copy implementation using Uint8Array.set() seems to be pretty inefficient.

Getting data out can be done “zero-copy” if you’re careful, by creating a typed-array subview of the emscripten heap; this can be used for instance to upload a WebGL texture directly from the decoder. Neat!

But, that trick doesn’t work when you need to pass data between worker threads.

Workers of the JavaScript world, unite!

Parallel computing is now: these days just about everything from your high-end desktop to your low-end smartphone has at least two CPU cores and can perform multiple tasks in parallel.

Unfortunately, despite half a century of computer science research and a good decade of marketplace factors, doing parallel programming well is still a Hard Problem.

Regular JavaScript provides direct access to only a single thread of execution, which keeps things simple but can be a performance bottleneck. Browser makers introduced Web Workers to fill this gap without introducing the full complexities of shared-memory multithreading…

Essentially, each Worker is its own little JavaScript universe: the main thread context can’t access data in a Worker directly, and the Worker can’t access data from the main context. Neither can one thread cause the other to block… So to communicate between threads, you have to send asynchronous messages.

This is actually a really nice model that reduces the number of ways you can shoot yourself in the foot with multithreading!

But, it maps very poorly to C/C++ threads, where you start with shared memory and foot-shooting and try to build better abstractions on top of that.

So, we’re not yet able to make use of any multithreaded capabilities in the actual decoders. :(

But, we can run the decoders themselves in Worker threads, as long as they’re factored into separate emscripten subprograms. This keeps the main thread humming smoothly even when video decoding is a significant portion of wall-clock time, and can provide a little bit of actual parallelism by running video and audio decoding at the same time.

The Theora and VP8 decoders currently have no inherent multithreading available, but VP9 can so that’s worth looking out for in the future…

Some browser makers are working on providing an “opt-in” shared-memory threading model through an extended ‘SharedArrayBuffer’ that emscripten can make use of, but this is not yet available in any of my target browsers (Safari, IE, Edge).

Waiting for SIMD

Modern CPUs provide SIMD instructions (“Single Instruction Multiple Data”) which can really optimize multimedia operations where you need to do the same thing a lot of times on parallel data.

Codec libraries like libtheora and libvpx use these optimized instructions explicitly in key performance hotspots when compiling to native code… but how do you deal with this when compiling via JavaScript?

There is ongoing work in emscripten and by at least some browser vendors to expose common SIMD operations to JavaScript; I should be able to write suitable patches to libtheora and libvpx to use the appropriate C intrinsics and see if this helps.

But, my main targets (Safari, IE, Edge) don’t support SIMD in JS yet so I haven’t started…

GPU Madness

The obvious next thing to ask is “Hey what about the GPU?” Modern computers come with amazing high-throughput parallel-processing graphics units, and it’s become quite the rage to GPU accelerate everything from graphics to spreadsheets.

The good news is that current versions of all main browsers support WebGL, and ogv.js uses it if available to accelerate drawing and YCbCr-RGB colorspace conversion.

The bad news is that’s all we use it for so far — the actual video decoding is all on the CPU.

It should be possible to use the GPU for at least parts of the video decoding steps. But, it’s going to require jumping through some hoops…

  • WebGL doesn’t provide general-purpose compute shaders, so would have to shovel data into textures and squish computation into fragment shaders meant for processing pixels.
  • WebGL is only available on the main thread, so if decoding is done in a worker there’ll be additional overhead shipping data between threads
  • If we have to read data back from the GPU, that can be slow and block the CPU, dropping efficiency again
  • The codec libraries aren’t really set up with good GPU offloading points in them, so this may be Hard To Do.

libvpx at least has a fork with some OpenCL and RenderScript support — it’s worth investigating. But no idea if this is really feasible in WebGL.


In the meantime, I’ve got lots of other things to fix in Wikipedia’s video support so will be concentrating on that stuff, but will keep on improving this as the JS platform evolves!

by brion at August 09, 2015 02:27 PM

Gerard Meijssen

#Quality for #Wikidata and for external #sources

There are always arguments to find why not to accept Wikidata as a quality resource. Many Wikipedians ignore Wikidata because they do not trust the quality of its data. They require sources because that is why they trust a fact in Wikipedia to be good.

The practical problem is that Wikidata has some 15 million items and most have one or multiple statements. Each statement should be sourced given the notion of sources as a requirement. Given the speed of new information in Wikidata, sourcing for all statements is not going to happen anytime soon and consequently an alternative that demonstrates quality is needed.

One best practice of Wikidata is publishing external sources for our items. It already adds a feeling of quality because it allows a person to see what those external sources have to say. It takes some software and a workflow to leverage this sense of quality and solidify it as a measurable quality improvement.

Obviously both Wikidata and external sources have their issues. Where they all agree, there is the least need to work on improving quality. Where Wikidata has no data, it is obvious to add data and use the external source as a reference. It becomes interesting when there is a difference.

The first thing to do is flag a differing statement as suspicious. It signals to software and people that there is a need for attention. People can research the issue and come to the conclusion that
  • Wikidata is correct
  • the external source is correct
  • both are incorrect
In all these circumstances, the flag for the statement will be changed, the statement may be changed and in every case a source is to be provided. This is when true sources make the biggest difference because the flag does not go away and with quality sources where there is this obvious need, the quality of Wikidata is easier to appreciate.

by Gerard Meijssen (noreply@blogger.com) at August 09, 2015 08:55 AM

August 08, 2015

Gerard Meijssen

#Wikidata - Frances Oldham Kelsey

When someone like Mrs Kelsey dies, it is wonderful to read about her in the Wikipedia article. There are always some factoids that can be added to the Wikidata item.

What struck me most is not the story of how Mrs Kelsey prevented the USA from the effects of thalidomide in babies but the importance that an article in the Washington Post had. Thanks to an article by Morton Mintz there was an outcry that resulted in the passing of the Kefauver Harris Amendment. It required drug manufacturers to provide proof of the effectiveness and safety of their drugs before approval.

Another fun fact is that the FDA named an award after Mrs Kelsey; she was the first to be awarded the FDA Kelsey Award. There is not much to find about this award because there is a controversy about the award. People protested that Mrs Kelsey's good name was abused by political appointees to the FDA who wanted to diminish its powers..

For Mrs Kelsey only a few awards were added in Wikidata. There are more awards but as always, there is more to do.

by Gerard Meijssen (noreply@blogger.com) at August 08, 2015 03:26 PM

Wikimedia Foundation

Wikimedia Foundation Quarterly Report, April–June 2015

Image by Wikimedia Foundation, freely licensed under CC BY-SA 3.0.

The Wikimedia Foundation’s report for last quarter gives an overview of how we fared on 118 goals by 35 different teams, alongside some key overall metrics.
Download the PDF version (2 MB) or read it as a wiki page.

The Wikimedia Foundation’s quarterly report for the fourth quarter of the 2014/15 fiscal year (April-June) has been published as a PDF on Wikimedia Commmons, was presented in our monthly Metrics and Activities meeting yesterday and is now also available as a wiki page.

PDF by Wikimedia Foundation, freely licensed under CC BY-SA 3.0

This is the third report since we switched from a monthly cycle, to align with our quarterly goal setting process. The report’s purpose is to help our movement and supporters understand how we spend our time, and what we accomplish. We are continuing to optimize the report’s format and the organization’s quarterly review process that the report is based on, to bring you better information at a lower overhead for the teams that take out time from their work to tell you how they have been doing.

This issue includes some new pieces of information, e.g. the approximate size of each team (in FTE, on average during this quarter), and for each objective, the number of team members who were involved with a significant amount of their time. The overall metrics scorecard now contains new, more reliable uptime numbers for both readers and contributors.

As before, we are including an overview slide summarizing successes and misses across all teams, broken down by department. In a mature 90 day goal setting process, the “sweet spot” is for about 75% of goals to be a success. Organizations that are meeting 100% of their goals are not typically setting aggressive goals. In this quarter, 87 of the 118 objectives were met (74%).

The report’s format is still evolving (as is the quarterly goals review process), and we welcome feedback here in the comments or on Meta-wiki.

Terence Gilbey, Chief Operating Officer, Wikimedia Foundation
Tilman Bayer, Senior Analyst, Wikimedia Foundation

by Tilman Bayer and Terence Gilbey at August 08, 2015 06:44 AM


Wikimania – can volunteers organize conferences?

Wikimania is the annual international Wikimedia conference. This is an annual international conference for Wikimedia contributors. About 1000 people convene for the three-day main conference, in which 5 conference tracks are ongoing for eight hours. Conference tracks cover such topics as presenting individuals’ projects, reviewing community organizing plans, promoting access to information sources, developing tutorial infrastructure, legal issues, software demonstrations, regional outreach, metrics reporting, and reviewing research. Before the main conference there is a two-day preconference, termed a hackathon, in which people meet in small groups for meetings, workshops, training, and more personal discussion. I went to the conference in DC in 2012, Hong Kong in 2013, London in 2014, and Mexico City in 2015.

An issue arose at the Mexico Conference. I only know gossip and not real insider details, but the facts are that the conference was supposed to be held at a the Vasconcelos Library but instead was held at the Hilton Hotel. Wikipedians love libraries and in the election process which chose Mexico as the host city, a major factor persuading the community to choose Mexico was the organizing team’s enthusiasm for the library. Two months before the conference happened the venue was changed from the public library to a Hilton Hotel. I did not notice the change announcement and was surprised closer to the event when I noticed that the location changed. Reasons cited for the change are inability to secure close enough hotels for attendees and uncertainty in the library’s wiki capacity. These things may be so, and perhaps the library was always an inappropriate choice of venue. Still – I regret that so many volunteers did so much work for about a year planning an event at this library only to suddenly change. How much volunteer work was expended in the original plan? Why was that venue not sooner identified as inappropriate? Considering that volunteers are supposed to organize things like venue location – was there some way that volunteer labor was insufficient to accomplish the task, and could the paid staff which did the emergency moving of the event have been diligent in the original assessment and saved volunteer time?

The situation is that the mythology around the Wikimedia movement is that volunteers do everything. In reality, paid staff do a lot and serve in the most essential roles. The mythology partly developed because from 2001-2008 definitely the Wikimedia Foundation and the community had almost no money, and no external organizations were funding Wikimedia contributors. Since about 2008 the situation has changed a lot, but there are few evaluations of the changes, and fewer publications about the changes still. From the Wikimedia Foundation perspective, their funding has gone from nothing in 2001 to USD 65+ million this year. I mention this is my “Value of a Wikipedian” post. Another change is that more organizations are willing to hire their own Wikipedians. I was the first person hired to do Wikipedia work full time indefinitely. It was a crazy concept when I was hired, most people would still say that it is a strange idea, but now at this point, a lot of organizations are doing it. Since I came to New York, I have come to realize that a lot of editing in television and movies is done by paid editors, and this is especially taboo. Still, on Wikipedia there is a lot of demand for good information on popular television shows, and people seem to appreciate Wikipedia’s coverage of this. The concept is so boring that except in the case of the most popular television shows no one would think to do this, but for many shows, there are enough fans to appreciate reading the content on Wikipedia if paid staff put it there. In a lot of ways, paid contributions are creeping into Wikipedia without there being any history of community discussion to address the implications of this.

I say this to give some context to what in any other nonprofit movement would be a non-issue. The Wikimania conference is imagined to be a community run event, but leaving a conference entirely to volunteers is too burdensome for the volunteers and too risky for the community movement. There is a community memory that in 2010 in Poland, the volunteers managing the Wikimania conference became overwhelmed. As the story goes, the Wikimedia Foundation stepped in and had staff take over some essential roles during the conference and hired local event coordinators to make it go well. In 2011 the conference in Israel went well because the Israeli chapter is known for good business sense, having an office with good fundraising and management practices, and otherwise being a volunteer organization with effective staff support. In 2012 the Wikimania coordinators in DC paid USD 30,000 to hire an event consultant, and the WMF granted that because “event consultant” is a role which was available for hire in the United States and because they actually managed finance, legal contracts, and event coordination while giving volunteers final sign-off on everything without having a cozy relationship to the volunteers. In 2013 the volunteers in Hong Kong got a lot criticism for not reporting the finances of the conference – see for example “Hong Kong’s Wikimania 2013—failure to produce financial statement raises questions of probity“. I know that Hong Kong did not hire an event planner in the way that one was hired for DC, and it is my opinion that if they had, and if their event planner had managed their accounting, then there would have been no community objection to their reporting of the event. Based on my incomplete information, had the Hong Kong team not depended on volunteers to do accounting – which is a tedious and time consuming task to suggest that any volunteer do – and instead asked for USD 30,000 for a consultant to produce the report and accounting, then they would have gotten the money and high praise for their management of the event, because I think it was the best managed Wikimania I have yet attended. They managed to have volunteers everywhere greeting everyone at so many parts of the process, and the volunteers collectively seemed to me like a trained army that was on the edge of all activity continually directing me into the experience they had designed and kept on a tight schedule. The London conference was great, but then also, the London Wikimedia chapter is the second-best funded after Germany and has about 10 staff. They also managed the conference in an expensive conference venue that required its own staff be funded to coordinate the event, in contrast to for example the DC and Hong Kong events in universities which depended heavily on volunteers to complement the few staff services and the complete Hilton services in Mexico.

In 2014 I helped to organize WikiConference USA in New York with other volunteers. Organizing conference programming is a fun activity for volunteers – doing event management was tedious. For us volunteers, we liked advertising the event in some channels, reviewing program submissions, soliciting for scholarship applications and reviewing them, and recruiting volunteers to be on hand for the day of the event. Some of the duties which we did not enjoy, and which we would have preferred to turn over to paid staff, include negotiating the event with the venue and caterers; managing the written agreements about finance and safety; coordinating a travel team to dispense money for scholarship recipients; the accounting; the metrics part of the grant reporting to the Wikimedia Foundation; comprehensive communication in the manner of communications professionals as opposed to the style of grassroots volunteers; and responding to harassment. We got a stalker during that event and it spoiled the mood of what happened. We managed the conference for about 30,000 because the venue was a school which donated what elsewhere we would have paid 60,000. About 10k of the 30 was the food and incidentals and the other 20k was scholarships. There were about 10 of us on the organizing team and I suppose we met in person about 30 hours each to plan the event plus maybe as much time alone doing things online. This was for a 3-day conference for about 3-500 people, I forget how many. Wikimania must be on the same scale.

Is it worth having volunteers spend their time in this way? The money is less of an object these days. Volunteer time is scarce, and anyone who would consider volunteering to convene a Wikimedia conference is likely to also be a person whose time could be spent where expertise is scarce, like actually presenting Wikimedia culture instead of only creating a space for others to do this. Professional event coordinators are at least 2-3x more efficient in organizing events than a volunteer team would be, and also will anticipate bureaucratic reporting standards intuitively when volunteers might not anticipate the need at all.

I was thinking – until now, Wikimania conferences have been held based on an Olympic style bidding process in which groups of volunteers in different cities around the world bid for the right to host the conference. The outcome of the bid is that they get something like USD 300,000 to host the conference, with more money coming for special needs on request and constituting maybe 100,000 more. The restriction is that volunteers are discouraged from hiring paid staff to present the conference, and the event is expected to be as volunteer run as possible. I wonder if it might happen that the Wikimedia Foundation sees a history of difficulty, and is thinking of squashing the idea that volunteers should present conferences. I have never heard anyone suggest this but I have heard from volunteer organizers how much work it is, and I have my own experience especially with WikiConference 2014 but also with other conferences I know the work involved and the inefficiency that volunteers experience in doing things that professionals do easily. I think it would be more reasonable for the Wikimedia Foundation to hire event staff to manage almost all parts of the event, if only to free the volunteers’ time to do more personal engagement. A local Wikipedia team should coordinate some hospitality functions, like staffing the registration desk, having volunteers around to answer questions about the neighborhood, in selecting the keynote speakers and scheduling programming, and in recruiting Wikipedians to participate. Historically an online volunteer committee has selected the program submissions to be featured and also chosen scholarship recipients. I want these things continued, but as for event coordination – paid staff in the Western World / United States tradition ought to be used.

I worry about two side issues.

One is that the Hilton Hotel is an expensive American hotel with horrible business ethics. They charge about USD $300 a night for rooms, so for the ~100 scholarship recipients and the ~100 Wikimedia Foundation staff who attended the conference, this was about the rate paid for 5 nights. USD 300 * 200 people * 5 nights is USD 300,000, which is the typical conference scale and probably about the price including venue space, catering, and the negotiation of rate. It bothers me that this money went to an American company and not a local business. It also bothers me that this rate is so far removed from the local economy. A recent economic report says 46% of people in Mexico made less than USD 157 in a month, so one night in this hotel costs about 2 months wages in the local rate. In Mexico City where the conference was held, the report says 76% of people make USD 157 or less. How did the local Wikipedia contributors feel about hosting a conference in a venue so far removed from local culture and norms? How would the international guests have felt to stay in a local hotel instead of an American one?

The other issue is that almost all of the conference presentations were showcasing the work of paid staff when many people think that the Wikimedia movement is a volunteer initiative. There were five days of conference. The first two days were hackathon days, in which staff at the Wikimedia Foundation control everything that goes on the schedule. This is the first year that happened, because in previous years I made posts to the hackathon without a problem but this year my posts were not allowed. There were lots of empty rooms reserved, and people could meet during the first two days, and scholarship recipients were present, but only posting to the schedule was prohibited. In the other three days of the conference, I count 150 talks. Among these, 48 talks were presentations including paid staff of the Wikimedia Foundation. The Wikimedia Foundation did not participate in Spanish language talks, of which there were 26, so 48/124 makes 39% of the talks to be paid presentations by Wikimedia Foundation staff. 50 of the English language talks were done by people who were paid to present by some organization other than the WMF (including staff of chapters or paid Wikipedians like me), so that really just leaves 26 talks or about 16% done in English by volunteers this year. This 16% is for the three days available to the community, and not the two hackathon days where programming was managed by WMF staff. I expect that I could have counted incorrectly about the paid talks, because maybe I neglected to identify someone as being paid and counted them as volunteers.

I would have preferred that the Wikimedia volunteer community fill most of the speaking slots, perhaps 66% of them, with anyone who is paid to present having a lesser allotment. I want to emphasize volunteers, because the community and the Wikimedia Foundation put so much emphasis on volunteer contributions and say that the Wikimedia movement is volunteer driven. I think there is a perception even in the Wikimedia community that the community speaks for itself, but somehow this year the Wikimedia community was the audience at its own conference. For future Wikimanias I might like all talks tagged as volunteer presented, WMF presented, and paid staff contributed.

I am grateful to the volunteers who contributed to put this conference on, because I specifically sought ought talks by volunteers.

by bluerasberry at August 08, 2015 03:11 AM