June 24, 2017

Gerard Meijssen

#Wikipedia - Sister projects in search results

The Wikipedia Signpost informs that the discovery team extended the results for search on Wikipedia. New is that English Wikipedia now includes results from
WikisourceWiktionaryWikiquote and  Wikivoyage and that is indeed welcome news.

There is one puzzling part in the information; "Wikidata and Wikispecies are not within the scope of this feature." It is puzzling because including Wikidata search results is where search has been augmented for years in many Wikipedias including the English Wikipedia by the people who added this little bit of magic Magnus provided.

As you can see in the screenshot of the search for Wilbur R. Leopold, an award was conferred on him and the origin of this factoid is the article on the award. Thanks to Wikidata, information is available for Mr Leopold. There are so many references in Wikidata that have no article in a Wikipedia or any other project that from a search perspective it is probably the next frontier.

When wiki links, red links and even black links can be associated with Wikidata items, it becomes even easier to add precision to the search results. Adding these links is the low hanging fruit to improved quality in Wikimedia projects anyway. 

by Gerard Meijssen (noreply@blogger.com) at June 24, 2017 09:31 AM

Wikimedia Foundation

Wikimedia Research Newsletter, March 2017

“Wikipedia, work, and capitalism. A realm of freedom?”

Review by Dorothy Howard

Arwid Lund
(Photo by LittleGun, CC BY-SA 4.0)

In his first book, Wikipedia, Work, and Capitalism. A Realm of Freedom?,[1] Arwid Lund, lecturer in the program of Information Studies (ALM: Archives, Libraries and Museums) at Uppsala Universitet, Sweden investigates the ideologies that he believes are shared by participants in peer-production projects like Wikipedia. The author typologizes the ways that Wikipedians understand their activities, including “playing v. gaming” and “working v. labouring,” (113-115) to explore his hypothesis that “there is a link between how Wikipedians look upon their activities and how they look upon capitalism.” (117) Lund characterizes peer-production projects by their shared resistance to information capitalism—things like copyright and pay-walled publishing, which they see as limiting creativity and innovation. His thesis is provocative. He claims that the anti-corporatist ideologies intrinsic to peer production and to Wikipedia are unrealistic because capitalism always finds a way to monetize free content. Overall, the book touches on many issues not usually discussed within the Wikipedia community, but which might be a useful entry point for those who want to consider the social impacts of the project.

Lund uses a combination of social critique and qualitative interviews conducted in 2012 to provide supporting evidence for his thesis. One recurrent theme is that Wikipedia is part of a larger trend in gamification—a design technique developed in Human–computer interaction (HCI) to describe the process of using features associated with “play” to motivate interaction and engagement with an interface. One example he gives is that editors report that they find Wikipedia’s competitive and confrontational elements to be game-like. (143-144) He also claims that Wikipedians’ descriptions of their work and play balance changes as they take on more levels of responsibility and professionalism in the community, such as adminship. Still, it’s highly questionable whether the 8 interviews, which mainly focus on the Swedish Wikipedia, are a sufficient sample size to make his claims scalable.

The culture of Wikipedia valorizes altruism in its embrace of volunteering for the project to produce information for the greater good. Lund argues that Wikipedians’ belief in the altruistic aspect of the project, makes it easy for them to depoliticize their work and to ignore the how Wikipedia participates in the corporate, information economy. To him, Wikipedia is symptomatic of the devaluation of digital work, when in past generations, making an encyclopedia might be a source of income and employment opportunities for contributors.

So, he argues, contributors believe that peer production represents a space of increased autonomy, democracy, and creativity in the production of ideas. But from his view, attempts at a “counter-economy,” “hacker communism,” or “gift economies” (239, 303) are prone to manipulation, because we can’t create utopian bubbles within capitalism that aren’t privy to its influence. Still, peer production projects operate as if creation of value outside of the capitalist system is possible. Lund argues that Wikipedia cannot avoid competition with proprietary companies which see Wikipedia as a threat, and have an interest in harvesting its content for their own benefit. (218) Yet it would be nice if he brought in more examples to make this claim. The reader is left wondering who these corporate interests are, and what exactly they derive from Wikipedia. Having this information would help us understand where Lund is coming from.

Marxist linguist V.N. Volosinov, one of the references for Lund’s analysis

Although the word “work” in the title might suggest that Lund focuses on wage labour, the author’s aims are more broad, and he uses the word to connote a variety of aspects of social, value-producing activities. (20) Namely, the production of “use-value,” the Marxist term for the productive social activity of creating things which are deemed useful and thus of value to be bought and sold in the market (even if producers don’t consider their work to be commodities). He draws from Marxist thinkers and semioticians, among them V.N. Volosinov, Terry Eagleton, and Louis Althusser, to unpack different approaches to describing why Wikipedians might feel like they are playing when they are really working. (107-108) Marxists call such assumptions “false consciousness,” but the concept is difficult because it requires us to analyze manifest and latent (discursive and non-discursive) awareness. It would have been useful for Lund to look at how the fields of anthropology or psychology talk about ideology. Both fields have extensively researched the topic. More stringent ethnographic or qualitative methods might have also made his argument more convincing. But, based on the references he provides, it seems that the book’s target audience may be media theorists and social scientists, people who already familiar with Marxist political economy.

Lund makes a compelling case that capitalism instrumentalizes freely-produced knowledge for its own monetary gains. Meanwhile, he says, Wikipedia’s design and its heavily ideological agenda, make it difficult for the community to address the issue. The book is an interesting contribution to ongoing conversations about how Wikipedia and projects motivated by copyleft principles can be defined from a social perspective.

How does unemployment affect reading and editing Wikipedia ? The impact of the Great Recession

Review by Tilman Bayer

A discussion paper titled “Economic Downturn and Volunteering: Do Economic Crises Affect Content Generation on Wikipedia?”[2] investigates how “drastically increased unemployment” affects contribution to and readership of Wikipedia. To study this question statistically, the authors (three economists from the Centre for European Economic Research (ZEW) in Mannheim, Germany) regarded the Great Recession that began in 2008 as an “exogeneous shock” that affected unemployment rates in different European countries differently and at different times. They relate these rates to five metrics for the language version of Wikipedia that corresponds to each country:

(1) aggregate views per month, (2) the number of active Wikipedians with a modest number of monthly edits ranging from 5 to 100, (3) the number of active Wikipedians with more than 100 monthly edits, (4) edits per article, and (5) the content growth of a corresponding language edition of Wikipedia in terms of words


For each of these, the Wikimedia Foundation publishes monthly numbers. Since the researchers did not have access to country-level breakdowns of this data (which is not published for every country/language combination due to privacy reasons, except for some monthly or quarterly overviews which the authors may have overlooked, but only start in 2009 anyway), “to study the relationship of country level unemployment on an entire Wikipedia, we need to focus on countries which have an (ideally) unique language”. This excluded some of the European countries that were most heavily affected by the 2008 crisis, e.g. the UK, Spain or Portugal, but still left them with 22 different language versions of Wikipedia to study.

An additional analysis focuses on district-level (Kreise) employment data from Germany and the German Wikipedia, respectively. None of the five metrics are available with that geographical resolution, so the authors resorted to the geolocation data for the (public) IP addresses of anonymous edits (which for several large German ISPs is usually more precise than in many other countries).

In both parts of the analysis, the economic data is related to the Wikipedia participation metrics using a relatively simple statistical approach (difference in differences), whose robustness is however vetted using various means. Still, since in some cases the comparison only included 9 months before and after the start of the crisis (instead of an entire year or several years), this leaves open the question of seasonality (e.g. it is well-known that Wikipedia pageviews are generally down in the summer, possibly due to factors like vacationing that might differ depending on the economic situation).

Summarizing their results, the authors write:

we find that increased unemployment is associated with higher participation of volunteers in Wikipedia and an increased rate of content generation. With higher unemployment, articles are read more frequently and the number of highly active users increases, suggesting that existing editors also increase their activity. Moreover, we find robust evidence that the number of edits per article increases, and slightly weaker support for an increased overall content growth. We find the overall effect to be rather positive than negative, which is reassuring news if the encyclopedia functions as an important knowledge base for the economy.


While leaving open the precise mechanism of these effects, the researchers speculate that “it seems that new editors begin to acquire new capabilities and devote their time to producing public goods. While we observe overall content growth, we could not find robust evidence for an increase in the number of new articles per day […]. This suggests that the increased participation is focused on adding to the existing knowledge, rather than providing new topics or pages. Doing so requires less experience than creating new articles, which may be interpreted as a sign of learning by the new contributors.”

The paper also includes an informative literature review summarizing interesting research results on unemployment, leisure time and volunteering in general. (For example, that “conditional on having Internet access, poorer people spend more time online than wealthy people as they have a lower opportunity cost of time.” Also some gender-specific results that, combined with Wikipedia’s well-known gender gap, might have suggested a negative effect of rising unemployment on editing activity: “Among men, working more hours is even positively correlated with participation in volunteering” and on the other hand “unemployment has a negative effect on men’s volunteering, which is not the case for women.”)

It has long been observed how Wikipedia relies on the leisure time of educated people, in particular by Clay Shirky, who coined the term “cognitive surplus” for it, the title of his 2010 book. The present study provides important insights into a particular aspect of this (although the authors caution that economic crises do not uniformly increase spare time, e.g. “employed people may face larger pressure in their paid job”, reducing their available time for editing Wikipedia). The paper might have benefited from including a look at the available demographic data about the life situations of Wikipedia editors (e.g. in the 2012 Wikipedia Editor survey, 60% of respondents were working full-time or part-time, and 39% were school or university students, with some overlap).


How complete are Wikidata entries?

Author’s summary by Simon Razniewski

While human-created knowledge bases (KBs) such as Wikidata provide usually high-quality data (precision), it is generally hard to understand their completeness. A conference paper titled “Assessing the Completeness of Entities in Knowledge Bases”[3] proposes to assess the relative completeness of entities in knowledge bases, based on comparing the extent of information with other similar entities. It outlines building blocks of this approach, and present a prototypical implementation, which is available on Wikidata as Recoin (https://www.wikidata.org/wiki/User:Ls1g/Recoin).

“Cardinal Virtues: Extracting Relation Cardinalities from Text”

Author’s summary by Simon Razniewski

Information extraction (IE) from text has largely focused on relations between individual entities, such as who has won which award. However, some facts are never fully mentioned, and no IE method has perfect recall. Thus, it is beneficial to also tap contents about the cardinalities of these relations, for example, how many awards someone has won. This paper[4] introduces this novel problem of extracting cardinalities and discusses the specific challenges that set it apart from standard IE. It present a distant supervision method using conditional random fields. A preliminary evaluation that compares information extracted from Wikipedia with that available on Wikidata shows a precision between 3% and 55%, depending on the difficulty of relations.

Conferences and events

See the research events page on Meta-wiki for upcoming conferences and events, including submission deadlines.

Other recent publications

Other recent publications that could not be covered in time for this issue include the items listed below. contributions are always welcome for reviewing or summarizing newly published research.

Compiled by Tilman Bayer
  • “Learning by comparing with Wikipedia: the value to students’ learning”[5] From the paper: “The main purpose of this research work is to describe and evaluate a learning technique that actively uses Wikipedia in an online master’s degree course in Statistics. It is based on the comparison between Wikipedia content and standard academic learning materials. We define this technique as ‘learning by comparing’. […] The main result of the paper shows that […] active use of Wikipedia in the learning process, through the learning-by-comparing technique, improves the students’ academic performance. […] The main findings on the students’ perceived quality of Wikipedia indicate that they agree with the idea that the encyclopaedia is complete, reliable, current and useful. Although there is a positive perception of quality, there are some quality factors that obtain better scores than others. The most valued quality aspect was the currentness of the content, and the least valued was its completeness.”
  • “Use and awareness of Wikipedia among the M.C.A students of C. D. Jain college of commerce, Shrirampur : A Study”[6]
  • “Comparative assessment of three quality frameworks for statistics derived from big data: the cases of Wikipedia page views and Automatic Identification Systems”[7] From the abstract: ” We apply these three quality frameworks in the context of ‘experimental’ cultural statistics based on Wikipedia page views”
  • “Discovery and efficient reuse of technology pictures using Wikimedia infrastructures. A proposal”[8] From the abstract: “With our proposal, we hope to serve a broad audience which looks up a scientific or technical term in a web search portal first. Until now, this audience has little chance to find an openly accessible and reusable image narrowly matching their search term on first try ..”
  • “Extracting scientists from Wikipedia”[9] From the abstract: “… we describe a system that gathers information from Wikipedia articles and existing data from Wikidata, which is then combined and put in a searchable database. This system is dedicated to making the process of finding scientists both quicker and easier.”
  • “Where the streets have known names”[10] From the abstract: “We present (1) a technique to establish a correspondence between street names and the entities that they refer to. The method is based on Wikidata, a knowledge base derived from Wikipedia. The accuracy of this mapping is evaluated on a sample of streets in Rome. As this approach reaches limited coverage, we propose to tap local knowledge with (2) a simple web platform. … As a result, we design (3) an enriched OpenStreetMap web map where each street name can be explored in terms of the properties of its associated entity.”


  1. Lund, Arwid (2017). Wikipedia, Work, and Capitalism. Springer: Dynamics of Virtual Work. ISBN 9783319506890. 
  2. Kummer, Michael E.; Slivko, Olga; Zhang, Xiaoquan (Michael) (2015-11-01). Economic Downturn and Volunteering: Do Economic Crises Affect Content Generation on Wikipedia?. Rochester, NY: Social Science Research Network. 
  3. Ahmeti, Albin; Razniewski, Simon; Polleres, Axel (2017). Assessing the Completeness of Entities in Knowledge Bases. ESWC. 
  4. Mirza, Paramita; Razniewski, Simon; Darari, Fariz; Weikum, Gerhard (2017). Cardinal Virtues: Extracting Relation Cardinalities from Text. ACL. 
  5. Meseguer-Artola, Antoni (2014-05-26). “Aprenent mitjançant la comparació amb la Wikipedia: la seva importància en l’aprenentatge dels estudiants”. RUSC. Universities and Knowledge Society Journal 11 (2): 57–69. ISSN 1698-580X. doi:10.7238/rusc.v11i2.2042.  (“Learning by comparing with Wikipedia: the value to students’ learning”, in English with Catalan abstract)
  6. Pathade, Prasad R. “Use and awareness of Wikipedia among the M.C.A students of C. D. Jain college of commerce, Shrirampur : A Study” (PDF). International Multidisciplinary e-Journal. ISSN 2277-4262. 
  7. Reis, Fernando; di Consiglio, Loredana; Kovachev, Bogomil; Wirthmann, Albrecht; Skaliotis, Michail (June 2016). Comparative assessment of three quality frameworks for statistics derived from big data: the cases of Wikipedia page views and Automatic Identification Systems (PDF). European Conference on Quality in Official Statistics (Q2016). Madrid. p. 16. 
  8. Heller; Blümel; Cartellieri; Wartena. “Discovery and efficient reuse of technology pictures using Wikimedia infrastructures. A proposal”. Zenodo. doi:10.5281/zenodo.51562. 
  9. Ekenstierna, Gustaf Harari; Lam, Victor Shu-Ming (2016). “Extracting Scientists from Wikipedia”. From Digitization to Knowledge 2016. p. 8. 
  10. Almeida, Paulo Dias; Rocha, Jorge Gustavo; Ballatore, Andrea; Zipf, Alexander (2016-07-04). “Where the Streets Have Known Names”. In Osvaldo Gervasi, Beniamino Murgante, Sanjay Misra, Ana Maria A. C. Rocha, Carmelo M. Torre, David Taniar, Bernady O. Apduhan, Elena Stankova, Shangguang Wang (eds.). Computational Science and Its Applications — ICCSA 2016. Lecture Notes in Computer Science. Springer International Publishing. pp. 1–12. ISBN 9783319420882.  Closed access

Wikimedia Research Newsletter
Vol: 7 • Issue: 3 • March 2017
This newsletter is brought to you by the Wikimedia Research Committee and The Signpost
Subscribe: Syndicate the Wikimedia Research Newsletter feed Email WikiResearch on Twitter WikiResearch on Facebook[archives] [signpost edition] [contribute] [research index]

by Tilman Bayer at June 24, 2017 05:52 AM

June 23, 2017

Wiki Education Foundation

Wiki Ed joins plant biologists for ASPB’s annual meeting

This weekend, Outreach Manager Samantha Weald and I head to Honolulu to join plant biology faculty and students at the annual meeting of the American Society of Plant Biologists (ASPB). When we partnered with ASPB, we committed to increasing the number of students that we support as they improve Wikipedia’s coverage of plant biology-related topics. We’re looking forward to meeting university instructors face-to-face to share with them the pedagogical benefits to students, and how we make it easy to get involved.

Since beginning our partnership in the fall 2015 term, Wiki Ed has trained more than 700 plant biology students to edit Wikipedia. Together, they have added 337,000 words to articles that have received nearly 9 million views. Students have improved Wikipedia’s information about beech bark disease, carnivorous plants, the environmental impact of agriculture, North American azaleas, DNA sequencing, and many more. We’re proud of the great work they’ve done, like the students in Jennifer Blake-Mahmud’s course at Rutgers University who wrote about sexual reproduction and sexually transmitted diseases in plants.

If you’re attending the conference, we’ll be at the education booth in the exhibit hall Sunday through Tuesday. We’ll also be attending the education symposium on Monday, June 26th, from 1:30–3:15pm. There, Dr. Sarah Wyatt, who has participated in Wiki Ed’s Classroom Program, will join her graduate students to share their experiences editing Wikipedia as an assignment. Please come join us at any of these events, or email us at contact@wikiedu.org to schedule an independent meeting. We look forward to bringing more plant biologists to Wikipedia!

by Jami Mathewson at June 23, 2017 09:54 PM

Wikimedia Foundation

Wikimedia Foundation v. NSA: Why we’re here and where we’re going

The National Security Agency’s headquarters in Fort Meade, Maryland. Photo by Trevor Paglen/Creative Time Reports, public domain.

For the last two years, the Wikimedia Foundation has been fighting in the United States federal courts to protect the fundamental rights and freedoms of Wikimedia users from overly-broad government  surveillance. We challenged the U.S. National Security Agency’s (NSA) “Upstream” mass surveillance of the internet, which vacuums up international text-based online communications without individualized warrants or suspicion. Now, in the wake of an important court ruling in our favor, we take a closer look at Wikimedia Foundation v. NSA.

On May 23, 2017, the U.S. Fourth Circuit Court of Appeals ruled that the Wikimedia Foundation has adequately alleged standing to challenge the NSA’s Upstream surveillance of internet traffic and may proceed to the next stage of the case. Specifically, the court found that the Foundation has adequately alleged the suspicionless seizure and searching of its internet communications through Upstream surveillance.  The Fourth Circuit’s decision is an important, but still intermediate, victory for online privacy and free expression.  In this blog post, we’ll provide some background on the case and the practices it challenges, look at the most recent ruling, and discuss our next steps.

How we got here

In March 2015, we joined eight other co-plaintiffs (represented pro bono by the American Civil Liberties Union (ACLU)) to file a lawsuit challenging the NSA’s Upstream surveillance practices. However, the events that precipitated our case began much earlier.

In 1978, Congress passed the Foreign Intelligence Surveillance Act (FISA), which regulates the collection of communications that fall into the category of “foreign intelligence information” on U.S. soil. FISA required the government to show probable cause to a court that a particular surveillance target was a “foreign power” or an agent thereof.

However, in 2008, the Foreign Intelligence Surveillance Act Amendments Act (FAA) amended FISA to authorize the government to monitor communications of non-U.S. persons for “foreign intelligence information” without establishing probable cause or making any individualized showing to a court. And in 2013, public disclosures of NSA documents revealed the massive scope of the surveillance practices allegedly authorized by the FAA, including “Upstream” surveillance.

Upstream surveillance involves installing devices at major “chokepoints” along the internet backbone. The NSA then seizes international text-based communications passing through these chokepoints and combs through those communications for so-called “selectors” associated with tens of thousands of targets. Although the NSA claims that Section 702 of FISA authorizes Upstream surveillance, we believe that its scope exceeds what is actually allowed by the statute. This broad surveillance also infringes several provisions of the U.S. Constitution, including: the First Amendment, which protects freedom of speech and association; the Fourth Amendment, which protects against unreasonable searches and seizures; and Article III, which grants specific powers to the judicial branch of government.

At the District Court, the government asserted that the Wikimedia Foundation and our co-plaintiffs lacked standing. Standing is a legal concept that determines whether a party has alleged a specific injury and has a right to bring a claim in court. The government argued that we lacked standing because we had not plausibly alleged that the NSA actually intercepted and searched our communications. The District Court considered only the standing issue, agreed with the government, and granted its motion to dismiss. We then appealed to the Fourth Circuit, explaining how and why we have standing in this case.   

Why we’re here

Privacy and free expression rights are fundamental to the Wikimedia Foundation’s vision of empowering everyone to share in the sum of all human knowledge.  Our mission depends on maintaining the privacy and confidentiality of user communications and activities, so as to encourage trust among community members and foster the creation and dissemination of free educational content.

In supporting Wikipedia and the other Wikimedia projects, we communicate with hundreds of millions of individuals who read or contribute to the repository of human knowledge. These communications often contain personally identifying, sensitive, or confidential information about users, our staff, and the Foundation. Suspicionless searches of these communications are like searching the patron records of the largest library in the world.

We strive to keep this information confidential, and always put user privacy first. The Wikimedia Foundation privacy policy limits what data we collect, how much we share, and how long we retain it. We never sell user information or share it with third parties for marketing purposes. In June 2015, we implemented HTTPS across the projects, which permit anonymous and pseudonymous participation as one of their key principles of operation.

We filed this lawsuit as another step in our efforts to stand up for Wikimedia users, and protect their ability to read and share knowledge freely and confidentially.

Where we’re going

The Fourth Circuit’s recent decision is a major step in the fight against mass surveillance. Notably, all three judges on the panel found that the Wikimedia Foundation had established standing, by alleging sufficient facts to defeat the government’s motion to dismiss. By a 2-1 vote, however, the panel upheld the lower court’s finding that our eight co-plaintiffs did not have standing. The third judge on the panel would have found that all nine plaintiffs had standing.

This important ruling is the most recent in a series of cases in which U.S. courts have permitted challenges to mass surveillance to go forward. This past October, the U.S. Third Circuit Court of Appeals held that an individual plaintiff had standing to challenge the NSA’s PRISM program, which collects internet communications directly from service providers. The Electronic Frontier Foundation (EFF)’s Jewel v. NSA case has also scored some legal victories, most recently when a District Court allowed EFF to conduct discovery from the government. Two other courts ruled against the U.S. government’s bulk collection of call records, which helped prompt the U.S. Congress to enact some positive reforms.  

These cases partially reverse a previous trend in which courts were less skeptical of government snooping. In 2013, the Supreme Court ruled 5-4 that the plaintiffs in Clapper v. Amnesty International did not have standing to challenge mass surveillance on the ground that their claims were too speculative. However, the global surveillance disclosures following the Clapper decision have revealed a great deal about the true scope and scale of the U.S. government’s suspicionless surveillance practices. This information has prompted courts to conclude that Clapper doesn’t foreclose every challenge to government surveillance in the name of national security. Encouragingly, some plaintiffs—like the Foundation in this case—are increasingly afforded the opportunity to reach the merits of their claims, though many continue to face an uphill battle.  

These victories come as Section 702 of FISA is scheduled to sunset in December 2017. Section 702 sets out the process the U.S. government must follow for obtaining authorization to target non-U.S. persons reasonably believed to be abroad, including their communications with persons in the U.S. It also broadened the scope of surveillance beyond foreign powers and their agents to, for example, any information the government believes relates to the “foreign affairs” of the United States. No particularity or probable cause is necessary.  

Regardless of the outcome in the courts, this year’s reauthorization debate represents an important opportunity for reform. Upstream’s many statutory and constitutional deficiencies must be fixed, and we welcome a public conversation about the importance of protecting internet users’ privacy and expressive freedoms.

Even though we have won this appeal, our fight against the NSA’s overbroad surveillance practices is far from over. We are closely reviewing the opinion with our counsel at the ACLU and our co-plaintiffs to determine the next steps, and we will continue to publish updates to keep Wikimedia users informed.

Want to learn even more about Wikimedia Foundation v. NSA?

An updated timeline, frequently asked questions, and more resources about this case can be found at our Wikimedia Foundation v. NSA resources page.  To view any of the legal documents or decisions in this case, check out the ACLU’s page.  Finally, if you want to add your voice to the cause, consider talking about your support on social media, or sharing the ACLU’s infographic about the case.

Jim Buatti, Legal Counsel
Aeryn Palmer, Legal Counsel

Thanks to Allison Davenport and Nick Gross for their assistance in preparing this blog post. Special thanks to all who have supported us in this litigation, including the ACLU’s Patrick Toomey and Ashley Gorski; the Knight Institute’s Jameel Jaffer and Alex Abdo; Aarti Reddy, Patrick Gunn, and Ben Kleine of our pro bono counsel Cooley, LLP; and the Wikimedia Foundation’s Zhou Zhou.

by Jim Buatti and Aeryn Palmer at June 23, 2017 03:48 PM

June 22, 2017

Wiki Education Foundation

Announcing our Annual Plan

Wiki Ed’s Board of Trustees approved our Annual Plan and Budget for Fiscal Year 2017–2018 at their June meeting. I’m pleased to share the full document here, which both recaps the work we did last year as well as highlights what we plan to do next year (Wiki Ed’s fiscal year runs July 1 to June 30).

Last fiscal year was one of enormous growth for Wiki Ed. Among our achievements:

  • Our Year of Science initiative culminated with more than 6,300 students engaged in improving Wikipedia’s underdeveloped science content while improving their writing, information literacy, critical thinking, collaboration, and online communications skills. The science students enrolled in our Classroom Program created 637 articles and improved more than 5,670.
  • With more than 65% of the students being female, our Classroom Program continues to be single most effective tool for boosting women’s authorship on Wikipedia.
  • Students in our program have now added the equivalent of 75% of the last print edition of Encyclopædia Britannica to Wikipedia.
  • Our technical investments in our Dashboard platform have enabled us to scale our impact without staffing additions.
  • Our research study results showed that both students and instructors value the Wikipedia-based assignment overwhelmingly over a “traditional” paper assignment in developing learning outcomes for every category queried: critical thinking, digital literacy, technical skills, online source reliability, about the class topic, and writing for a general audience. Moreover, students found themselves motivated, more satisfied, and generally were very positive about the Wikipedia assignment.

We look forward to continuing to expand our impact next year. Among our key initiatives laid out in the plan:

  • Wiki Ed will kick off a multi-year Future of Facts campaign. We will dedicate distinct resources to recruiting, onboarding, and supporting higher education courses in politically relevant subject areas like public policy, political science, law, history, sociology, and environmental science, as well as interdisciplinary courses that will work on these topic areas. Students in these courses will write Wikipedia articles in these topic areas, citing reliable sources, thereby improving the public’s access to information on topic areas relevant to an informed citizenry. We will also recruit and support Visiting Scholar positions in these subject areas, in which we pair an experienced Wikipedia editor who writes in politically relevant topic areas with a university who provides access to sources in that subject.
  • We will begin to develop technology for what we’re calling Guided Editing. One of the biggest pain points about our student editors for existing Wikipedia editors is that sometimes they struggle to get the tone right for an encyclopedia article. Issues with plagiarism, too few citations, and failure to meet the manual of style on Wikipedia also frustrate existing editors. We want to create a Guided Editing experience that uses artificial intelligence to review students’ edits as they make them, making suggestions to avoid plagiarism, citation issues, tone problems, and manual of style errors, before the students’ edit is made on the live article namespace on Wikipedia. Creating this Guided Editing system will enable us to address many of the most common challenges student editors face. In 2017–18, we will begin the early stages of what we expect to be a major technical project.

To read more about our work last year and this year, I encourage you to read our Annual Plan, which describes our plans in more detail. As always, we will also continue to share our progress through monthly reports to our board, which we also share on our website.

by Frank Schulenburg at June 22, 2017 04:56 PM

Wikimedia Foundation

Why does Venezuelan photographer Wilfredo Rodríguez donate his work to the world?

Photo by User:The Photographer (Wilfredo Rodríguez), public domain.

Wilfredo Rodríguez was born in a modest Venezuelan house, not unlike the one seen above, on the small Caribbean island of Margarita. Its size, however, does not measure up to its beauty and cultural heritage—and that’s something Rodríguez, better known on Wikimedia projects by his username “The Photographer,” helps the world understand.

Rodríguez joined the Wikimedia movement over a decade ago, during which time he has contributed over 40,000 images to Wikimedia Commons, the free educational media repository. Unlike many others, he has actively decided to release many of his photos into the public domain, giving up the right to be credited when his work is shared.

His photos have been displayed in some of the world’s largest exhibitions, and hundreds of his photos are promoted by the Wikimedia community as featured photos and quality images, the highest quality markers on the project.

Rodríguez’ early childhood gives us a clue as to what he would eventually bring to that open movement.

Head of an Iguana in Venezuela. Photo by The Photographer (Wilfredo Rodríguez), public domain.

Photo by The Photographer (Wilfredo Rodríguez), public domain.

When Rodríguez was growing up, a spring would continually bubble up from a hill near his house, creating a small river. He found himself fascinated by the natural phenomenon and the animals nearby that depended on it for survival, so much so that he took to carrying a notebook to sketch the trees and iguanas that lived there. In fact, his interest was so keen that his childhood nickname was ‘iguana’.

His childhood was, however, not entirely spent underneath these trees: “I do not remember feeling any lack or need during my childhood,” he says, “but in my adulthood I discovered that my parents sometimes didn’t eat to feed me and my sister during the 1980s crisis in Venezuela. It was a very difficult period for my country, in which I managed to continue studying.”

Photo by The Photographer (Wilfredo Rodríguez), public domain.

Despite the ongoing crisis, Rodríguez was able to attend a local university and chose to specialize in systems engineering, a less preferred yet promising field of study when considering the best job opportunities. Rodríguez excelled in the field and graduated with the second highest scores in his class. Given his photographic interests today, it does not take a rocket scientist to guess what his family graduation gift to him was.

“My first camera was a great financial burden for my family,” he says, “but it was a graduation gift. I remember that it was a generic Chinese brand and could only take three-megapixel photos. Still, it allowed me to do incredible things when using it in combination with magnifying loupes and some improvised binoculars.”

Photo by The Photographer (Wilfredo Rodríguez), public domain.

Though he contributes mostly as a photographer today, Rodríguez’s first steps in the Wikimedia movement displayed a decidedly different area of interest. Rodríguez had first volunteered for Kiwix, the free software that allows searching and reading Wikipedia without an internet connection.

“I believe that something needs to be done to bring this knowledge to the remote areas of my country,” says Rodríguez, as he hopes to mitigate the impact of government censorship and what Human Rights Watch has called the “humanitarian crisis” happening in his country. “Those areas without the internet are in dire need of help,” he believes.

As part of the Kiwix project, Rodríguez installed the software in hundreds of information centers in Venezuela with the support of Emmanuel Engelhart, the developer behind Kiwix, and César Wilfredo, a fellow Wikipedian from Venezuela.

Rodríguez then began to turn his efforts to taking photos of his country and uploading them to Wikimedia Commons. This step was, in his words, “a way to protest” by finding a way to show the challenges of daily life there. “I always thought,” he says, “that what I was trying to show was more important than my life, because what I was doing was going to remain for future generations.” This sometimes included rather dangerous journeys:

I remember climbing the Bolivar and Humboldt peaks at 5000 meters (16,400 ft) above sea level to capture the melting of the mountains’ remaining glaciers. … It took eight days of hiking and climbing with 60-kilogram (130 lb) backpacks. The trip was very difficult and extreme. I had prepared for almost a year, but we still had serious issues with food after one of the members of the team left the group leaving us without food. We continued for 3 more days practically without eating.

I took some good shots; however, above 5000 meters, it is difficult to take pictures because of the lack of oxygen.

The most dangerous thing I did was to travel to ranches in Caracas, though, which was potentially fatal because of crime and anti-government protests taking place at the time of my trip.”

Photo by The Photographer (Wilfredo Rodríguez), public domain.

It is not an easy decision for a photographer to give up their rights in a photo by sharing it in the public domain, especially when there is a long and arduous adventure behind it, but Rodríguez had a different view about this: “I think the message is more important than the author. I know that for some photographers it is important that they receive credit for the photo, and I respect that opinion, but I don’t think about my work this way.”

As a result, Rodríguez had his photos exhibited at the French National Museum of Natural History, Encyclopedia Britannica, local galleries in Maracaibo, and many other places.

Photo by The Photographer (Wilfredo Rodríguez), public domain.

Photo by The Photographer (Wilfredo Rodríguez), public domain.

Photo by The Photographer (Wilfredo Rodríguez), CC BY-SA 4.0.

Wilfredo Rodríguez. Photo by The Photographer (Wilfredo Rodríguez), CC BY-SA 4.0.

Samir Elsharbaty, Digital Content Intern
Wikimedia Foundation

by Samir Elsharbaty at June 22, 2017 04:30 PM

June 21, 2017

Weekly OSM

weeklyOSM 361



pole_climber’s Colour Coded OSM Notes – displayed based on last edit date 1 | © pole_climber, uMap and map data © OpenStreetMap contributors ODbL


  • User Chetan_Gowda writes a diary about Microsoft’s release of 9.8 million building footprint data of 102 cities, covering 44 US states, to the OpenStreetMap community. He finds the general quality of data being good and encourages the local communities to evaluate the quality for import. He is asked in the comment section why he considers the data to be of good quality.
  • Christoph Hormann explains how to accurately map coastlines and tidal zones in OSM, thanks to recent and detailed aerial imagery.
  • Jochen Topf announces another forthcoming multipolygon challenge, concerning “old-style” multipolygons with the same tags on both the relation and the outer way. As these are rendered differently with the 4.0 release of OSM Carto, they may be now noticeable.
  • Volker Schmidt asks how to map a “Verkehrsübungsplatz” – “Traffic training area”. Here is an example.
  • Joachim suggests to map places also as areas. The idea is to introduce a new tag boundary=place and, if necessary, to use boundary relations.
  • Malte Heynen proposes to tag foot and cycleways next to highways with their street classification. Routing engines could then prefer ways far away from the traffic. According to the proposal you would change cycleway=sidepath to cycleway=sidepath:primary.
  • Even though the proposal “Language information for name” by Lukas Sommer was accepted by 57% (26 votes), he will rewrite it, “to explain better, more detailed, what missing language information means for Unicode text strings, and how an example use case could look like.“
  • São Jorge Island in the Azores disappeared, reports Topo Lusitania on June 9th on the Portuguese mailing list. Obviously, the issue is still not resolved until the drafting of this edition of weeklyOSM.
  • User Polyglot wants to start doing regular Hangouts on Air about mapping of public transport using JOSM and the PT_Assistant plugin.
  • User marthaleena explains how she is mapping her hometown Visakhapatnam (Andhra Pradesh, India).
  • Pascal Neis published a blog post as a summary of his German talk (slides and video) (de) at the FOSSGIS & OpenStreetMap conference 2017 with the title “Detecting vandalism in OpenStreetMap”.


  • Antoine Riche setup a map of regular OSM meetings in France: these are usually held monthly.
  • Ilya Zverev announced the “Call for Nominees” for the OpenStreetMap Awards 2017. There are some changes compared to last year’s process.
  • As part of its twinning with the rural commune of Dianguirdé in the north-west of Mali, the town of Ivry-sur-Seine mapped the village and its surroundings with the support of CartONG.
  • A user published in the German OSM Forum, parts of the letter, “German Association of Energy and Water Management e.V.”. Hydrants in the OpenFireMap (i.e. in OpenStreetMap) were a thorn in the side of the association. It is feared the security of critical infrastructure. The association disapproved of the letter’s publication. Readers may recall similar issues in the United Kingdom.
  • Simon Poole sent a reminder that an OpenStreetMap organisation entity exists on Transifex where you can now move easily your project for more visibility to volunteer translators.

OpenStreetMap Foundation

  • OSMF published the minutes of the Engineering Working Group’s meeting on May 30th.
  • Martijn van Exel informs the Talk-FR mailing list that the French OpenStreetMap association has made an application for local chapter status with OSMF. He asks the community to share any questions, comments or concerns.


  • Sev_osm reports on Twitter about two workshops on using free tools for mapping and geoscience (OSM, QGIS, geOrchestra), taking place from June 12th to 24th in Niamey, Niger. This initiative is supported by “Les Libres Géographes (LLG)“, Projet EOF and OSM communities from Niger, Benin, Burkina Faso and Togo.
  • On June 15th, the Mapbox team in Peru conducted an OpenStreetMap workshop for 4th and 5th grade students at the Mariscal Caceres School in Ayacucho, Peru. Students were amazed to see how easily different geographic data can be added in OpenStreetMap.
  • OSM Peru participated in the “Conference on the reconstruction of peasant cooperatives” (after the devastating floods of the past few months) on June 6th in Lima, Peru.

Humanitarian OSM

  • The Drucker Institute at Claremont Graduate University has named the 50 semifinalists for the 2017 Drucker Prize, and HOT is one of them. The winner of the $100,000 Drucker Prize will be announced on September 29th.
  • HOT Indonesia hosts a mapathon with students from the University of Indonesia’s (UI) Department of Geography.
  • Fatima Alher tweets about a mapathon that took place on June 17th in Niamey (see “Events”). It was aimed, through this task, at mapping the surroundings of Diffa, Niger, an area affected by the Boko Haram abuses. See this announce on the HOT mailing list for details.
  • HOT has partnered with the Global Earthquake Model (GEM) and ImageCat on a Challenge Fund focused on developing a global exposure database for multi-hazard risk analysis. The Challenge Fund, formed by the Global Facility for Disaster Reduction and Recovery (GFDRR) and the UK’s Department for International Development (DFID) is aimed at building local and global resilience through innovation in order to better identify risk and enable more effective decision-making.


  • A major new release, 4.0, of CartoCSS is in the process of being rolled out on the OSMF servers. There is no official announcement yet, but many hints in the mailing lists, here and here. One of the big steps forward in increasing the flexibility of the style is the incorporation of LUA pre-processing.
  • HeiGIT @ GIScience Heidelberg released a dedicated stable disaster version of OpenRouteService (ORS) to support humanitarian logistics for Africa, South America and Indonesia with data from OSM.
  • Lokaler Editor, a browser-based tool for creating maps for print on the web, is now in beta stage. It helps journalists to create maps with their own design and enables them to add their own notes. The service offering is a freemium model with among others SVG export, and should start in autumn. The source code will be opened in 2018. User Spanholz named other possibilities for this tool: “I shared it also with my local police forces. They can now easily create maps for festivals or emergency situations, which can accurately show the area while highlighting more necessary informations.” There is a tutorial video as well.
  • [1] pole_climber published a very detailed blog post on how to create a colour coded map about OSM notes, displayed based on last edit date.


  • Milan municipality imported a few drinking fountains’ locations from OSM, therefore prompting (it) (automatic translation) a detailed licensing discussion on the talk-it mailing list.


  • Jean-Maxime Fillau recently contributed to the OSRM project by adding an option to compute a route that ensures the vehicle arrives on the correct side of the road for the destination. This is particularly useful for large vehicles such as delivery trucks and fire engines.


  • Graphhopper is looking for freelancers or companies interested in implementing customizations of the GraphHopper routing engine, jsprit or integrating the GraphHopper Directions API in an application or process.


Software Version Release date Comment
Komoot Android * var 2017-06-14 No infos.
Mapillary Android * 3.60 2017-06-14 More robust EXIF reading, fix of stray images upload problems.
Mapillary iOS * 4.7.1 2017-06-14 Two bugs fixed.
OSRM Backend 5.7.4 2017-06-14 Bugfix release.
StreetComplete 0.12 2017-06-16 New languages: Ukrainian and Finnish, warning dialog if user tries to add information not at his location, bugfixes.
Kurviger Free * 1.1.0-2 2017-06-17 Changes in navigation, geocoding and UI.
Simple GIS Client * 9.1 2017-06-18 Changes for the US Census API service, better memory management for big OSM datasets.

Provided by the OSM Software Watchlist. Timestamp: 2017-06-19 14:20:12+02 UTC

(*) unfree software. See: freesoftware.

Did you know …

  • … what to do when you find OSM maps used without attribution?
  • … the three-language platform of the Belgian OSM Community?
  • … that Berlin city map on berlin.de portal uses OSM?
  • … that Italian OSM data, updated daily, are available for download on Edmund Mach Foundation servers?

OSM in the media

Other “geo” things

  • An international research team published an article on Nature about crowdsourced validation of land use data collected by remote sensing. They opened four campaigns on GeoWiki platform, analysed the variations in accuracy, and made the datasets available for further research.
  • The University of Heidelberg is looking for a Software Developer.
  • ResearchNReports published the Global Cloud GIS Market Research Report Forecast 2017-2021, a valuable source of insightful data for business strategists. It provides the Cloud GIS industry overview with growth analysis and historical & futuristic cost, revenue, demand and supply data (as applicable).
  • MapMyIndia claims to be India’s most comprehensive GPS navigation & tracking solutions provider, engaging the user on multiple platforms – mapping India years before Google Maps. While Google Maps is an app that allows users to visualise data, the data provided by MapMyIndia allows users to analyse a detailed array of geo-demographic data.
  • User ff5722 asks if there are really over 4.5 million km of roads in China. Several different estimation methods appear to confirm this figure. OSM only has mapped 1.2 million km of Chinese roads so cannot be used yet, to provide a more detailed analysis.

Upcoming Events

Where What When Country
Essen 8. FOSSGIS Hacking Event im Linuxhotel 2017-06-23-2017-06-25 germany
Essen SommerCamp 2017 2017-06-23-2017-06-25 germany
Bremen Bremer Mappertreffen 2017-06-26 germany
Graz Stammtisch Graz 2017-06-26 austria
Dijon Rencontres mensuelles 2017-06-27 france
Prague Missing Maps Mapathon in Prague, Paralelní polis 2017-06-27 czech republic
Leuven Leuven Monthly OSM Meetup 2017-06-28 belgium
Montpellier Rencontre mensuelle 2017-06-28 france
Digne-les-Bains {UDOS} 2017 : Université d’été du développement de logiciel libre 2017-06-28-2017-06-30 france
Brest Mapathon Missing Maps à l’UBO Campus Victor Ségalen 2017-06-29 france
Amstetten Stammtisch Ulmer Alb 2017-06-29 germany
Nantes Mapathon Missing Maps à l’ENSA 2017-06-29 france
Paris Rencontre mensuelle 2017-06-29 france
Dusseldorf Stammtisch Düsseldorf 2017-06-30 germany
Rostock Rostocker Treffen 2017-07-04 germany
Stuttgart Stuttgarter Stammtisch 2017-07-05 germany
Essen Ideenaustausch mit dem RVR 2017-07-05 germany
Salzburg AGIT2017 2017-07-05-2017-07-07 austria
Toyonaka 【西国街道#07】豊中の歴史マッピングパーティ 2017-07-08 japan
Kampala State of the Map Africa 2017 2017-07-08-2017-07-10 uganda
Passau Niederbayerntreffen 2017-07-10 germany
Rennes Réunion mensuelle 2017-07-10 france
Champs-sur-Marne (Marne-la-Vallée) FOSS4G Europe 2017 at ENSG Cité Descartes 2017-07-18-2017-07-22 france
Boston FOSS4G 2017 2017-08-14-2017-08-19 united states
Aizu-wakamatsu Shi State of the Map 2017 2017-08-18-2017-08-20 japan
Patan State of the Map Asia 2017 2017-09-23-2017-09-24 nepal
Boulder State of the Map U.S. 2017 2017-10-19-2017-10-22 united states
Buenos Aires FOSS4G+State of the Map Argentina 2017 2017-10-23-2017-10-28 argentina
Lima State of the Map LatAm 2017 2017-11-29-2017-12-02 perú

Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please check your event in our public calendar preview and correct it, where appropriate.

This weeklyOSM was produced by Anne Ghisla, Nakaner, Peda, PierZen, Polyglot, SK53, Spec80, derFred, jcoupey, jinalfoflia, keithonearth, vsandre, wambacher.

by weeklyteam at June 21, 2017 09:50 PM

Wiki Education Foundation

Sex in the Tree of Life

The demarcations of human sexuality have become a major issue in the culture wars, but for plants, sexual diversity is the norm. There are plants with “perfect” flowers that are completely hermaphroditic, with fully functional pollen and eggs produced in the same flower. There are monoecious plants, which produce both male and female flowers. There are dioecous species, with individuals that only produce male or female flowers. And then things start to get complicated. Gynodioecy is the phenomenon in some plant species in which individuals are either female or hermaphroditic. A student in Jennifer Blake-Mahmud’s Sex in the Tree of Life class converted the short, one-paragraph article on gynodioecy into a substantial, informative article.

Some of this diversity is on display in the genus Silene, a widespread genus of small wildflowers. In addition to monoecious, dioecious and gynodioecious species, Silene includes trioecious and andromonoecious species. Some even display more than one type of sex determination. If you want to know more about all of this, check out the sex determination in Silene article that a student in the class created.

When people think about sexually transmitted infections, they rarely think about plant disease, but that’s precisely how Microbotryum violaceum infection of Silene latifolia is usually classified. Silene latifolia is a small flowering plant that is native to Europe, western Asia and North Africa. It was introduced into North America and has become widespread. Microbotryum violaceum is a smut fungus, a parasitic fungus which takes over the anthers of infected plants and uses them for spore production. Pollinators visit the flowers and transmit the fungal spores instead of pollen (thus making the infection sexually transmitted). This article is also the handiwork of a student in the class.

And then there are figs. Figs have a complex reproductive cycle in which fig wasps lay their eggs in fig flowers. The wasp larvae parasitize the flowers, and female wasps emerge covered with pollen and go off to find another fig to pollinate, lay their eggs in, and die. Almost every fig species (and there are about 800 of them) is pollinated by a single species of fig wasp. Both fig and fig wasp are completely dependent on one another in order to reproduce, and unsurprisingly, pairs of fig and fig wasp species have coevolved. If you want to learn more about this, you can check out the reproductive coevolution in Ficus, which was also created by a student in the class.

While natural selection and sexual selection are thought of as the main drivers of evolutionary change, social selection has been proposed as an alternative to sexual selection. While sexual selection applies to mate choice, and puts the choice in the hands of only one gender, social selection is transactional – one individual offers something in exchange for the opportunity to reproduce. Thanks to the work of a student in this class, you can now learn more about this model of evolutionary change on Wikipedia.

To learn more about how to get involved, send us an email at contact@wikiedu.org or visit teach.wikiedu.org.

Image: 20140427Silene latifolia2.jpg, by AnRo0002, CC0 1.0, via Wikimedia Commons.

by Ian Ramjohn at June 21, 2017 06:03 PM

June 20, 2017

Wikimedia Cloud Services

Watroles returns! (In a different place and with a different name and totally different code.)

Back in the dark ages of Labs, all instance puppet configuration was handled using the puppet ldap backend. Each instance had a big record in ldap that handled DNS, puppet classes, puppet variables, etc. It was a bit clunky, but this monolithic setup allowed @yuvipanda to throw together a simple but very useful tool, 'watroles'. Watroles answered two questions:

  1. What puppet roles and classes are applied to a given instance?
  2. What instances use a given puppet class or role?

#2 turned out to be especially important -- basically any time an Op merged a patch changing a puppet role, they could look at watroles to get a quick list of all the instances that were going to break. Watroles was an essential tool for keeping VMs properly puppetized during code refactors and other updates.

Alas, the puppet ldap backend fell into disrepair. Puppetlabs stopped maintaining it, and Labs VMs were left out of more and more fancy puppet features because those features were left out of ldap. So... we switched to a custom API-based puppet backend, one that talks to Horizon and generally makes VM puppet config more structured and easier to handle (as well as supporting project-wide and prefix-wide puppet config for large-scale projects.)

That change broke Watroles, and the tool became increasingly inaccurate as instances migrated off of ldap, and eventually it was turned off entirely. A dark age followed, in which puppet code changes required as much faith as skill.

Today, at last, we have a replacement. I added a bunch of additional general-purpose queries to our puppet configuration API, and we've added pages to the OpenStack Browser to display those queries and answer both of our previous questions, with bonus information as well:

  1. What puppet roles and classes are applied to a given instance?
  2. What instances, prefixes, or projects use a given puppet class or role?
  3. Which puppet classes are currently in use on Cloud VMs?

The data on those pages is cached and updated every 20 minutes, so won't update instantly when a config is changed, but should nonetheless provide all the information needed for proper testing of new code changes.

by Andrew (Andrew Bogott) at June 20, 2017 07:21 PM

Wikimedia Foundation

Community digest: ‘Faras in Wikipedia’ project scoops an award in the Sybilla Museum Event of the Year competition; news in brief

Photo by Natalia Szafran-Kozakowska/Wikimedia Poland, CC BY-SA 4.0.

In Greek mythology, sibyls were oracle women with divine inspiration who could prophesy future events and were symbols for wisdom and insightfulness.

They are also the namesake for the ‘Sybillas’, annual national awards for the most innovative and valuable work by museum professionals across Poland, presented by the National Institute of Museology and the Ministry of Culture.

This year, Faras in Wikipedia was one of the candidates in the Digitization and New Media category. The category prize went to the National Maritime Museum in Gdańsk, with their effort to contribute the museum’s digitized collection on Wikimedia Commons where the only runner-up was the National Museum in Warsaw and their Faras in Wikipedia project.

The winners, and up to four shortlisted candidates, are selected in 10 categories like exhibitions, conservation and restoration, and education. Over 250 projects were submitted to the competition.

Faras in Wikipedia was coordinated by Wikipedians, volunteers, and National Museum staff. The museum uploaded a wide range of images to Wikimedia Commons that display the newly renovated Faras Gallery, individual wall paintings and artifacts from the Faras Cathedral, and documentary photos of the archaeological excavation work conducted in Faras in the 1960s.

The images were used to illustrate several dozen new Wikipedia articles about the Gallery, the treasures it hosts, and the achievements of Polish archaeologists. They were written by a group of volunteers—students and specialists in archaeology, cultural studies and art history.

The participating volunteers got to know more about the history of the gallery and its artworks. Maria Drozdek, the Wikipedian in residence at the National Museum, taught them basic editing skills on Wikipedia at the same time.

The project has significantly improved the quality of Wikipedia articles about the Faras Gallery at the National Museum in Warsaw, Polish archeologist Kazimierz Michałowski, artworks from the Faras Cathedral, such as Saint Anne or Bishop Petros with Saint Peter the Apostle and many others. Contributions were added in Polish, English, Russian, Belarusian, and other languages and Aleksandra Sulikowska-Bełczowska, curator of the Nubian Collection at the NMW, helped proofread the material added by volunteers.

The project concluded with the ‘Digital Museum’ conference, which discussed the Faras project, a bilingual publication, and other open GLAM projects.

The Sybillas acknowledges equal participation in the project by both the museum and the participating Wikipedians. Drozdek was the first Wikipedian-in-residence to be hired by a Polish museum. The success of projects at the National Museum was surprising for the GLAM Wikipedia community, which is looking forward to future projects that can share knowledge about the treasures of Polish art and culture.

Marta Malina Moraczewska,
Wojciech Pędzich
Wikimedia Poland

In brief

Photo by Elitre, CC BY-SA 4.0.

Bologna hosts an editathon on videogames and science-fiction movies: Last month, the Cineteca di Bologna, a movie and videogame archive in Italy hosted an editathon (editing workshop), where the participants worked on improving Wikipedia articles about science-fiction movies and videogames. The event was attended by experienced Wikipedia editors in addition to gamers and film enthusiasts. The event resulted in improving many Wikipedia articles.

Wikimedians in Poland get together for their annual meeting: Nearly 70 wikimedians gathered in Bydgoszcz, Poland for the annual Wikimedia Poland (Polska) conference. The conference took place from 2nd till 4th June. Participants attended a variety of workshops, presentations and panel discussions. Some part of the program focused on practical editing workshops, while, the largest part was focused on organizing live events. Wikimedia Poland’s general assembly saw volunteers, board members and staff discussing achievements, challenges, projects and future plans of the chapter.

Photo by Nevenka Mancheva, CC BY-SA 4.0.

Macedonian Wikipedians hold a series of architecture editathons: Last year, Wikimedians in Macedonia partnered with the architectural design center in Macedonia on organizing editing workshops with a special focus on architecture. The workshops aim at improving the content on Macedonian Wikipedia about architecture. The most recent event in this series was a two-day editathon on 23 and 24 May 2017, where the attendees spent the first day learning the editing basics while the second day was dedicated to improving Wikipedia pages.

Wikimania 2017: The annual conference of the Wikimedia movement will be held this year on 9-13 August at Le Centre Sheraton Montréal in Canada. The venue will host most of the conference, hackathon, meetups, and pre-events. Most of the foundation staff and scholarship recipients will be housed there as well. Early bird discounted registration for the conference is now open until 10 July and the draft program for the conference has been posted on Wikimania 2017 website.

Wikimedia Affiliations Committee is open to candidates: The Affiliations Committee, the committee responsible for guiding volunteers in establishing Wikimedia chapters, thematic organizations, and user groups, is looking for new members.

The Committee members help review applications from new groups, answer questions and provide advice about the different Wikimedia affiliation models and processes, review affiliate bylaws for compliance with requirements and best practices, and advise the Wikimedia Foundation Board of Trustees on issues connected to chapters, thematic organizations and Wikimedia user groups. More information about the Committee, membership requirements and how to apply can be found on Wikimedia-l.

Compiled and edited by Samir Elsharbaty, Digital Content Intern
Wikimedia Foundation

by Marta Malina Moraczewska, Wojciech Pędzich and Samir Elsharbaty at June 20, 2017 07:18 PM

Wiki Education Foundation

Wikipedia in the art history classroom

Anne McClanan is a Professor of Art History and Digital Humanities at Portland State University, where she has incorporated Wikipedia assignments into several classes since 2011.

Since I first taught with Wikipedia-based research assignments in 2011, the process has gotten a great deal simpler for both teachers and students. My reasons for having the students create Wikipedia entries, rather than write traditional research papers, will be familiar to others who have followed this path. It allows students to hone their research skills, their ability to write to context, and to engage in an endeavor that feels (and is) more purposeful. They create a lasting contribution to the openly available resources on the topic, and take very evident pride in this work. Moreover since students in art history classes tend to be mostly female, it offers a way to impart to a cohort otherwise not well-represented in the world’s largest OER a skillset that can lead to future participation once the class is over.

In the range of content I teach at a large public university, I have discovered that the courses that present the best fit for Wikipedia writing assignments are those that are a little outside of the main canon of greatest hits in art history (for example, Sienese art had better opportunities than Florentine art, Byzantine art better than Gothic, etc). I’ve used these assignments most frequently for Byzantine art history, since it had the largest disconnect between notable topics with abundant scholarly writing and coverage on Wikipedia. I realized how much opportunity there was when an initial survey of a standard textbook, Robin Cormack’s Byzantine Art, found that only half of the key works illustrated from the period had Wikipedia entries. My students have also created new or expanded existing Wikipedia entries for courses in Gothic Art, Trecento Sienese Art, and Digital Humanities.

The Wiki Education course dashboard structures the students’ training in how to create and edit entries, but parallel to that, additional assignment materials are housed on my course LMS site. Before the term begins, I scout out topics ripe for new entries or expansion. Other professors I know have students discover potential topics on their own, but given the ten-week quarter system at my school that exploration phase just doesn’t seem feasible. The students follow a research trajectory similar to that of traditional papers, in which they submit preliminary bibliographies in week 3, full drafts with peer review in week 7, revised versions week 9, and then, only after getting the thumbs up from me on the revised version, do they post their material to Wikipedia. Looking at examples from the winter 2017 quarter, students created new entries on more comprehensive topics such as Byzantine glass as well as on specific objects and sites. Some instead expanded existing entries such as those on Lazarus Zographos and Daphni Monastery.

Honing the ability to critically evaluate information is a central pedagogical goal in my classes, so I require student research to be grounded in peer-reviewed sources, with the same rigor as in my other courses’ non-Wikipedia research assignments. Teaching the students how to think critically about their sources is time-consuming, and inevitably I have individual meetings with many students who haven’t undertaken this kind of research before. In these sessions and final evaluations, though, students report feeling much more motivated to push hard on their research because they know that it will have an audience of the whole world on Wikipedia. Moreover, when I encounter students later, sometimes years after the class, they often say that they still check their Wikipedia entry, and when I hear that I know all of the work involved was well worth it.

Image: Byzantine Glass and Silver-Stain Bracelet.jpg, by The Metropolitan Museum of Art, CC0, via Wikimedia Commons.

by Guest Contributor at June 20, 2017 06:40 PM

Wikimedia Cloud Services

Official Debian Stretch image now available

Debian Stretch was officially released on Saturday[1], and I've built a new Stretch base image for VPS use in the WMF cloud. All projects should now see an image type of 'debian-9.0-stretch' available when creating new instances.

Puppet will set up new Stretch instances just fine, and we've tested and tuned up several of the most frequently-used optional puppet classes so that they apply properly on Stretch. Stretch is /fairly/ similar to Jessie, so I'd expect most puppet classes that apply properly on Jessie to work on Stretch as well, but I'm always interested in the exceptions -- If you find one, please open a phabricator ticket.

The WMF and the Cloud team is committed to long-term support of this distribution. If you are starting a new project or rebuilding a VM you should start with Stretch to ensure the longest possible life for your work.

by Andrew (Andrew Bogott) at June 20, 2017 06:25 PM

Wikimedia Performance Team

The journey to Thumbor, part 1: rationale

We are currently in the final stages of deploying Thumbor to Wikimedia production, where it will generate media thumbnails for all our public wikis. Up until now, MediaWiki was responsible for generating thumbnails.

I started the project of making Thumbor production-ready for Wikimedia a year and a half ago and I'll talk about this journey in a series of blog posts. In this one, I'll explain the rationale behind this project.


The biggest reason to change the status quo is security. Since MediaWiki is quite monolithic, deployments of MediaWiki on our server fleet responsible for generating thumbnails aren't as isolated as they could be from the rest of our infrastructure.

Media formats being a frequent security breach vector, it has always been an objective of ours to isolate thumbnailing more than we currently can with Mediawiki. We run our command-line tools responsible for media conversion inside firejail, but we could do more to fence off thumbnailing from the rest of what we do.

One possibility would have been to rewrite the MediaWiki code responsible for thumbnailing, turning it into a series of PHP libraries, that could then be run without MediaWiki, to perform the thumbnailing work we are currently doing - while untangling the code enough that the thumbnailing servers can be more isolated.

However such a rewrite would be very expensive and when we can afford to, we prefer to use ready-made open source solutions with a community of their own, rather than writing new tools. It seemed to us that media thumbnailing was far from being a MediaWiki-specific problem and there ought to be open source solutions tackling that issue. We undertook a review of the open source landscape for this problem domain and Thumbor emerged as the clear leader in that area.


The MediaWiki code responsible for thumbnailing currently doesn't have any team ownership at the Wikimedia Foundation. It's maintained by volunteers (including some WMF staff acting in a volunteer capacity). However, the amount of contributors is very low and technical debt is accumulating.

Thumbor, on the other hand, is a very active open-source project with many contributors. A large company, Globo, where this project originated, dedicates significant resources to it.

In the open source world, joining forces with others pays off, and Thumbor is the perfect example of this. Like other large websites leveraging Thumbor, we've contributed a number of upstream changes.

Maintenance of Wikimedia-specific Thumbor plugins remains, but those represent only a small portion of the code, the lion's share of the functionality being provided by Thumbor.

Service-oriented architecture

For operational purposes, running parts of the wiki workflow as isolated services is always beneficial. It enables us to set up the best fencing possible for security purposes, where Thumbor only has access to what it needs. This limits the amount of damage possible in case of a security vulnerability propagated through media files.

From monitoring, to resource usage control and upstream security updates, running our media thumbnailing as a service has significant operational upsides.

New features

3rd-party open source projects might have features that would have been low priority on our list to implement, or considered too costly to build. Thumbor sports a number of features that MediaWiki currently doesn't have, which might open exciting possibilities in the future, such as feature detection and advanced filters.

At this time, however, we're only aiming to deploy Thumbor to Wikimedia production as a drop-in replacement for MediaWiki thumbnailing, targeting feature parity with the status quo.


Where does performance fit in all this? For one, Thumbor's clean extension architecture means that the Wikimedia-specific code footprint is small, making improvements to our thumbnailing pipeline a lot easier. Running thumbnailing as a service means that it should be more practical to test alternative thumbnailing software and parameters.

Rendering thumbnails as WebP to user agents that support it is a built-in feature of Thumbor and the most likely first performance project we'll leverage Thumbor for, once Thumbor has proven to handle our production load correctly for some time. This alone should save a significant amount of bandwidth for users whose user agents support WebP. This is the sort of high-impact performance change to our images that Thumbor will make a lot easier to achieve.


Those many factors contributed to us betting on Thumbor. Soon it will be put to the test of Wikimedia production where not only the scale of our traffic but also the huge diversity of media files we host make thumbnailing a challenge.

In the next blog post, I'll describe the architecture of our production thumbnailing pipeline in detail and where Thumbor fits into it.

by Gilles (Gilles Dubuc) at June 20, 2017 04:29 PM

T Shrinivasan

Minutes – Intro to Wikipedia – Villupuram

Last Sunday, I gave a talk on Wikipedia at Villupuram GLUG. We had around 30 participants. Got few School Teachers, Writers, Media people too.

Started the session with a game. Started a story with one line. Asked everyone to continue one line after another. Thus the story was built collaboratively.

This experience helped them to understand how wiki pages are being written. Explored about wikipedia history, Foundation, Languages, /various wiki projects like wikisource, wiktionary etc.

Explained about issues by copyright, Creative commons license and Commons.

Then, Poet Ramamurthy expressed his thoughts about wikipedia and public contributions.

Journalist Ko.senguttuvan shared his thoughts on copyrights. He requested all to document their knowledge so that the next generation can use it. He presented me a book he wrote about his Journalism experiences.

Teacher Dhilip narrated his efforts on enhancing the government schools with various ICT activities.

Then, we started the practical sessions. Asked all to create an account on wiki. Then, explored about the pages, history, talk page, visual editor, language tools. Asked them to edit the page for villupuram and asked to add few points.

This was a very initial intro session. Hope they get some idea about wiki ecosystem.

Thanks for the organizers, Puduvai GLUG and Villupuram GLUG. They are doing great activities there by introducing Free Software every sunday.

Special thanks to Karkee, Khaleel and Sathish for great efforts on this event.

Few Photos are here – https://goo.gl/photos/JwXmZwYN1QCvnDfH8

Thanks for Dinamani Newspaper for writing about the event.

by tshrinivasan at June 20, 2017 04:10 PM


A libel story

A visit to the Biligirirangan Hills just as the monsoons were setting in led me to look up on the life of one of the local naturalists who wrote about this region, R.C. Morris. One of the little-known incidents in his life is a case of libel arising from a book review. I had not heard of such a case before but it seems that such cases are on the increase and is a big risk for anyone who publishes critical reviews. There is a nice guide to avoid trouble and there is a plea within academia to create a safe space for critical discourse.

This is a somewhat short note and if you are interested in learning more about the life of R.C. Morris - do look up the Wikipedia entry on him or this piece by Dr Subramanya. I recently added links to most of his papers in the Wikipedia entry and perhaps one that really had an impact on me was on the death of fourteen elephants from eating kodo millet - I suspect it is a case of aflatoxin poisoning! Another source to look for is the book Going Back by Morris' daughter and pioneer mountaineer Monica Jackson. I first came to know of the latter book in 2003 through the late Zafar Futehally who were family friends of the Morrises. He lent me this rather hard to find book when I had posted a note to an email group (a modified note was published in the Newsletter for Birdwatchers, 2003 43(5):66-67 - one point I did not mention and which people find rather hard to believe is that my friend Rajkumar actually got us to the top of Honnametti in a rather old Premier Padmini car!).

I came across the specific libel case against Morris in a couple of newspaper archives - this one in the Straits Times, 27 April 1937, can be readily found online:

Statements  Made In Book Review.

Major Leonard Mourant Handley, author of "Hunter's Moon," a book dealing with his experiences as a big game-hunter, was at the Middlesex Sheriff's Court awarded £3,000 damages for libel against Mr. Randolph Camroux Morris. Mr. Morris did not appear and was not represented. The libel appeared in a review of "Hunter's Moon" by Mr. Morris that appeared in the journal of the Bombay Natural History Society. Mr. Valentine Holmes said Major Handley wrote the book, his first, in 1933. and it met with amazing success.

Mr. Morris, in his review, declared that it did not give the personal experiences of Major Handley. Mr. Morris wrote :"There surely should be some limit to the inaccuracies which find their way into modern books, which purport to set forth observations of interest to natural  scientists  and  shikaris.

"The recent book. 'Hunters Moon.' by Leonard Handley, is so open to criticism in this respect, that one is led to the conclusion that the author has depended upon his imagination and trusted to the credulity of the public for the purpose of producing a 'bestseller' rather than a work of sporting or scientific value."

Then followed some 38 instances of alleged Inaccuracies.

Mr. Holmes said that at one time Mr. Morris was a close friend of Major Handley, but about 1927 some friction arose between Mrs. Morris and Mrs.  Handley. In evidence. Major Handley said that, following the libel, a man who had been a close friend of Ms refused to nominate him for membership of a club The Under-Sheriff. Mr. Stanley Ruston said there was no doubt that the motive of the libel lay in the fact that Major Handley had seized some of the thunder Mr. Morris was providing for his own book.

Naturally this forced me to read the specific book which is also readily available online

The last chapter deals with the hunter's exploits in the Biligirirangans which he translates as the "blue [sic] hills of Ranga"! It is also worth examining Morris' review of the book in the Journal of the Bombay Natural History Society which is merely marked under his initials. I wonder if anyone knows of the case history and whether it was appealed or followed up. I suspect Morris may have just quietly ignored it if the court notice was ever delivered in the far away estate of his up in Attikan or Honnameti.

The review is fun to read as well...

Meanwhile, here is a view of the Honnametti rock which lies just beside the estate where Morris lived.
Honnametti rock

Memorial to Randolph Camroux Morris
Grave of Mary Macdonald, wife of Leslie Coleman, who in a way
founded the University of Agricultural Sciences. Coleman was perhaps the first
to teach the German language in Bangalore to Indian students.

Sidlu kallu or lightning-split-rock, another local landmark.

by Shyamal L. (noreply@blogger.com) at June 20, 2017 04:06 PM

Magnus Manske

ORCID mania

ORCID is an increasingly popular service to disambiguate authors of scientific publications. Many journals and funding bodies require authors to register their ORCID ID these days. Wikidata has a property for ORCID, however, only ~2400 items have an ORCID property at the moment of writing this blog post. That is not a lot, considering Wikidata contains 728,112 scientific articles.

Part of the problem is that it is not easy to get ORCIDs and its connections to publications in an automated fashion. It appears that several databases, public or partially public, contain parts of the puzzle that is required for determining the ORCID for a given Wikidata author.

So I had a quick look, and found that, on the ORCID web site, one can search for a publication DOI, and retrieve the list of authors in the ORCID system that “claim” that DOI. That author list contains variations on author names (“John”, “Doe”, “John Doe”, “John X. Doe” etc.) and their ORCID IDs. Likewise, I can query Wikidata for a DOI, and get an item about that publication; that item contains statements with authors that have an item (“P50”). Each of these authors has a name.

Now, we have two lists of authors (one from ORCID, one from Wikidata), both reasonably short (say, twenty entries each), that should overlap to some degree, and they are both lists of authors for the same publication. They can now be joined via name variations, excluding multiple hits (there may be two “John Doe”s in the author list of a publication; this happens a lot with Asian names), as well as excluding authors that already have an ORCID ID on Wikidata.

I have written a bot that will take random DOIs from Wikidata, query them in ORCID, and compare the author list. In a first run, 5.000 random DOIs yielded 123 new ORCID connections; manual sampling of the matches looked quite good, so I am adding them via QuickStatements (sample of edits).

Unless this meets with “social resistance”, I can have the bot perform these edits regularly, which would keep Wikidata up-to-date with ORCIDs.

Additionally, there is a “author name string” property, which stores just the author name for now, for authors that do not have an item yet. If the ORCID list matches one of these names, an item could automatically be created for that author, including ORDIC ID, and association to the publication item. Please let me know if this would be desirable.

by Magnus at June 20, 2017 09:27 AM

June 19, 2017

Wiki Education Foundation

What students learn from contributing to Wikipedia

Since 2010, more than 36,000 students in the U.S. and Canada have edited Wikipedia as a class assignment. It’s easy to quantify their impact to Wikipedia: they’ve added more than 30 million words (or two-thirds of the last print edition of Encyclopædia Britannica) on a range of academic subjects that were either underdeveloped or entirely missing. But what does contributing to Wikipedia mean for the students? That question has been more difficult to answer. Until now.

To better understand the types of skills students obtain from contributing to Wikipedia as a course assignment, the Wiki Education Foundation sponsored Dr. Zach McDowell, of the University of Massachusetts-Amherst, to conduct a study of our program participants during the Fall 2016 term. After careful analysis of both quantitative and qualitative data, the study found that Wikipedia-based assignments enhance students’ digital literacy and critical research skills, foster their ability to write for a public audience, promote collaboration, and motivate them more than traditional assignments. Students also gain a valuable understanding and appreciation for a source of information they use every day: Wikipedia.

Digital Literacy and Critical Research

In an age when fake news is increasingly prevalent, it is critical that students learn how to differentiate reliable sources of information from the unreliable. The study found 96% of instructors thought the Wikipedia assignment was more or much more valuable for teaching students digital literacy than traditional assignments, and 85% thought the Wikipedia assignment was more or much more valuable for teaching students about the reliability of online sources. As one student participant in a focus group said about learning to write for Wikipedia and having to understand sourcing guidelines, “It raises an awareness of what is good information, what is bad information … you have much more of a questioning mentality and you’re a lot more conscious of the validity of the information that you read.”

Writing for a Public Audience

According to the study, 79% of instructors thought the Wikipedia assignment was more or much more valuable for teaching students to write clearly for the general public. While most of our students may never have to write for a large public audience in their future careers, they will have to share information with their colleagues, managers, stakeholders, and other professional constituents in a clear and concise manner. When students contribute to Wikipedia, they recognize that their work may be read by a broad and diverse audience. They are compelled to ensure that their contributions are comprehensible to a wide variety of readers. In the words of another student in a focus group, “I think it’s going to help you in pretty much any field that you go to because anywhere you go, you’re going to have to write things. You’re going to have to do research and present things in a way that people can understand whether they’re part of your field or not. … I think having that skill of getting a bunch of information and then putting it together in a way that’s understandable to a big amount of people is important.”


The study found that when students contribute to Wikipedia, they learn to be accountable for their own words, but come to understand that results are achieved most effectively through a cooperative spirit. They become adept at receiving as well as offering criticism, and they learn that relinquishing some level of ownership over your work is a path to improvement. There are few professions where individuals work in a vacuum, and contributing to Wikipedia gives students the courage to both accept input and offer up their own viewpoints. Said another study participant: “I always thought of research as a very solitary thing, like someone in a library basement looking through books and stuff. So, knowing that Wikipedia has this whole community of people who are researching and adding to things just changes how I think about it, I think. I never really thought of it as a collaborative endeavor and now I know that it can be, it’s kind of interesting to see it that way.”


Students spend countless hours writing dozens of papers throughout their college careers — papers that are typically read only by the instructors and that never see the light of day again once the term is over. When students contribute to Wikipedia, their motivation is twofold, the study found. Because they know that their contributions will be available for the general public to read, they feel compelled to ensure the accuracy and reliability of their work. At the same time, they have a sense of pride that something they produced may help others and live well beyond the classroom. “When you think about someone else reading your work, you don’t want there to be errors in it, you want it to be relevant. I think it just encouraged me to look back at everything and get input from other people and rewrite and rewrite and rewrite,” a focus group participant said. Responsibility and pride encourage students to produce work that is meticulous, well-researched, and thorough. Not only can they share their work with friends, family, and colleagues, but they can truly say that they’ve published something — a feat most undergraduates rarely get to experience.

Appreciation of Wikipedia’s Policies

The study found that students’ perceptions of Wikipedia improved after contributing as part of a Wikipedia-based assignment. “I do see it as way more credible of a source than I did before. It definitely proved to me that you have to be legit. The monitoring of information is a lot more prevalent than I thought,” said one focus group participant. Wikipedia is a site that most students use on a regular basis, whether they have been discouraged from doing so or not. When students contribute to Wikipedia, they learn how to use the site more effectively. They can identify good Wikipedia entries as well as those that may be in need of improvement. They understand how to use the site as a starting point for research and how to judiciously use the information they glean from Wikipedia.

Students can play a critical role in improving academic content on Wikipedia. Unlike the population at large, they have access to countless library resources that are often behind prohibitive paywalls. This study confirms that contributing to Wikipedia as part of a course assignment can play a significant role in helping students to develop critical academic as well as professional skills, and that students are more motivated and derive more satisfaction from contributing to Wikipedia as compared to traditional writing assignments.

For more details, read the full report on Student Learning Outcomes using Wikipedia-based Assignments. Thank you again to Dr. McDowell for your diligence and commitment to this project. The data for the study is also released under an open license, and we encourage other scholars interested in the learning outcomes of Wikipedia-based assignments to conduct further research and inquiries using this robust data set. We hope that this research is just the beginning of many studies to come.

by Helaine Blumenthal at June 19, 2017 05:38 PM

Wikimedia Foundation

Writing Wikipedia articles teaches information literacy skills, study finds

Photo by Jami Mathewson/Wiki Education Foundation, CC BY-SA 4.0.

Instructors in more than 90 countries worldwide assign their students to edit Wikipedia as a class assignment. Today, the Wiki Education Foundation (Wiki Ed) is releasing the results from the most comprehensive study ever undertaken to evaluate student learning outcomes from Wikipedia assignments. The study concludes that Wikipedia assignments provide students valuable digital/information literacy, critical research, teamwork, and technology skills, and students are more motivated by these assignments than they are by traditional writing assignments.

Wiki Ed is an independent nonprofit organization that supports college and university faculty in the United States and Canada to assign their students to edit Wikipedia articles. In 2016–17, Wiki Ed sponsored Dr. Zachary McDowell at the University of Massachusetts Amherst (now at the University of Illinois at Chicago) to conduct a research study on the student learning outcomes for students in Wiki Ed’s program. With approval from the University of Massachusetts Amherst Human Research Protection Office, Dr. McDowell conducted a mixed-methods research study using surveys and focus groups on students and instructors participating in Wiki Ed’s program in the fall 2016 term. That term, more than 6,000 students in more than 270 courses edited Wikipedia as a class assignment.

See the full report on Wikimedia Commons..

Among the study’s findings:

  • 96% of instructors thought the Wikipedia assignment was more or much more valuable for teaching students digital literacy than traditional assignments are
  • 85% of instructors thought the Wikipedia assignment was more or much more valuable for teaching students the reliability of online sources
  • 79% of instructors thought the Wikipedia assignment was more or much more valuable for teaching students to write clearly for the general public
  • Wikipedia assignments shift students’ perceptions of Wikipedia’s reliability to show more trust in Wikipedia
  • Students are more motivated to complete Wikipedia assignments, particularly because they perceived work to be useful beyond the classroom
  • Students’ skill development from Wikipedia assignments maps well to the Association of College and Research Libraries’ Information Literacy Framework, particularly:
    • Authority is constructed and contextual
    • Information creation as a process
    • Information has value
    • Scholarship as conversation

These findings demonstrate the value that comes from learning to edit Wikipedia for the first time, something critical for program leaders within the Wikimedia movement. A large student learning outcomes study provides program leaders trying to convince new instructors, administrators, or organizations the data behind the skills that can come from learning to edit Wikipedia articles.

The full research report and all of the data, codebooks, and other documentation from the study are freely licensed under CC BY-SA 4.0. We encourage others to conduct additional analysis on the data, and hope to continue to advance our understanding of student learning outcomes from Wikipedia-based assignments with future research.

LiAnna Davis, Director of Programs and Deputy Director
Wiki Education Foundation

by LiAnna Davis at June 19, 2017 05:33 PM

Brion Vibber

Brain dump: JavaScript sandboxing

Another thing I’ve been researching is safe, sandboxed embedding of user-created JavaScript widgets… my last attempt in this direction was the EmbedScript extension (examples currently down, but code is still around).

User-level problems to solve:

  • “Content”
    • Diagrams, graphs, and maps would be more fun and educational if you could manipulate them more
    • What if you could graph those equations on all those math & physics articles?
  • Interactive programming sandboxes
  • Customizations to editor & reading UI features
    • Gadgets, site JS, shared user JS are potentially dangerous right now, requiring either admin review or review-it-yourself
    • Narrower interfaces and APIs could allow for easier sharing of tools that don’t require full script access to the root UI
  • Make scriptable extensions safer
    • Use same techniques to isolate scripts used for existing video, graphs/maps, etc?
    • Frame-based tool embedding + data injection could make export of rich interactive stuff as easy as InstantCommons…

Low-level problems to solve

  • Isolating user-provided script from main web context
  • Isolating user-provided script from outside world
    • loading off-site resources is a security issue
    • want to ensure that wiki resources are self-contained and won’t break if off-site dependencies change or are unavailable
  • Providing a consistent execution environment
    • browsers shift and change over time…
  • Communicating between safe and sandboxed environments
    • injecting parameters in safely?
    • two-way comms for allowing privileged operations like navigating page?
    • two-way comms for gadget/extension-like behavior?
    • how to arrange things like fullscreen zoom?
  • Potential offline issues
    • offline cacheability in browser?
    • how to use in Wikipedia mobile apps?
  • Third-party site issues
    • making our scripts usable on third-party wikis like InstantCommons
    • making it easy for third-party wikis to use these techniques internally

Meta-level problems to solve

  • How & how much to review code before letting it loose?
  • What new problems do we create in misuse/abuse vectors?

Isolating user-provided scripts

One way to isolate user-provided scripts is to run them in an interpreter! This is potentially very slow, but allows for all kinds of extra tricks.


I stumbled on JS-Interpreter, used sometimes with the Blockly project to step through code generated from visual blocks. JS-Interpreter implements a rough ES5 interpreter in native JS; it’s quite a bit slower than native (though some speedups are possible; the author and I have made some recent tweaks improving the interpreter loop) but is interesting because it allows single-stepping the interpreter, which opens up to a potential for an in-browser debugger. The project is under active development and could use a good regression test suite, if anyone wants to send some PRs. 🙂

The interpreter is also fairly small, weighing in around 24kb minified and gzipped.

The single-stepping interpreter design protects against infinite loops, as you can implement your own time limit around the step loop.

For pure-computation exercises and interactive prompts this might be really awesome, but the limited performance and lack of any built-in graphical display means it’s probably not great for hooking it up to an SVG to make it interactive. (Any APIs you add are your own responsibility, and security might be a concern for API design that does anything sensitive.)


An old project that’s still around is Google Caja, a heavyweight solution for embedding foreign HTML+JS using a server-side Java-based transpiler for the JS and JavaScript-side proxy objects that let you manipulate a subset of the DOM safely.

There are a number of security advisories in Caja’s history; some of them are transpiler failures which allow sandboxed code to directly access the raw DOM, others are failures in injected APIs that allow sandboxed code to directly access the raw DOM. Either way, it’s not something I’d want to inject directly into my main environment.

There’s no protection against loops or simple resource usage like exhausting memory.

Iframe isolation and CSP

I’ve looked at using cross-origin <iframe>s to isolate user code for some time, but was never quite happy with the results. Yes, the “same-origin policy” of HTML/JS means your code running in a cross-origin frame can’t touch your main site’s code or data, but that code is still able to load images, scripts, and other resources from other sites. That creates problems ranging from easy spamming to user information disclosure to simply breaking if required offsite resources change or disappear.

Content-Security-Policy to the rescue! Modern browsers can lock down things like network access using CSP directives on the iframe page.

CSP’s restrictions on loading resources still leaves an information disclosure in navigation — links or document.location can be used to navigate the frame to a URL on a third domain. This can be locked down with CSP’s childsrc param on the parent document — or an intermediate “double” iframe — to only allow the desired target domain (say, “*.wikipedia-embed.org” or even “item12345678.wikipedia-embed.org”). Then attempts to navigate the frame to a different domain from the inside are blocked.

So in principle we can have a rectangular region of the page with its own isolated HTML or SVG user interface, with its own isolated JavaScript running its own private DOM, with only the ability to access data and resources granted to it by being hosted on its private domain.

Further interactivity with the host page can be created by building on the postMessage API, including injecting additional resources or data sets. Note that postMessage is asynchronous, so you’re limited in simulating function calls to the host environment.

There is one big remaining security issue, which is that JS in an iframe can still block the UI for the whole page (or consume memory and other resources), either accidentally with an infinite loop or on purpose. The browser will eventually time out from a long loop and give you the chance to kill it, but it’s not pleasant (and might just be followed by another super-long loop!)

This means denial of service attacks against readers and editors are possible. “Autoplay” of unreviewed embedded widgets is still a bad idea for this reason.

Additionally, older browser versions don’t always support CSP — IE is a mess for instance. So defenses against cross-origin loads either need to somehow prevent loading in older browsers (poorer compatibility) or risk the information exposure (poorer security). However the most popular browsers do enforce it, so applications aren’t likely to be built that rely on off-site materials just to function, preventing which is one of our goals.

Worker isolation

There’s one more trick, just for fun, which is to run the isolated code in a Web Worker background thread. This would still allow resource consumption but would prevent infinite loops from blocking the parent page.

However you’re back to the interpreter’s problem of having no DOM or user interface, and must build a UI proxy of some kind.

Additionally, there are complications with running Workers in iframes, which is that if you apply sandbox=allow-scripts you may not be able to load JS into a Worker at all.

Non-JavaScript languages

Note that if you can run JavaScript, you can run just about anything thanks to emscripten. 😉 A cross-compiled Lua interpreter weighs in around 150-180kb gzipped (depending on library inclusion).

Big chart

Here, have a big chart I made for reference:

Offline considerations

In principle the embedding sites can be offline-cached… bears consideration.

App considerations

The iframes could be loaded in a webview in apps, though consider the offline + app issues!

Data model

A widget (or whatever you call it) would have one or more sub resources, like a Gadget does today plus more:

  • HTML or SVG backing document
  • JS/CSS module(s), probably with a dependency-loading system
  • possibly registration for images and other resources?
    • depending on implementation it may be necessary to inject images as blobs or some weird thing
  • for non-content stuff, some kind of registry for menu/tab setup, trigger events, etc

Widgets likely should be instantiable with input parameters like templates and Lua modules are; this would be useful for things like reusing common code with different input data, like showing a physics demo with different constant values.

There should be a human-manageable UI for editing and testing these things. 🙂 See jsfiddle etc for prior art.

How to build the iframe target site


  • Subdomain per instance
    • actually serve out the target resources on a second domain, each ‘widget instance’ living in a separate random subdomain ideally for best isolation
    • base HTML or SVG can load even if no JS. Is that good or bad, if interactivity was the goal?
    • If browser has no CSP support, the base HTML/CSS/JS might violate constraints.
    • can right-click and open frame in new window
    • …but now you have another out of context view of data, with new URLs. Consider legal, copyright, fairuse, blah blah
    • have to maintain and run that second domain and hook it up to your main wiki
    • how to deal with per-instance data input? Pre-publish? postMessage just that in?
      • injecting data over postMessage maybe best for the InstantCommons-style scenario, since sites can use our scripts but inject data
    • probably easier debugging based on URLs
  • Subdomain per service provider, inject resources and instance data
    • Inject all HTML/SVG/JS/CSS at runtime via postMessage (trusting the parent site origin). Images/media could either be injected as blobs or whitelisted by URL.
    • The service provider could potentially be just a static HTML file served with certain strict CSP headers.
    • If injecting all resources, then could use a common provider for third-party wikis.
      • third-party wikis could host their own scripts using this technique using our frame broker. not sure if this is good idea or not!
    • No separate content files to host, nothing to take down in case of legal issues.
    • Downside: right-clicking a frame to open it in new window won’t give useful resources. Possible workarounds with providing a link-back in a location hash.
    • Script can check against a user-agent blacklist before offering to load stuff.
    • Downside: CSP header may need to be ‘loose’ to allow script injection, so could open you back up to XSS on parameters. But you’re not able to access outside the frame so pssssh!

Abuse and evil possibilities

Even with the security guarantees of origin restrictions and CSP, there are new and exciting threat models…

Simple denial of service is easy — looping scripts in an iframe can lock up the main UI thread for the tab (or whole browser, depending on the browser) until it eventually times out with an error. At which point it can potentially go right back into a loop. Or you can allocate tons of memory, slowing down and eventually perhaps crashing the browser. Even tiny programs can have huge performance impact, and it’s hard to predict what will be problematic. Thus script on a page could make it hard for other editors and admins to get back in to fix the page… For this reason I would  recommend against autoplay in Wikipedia articles of arbitrary unreviewed code.

There’s also possible trolling patterns: hide a shock image in a data set or inside a seemingly safe image file, then display it in a scriptable widget bypassing existing image review.

Advanced widgets could do all kinds of fun and educational things like run emulators for old computer and game systems. That brings with it the potential for copyright issues with the software being run, or for newer systems patent issues with the system being emulated.

For that matter you could run programs that are covered under software patents, such as decoding or encoding certain video file formats. I guess you could try that in Lua modules too, but JS would allow you to play or save result files to disk directly from the browser.

WP:BEANS may apply to further thoughts on this road, beware. 😉

Ideas from Jupyter: frontend/backend separation

Going back to Jupyter/IPython as an inspiration source; Jupyter has a separation between a frontend that takes interactive input and displays output, and a backend kernel which runs the actual computation server-side. To make for fancier interactive displays, the output can have a widget which runs some sort of JavaScript component in the frontend notebook page’s environment, and can interact with the user (via HTML controls), with other widgets (via declared linkages) and with the kernel code (via events).

We could use a model like this which distinguishes between trusted (or semi-trusted) frontend widget code which can do anything it can do in its iframe, but must be either pre-reviewed, or opted into. Frontend widgets that pass review should have well-understood behavior, good documentation, stable interfaces for injecting data, etc.

The frontend widget can and should still be origin-isolated & CSP-restricted for policy enforcement even if code is reviewed — defense in depth is important!

Such widgets could either be invoked from a template or lua module with a fixed data set, or could be connected to untrusted backend code running in an even more restricted sandbox.

The two main ‘more restricted sandbox’ possibilities are to run an interpreter that handles loops safely and applies resource limits, or to run in a worker thread that doesn’t block the main UI and can be terminated after a timeout…. but even that may be able to exhaust system resources via memory allocation.

I think it would be very interesting to extend Jupyter in two specific ways:

  • iframe-sandboxing the widget implementations to make loading foreign-defined widgets safer
  • implementing a client-side kernel that runs JS or Lua code in an interpreter, or JS in a sandboxed Worker, instead of maintaining a server connection to a Python/etc kernel

It might actually be interesting to adopt, or at least learn from, the communication & linkage model for the Jupyter widgets (which is backbone.js-based, I believe) and consider the possibilities for declarative linkage of widgets to create controllable diagrams/visualizations from common parts.

An interpreter-based Jupyter/IPython kernel that works with the notebooks model could be interesting for code examples on Wikipedia, Wikibooks etc. Math potential as well.

Short-term takeaways

  • Interpreters look useful in niche areas, but native JS in iframe+CSP probably main target for interactive things.
  • “Content widgets” imply new abuse vectors & thus review mechanisms. Consider short-term concentration on other areas of use:
    • sandboxing big JS libraries already used in things like Maps/Graphs/TimedMediaHandler that have to handle user-provided input
    • opt-in Gadget/user-script tools that can adapt to a “plugin”-like model
    • making those things invocable cross-wiki, including to third-party sites
  • Start a conversation about content widgets.
    • Consider starting with strict-review-required.
    • Get someone to make the next generation ‘Graphs’ or whatever cool tool as one of these instead of a raw MW extension…?
    • …slowly plan world domination.

by brion at June 19, 2017 04:18 AM

Tech News

Tech News issue #25, 2017 (June 19, 2017)

TriangleArrow-Left.svgprevious 2017, week 25 (Monday 19 June 2017) nextTriangleArrow-Right.svg
Other languages:
العربية • ‎čeština • ‎Ελληνικά • ‎English • ‎British English • ‎español • ‎فارسی • ‎suomi • ‎français • ‎עברית • ‎italiano • ‎日本語 • ‎ಕನ್ನಡ • ‎polski • ‎português • ‎português do Brasil • ‎русский • ‎svenska • ‎українська • ‎Tiếng Việt • ‎中文

June 19, 2017 12:00 AM

June 18, 2017

Gerard Meijssen

#Wikidata - John P. A. Ioannidis and his awards

I am a self confessed award junkie. They are imho important because they are an indication of who is notable and who is less so.

Three awards are associated with professor Ionannidis in Wikidata. One award was also conferred on Hans Rosling and this gives me added confidence in Mr Ionannidis and other recipients of the Chanchlani Global Health Research Award.

Professor Ionannidis throws cold water on much of the practice of scientific practice and consequently on its practitioners. One of his papers has the title: Why most published research findings are false and it is inherently a challenge as well to what we write in the Wikipedias and Wikidata.

At Wikidata a wholesale import is happening of papers, science facts and its authors. This is a great idea, particularly when papers that dismiss much of the nonsense papers gets a prominent place. The result will be that the Neutral Point Of View gets an other twist; it balances what we include with actual science.

by Gerard Meijssen (noreply@blogger.com) at June 18, 2017 09:04 PM

June 17, 2017

T Shrinivasan

Intro to wikipedia – a talk at Villupuram

Tomorrow, Sunday, 18-06-1983, I am giving a talk at Villupuram on “Introduction to Wikipedia”.

Inviting you all to the event. Thanks for Villupuram GNU Linux Users Group and Puduvai GNU Linux Users Group for organizing this event.

Date : 18-06-2017 Sunday

Time : 10.00 am

Venu : Bodhi IAS Academy,
Shanthi Nilayam
No:10, Vishvalingam Layout,
Villupuram – 605602

Contact : 995 253 408 Three , 750 227 341 Eight



by tshrinivasan at June 17, 2017 05:06 PM

Gerard Meijssen

#Wikidata vs #GeoNames - the first to throw a stone

Wikidata has some vocal people vilifying GeoNames. They insist that no data from GeoNames is included in Wikidata because "the quality is so bad". In my last post I wrote down assertions about Wikidata. One of them is that "Never mind how "bad" an external data source is, when they are willing to cooperate on the identification and curation of mutual differences, they are worthy of collaboration".

I wrote an email to Markc Wick, the founder of GeoNames and with his permission I can publish our mail exchange.

Hoi,The import of data from GeonNames into Wikipedia has been controversial. People say that the quality of the GeoNames data is not "good enough". It resulted in the deletion of thousands of articles from the Swedish Wikipedia. I am not Swedish, I did not follow their discussions but the problem is it sours collaboration with other parties because "their data might not be 100%".
This happened in the past, I care for the future. In Wikidata we do link to GeoNames (example Almere [1]).
There are several ways in which we can help each other and potentially even benefit from a collaboration. Wikidata is licensed with a CC-0 license and therefore GeoNames can have all our data and do with it as they please.
My initial proposal is for a comparison of the shared data. The data where GeoNames differs from Wikidata is potentially problematic. Concentrating on these differences together will improve both our and your data.
Would you be interested?
       Gerard Meijssen
His answer is everything I could hope for:
Hi Gerard
Thanks a lot for your email. A couple of weeks ago I have started to parse the wikidata extract and look for the matching attributes. Unfortunately I got interrupted and have not yet looked at the result of the parsing. I will continue as soon as I find the time.
The goal is to add the wikidata identifier to the alternatenames table with pseudos language code 'wkdt'. What I have noted so far is that sometimes the geonameids in wikidata go the wrong concept. For instance going to the city feature when the article is speaking about the administrative division or vice versa. This is one of the things I would like to check before adding the wikidataid as alternatename. GeoNames also has links to wikipedia.
I don't think wikipedia should import all geonames features, not all of them are relevant enough to justify a wikipedia article.
Best Regards 
Not only is there an interest to collaborate; Marc is checking the links in Wikidata referring to GeoNames and as can be expected he finds issues. As I asserted, this is to be expected and collaboration is the only way forward for optimal results.

by Gerard Meijssen (noreply@blogger.com) at June 17, 2017 08:19 AM

June 16, 2017

All Things Linguistic

Digital tools help revitalize rare languages

Digital tools help revitalize rare languages:

I’m quoted in an episode of CBC Spark about digitally disadvantaged languages: 

While the internet remains a challenge for many little-used language speakers, it is helping small groups get an online foothold. It’s much easier to publish a blog online or even self-publish a dictionary with online tools, Gretchen says.

But she adds that without a large online body of words, things like spell-check, which majority language-users take for granted, make things difficult.

“English still dominates, but things are getting flatter,” she says.

The story also features interviews with Khelsilem about Squamish revitalization online, such as a talk show podcast in Squamish, and with Osgur Ó Ciardha and Peadar Ó Caomhánaigh about Pop Up Gaeltacht, using social media to organize Irish-language meetups. 

You can read the associated story or listen on the website now, or if you’d like to hear it on the radio proper, I’m told it’s airing on CBC Radio One Sunday afternoon at 1:05 pm local time in most parts of Canada and again on Wednesday June 21 at 2:05 pm. It’ll also be syndicated on Sirius XM 169 and on ABC (Australia).  

June 16, 2017 11:55 PM

Wiki Education Foundation

Making knowledge public using educational technology

For the ninth year, faculty at Xavier University of Louisiana (XULA) have come together to experiment with new pedagogy in their classrooms. Their group, the Faculty Community of Teaching Scholars (FaCTS), is funded by the Andrew Mellon Foundation and provides a stipend for participants to explore that year’s theme. The theme for academic year 2017–18 is “Making knowledge public using educational technology.” Dr. Megan Osterbur, who participates in Wiki Ed’s Classroom Program, helped organize this year’s group of selected applicants and saw a clear alignment with Wikipedia assignments. After all, Wikipedia serves as educational technology for student editors and is as public as knowledge gets in 2017.

Megan Osterbur, Assistant Professor of Political Science at Xavier University of Louisiana

We are thrilled to work with several FaCTS participants as they incorporate Wikipedia assignments into their various departments. As part of the grant, Megan arranged to bring me and Outreach Manager Samantha Weald to their one-week intensive planning period in May. We joined for the first two days, when participants did a deep dive into the makings of Wikipedia. We collaborated to design meaningful assignments, and we showcased Wiki Ed’s tools and trainings available for instructors and students. We discussed at length how Wikipedia is perhaps the most relevant platform for public knowledge for today’s students.

According to Megan, when we think about public knowledge, we must consider three things: audience, accessibility, and purpose.


FaCTS workshop

When students create knowledge, we ask them to consider their intended audience. When we bring technology into the knowledge-creation process, we must also ask: Who is the actual audience, based on the platform and technology you’re using?

Regarding Wikipedia, the answer comes easily to students, as they’ve been visiting the website for years. Because of their familiarity, they understand that anyone and everyone may read Wikipedia as they look for information about a topic. They inherently understand Wikipedia’s audience, and, thus, the language they must use to communicate effectively. This understanding comes much more easily to them than when writing a mock journal article or traditional term paper.


For knowledge to be truly public, it must be accessible by the public. If academics pursue a career in research and higher education to inform the public and share something they love — knowledge and lifelong learning — with the world, are they achieving their goals through traditional means? Megan compelled us to think about published work and who can access it. For journal articles and academic books or textbooks, the group argued that the audience is limited to peers within academia, some students in higher education, and authors’ family members who may not fully grasp the texts’ jargon. One FaCTS participant joked that “nobody interesting” has access to her published work. Megan pointed out that she knows “the beautiful leather-bound dissertation [her] mom has has more dust on it than fingerprints.”

One of Wikipedia’s primary appeals to instructors as a publishing platform for students is the wide accessibility by the public. College students, high school students, grandparents, children, professors, professionals, and everyone in between read Wikipedia, so student editors can share important academic research and knowledge with a broad audience. That said, in order for a knowledge source to be truly accessible, readers must have the critical media and digital literacy to comprehend the content and consider its reliability. Another advantage of the Wikipedia assignment is that by engaging with this learning process, students build their media literacy skills. In the end, they are not only creating knowledge for others but are better knowledge consumers for themselves. Wikipedia, academic journals, textbooks, and other knowledge sources become even more accessible to students thanks to the skills they build while contributing to Wikipedia.


FaCTS workshop

The FaCTS participants discussed how purpose plays a role in the importance of public knowledge. With Megan’s facilitation, we identified the following purposes for public knowledge:

  • Egalitarian approaches to information, knowledge, and learning.
  • Alignment with an institutional mission. For example, XULA aims to create social justice and global leaders. By creating accessible information for the world, including people who do not have access to institutionalized education, their students create a more just society and build their own leadership skills.
  • Economic necessity: If we keep knowledge private and restricted, the masses are less informed and we may end up with a tertiary economy with no knowledge base and very little economic mobility.
  • Building a foundation for life-long learning: By providing mechanisms for people to continually learn, they can increase their quality of life over time.
  • Critical consumption & multi-mode literacy in the context of information cacophony: Learners engage with new information that does not assume authority but juxtaposes the new information with prior knowledge and information schemas. When attaining new information, they can assess where it fits in with other knowledge schemas they already have. Importantly, information can also serve the purpose of developing multi-modal literacy, helping consumers understand charts, graphics, and other non-text-based information.

As a resource accessed by hundreds of millions of people every month, Wikipedia-editing helps students create knowledge with purpose. They identify academic information that is missing on Wikipedia, research the topics, and fill in the content gaps. As Wikipedia’s readers search for information on the topic, they’re able to learn from sources they may not otherwise have access to, either because it’s behind a paywall or because they don’t have the advanced literacy within the field to comprehend the available literature. These readers take that information into their daily and professional lives, fulfilling the purpose of sharing knowledge.

We learned a lot about how instructors are marrying these two concepts of public knowledge and educational technology, and we’re looking forward to working with even more students at XULA. If you’re an instructor who’s interested in increasing the public knowledge in your field by asking students to edit Wikipedia, contact us at contact@wikiedu.org.

by Jami Mathewson at June 16, 2017 05:57 PM

Weekly OSM

weeklyOSM 360


Jean-Christophe Becquet produces wooden maps with laser cutter 1 | © OpenStreetMap Mitwirkende CC-BY-SA 2.0 | ©  CC-BY SA


  • François Lacombe published a presentation on the topic, “Mapping of energy distribution networks”. (fr)
  • User Jothirnadh explains how to edit a road segment while preserving route relations. His diary entry provides plenty of details and clear screenshots for each step.
  • Michał Brzozowski posts some statistics about MAPS.ME edits. He remarks how power users fix many editing mistakes, but devote too much time and resources that could be otherwise applied on OSM improvements. Roland Obricht expresses appreciation for MAPS.ME edits.
  • Yuri Astrakhan aka nyurik released a video on the topic “OSM+Wikidata”. More information on the wiki.
  • Tijmen Stam suggests to introduce a tag man_made=tunnel equivalent to the widely-used man_made=bridge. Responses are positive and include suggested use-cases.
  • In Talk-ES user dcapillae mentions (automatic translation) a conflict happening in Galicia, North-West of Spain when someone decided to “close” a note. This practice is discouraged by many.
  • Vincent Privat has summarized the suggestions from the SotM-FR concerning “indoor mapping” in a ticket.


  • The OSGeo-Live team is looking (DE) (automatic translation) for a maintainer of JOSM and Mapnik, in order to keep them as part of the Live-DVD. Astrid Emde provides detailed information on tasks and expected workload.
    Additional help is needed to compile new documentation based on LearnOSM material. You’re welcome to contribute, ideally before July 3rd.
  • Creator13 collected a set of weird errors in French land registry. Despite these mismatches, he is glad that the dataset’s import saved a great amount of manual editing.
  • GOwin compares OSM and Google Maps “glitches” in four areas of the world. He identifies OSM’s strength in community involvement versus centralised map management and cites Kibera project as a great example of map maintenance done by local people.
  • PlacesForBikes used OSM data to compare bike networks in 299 cities across the U.S. The analysis aims to identify whether bikers can travel on low-stress segments of the bike network, and unveiled interesting findings.
  • Members of the Russian community (ru) (automatic translation) travelled to Tula in order to map together.
  • People for Bikes’, “Bike Network Analysis” gave Chicago a low score, but this is strongly related to the model and assumptions underlying the analysis itself. Both the data and the analysis are open: geodata comes from OSM and U.S. Census, while the analysis’ source code is available online.
  • Edward Betts presents a tool to add Wikidata tags to OSM elements, and provides a summary of its functionality and usage.
  • Bryan Housel is looking for translators for the iD editor.


  • At SotM-FR, Michaël Louchin introduced LizMap, an integrated tool to create maps using QGIS and publish them online.
  • The second edition of “l’université d’été développement de logiciel libre et open source (UDOS)” is taking place from June 28th to 30th in Digne-les-Bains, France. Among others, there are workshops on OpenLayers3, LeafletJS and OpenStreetMap with Overpass API. (fr)
  • The Regionalverband Ruhr wants to convert its cross-city map work, which is in part OSM-based, and is looking (OSM-wiki) to communicate with the community. For several years, the RVR provides high-resolution aerial photographs for mapping.
  • Stefan Keller will be talking about “Using OpenStreetMap for Tourism and Transport“ at the Opendata.ch (June 27, Luzern Switzerland) in the track, “Open Tourism & Transport Data“.
  • Julien Coupey shares the presentation on Vehicle Routing Open-Source Optimization Machine he did for the State of the Map France 2017.

Humanitarian OSM

  • The Humanitarian OpenStreetMap Team (HOT) has recently signed a cooperation agreement with the International Organization for Migration (IOM), the UN’s Migration Agency. Both organisations agreed to work closer together in the time of need, with the shared commitment to defending the rights and well-being of refugees, forcibly displaced communities and migrants.
  • OpenStreetMap Togo extends (fr) (automatic translation) the Open Source Tour to Sotouboua municipality, where it has organised a OSM introduction for development agents (associations, NGOs and public service personnel).


  • “La Dépêche” published an OSM-based interactive map to display the results in every district for the French parliamentary elections.
  • Jean-Christophe Becquet presents on the French mailing list the project Cartosaixy. The main objectives: encouraging and animating the contribution to digital commons in rural areas, valuing free data by producing custom maps for the needs of small municipalities and experimenting with an innovative method of producing wooden maps with a laser cutter.
  • Sarah Hoffmann, aka Lonvia, the maintainer of Waymarked Trails, announced a collaboration with OpenTopoMap.


  • DoorDash recently updated their application and are using Mapbox to let their customers track couriers, food delivery and much more.
  • The NZZ shows all stages of “tour de Suisse” – of course OSM based. 😉


  • OSM-legal mailing list answers a question regarding OSM data processing for routing calculations on a private machine for commercial purposes.
  • Seán Lynch, creator of OpenLitterMap, asks OSM-legal mailing list for advice on terms and conditions, and liability of the site in case the user gets injured while mapping litter.


  • Mapbox Streets now supports Arabic and Portuguese languages, making their maps more accessible to 300 million internet users and contributing back to Wikidata and OpenStreetMap.


  • The Qt Automotive Suite (Qt 5.9) now ships with vector-based maps (provided by Qt Location Mapbox GL plugin) and APIs for navigation, customized design and the ability to layer in your own data.


Software Version Release date Comment
Kurviger Free * 10.0.28 2017-06-05 Some enhancements.
guide4you 2.0.1 2017-06-06 Many changes, please read release info.
Naviki Android * 3.60.1 2017-06-06 Bugfix release.
Magic Earth * 2017-06-07 Many changes. Please read release info.
MapContrib 1.8.2 2017-06-07 Bugs fixed.
Mapillary iOS * 4.7.0 2017-06-08 Six new features and enhancements.
Mapbox GL JS v0.38.0 2017-06-09 One new feature, 11 bugs fixed and five development workflow improvements.
iD 2.2.2 2017-06-12 Many changes, please read release info.
Komoot Android * var 2017-06-12 No infos.
Komoot iOS * 9.1.2 2017-06-12 Design of the komoot collections completely reworked, improved tour planning, reduced storage space.
OpenLayers 4.2.0 2017-06-12 Many changes, please read release info.
SQLite 3.19.3 2017-06-18 Bugfix release.

Provided by the OSM Software Watchlist. Timestamp: 2017-06-13 00:32:52+02 UTC

(*) unfree software. See: freesoftware.

Mapbox’s mobile Navigation SDKs for iOS and Android have released big updates iOS (0.4.0) and Android (0.3.1). These will now give the tools that you need to add turn-by-turn navigation to your app or build a completely custom navigation app from scratch.

OSM in the media

  • Ulrich Waltemath created (de) (automatic translation) a database of bus stops for Harz district, Germany. He added precise locations to OSM, agreed, however, to share the database (complete with pictures) with the bus transport company for internal use only, at least for now.

Other “geo” things

  • The Economist reviews the landscape of digital cartography for consumers, OSM and related projects are mentioned as an alternative to Google Maps.
  • ISPRS’ 4th International Workshop on GeoInformation Science will take place in Safranbolu, Turkey, on the 14th and 15th of October 2017. The workshop will focus on multi-dimensional and multi-scale spatial data modeling. Papers can be submitted until the 15th of July.
  • Mapillary hosted a blog post from Harriette Stone about post-earthquake survey methods. She listed pros and cons of data collection with cars, by foot or with drones, and suggests how to mitigate downsides of each method.
  • Viae Romanae Maiores – Tabula reticuli: main roads of Ancient Roman Empire, rendered subway-style.

Upcoming Events

Where What When Country
Tokyo 東京!街歩き!マッピングパーティ:第9回 旧芝離宮恩賜庭園 2017-06-17 japan
Procida Unaquantum conference Procida 2017-06-17-2017-06-18 italy
Istanbul First Missing Maps Mapathon in Turkey for the World Refugee Day, Impact Hub 2017-06-18 turkey
Bonn Bonner Stammtisch 2017-06-20 germany
Lüneburg Mappertreffen Lüneburg 2017-06-20 germany
Nottingham Nottingham Pub Meetup 2017-06-20 united kingdom
Scotland Edinburgh 2017-06-20 united kingdom
Melbourne Missing Maps Mapathon in Melbourne, Kathleen Syme Community Centre 2017-06-20 australia
Karlsruhe Stammtisch 2017-06-21 germany
Lübeck Lübecker Mappertreffen 2017-06-22 germany
Wateringen Missing Maps Mapathon gemeente Westland, Rode Kruis en GeoGilde, Gemeentehuis Wateringen 2017-06-22 the netherlands
Essen 8. FOSSGIS Hacking Event im Linuxhotel 2017-06-23-2017-06-25 germany
Essen SommerCamp 2017 2017-06-23-2017-06-25 germany
Bremen Bremer Mappertreffen 2017-06-26 germany
Dijon Rencontres mensuelles 2017-06-27 france
Prague Missing Maps Mapathon in Prague, Paralelní polis 2017-06-27 czech republic
Leuven Leuven Monthly OSM Meetup 2017-06-28 belgium
Digne-les-Bains {UDOS} 2017 : Université d’été du développement de logiciel libre 2017-06-28-2017-06-30 france
Amstetten Stammtisch Ulmer Alb 2017-06-29 germany
Nantes Mapathon Missing Maps à l’ENSA 2017-06-29 france
Dusseldorf Stammtisch Düsseldorf 2017-06-30 germany
Salzburg AGIT2017 2017-07-05-2017-07-07 austria
Kampala State of the Map Africa 2017 2017-07-08-2017-07-10 uganda
Champs-sur-Marne (Marne-la-Vallée) FOSS4G Europe 2017 at ENSG Cité Descartes 2017-07-18-2017-07-22 france
Boston FOSS4G 2017 2017-08-14-2017-08-19 united states
Aizu-wakamatsu Shi State of the Map 2017 2017-08-18-2017-08-20 japan
Patan State of the Map Asia 2017 2017-09-23-2017-09-24 nepal
Boulder State of the Map U.S. 2017 2017-10-19-2017-10-22 united states
Buenos Aires FOSS4G+State of the Map Argentina 2017 2017-10-23-2017-10-28 argentina
Lima State of the Map LatAm 2017 2017-11-29-2017-12-02 perú

Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please check your event in our public calendar preview and correct it, where appropriate.

This weeklyOSM was produced by Anmaca, Anne Ghisla, Nakaner, Peda, Polyglot, Rogehm, SK53, SeleneYang, YoViajo, derFred, jcoupey, jinalfoflia, juanblas09, vsandre, wambacher.

by weeklyteam at June 16, 2017 01:39 PM

Wikimedia Foundation

Discussing fake news and the NSA lawsuit at Yale

Photo by Nick Allen, CC BY-SA 3.0.

Over the years, the Wikimedia Foundation and Yale Law School have established an ongoing research and educational affiliation and collaboration during which both Yale students and faculty and the Foundation have participated in symposia, presentations, and conferences, hosted by either Yale or Wikimedia. Furthermore, students and new graduates from Yale have held internships and fellowships at Wikimedia and Yale researchers have engaged in collaborative research with Wikimedia.

Given Wikimedia’s mission and interests, one of Wikimedia’s fundamental activities is to directly contribute to the research and education mission of Yale and other institutions of higher education. The ongoing affiliation and collaboration between the parties has benefited both sides by allowing for the advancement of public policy research, the exchange of new ideas about the regulations and laws of online platforms, and mentorship opportunities for promising students and new graduates from Yale to better understand the Wiki-model of knowledge creation.

As a continuation of this affiliation, members of the Wikimedia legal team visited Yale for two events this past spring.

On March 7, Jacob Rogers, an attorney from the Wikimedia Foundation, attended a panel at Yale Law School on addressing the topic of fake news and false information online. The panel brought together a number of academics, news publishers, and online platforms to discuss the nature of the problem and look into proposed solutions. The discussion focused on the complexity of the problem, with falsehoods potentially arising from misunderstandings and honest attempts to get things right in a complex world as well as from intentionally malicious misleading information. One solution discussed was the use of crowdsourcing, and the panelists looked at Wikipedia’s model to try to understand how some crowdsourced projects succeed with motivated, empowered volunteers. The panel also discussed different approaches to automation and the idea of combining automated tools with human review to reach a better result than either can accomplish alone.

On March 21, James Buatti and Zhou Zhou, attorneys at the Wikimedia Foundation, delivered a presentation on the Wikimedia v. NSA lawsuit to the Yale community. The presentation provided background for some of the known U.S. government surveillance programs, information about the history of litigation against these programs, the ongoing status of Wikimedia v. NSA lawsuit, and previewed possible new changes and implications for the government surveillance under the new administration. The Foundation attorneys had an extended discussion with audience members afterwards, who, among other things, expressed appreciation for the presentation’s explanation of the differences and similarities of the various government surveillance programs. The audience also provided their own valuable insights on how surveillance power might be balanced with the need to protect civil liberties within a proper and practical legal and policy framework.

Events like these recent ones at Yale help the Wikimedia Foundation share our unique values and processes with the outside world and build support, by educating prominent members of the legal and policy community, for the causes we believe in and fight for. As such, we look forward to continuing our affiliation and collaboration with Yale for many years to come.

Jacob Rodgers, Legal Counsel
Zhou Zhou, Legal Counsel
Wikimedia Foundation

For more information about Wikimedia’s perspective on public policy issues and to stay engaged, please visit the Foundation’s public policy page and join the Wikimedia movement’s mailing list. Academic and research institutions who wish to connect with members of the Wikimedia legal and public policy team about these topics are also welcome to email the team at policy@wikimedia.org.

by Jacob Rogers and Zhou Zhou at June 16, 2017 01:16 PM

June 15, 2017

Wikimedia Tech Blog

Investigating a mysterious performance improvement

Photo by USGS Bee Inventory and Monitoring Lab, public domain.

Late last month, Jon Robson, a software developer at the Wikimedia Foundation, pinged me about a performance improvement that his team had noticed. On our mobile site, large wiki articles appeared to load faster in our testing environment, according to WebPageTest, a web performance tool that we use to measure how fast above-the-fold content is displayed.

You can see the way the improvement looked in the image below, which shows a sudden drop in page load time. (This is called the SpeedIndex, where a lower number indicates a faster load time.)

Pat Meenan, the author of WebPageTest, describes SpeedIndex as the “average time at which visible parts of the page are displayed. It is expressed in milliseconds and dependent on size of the view port.”

The actual mathematical definition is a bit more complicated, but essentially it captures how fast an end user sees content above-the-fold for a given screen size and internet connection speed. We run different profiles in WebPageTest representing different kinds of devices and internet speeds, because sometimes performance changes only affect some devices or some types of connection speeds or some types of wiki pages, even.

In the case of the WebPageTest test spotted by Robson, the drop in page load time was taking place on mobile phone-sized screens using a simulated 3G network.

It was time for an internal investigation: why was this happening? What was the root cause?

As we do for every suspected performance improvement or regression from an unknown cause, we filed a task for it, tagged the Performance-Team, and began digging a little deeper. (If you ever come across something odd related to performance, you can create a task directly and tag the Performance-Team—which will get our attention.)

Comparing synthetic testing and real user metrics

When a change like this happens in a testing environment, we first verify whether or not a similar change can be seen in our real user metrics. Specifically, we look at Navigation Timing in Grafana, which indicates the loading milestones the browser reports.

This is because we can’t use SpeedIndex to measure the page load time of real users. Our real user metrics are limited by the JavaScript APIs available in browser to measure page load speed, which are very basic compared to what WepPageTest can do in a testing environment. There’s no way for us to tell, for example, when people can see all images and text above the fold from the client-side code.

We are able, however, to understand when a web browser starts publishing anything on a page. To do this, we use firstPaint, a simple metric reported by some browsers that tells us the point in time when the browser starts painting the page. Though firstPaint measures a different metric than SpeedIndex, there’s an important overlap in the way they measure page statuses: they use the same timeline. In other words, they start measuring what happens on a page at the same start time. This also means it’s common for any  a SpeedIndex change in the testing environment to come with a simultaneous variation in real user metrics like firstPaint. When this happens, it makes our investigation easier because we know it’s not an issue in our telemetry, but a real effect. (When there’s no correlation between synthetic testing metrics and real user metrics, we try different tactics.)

This fundamental difference also means that some performance improvements can improve the site loading time on SpeedIndex, while not changing any metrics on firstPaint or any Navigation Timing metrics. In these cases, we know performance has improved, but we can’t measure how much site load time improved in the real world for people browsing Wikipedia.

This is exactly what happened in this mysterious incident: SpeedIndex metrics improved, but real user metrics didn’t. That doesn’t mean that the site didn’t load faster for our real users—but it’s necessary to understand that Navigation Timing, which measures some milestones in the page load, is only a partial view of performance, and that we can’t always measure performance changes using real user data.

Comparing WebPageTest runs

The next logical step in our investigation was to compare how the page performed on WebPageTest both before and after the performance change. You can see our synthetic tests, which run continuously on our public WebPageTest instance. Here are the steps:

First you want to click on the test history section, which brings you to this view:

Next, click on the show tests from all users checkbox. You should now see all our test runs:

We continuously test a number of pages for the desktop and mobile site, using various simulated internet connection speeds and other metrics. Finding the tests you’re interested with in this historical view requires some manual labour, because you need to manually search for the labels you’re interested in, as the search box only applies to the URL.

WebPageTest does supports a great feature to compare different runs from the history view, but won’t get into that here, though, as the difference in speed is visible from the screenshot of the runs alone. After combing through the history view, I found two runs of the same test. It was for loading the Sweden article on the English Wikipedia, while browsing the mobile site on Chrome with a simulated 3G connection before and after the SpeedIndex drop.

Here’s how it looked before:

And here’s how it looked after:

Notice any difference?

It’s obvious that the content above the fold has changed. The new version displays mostly text above the fold, whereas the old version also contains images. This explains the SpeedIndex improvement: it’s faster to load text than an image, which means that users get content they can consume above-the-fold faster. This is more dramatic on slow connections, which is why this performance improvement showed up on our synthetic testing that simulated a 3G connection.

But was this a deliberate or accidental change?

The next part of the investigation was to determine whether that was an accidental change, or a deliberate one. The first place we examined was the Wikimedia Server Admin Log. Whenever changes are deployed to Wikimedia’s production servers, log entries are added there. Deployments can be individual patches or our weekly deployment train. This part of the investigation is simple: we simply went through the log, looking for anything that happened around the time the performance change happened.

And sure enough, we found this log entry around the time of the performance change:

18:31 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: mobileFrontend: Move first paragraph before infobox T150325 (duration: 00m 41s)

The task quoted in that log entry, T150325: Move first paragraph before infobox on stable, is a deliberate change to improve the user experience by showing the first section of an article at the top rather than the infobox. While making this change, Foundation engineer Sam Smith (@phuedx) also improved the performance for users on slow internet connections. They will now see the first section of an article above the fold, which they can start reading, instead of a mostly empty infobox whose images are still loading.

So our mystery has been solved, and users on slower connections saw a performance improvement, as well. As for me, well, I put down my deerstalker, calabash pipe, and magnifying glass until the next whodunit and headed back to improving performance on Wikimedia projects.

Gilles Dubuc, Senior Software Engineer (Contractor), Performance
Wikimedia Foundation

Want to work on cool projects like this? See our current openings.

by Gilles Dubuc at June 15, 2017 03:47 PM

Jeroen De Dauw

Simple is not easy

Simplicity is possibly the single most important thing on the technical side of software development. It is crucial to keep development costs down and external quality high. This blog post is about why simplicity is not the same thing as easiness, and common misconceptions around these terms.

Simple is not easy

Simple is the opposite of complex. Both are a measure of complexity, which arises from intertwining things such as concepts and responsibilities. Complexity is objective, and certain aspects of it, such as Cyclomatic Complexity, can be measured with many code quality tools.

Easy is the opposite of hard. Both are a measure of effort, which unlike complexity, is subjective and highly dependent on the context. For instance, it can be quite hard to rename a method in a large codebase if you do not have a tool that allows doing so safely. Similarly, it can be difficult to understand an OO project if you are not familiar with OO.

Achieving simplicity is hard

I’m sorry I wrote you such a long letter; I didn’t have time to write a short one.

Blaise Pascal

Finding simple solutions, or brief ways to express something clearly, is harder than finding something that works but is more complex. In other words, achieving simplicity is hard. This is unfortunate, since dealing with complexity is also hard.

In recent decades the cost of software maintenance has become much greater than the cost of its creation, so it makes sense to make maintenance as easy as we can. This means avoiding as much complexity as we can during the creation of the software, which is a hard task. The cost of the complexity does not suddenly appear once the software goes into an official maintenance phase, it is there on day 2, when you need to deal with code from day 1.

Good design requires thought

Questions about whether design is necessary or affordable are quite beside the point: design is inevitable. The alternative to good design is bad design, not no design at all.

— Vaughn Vernon in Domain-Driven Design Distilled

Some people in the field conflate simple and easy in a particularly unfortunate manner. They reason that if you need to think a lot about how to create a design, it will be hard to understand the design. Clearly, thinking a lot about a design does not guarantee that it is good and minimizes complexity. You can do a good job and create something simple or you can overengineer. There is however one guarantee that can be made based on the effort spent: for non-trivial problems, if little effort was spent (by going for the easy approach), the solution is going to be more complex than it could have been.

One high-profile case of such conflation can be found in the principles behind the Agile Manifesto. While I don’t fully agree with some of the other principles, this is the only one I strongly disagree with (unless you remove the middle part). Yay Software Craftsmanship manifesto.

Simplicity–the art of maximizing the amount of work not done–is essential

Principles behind the Agile Manifesto

Similarly we should be careful to not confuse the ease of understanding a system with the ease of understanding how or why it was created the way it was. The latter, while still easier than the actual task of creating a simple solution, is still going to be harder than working with said simple solution, especially for those that lack the skills used in its creation.

Again, I found a relatively high-profile example of such confusion:

If the implementation is hard to explain, it’s a bad idea. If the implementation is easy to explain, it may be a good idea.

The Zen of Python

I think this is just wrong.

You can throw all books in a library onto a big pile and then claim it’s easy to explain where a particular book is – in the pile – though actually finding the book is a bigger challenge. It’s true that you need more skills to use a well-organized library effectively than you need to go through a pile of books randomly. You need to know the alphabet, be familiar with the concept of genres, etc. Clearly an organized library is easier to deal with than our pile of books for anyone that has those skills.

It is also true that sometimes it does not make sense to invest in the skill that allows working more effectively, and that sometimes you simply cannot find people with the desired skills. This is where the real bottleneck is: learning. Most of the time these investments are worth it, as they allow you to work both faster and better from that point on.

See also

In my reply to the Big Ball of Mud paper I also talk about how achieving simplicity requires effort.

The main source of inspiration that led me to this blog post is Rich Hickeys 2012 Rails Conf keynote, where he starts by differentiating simple and easy. If you don’t know who Rich Hickey is (he created Clojure), go watch all his talks on YouTube now, well worth the time. (I don’t agree with everything he says but it tends to be interesting regardless.) You can start with this keynote, which goes into more detail than this blog post and adds a bunch of extra goodies on top. <3 Rich

Following the reasoning in this blog post, you cannot trade software quality for lower cost. You can read more about this in the Tradable Quality Hypothesis and Design Stamina Hypothesis articles.

There is another blog post titled Simple is not easy, which as far as I can tell, differentiates the terms without regard to software development.

by Jeroen at June 15, 2017 05:59 AM

June 14, 2017

Benjamin Mako Hill

Community Data Science Workshops Post-Mortem

Earlier this year, I helped plan and run the Community Data Science Workshops: a series of three (and a half) day-long workshops designed to help people learn basic programming and tools for data science tools in order to ask and answer questions about online communities like Wikipedia and Twitter. You can read our initial announcement for more about the vision.

The workshops were organized by myself, Jonathan Morgan from the Wikimedia Foundation, long-time Software Carpentry teacher Tommy Guy, and a group of 15 volunteer “mentors” who taught project-based afternoon sessions and worked one-on-one with more than 50 participants. With overwhelming interest, we were ultimately constrained by the number of mentors who volunteered. Unfortunately, this meant that we had to turn away most of the people who applied. Although it was not emphasized in recruiting or used as a selection criteria, a majority of the participants were women.

The workshops were all free of charge and sponsored by the UW Department of Communication, who provided space, and the eScience Institute, who provided food.

cdsw_combo_images-1The curriculum for all four session session is online:

The workshops were designed for people with no previous programming experience. Although most our participants were from the University of Washington, we had non-UW participants from as far away as Vancouver, BC.

Feedback we collected suggests that the sessions were a huge success, that participants learned enormously, and that the workshops filled a real need in the Seattle community. Between workshops, participants organized meet-ups to practice their programming skills.

Most excitingly, just as we based our curriculum for the first session on the Boston Python Workshop’s, others have been building off our curriculum. Elana Hashman, who was a mentor at the CDSW, is coordinating a set of Python Workshops for Beginners with a group at the University of Waterloo and with sponsorship from the Python Software Foundation using curriculum based on ours. I also know of two university classes that are tentatively being planned around the curriculum.

Because a growing number of groups have been contacting us about running their own events based on the CDSW — and because we are currently making plans to run another round of workshops in Seattle late this fall — I coordinated with a number of other mentors to go over participant feedback and to put together a long write-up of our reflections in the form of a post-mortem. Although our emphasis is on things we might do differently, we provide a broad range of information that might be useful to people running a CDSW (e.g., our budget). Please let me know if you are planning to run an event so we can coordinate going forward.

by Benjamin Mako Hill at June 14, 2017 05:47 PM

Community Data Science Workshops in Seattle

Photo from the Boston Python Workshop – a similar workshop run in Boston that has inspired and provided a template for the CDSW.
Photo from the Boston Python Workshop – a similar workshop run in Boston that has inspired and provided a template for the CDSW.

On three Saturdays in April and May, I will be helping run three day-long project-based workshops at the University of Washington in Seattle. The workshops are for anyone interested in learning how to use programming and data science tools to ask and answer questions about online communities like Wikipedia, Twitter, free  and open source software, and civic media.

The workshops are for people with no previous programming experience and the goal is to bring together researchers as well as participants and leaders in online communities.  The workshops will all be free of charge and open to the public given availability of space.

Our goal is that, after the three workshops, participants will be able to use data to produce numbers, hypothesis tests, tables, and graphical visualizations to answer questions like:

  • Are new contributors to an article in Wikipedia sticking around longer or contributing more than people who joined last year?
  • Who are the most active or influential users of a particular Twitter hashtag?
  • Are people who participated in a Wikipedia outreach event staying involved? How do they compare to people that joined the project outside of the event?

If you are interested in participating, fill out our registration form here. The deadline to register is Wednesday March 26th.  We will let participants know if we have room for them by Saturday March 29th. Space is limited and will depend on how many mentors we can recruit for the sessions.

If you already have experience with Python, please consider helping out at the sessions as a mentor. Being a mentor will involve working with participants and talking them through the challenges they encounter in programming. No special preparation is required.  If you’re interested,  send me an email.

by Benjamin Mako Hill at June 14, 2017 05:46 PM

Consider the Redirect

In wikis, redirects are special pages that silently take readers from the page they are visiting to another page. Although their presence is noted in tiny gray text (see the image below) most people use them all the time and never know they exist. Redirects exist to make linking between pages easier, they populate Wikipedia’s search autocomplete list, and are generally helpful in organizing information. In the English Wikipedia, redirects make up more than half of all article pages.

seattle_redirectOver the years, I’ve spent some time contributing to to Redirects for Discussion (RfD). I think of RfD as like an ultra-low stakes version of Articles for Deletion where Wikipedians decide whether to delete or keep articles. If a redirect is deleted, viewers are taken to a search results page and almost nobody notices. That said, because redirects are almost never viewed directly, almost nobody notices if a redirect is kept either!

I’ve told people that if they want to understand the soul of a Wikipedian, they should spend time participating in RfD. When you understand why arguing about and working hard to come to consensus solutions for how Wikipedia should handle individual redirects is an enjoyable way to spend your spare time — where any outcome is invisible — you understand what it means to be a Wikipedian.

That said, wiki researchers rarely take redirects into account. For years, I’ve suspected that accounting for redirects was important for Wikipedia research and that several classes of findings were noisy or misleading because most people haven’t done so. As a result, I worked with my colleague Aaron Shaw at Northwestern earlier this year to build a longitudinal dataset of redirects that can capture the dynamic nature of redirects. Our work was published as a short paper at OpenSym several months ago.

It turns out, taking redirects into account correctly (especially if you are looking at activity over time) is tricky because redirects are stored as normal pages by MediaWiki except that they happen to start with special redirect text. Like other pages, redirects can be updated and changed over time are frequently are. As a result, taking redirects into account for any study that looks at activity over time requires looking at the text of every revision of every page.

Using our dataset, Aaron and I showed that the distribution of edits across pages in English Wikipedia (a relationships that is used in many research projects) looks pretty close to log normal when we remove redirects and very different when you don’t. After all, half of articles are really just redirects and, and because they are just redirects, these “articles” are almost never edited.

edits_over_pagesAnother puzzling finding that’s been reported in a few places — and that I repeated myself several times — is that edits and views are surprisingly uncorrelated. I’ll write more about this later but the short version is that we found that a big chunk of this can, in fact, be explained by considering redirects.

We’ve published our code and data and the article itself is online because we paid the ACM’s open access fee to ransom the article.

by Benjamin Mako Hill at June 14, 2017 05:45 PM

The Wikipedia Adventure

I recently finished a paper that presents a novel social computing system called the Wikipedia Adventure. The system was a gamified tutorial for new Wikipedia editors. Working with the tutorial creators, we conducted both a survey of its users and a randomized field experiment testing its effectiveness in encouraging subsequent contributions. We found that although users loved it, it did not affect subsequent participation rates.

Start screen for the Wikipedia Adventure.

A major concern that many online communities face is how to attract and retain new contributors. Despite it’s success, Wikipedia is no different. In fact, researchers have shown that after experiencing a massive initial surge in activity, the number of active editors on Wikipedia has been in slow decline since 2007.

The number of active, registered editors (≥5 edits per month) to Wikipedia over time. From Halfaker, Geiger, and Morgan 2012.

Research has attributed a large part of this decline to the hostile environment that newcomers experience when begin contributing. New editors often attempt to make contributions which are subsequently reverted by more experienced editors for not following Wikipedia’s increasingly long list of rules and guidelines for effective participation.

This problem has led many researchers and Wikipedians to wonder how to more effectively onboard newcomers to the community. How do you ensure that new editors Wikipedia quickly gain the knowledge they need in order to make contributions that are in line with community norms?

To this end, Jake Orlowitz and Jonathan Morgan from the Wikimedia Foundation worked with a team of Wikipedians to create a structured, interactive tutorial called The Wikipedia Adventure. The idea behind this system was that new editors would be invited to use it shortly after creating a new account on Wikipedia, and it would provide a step-by-step overview of the basics of editing.

The Wikipedia Adventure was designed to address issues that new editors frequently encountered while learning how to contribute to Wikipedia. It is structured into different ‘missions’ that guide users through various aspects of participation on Wikipedia, including how to communicate with other editors, how to cite sources, and how to ensure that edits present a neutral point of view. The sequence of the missions gives newbies an overview of what they need to know instead of having to figure everything out themselves. Additionally, the theme and tone of the tutorial sought to engage new users, rather than just redirecting them to the troves of policy pages.

Those who play the tutorial receive automated badges on their user page for every mission they complete. This signals to veteran editors that the user is acting in good-faith by attempting to learn the norms of Wikipedia.

An example of a badge that a user receives after demonstrating the skills to communicate with other users on Wikipedia.

Once the system was built, we were interested in knowing whether people enjoyed using it and found it helpful. So we conducted a survey asking editors who played the Wikipedia Adventure a number of questions about its design and educational effectiveness. Overall, we found that users had a very favorable opinion of the system and found it useful.

Survey responses about how users felt about TWA.
Survey responses about what users learned through TWA.

We were heartened by these results. We’d sought to build an orientation system that was engaging and educational, and our survey responses suggested that we succeeded on that front. This led us to ask the question – could an intervention like the Wikipedia Adventure help reverse the trend of a declining editor base on Wikipedia? In particular, would exposing new editors to the Wikipedia Adventure lead them to make more contributions to the community?

To find out, we conducted a field experiment on a population of new editors on Wikipedia. We identified 1,967 newly created accounts that passed a basic test of making good-faith edits. We then randomly invited 1,751 of these users via their talk page to play the Wikipedia Adventure. The rest were sent no invitation. Out of those who were invited, 386 completed at least some portion of the tutorial.

We were interested in knowing whether those we invited to play the tutorial (our treatment group) and those we didn’t (our control group) contributed differently in the first six months after they created accounts on Wikipedia. Specifically, we wanted to know whether there was a difference in the total number of edits they made to Wikipedia, the number of edits they made to talk pages, and the average quality of their edits as measured by content persistence.

We conducted two kinds of analyses on our dataset. First, we estimated the effect of inviting users to play the Wikipedia Adventure on our three outcomes of interest. Second, we estimated the effect of playing the Wikipedia Adventure, conditional on having been invited to do so, on those same outcomes.

To our surprise, we found that in both cases there were no significant effects on any of the outcomes of interest. Being invited to play the Wikipedia Adventure therefore had no effect on new users’ volume of participation either on Wikipedia in general, or on talk pages specifically, nor did it have any effect on the average quality of edits made by the users in our study. Despite the very positive feedback that the system received in the survey evaluation stage, it did not produce a significant change in newcomer contribution behavior. We concluded that the system by itself could not reverse the trend of newcomer attrition on Wikipedia.

Why would a system that was received so positively ultimately produce no aggregate effect on newcomer participation? We’ve identified a few possible reasons. One is that perhaps a tutorial by itself would not be sufficient to counter hostile behavior that newcomers might experience from experienced editors. Indeed, the friendly, welcoming tone of the Wikipedia Adventure might contrast with strongly worded messages that new editors receive from veteran editors or bots. Another explanation might be that users enjoyed playing the Wikipedia Adventure, but did not enjoy editing Wikipedia. After all, the two activities draw on different kinds of motivations. Finally, the system required new users to choose to play the tutorial. Maybe people who chose to play would have gone on to edit in similar ways without the tutorial.

Ultimately, this work shows us the importance of testing systems outside of lab studies. The Wikipedia Adventure was built by community members to address known gaps in the onboarding process, and our survey showed that users responded well to its design.

While it would have been easy to declare victory at that stage, the field deployment study painted a different picture. Systems like the Wikipedia Adventure may inform the design of future orientation systems. That said, more profound changes to the interface or modes of interaction between editors might also be needed to increase contributions from newcomers.

This blog post, and the open access paper that it describes, is a collaborative project with Sneha Narayan, Jake OrlowitzJonathan Morgan, and Aaron Shaw. Financial support came from the US National Science Foundation (grants IIS-1617129 and IIS-1617468), Northwestern University, and the University of Washington. We also published all the data and code necessary to reproduce our analysis in a repository in the Harvard Dataverse. Sneha posted the material in this blog post over on the Community Data Science Collective Blog.

by Benjamin Mako Hill at June 14, 2017 05:45 PM

WMF Release Engineering

New feature: Embed videos from Commons into Phabricator markup

I just finished deploying an update to Phabricator which includes a simple but rather useful feature:

T116515: Enable embedding of media from Wikimedia Commons

You can now embed videos from Wikimedia commons into any Task, Comment or Post. Just paste the commons URL to embed the standard commons player in an iframe. For example, this url:


Produces this embedded video:

by mmodell (Mukunda Modell) at June 14, 2017 04:36 AM

Wikimedia Performance Team

Looking back: improvements to edit save time

The WMF's financial year and its annual plan are coming to an end, and one of the Performance team's goals this past year was to reduce the amount of time it takes to save an edit on a wiki.

This set of metrics, which we call Save Timing, is publicly tracked on Grafana. It's recorded for all Wikimedia wikis. It's a critical performance pain point for editors, as edits on large wiki pages can sometimes take seconds to save.

We distinguish the amount of time the backend takes to process the edit, from the amount of time the end-user actually experiences to save the edit (collected client-side). We'll focus on the latter, as this is what people really experience. Backend traffic can come from bots, jobs, etc. where long execution times atypical of human edits affect the metrics.

Let's look at the evolution of frontend save timing since the beginning of the financial year, on July 1st 2016.

The 99th percentile, which represents the slowest editors experience dropped significantly:

Going from 22.4 to 16.82 seconds (weekly average), a 25% improvement.

So did the median:

Going from 953 to 813 milliseconds (weekly average), a 15% improvement.

@aaron deserves most of the credit for this tremendous performance improvement that editors experience every day. Performance is a never-ending goal and we hope to achieve even better save timing in the future thanks to our continued work in this area.

by Gilles (Gilles Dubuc) at June 14, 2017 01:56 AM

June 13, 2017

Jeroen De Dauw

OOP file_get_contents

I’m happy to announce the immediate availability of FileFetcher 4.0.0.

FileFecther is a small PHP library that provides an OO way to retrieve the contents of files.

What’s OO about such an interface? You can inject an implementation of it into a class, thus avoiding that the class knows about the details of the implementation, and being able to choose which implementation you provide. Calling file_get_contents does not allow changing implementation as it is a procedural/static call making use of global state.

Library number 8234803417 that does this exact thing? Probably not. The philosophy behind this library is to provide a very basic interface (FileFetcher) that while insufficient for plenty of use cases, is ideal for a great many, in particular replacing procedural file_get_contents calls. The provided implementations are to facilitate testing and common generic tasks around the actual file fetching. You are encouraged to create your own core file fetching implementation in your codebase, presumably an adapter to a library that focuses on this task such as Guzzle.

So what is in it then? The library provides several trivial implementations of the FileFetcher interface at its heart:

  • SimpleFileFetcher: Adapter around file_get_contents
  • InMemoryFileFetcher: Adapter around an array provided to its constructor
  • ThrowingFileFetcher: Throws a FileFetchingException for all calls (added after 4.0)
  • NullFileFetcher: Returns an empty string for all calls (added after 4.0)
  • StubFileFetcher: Returns a stub value for all calls (added after 4.0)

It also provides a number of generic decorators:

Version 4.0.0 brings PHP7 features (scalar type hints \o/) and adds a few extra handy implementations. You can add the library to your composer.json (jeroen/file-fetcher) or look at the documentation on GitHub. You can also read about its inception in 2013.

by Jeroen at June 13, 2017 02:24 PM

Gerard Meijssen

#Wikidata some assertions

Wikidata is no different from any community, there are differences of opinion. Everybody has his or her own perspective but there are assertions that can be made that have a more universal resonance. 

The assertions below represent the underlying arguments I use in my blog posts and in the discussions I take part of. They are the ones I feel are not necessarily "political" or have a negative impact.
  1. There is no data store without problems, this includes Wikipedia and Wikidata.
  2. The data we hold is best understood by applying set theory. The data in Wikidata consists of many subsets; probably the most valuable subset for the WMF are the interwiki links.
  3. The error rate in each subset can be assessed and is by definition different from the overall Wikidata error rate
  4. The absence of data often indicates a bias in the data Wikidata holds. A good example is the lack of data relevant to the global south.
  5. Given the huge influx of data from Wikipedia, the biggest imports have been from English Wikipedia and it is one reason for the existing biases in Wikidata.
  6. An absence of data prevents the application of tools. Tools may suggest writing a Wikipedia article, tools may compare data with other sources.
  7. Concentrating on the differences between Wikidata and any other data source is the most optimal way of improving the quality of existing data in either data set.
  8. Having an application for the data in Wikidata is the best way for improving the usefulness for a subset of data.
  9. Each contributor to Wikidata works on the data set(s) of his/her own choice, these data sets interact in the whole of Wikidata. This may raise issues and this can not always be avoided.
  10. Examples of problematic data must be seen in the light of the total of the data set they are part of. Statistically they may be irrelevant.
  11. Never mind how "bad" an external data source is, when they are willing to cooperate on the identification and curation of mutual differences, they are worthy of collaboration
  12. Wikidata improves continually and as such it is "purrfect" but it will never be perfect.

by Gerard Meijssen (noreply@blogger.com) at June 13, 2017 06:07 AM

June 12, 2017

Wiki Education Foundation

Students expand coverage of country-specific environmental issues

If you’re interested to read a broad overview of an environmental topic, there’s a very good chance you’ll find an article about it on Wikipedia. If you want in-depth information about the topic as it pertains to a specific country, however, you’ll probably only be able to find information about a small number of developed countries. It is in this kind of situation that student editors are well positioned to make a difference by filling in gaps in coverage. Wikipedia has a series of articles in the format “Environmental issues in [country].” The series is still far from complete, but it is substantially better than it was before student editors in Tiffany Linton Page’s Advanced Studies in Development Studies course got to work creating and expanding many countries’ articles.

Environmental issues by [country] articles on Wikipedia at of June 7, 2017. Blue links represent existing articles; red links do not yet exist.

The United Arab Emirates (UAE) is a small coastal country that is largely desert. Urban development, agriculture and wildlife habitat all compete for the country’s limited land base. The combination of increasing population, rising sea levels and increased aridity all put pressure on this limited resource. While these environmental problems exist in many parts of the world, the mainstay of the economy of the UAE, fossil fuel production, is a major factor behind many of these problems. You can now read about these issues and how they interrelate in the article on environmental issues in the United Arab Emirates, which was created by a student in this class.

Students in the class created new articles on environmental issues in Kuwait, Yemen, Israel, Sri Lanka, Bangladesh, Mongolia and Georgia. Others expanded existing articles on environmental issues in Pakistan, Uruguay, Haiti, and Colombia. Climate change, population growth and water pollution are problems in most of these countries. Deforestation, desertification, and mangrove loss pose major problems in only a subset of them. Thanks to the work by these student editors, the picture is far more complete.

Other students in the class focused their work on subnational geographical entities or on single problems; some created environmental impact of development in the Sundarbans and mangrove deforestation in Myanmar, while others expanded entries on deforestation in Cambodia, electronic waste in China, and the geography of Uzbekistan. Two other new articles were created by the class: climate change and indigenous persons and Emissions Trading Scheme in South Korea. Other existing articles expanded by the class include deep ecology, green development, Consejo Nacional de Areas Protegidas, payment for ecosystem services, the Paris Agreement, sustainable procurement, underdevelopment, criticisms of globalization, and poaching. Through their contributions to Wikipedia, these students expanded the body of knowledge readily available on important topics that have been, for the most part, poorly represented.

To learn more about how to get involved, send us an email at contact@wikiedu.org or visit teach.wikiedu.org.

Image: Flooding after 1991 cyclone.jpg, by Val Gempis, public domain, via Wikimedia Commons.

by Ian Ramjohn at June 12, 2017 05:40 PM

Wikimedia Foundation

How will external forces hinder or help the future of the Wikimedia movement?

Photo by Thomas Bresson, CC BY-SA 3.0.

What are the key trends and ideas that will influence the success of the Wikimedia movement in the coming 15 years? This is the question we are delving into as we work alongside the Wikimedia Foundation in its strategic planning process.

We started the work by honing in on the five topics we think are most important to consider for Wikimedia 2030. We offer them here for your consideration and input. Besides researching these themes over the coming weeks, we will also be talking to dozens of nonprofit organizers, tech field leaders, journalists, and researchers to hear their thoughts and speculations about the world Wikimedia projects and participants will inhabit in 2030, and how best to prepare.

This post is the first of several invitations to learn about and get involved in our research project. Future posts will share the information and ideas we are synthesizing, offer the opportunity for dialogue, and provide links to key research and readings we find useful. Aggregate trends and insights will be offered at Wikimania 2017 and will be shared afterwards in a final report.

The five research themes we are exploring:

  • Demographics: Who is in the world in 2030?  What places will the most people call home? Will there be more people over or under the age of 30? Will bots outnumber people? We will give a satellite-high overview of global population trends, focusing on how the biggest growth may be happening within places where Wikimedia has significant headroom for participation and expansion. We will cover trends in technology, literacy, open society, educational attainment, and other key factors as they pertain to our central research themes and to the Wikimedia’s movement’s future.
  • Future of the commons:  What are threats–and what are hopes–for the free flow of knowledge? Many forces are at play that could lead to the contraction or expansion of the open web—from people inhabiting ever-smaller, disconnected filter bubbles online, to at the other extreme, demanding a more open, free, and interconnected information commons. We will sketch out different scenarios around the issues most important to the Wikimedia community: access, censorship, privacy, copyright, and intermediary liability. We will also identify some of the most powerful actors that might shape these futures, from governments to nonprofit standard-setters to corporate agents to malicious individuals.
  • Platforms and content: How will people’s media consumption change, who will be producing that media, and how will they do it? New technologies for communication and information sharing continue to emerge daily. By 2030, what will users expect in terms of the nature of their media consumption and production experiences? Media prognosticators promise inventions that will engage all five senses and turn our brains into a joystick in the process. What interfaces will people regularly use to access and create content? Where will they go to find information, entertainment, distraction or connections, and how will they expect to interact with these activities? How widely available will new tools be? We will consider a range of hardware, software, and content possibilities, from the imminent to the speculative, and examine what these might mean for ways the Wikimedia community evolves.
  • Future of reference and reading: What new information-seeking and creation behaviors are going to emerge? If we used to go to the shelf for the encyclopedia, and now we reach for the phone, what will we be doing in 2030? We’ll take a look at new forms of literacy beyond text and images, the transformation of formal and informal education settings, and problems related to verification. How will people collaborate around complex topics and come to shared understandings in an immersive and densely networked future? How will students employ technology for school work, and who will be creating content, as technology makes it possible for non-experts to create animation, or design games? What skills will adults need to continue to learn in a rapidly transforming world?
  • Misinformation: What can be done to make the knowledge we seek more trustworthy? And what is the next fake news frontier? Traditionally, Wikimedians have relied on transparency of the editing process and hyperlinks to sources to help readers decide if a given entry is comprehensive and fact-based. What would a hyper-transparency look like, where more layers are revealed, showing not just a link to a source but allowing ways to see the context of that source, including its provenance, and how it fits within the universe of sources? How will corporate or government censorship or algorithmic models shape public conversation, for bad, and for good? ..what will the bots be up to?

We welcome your comments and your contributions of links to relevant readings and research we can consider. In the meantime, we are honing our interview list, reading a lot and getting started on sourcing these critical questions.

Our perspective? As a service and a movement that millions rely on every day, Wikimedia’s future vitality is important to everyone on the planet. We are energized by our involvement in futures-facing research that can help guide Wikimedia strategy.

Jessica Clark, Dot Connector Studio
Sarah Lutman, Lutman & Associates

Dot Connector Studio is a Philadelphia-based media research and strategy firm focused on how emerging platforms can be used for social impact.

Lutman & Associates is a St. Paul-based strategy, planning, and evaluation firm focused on the intersections of culture, media, and philanthropy.

by Jessica Clark and Sarah Lutman at June 12, 2017 04:42 PM

Gerard Meijssen

#Causegraph, an other way of looking at #Wikidata

Causegraph is a tool to visualize and analyze cause/influence relationships using Wikidata. If you have not seen it yet, give it a spin.

Randomly looking at the galaxy of relations, I found a Charles Frédéric Bassenge, he is in Wikidata because he is the father of Pauline Runge. He is in Wikidata because she has an entry in WikiTree. What amazes me most is the quality of the data for the father and his absence in WikiTree. 

Causegraph works on the basis of there being a direct relation between two persons. For Jacob Palis, the doctoral students and doctoral advisers are included and not the other TWAS award winners.

What is really good is that it is regularly updated. It would be even better when it was a Labs tool. This might enable real time updates .. <grin> there is always a wish for more and better </grin>

by Gerard Meijssen (noreply@blogger.com) at June 12, 2017 12:40 PM

This month in GLAM

This Month in GLAM: May 2017

by Admin at June 12, 2017 05:44 AM

Tech News

Tech News issue #24, 2017 (June 12, 2017)

TriangleArrow-Left.svgprevious 2017, week 24 (Monday 12 June 2017) nextTriangleArrow-Right.svg
Other languages:
العربية • ‎čeština • ‎Ελληνικά • ‎English • ‎español • ‎فارسی • ‎suomi • ‎français • ‎עברית • ‎italiano • ‎日本語 • ‎ಕನ್ನಡ • ‎polski • ‎português do Brasil • ‎русский • ‎svenska • ‎українська • ‎Tiếng Việt • ‎中文

June 12, 2017 12:00 AM

June 11, 2017

Gerard Meijssen

How #Wikipedia gets into @Africa

This is a map showing how fiber is getting into Africa. The blind spots is where the Internet does not go. The red lines is where the future for the Wikimedia lies.

by Gerard Meijssen (noreply@blogger.com) at June 11, 2017 09:18 PM

#Wikidata - Premio Almirante Álvaro Alberto

The Premio Almirante Álvaro Alberto is named after admiral Álvaro Alberto da Mota e Silva. They are both notable for their own reasons.

The award was mentioned in an article on the German Wikipedia for César Camacho. The award was not known to Wikidata and was added. The website of the conferring organisation gives me the impression that it is the "National Council for Scientific and Technological Development" and part of the Brazilian ministry of sciences. When you look for it in Wikidata, it is embarrassing.

The admiral is probably a child of his time. He was military and also a very relevant scientist. As a military man he held the rank of vice admiral and as a scientist he was twice the president of the academy of scientists. He was also very much involved in the Brazilian nuclear program.

When you consider the notability of Brazil, it is astounding how little is known in Wikidata. Many politicians have been added for Brazil; national senators and deputies. 

Brazil is one of the top twenty countries in the world I think, when you consider any and all of the "lesser" countries it is obvious that we know even less. When Wikipedia and by inference Wikidata is about the sum of all knowledge, there is a lot of white space where all our tools have no impact.

by Gerard Meijssen (noreply@blogger.com) at June 11, 2017 04:42 PM

June 10, 2017

Weekly OSM

weeklyOSM 359


New Walkthrough in ID

An example of the revised walkthrough in iD 1 | © Mapbox, OpenStreetMap contributors CC-BY


  • Bryan posted a blog entry, A friendlier introduction to editing OpenStreetMap, which emphasizes on improving the iD editor walkthrough, a tutorial that guides new users through some basic editing tasks and teaches them the skills and confidence to improve OpenStreetMap.
  • GeoMappando wrote (it) (automatic translation) a great introduction to OSM, that includes data quality, licensing, database layout and the tagging system. This post is the beginning of a series: stay tuned for the second post about database exports and file formats.
  • The proposal to extend the ‘aeroway’ tagging with spaceport infrastructure has been accepted with 15 to 3 votes.
  • The tagging list discusses an appropriate tagging of restrictions for vehicles fueled by liquefied or compressed gas. This applies to underground parking facilities as well as the Chunnel, for instance.
  • This thread on the tagging mailing list reviews tags for truck parking.


  • Ouizi notes (fr) that OpenStreetMap is more detailed than any other map provider.
  • User piligab from Peru, published an OSM diary, “My first year at Mapbox working with OpenStreetMap”. She talks about her experience working in Mapbox and how to make OSM the best map of the world. By the way, she regularly contributes to the Spanish version of weeklyOSM as well. 😉
  • Chris Hill noticed new notes around Hull created by Street Complete. He installed the app and expressed dismay that it: added unnecessary information, such as surface=asphalt; and does not inform the user that they are adding them to OSM.
  • Various events took place for the students throughout Avignon as a part of the Education OSM program for this year’s SotM-FR. More information here.
  • The first #Geobeers get together of the Paraguayan OSM community took place in Asunción to discuss new projects.

OpenStreetMap Foundation

  • After a failure of the current hard disks, the company Metanet donated 8 SAS disks to the Swiss OpenStreetMap association. This ensures the operation of the server for the next 1 to 2 years. A big thank you to Metanet!
  • Simon Poole released the first draft of the geocoding guidelines. Feedback can be given until the end of June.


  • This year’s AGIT, the “largest yearly conference and fair about geoinformation” will take place (automatic translation) from July 5th to 7th in Salzburg (automatic translation). OSM will be featured.
  • Some nice items from the SotMFR 2017 in Avignon:
  • State of the Map US 2017 will be held in Boulder, Colorado, this year over the weekend of October 19th-22nd. Check out their website for more information.

Humanitarian OSM

  • David Luswata of HOT US reports about LEGIT team completing the field mapping in Zwedru City.
  • The Global Facility for Disaster Reduction and Recovery (GFDRR) is currently seeking a Short Term Consultant (8 month) who will work in Kampala, Uganda. Attention deadline: June 9th, 2017.
  • Melanie Eckle published the meeting minutes of the HOT board meeting that happened on 1st of June 2017.
  • The Open Data for Resilience Initiative (OpenDRI) is hiring a full-time consultant for their office in Washington D.C.


  • uMap prepared custom Avignon maps in occasion of SoTM-France.


  • The development (coding) phase of Google Summer of Code (GSoC) started on May 30th. OSM has five accepted projects. In addition students from OSGeo, KDE and Green Navigation will work on applications using OSM data.
  • Andy Allan describes his work on “Factory Refactoring”, a significant change in OSM website codebase that makes the test suite more reliable.
  • OSM data are now available on Amazon Web Services (AWS), as snapshots, historical archives and changesets.
  • Anita Graser reports about a prototype for pedestrian navigation using OSM data. Novel features include: routing through areas, and navigation by landmark (“turn left at the school”).


Software Version Release date Comment
Naviki iOS * 3.60 2017-05-16 Stability in recording increased, reports revised and bug fixes.
Mapillary iOS * 4.6.17 2017-05-31 Bugfix release.
Naviki Android * 3.60 2017-05-31 Stability in recording increased, reports revised and bug fixes.
Locus Map Free * 3.24.1 2017-06-01 Please read release info.
Komoot Android * var 2017-06-02 No infos.
Kurviger Free * 10.0.27 2017-06-02 Map styles, avoiding unpaved roads and other improvements.
Mapillary Android * 3.59 2017-06-02 Improved images upload process and some fixes.
OsMo Android 2.4.11 2017-06-03 OsmAnd integration, new remote commands and some bugfixes.

Provided by the OSM Software Watchlist. Timestamp: 2017-06-05 17:28:17+02 UTC

(*) unfree software. See: freesoftware.

Other “geo” things

  • A recent post on the Reddit, Data is Beautiful which went viral, shows community animating the transformation of subway maps of 15 cities to actual geography.
  • Wired publishes an article about air quality mapping at hyper-local scale and transboundary impact of global air pollution.
  • Mapbox developer Antonio Zugaldia worked on a connection for Amazon Alexa for navigation and traffic information.
  • Apple plans to improve (automatic translation) their maps by trained crowdworkers. A comment on Heise wonders why Apple wouldn’t use the much better OSM data instead, supporting the OSM community in return, which could mean a win-win deal.
  • The European Space Agency (ESA) announced the release of information, satellite imagery and associated geodata under the CC BY-SA 3.0 IGO licence, as well as an unprecedented prize for Copernicus Masters, the largest international competition in the commercial use of Earth observation data.
  • Last Thursday Japan sent the second of four satellites of their Quasi-Zenith Satellite System into space. They will send GPS compatible signals and augmentation values. Because of their elliptical, geosynchronous orbits the satellites will stay for a long time in the zenith over Japan and will provide highly accurate positioning around 10 centimeters.

Upcoming Events

Where What When Country
Berlin #CompletetheMap – Mapillary photo mapping 25/05/2017-15/06/2017 germany
Russia Tula Mapping Party, Tula 10/06/2017-11/06/2017
Suita 【西国街道#05・初心者向け】万博探索マッピングパーティ 10/06/2017 japan
Tokyo 第1回 東京!街歩かない!マッピングバーティ 10/06/2017 japan
Manila San Juan City Mapa-thon by MapAm❤re – Juan more time!, San Juan 10/06/2017 philippines
Passau Mappertreffen 12/06/2017 germany
Rennes Réunion mensuelle 12/06/2017 france
Nantes Rencontres mensuelles 13/06/2017 france
Lyon Rencontre mensuelle libre 13/06/2017 france
Freiberg Stammtisch Freiberg 15/06/2017 germany
Leipzig Stammtisch Leipzig 15/06/2017 germany
Zittau OSM-Stammtisch Zittau 16/06/2017 germany
Tokyo 東京!街歩き!マッピングパーティ:第9回 旧芝離宮恩賜庭園 17/06/2017 japan
Bonn Bonner Stammtisch 20/06/2017 germany
Lüneburg Mappertreffen Lüneburg 20/06/2017 germany
Nottingham Nottingham Pub Meetup 20/06/2017 united kingdom
Scotland Edinburgh 20/06/2017 united kingdom
Karlsruhe Stammtisch 21/06/2017 germany
Lübeck Lübecker Mappertreffen 22/06/2017 germany
Essen 8. FOSSGIS Hacking Event im Linuxhotel 23/06/2017-25/06/2017 germany
Essen SommerCamp 2017 23/06/2017-25/06/2017 germany
Bremen Bremer Mappertreffen 26/06/2017 germany
Salzburg AGIT2017 05/07/2017-07/07/2017 austria
Kampala State of the Map Africa 2017 08/07/2017-10/07/2017 uganda
Champs-sur-Marne (Marne-la-Vallée) FOSS4G Europe 2017 at ENSG Cité Descartes 18/07/2017-22/07/2017 france
Boston FOSS4G 2017 14/08/2017-19/08/2017 united states
Aizu-wakamatsu Shi State of the Map 2017 18/08/2017-20/08/2017 japan
Patan State of the Map Asia 2017 23/09/2017-24/09/2017 nepal
Boulder State of the Map U.S. 2017 19/10/2017-22/10/2017 united states
Buenos Aires FOSS4G+State of the Map Argentina 2017 23/10/2017-28/10/2017 argentina
Lima State of the Map LatAm 2017 29/11/2017-02/12/2017 perú

Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please check your event in our public calendar preview and correct it, where appropriate.

This weeklyOSM was produced by Anne Ghisla, Peda, Polyglot, Rogehm, SK53, derFred, jcoupey, jinalfoflia, keithonearth, kreuzschnabel, wambacher.

by weeklyteam at June 10, 2017 06:21 AM

Gerard Meijssen

#Wikidata - #diversity of #science - Professor Govind Swarup

Professor Swarup received the TWAS prize. The TWAS Prize is an annual award instituted in 1985 by The World Academy of Sciences to recognise excellence in scientific research in the global South. It follows that when attention is given to scientists like Mr Swarup, it should be easy to link to other scientists, particularly those from the global south.

With twenty two awards Mr Swarup does not disappoint. Many of the awards are from India; one of the conferring organisations, the Indian Science Congress Association, lists 41 awards. Its rule that a scientist can now receive only one award in his lifetime indicates how many scientists are recognised by the ISCA.

Making the TWAS prize winners more complete by adding the awards helps to improve the diversity of scientists. It is not only women who have not been fully recognised it is also the scientists from the global south.

by Gerard Meijssen (noreply@blogger.com) at June 10, 2017 05:12 AM

June 09, 2017

Wikimedia Foundation

Ten community-led projects awarded Project Grants

The tomb of Bibi Jawindi in Uch Sharif, Punjab, Pakistan, captured for Wiki Loves Monuments 2016. Photo by Usamashahid433, CC BY-SA 4.0.

We are excited to announce the successful grantees from the third round of the Wikimedia Foundation’s Project Grants program.

Project Grants support individuals, groups and organizations to implement new experiments and proven ideas, whether focused on building a new tool or gadget, organizing a better process on your wiki, researching an important issue, coordinating an editathon series or providing other support for community-building.

We launched the Project Grants program in 2016 to pilot new program designs created in response to community feedback. In early 2017, after two rounds of funding, we conducted a community survey to understand how the changes have impacted our grantees. Our report on the results of that survey are now available on Meta.

Project Grants are reviewed by a volunteer committee currently made up of 17 Wikimedians who come from over 13 different wikis and collectively speak over 15 languages. Outside of our Project Grant committee work, members edit, review, and translate content; help govern local chapters; write software; organize off-wiki events; facilitate workshops; work as sysops and bureaucrats; verify copyright and licensing permissions; draft and discuss project policies; and recruit and welcome new users to Wikimedia projects. Many members also serve as advisors to new grantees, helping to answer questions, connect them to relevant resources, and comment on monthly and midpoint reports.

In this latest round, 32 eligible proposals were submitted for the committee’s review. The committee has recommended that ten projects be funded to receive $224,900, divided into three themes: online organizing, offline outreach, and software. Here is what we are funding:

Online organizing: two projects funded

The ceiling in the Sioni Cathedral in Tbilisi, seen from Wiki Loves Monuments 2016. Photo by Diego Delso, CC BY-SA 4.0.

  • Wiki Loves Monuments 2017 coordination: This year, the Wiki Loves Monuments International Team will continue to encourage content on diverse cultural heritage sites in this annually-run contest. By addressing the need for partnerships and development of tools to support participants, this project hopes to bolster local communities to contribute and collaborate on the Wikimedia projects, with a focus on Commons and Wikidata.
  • Contest toolkits and prize funds: Led by a prolific English Wikipedian, Dr.Blofeld, this project aims to equip prospective contester organizers with toolkits and design ideas, enabling them to customize their own campaigns. In addition, the project will include a large contest to boost the geographic diversity of representation of women on English Wikipedia.

Offline outreach: seven projects funded

  • Wiki Loves Monuments in Perú: Recently-recognized affiliate Wikimedians of Peru User Group will organize a Wiki Loves Monuments contest through outreach efforts with the Ministry of Culture in Peru as well as with newer participants outside of Lima to contribute national and cultural heritage to the movement.
  • Multimedia Documentation of Traditional Trades and Crafts of Eastern, Northern and Up-Country Sri Lanka: Through extensive outreach in underrepresented regions in Sri Lanka, this project plans to expand knowledge of the traditional industries, agricultural trades, and crafts of Sri Lanka. With integrated support from the Noolaham Foundation, engagement with local communities through their livelihoods will be a significant step toward documenting cultural heritage in Tamil Wikipedia, Commons, Wiktionary, and Wikibooks.
  • Wikimedian in Residence at UNESCO 2017–18: As a follow-up to a previous Foundation grant with the United Nations Educational, Scientific and Cultural Organization (UNESCO), John_Cummings and Navino Evans will facilitate long-term infrastructure to support ongoing content donations, including media and text for Wikipedia and Commons, and structured data for Wikidata.  Using the UNESCO partnership as a model, they will educate and encourage other scientific and cultural institutions to contribute open license material.
  • Engaging with Academic Librarians and Sororities to Address the Gender Gap: A returning grantee from the Inspire Campaign, West Virginia University Libraries’ Wikipedian in Residence for Gender Equality will foster partnerships with three other academic institutions. The project will develop a scalable model for cross training student life and university library staff to promote Wikipedia editing as an option for sororities to meet their community service requirements.
  • UG GLAM Macedonia/Wikipedian in Residence: Through education and training, two Wikipedians in Residence will enable GLAM institutions in Skopje to produce public domain material to Macedonian Wikimedia projects. This will be an opportunity for the Macedonian community to collaborate with the State Archives in the Republic of Macedonia and City Library “Braka Miladinovci,” and to target areas for volunteers to explore and release new content to the movement.

Software: one project funded

EveryPolitician will populate Wikidata with structured data reflecting interrelationships of heads of government. Such information has many applications, including empowering citizen activists who fight corruption in leadership around the world. Screenshot, public domain/CC0.

  • EveryPolitician: UK-based organization mySociety aims to populate Wikidata with well-structured, consistent data on elected officials from around the world. EveryPolitician combines data from multiple primary and secondary sources from over 3.6 million data points on almost 73,000 politicians in 233 countries and territories. Through technical infrastructure and volunteer workflows, the project will establish ongoing updates to Wikidata about political leadership around the world, providing access to crucial information for citizens seeking to engage and advocate with their elected representatives.

Analysis of trends

Wikimedians in Residence

Wikimedians in Residence (WiRs) play an important role in our movement.   They serve as critical liaisons between mission-aligned partner organizations and our extensive community of volunteer contributors.  Through these partnerships, high quality content curated and maintained by the hosting organization becomes accessible online through the Wikimedia projects.  Ideally, the hosting organization funds the WiR’s work, though the Wikimedia Foundation occasionally offers supplemental funding support when the hosting organization is not able to fully cover costs and the specific opportunity is strategically valuable for the Wikimedia movement.  Foundation-funded WiRs do not directly create content; instead, they organize and empower volunteers with the resources available through the hosting organization in order to generate meaningful new content on our projects.  The goal is to leverage the partnership to build a platform that assures sustainable outcomes long after the WiR has completed their service.  WiRs might do this in many ways, including training organizational staff to upload content, implementing infrastructure to enable ongoing content donations, and creating online and offline opportunities for volunteers to engage in content creation and curation using those donations.

This year, we received six requests for Wikipedians in Residence and we have funded five of them.  Two veteran WiRs will continuing their existing work: John Cummings (now working with Navino Evans, will solidify workflows that will make Wikimedia projects ongoing recipients of UNESCO’s extensive data and collections; Kelly Doyle, based in the West Virginia University Libraries, will extend her reach to three more campuses, establishing a model to make editing Wikipedia a standardized component of sorority life across the United States.  In addition, several new WiRs will serve at El Colegio de México, Goge Africa, the State Archives in the Republic of Macedonia and City Library “Braka Miladinovci”.

Wiki Loves Monuments

The oldest and perhaps best-known international photo contest in the Wikimedia movement, Wiki Loves Monuments (WLM) has been inspiring and galvanizing volunteers since its origin in 2010.  Every year, it drives widespread photo-documentation of the world’s built cultural heritage.  In addition to attracting jaw-droppingly beautiful photo contributions to our projects, the contest serves an important role in supporting Wikimedian communities.  Because it provides a clear, accessible procedure that volunteer groups with widely varying levels of experience can follow, WLM supports national-scale contests in more countries each year.  This offers both new and veteran groups a relatively simple opportunity to participate in an international activity with richly diverse cultural results.  Cumulatively, these results are significant:  according to the international organizing team, WLM has now brought together the largest collection of monument data in the world.

This year, we funded two requests for WLM activities:  The international coordinating team will support the umbrella infrastructure that makes the contest as a whole possible.  In addition, we will welcome a national contest in Perú.

We received many compelling proposals this year that the committee decided not to fund. We encourage applicants who were not successful in this round of funding to refine and resubmit their proposals in upcoming rounds or to pilot a smaller project in Rapid Grants. Return proposals that have been revised in response to community and committee feedback are warmly welcomed. The open call for Project Grants 2017 Round Two will launch on August 28, 2017, with applications due September 26, 2017.

We look forward to reviewing your suggestions and future submissions, but for now we say congratulations to the successful grantees and encourage you to follow their progress as they begin work in the coming weeks.

Marti Johnson, Program Officer, Individual Grants
Morgan Jue, Community Resources Contractor
Wikimedia Foundation

by Marti Johnson and Morgan Jue at June 09, 2017 07:44 PM

Michael Kim, investor and civic leader, joins the Wikimedia Endowment Advisory Board

Photo courtesy of Michael Kim.

Michael Kim—the founder and Managing Partner of Cendana Capital, an investment firm focused on early stage venture capital, and former trustee of the Asian Art Museum and San Francisco Employee Retirement System—has been appointed to the Wikimedia Endowment Advisory Board.

Michael joins Wikipedia founder Jimmy Wales, venture capitalist Annette Campbell-White, philanthropist and professor Peter Baldwin, and business leader Niels Christian Nielsen as the fifth member of the board that is entrusted with overseeing the Wikimedia Endowment, a permanent source of funding to ensure Wikipedia thrives for generations to come.

In addition to his expertise in investment strategy and venture capital, Michael is actively involved in arts and public service in San Francisco, where he and his family reside. In 2004, he was appointed by then San Francisco Mayor Gavin Newsom to a five year term as a Trustee of the San Francisco Employee Retirement System, a $20 billion pension fund, and served as the President of the board and the Chairman of the Investment Committee.

“Wikipedia is one the most important cultural assets and public trusts of our time” says Michael Kim. “Building lasting support for Wikipedia and its sister projects ensures and sustains innovation, growth and learning for generations to come, and I’m thrilled to do my part to help.”

Michael served as a Trustee of the Asian Art Museum Foundation for ten years, where he also was a member of the investment committee that oversaw the Museum’s $100 million endowment.  He is the former Chairman of the Advisory Board of the Symphonix League of the San Francisco Symphony. Michael formerly served on the boards of Lead21, an organization that enables entrepreneurs to advocate free market public policy, and the Pacific Research Institute, a San Francisco-based think tank that champions freedom, opportunity and personal responsibility for all individuals by advancing free market policy solutions.

“Michael is an ideal candidate for the Board, given his deep experience in the management and governance of institutional assets with the highest level of fiduciary duty and stewardship, coupled with his long standing commitment to help global non-profit organizations,” says current Endowment Board Member Annette Campbell-White. “His thoughtfulness and strategic acumen will be a tremendous addition to the long term sustainability of Wikipedia, which means continued access to free knowledge for all.”

Michael is an honors graduate of Cornell University, where he served on the Cornell Council. He also has an MS from Georgetown University’s School of Foreign Service—where he serves on the MSFS Advisory Board—and an MBA from the Wharton School of Business.

Lisa Seitz-Gruwell, Chief Advancement Officer
Wikimedia Foundation

Endowment Board members are selected based on active involvement in philanthropic endeavors, prior nonprofit board experience, fundraising and investment expertise, and a strong commitment to the Wikimedia Foundation’s mission.

by Lisa Gruwell at June 09, 2017 05:11 PM

Content Translation Update

June 9 CX Update: More comfortable namespace selection


There are rare CX updates in the last few months because most of the work at the moment is about converting Content Translation’s editing component to Visual editor.

This week, however, a significant user-visible update was deployed: It is now easier to select the namespace in which the translation will be published:


It was always possible to publish a page to any namespace by typing the namespace name in the target title field, but many people indicated in their feedback that it would be easier to make the selection easier for people who are not so familiar with namespaces.

This is now possible by clicking the new “gear” button near the “Publish translation” button at the top. The default is “New page”, which will publish the page to the same namespace in which the source page is found. “Personal draft” will publish the page to a sub-page in the translator’s user space. The “Community draft” option is available only in wikis that defined a draft namespace; in the English Wikipedia this is “Draft:”, so selecting this option will add “Draft:” before the current title.

There will be several more tweaks coming soon to this feature.

by aharoni at June 09, 2017 02:28 PM

June 08, 2017

Wikimedia Tech Blog

Wikimedia’s open source software community launches Code of Conduct for technical spaces

Photo by Tyssul Patel, public domain/CC0.

We are proud to announce that the Wikimedia technical community has approved a Code of Conduct (CoC) that promotes a respectful, diverse, and welcoming environment in Wikimedia technical spaces. The CoC is a policy that creates clear expectations for how community members should interact, encouraging respectful and productive dialogue. It also describes how people can easily report behavior that does not meet these expectations.

Codes of conduct have become more popular recently in technology organizations and online communities, which have long grappled with how to ensure that everyone feels safe and respected in technical spaces on and offline. Like many other online communities, the Wikimedia technical community has been affected by harassment and other toxic behavior.  Harassment harms individuals, limits the potential for creativity and open collaboration, and discourages new contributors. Many in the Wikimedia movement, including the Wikimedia Foundation’s Board of Trustees, have made a commitment to help create a healthier and more inclusive Wikimedia community. The new code of conduct is an important step in mitigating harassment and creating a space where everyone feels welcome to participate in the Wikimedia technical community.

How we built it

To address the problem, professionals and volunteers in the community developed a policy through an open, collaborative drafting process.  This took place both online and at events like Wikimania conferences and the 2016 Developer Summit.  In other communities, drafting a code of conduct often involves fewer people, and decisions might be made by a project leader or  governing board. We instead used a deeply participatory approach, as has been used for other policy discussions in the Wikimedia movement. More than 140 editors participated in the public discussions, collectively contributing 2,718 edits to the discussion page. Others provided anonymous feedback.

Work began at a public Wikimania session in July 2015, in Mexico City. Developing policies to address harmful behavior in this community was a daunting task. Although codes of conduct have become increasingly common in free/open source software projects, Wikimedia’s technical spaces posed several specific challenges. For example, the CoC needed to address the needs and concerns of volunteers as well as Wikimedia Foundation employees. It needed to be enforceable, to ensure that technical community members would have a safe and welcoming space to contribute.  Finally, those who would be enforcing it needed to be trained in commonly encountered abusive dynamics, so that they could address CoC violations effectively and without further escalating the situation.  It was important, for instance, to include language deterring false or retaliatory reports. This is part of how we sought to protect victims from potential misuse of the policy.

We benefited from existing work, building on policies such as the Contributor Covenant, Wikimedia’s Friendly Space Policy, and the Citizen Code of Conduct.  We also benefited from expert advice and the support of the Support & Safety, Talent & Culture, and Legal teams at the Wikimedia Foundation. We expanded on these existing policies in order to meet our community’s specific needs.  Through detailed conversations, we resolved complicated issues, while focusing on how to make the Wikimedia technical community a better place for everyone to participate.

The Wikimedia technical community approved the CoC this March, concluding a 19-month process.  The Code of Conduct Committee recently began their work, after a community feedback process.  The Committee’s job is to receive reports, assess them, and determine how to respond.  For instance, they might issue warnings or enact temporary bans.

Reactions and reuse

“For over a year, Wikimedia Foundation staff and volunteer contributors have invested time and energy to develop a code of conduct that meets the unique needs of Wikimedia technical spaces and reflects the value our movement shares in respectful, open collaboration,” said Victoria Coleman, Chief Technology Officer of the Wikimedia Foundation. “This work is critical to creating welcoming, inclusive spaces for participation across the Wikimedia projects.”

Community members have welcomed the new policy.  “I applaud Wikimedia for posting a Code of Conduct and appointing a Committee to handle concerns,” said Anna Liao, a MediaWiki developer and Outreachy participant. “If I am ever the target of unacceptable behaviour or I witness it amongst others, there is a pathway to address these issues.”

Moritz Schubotz, a volunteer developer working on MediaWiki’s Math functionality, added that some situations “require the creation and enforcement of this CoC, to keep our working space nice and pleasant.”

The CoC is meant to set behavioral norms and create cultural change.  It shows how we seek to grow as a community, and we hope it increases people’s comfort and desire to join and participate more.

“No matter how open the community is, it should have a code of conduct,” technical volunteer Greta Doçi told us. “It promotes moral behavior, prevents negative legal effects, encourages positive relationships, and acts as a reference for solving ethical dilemmas.”

We encourage others, within the Wikimedia movement or elsewhere, to consider how a code of conduct or anti-harassment policy can strengthen their own community.  The policy itself is also open source for anyone to reuse and adapt.

Matthew Flaschen, Senior Software Engineer, Collaboration, Wikimedia Foundation
Moriel Schottlender, Software Engineer, Collaboration, Wikimedia Foundation
Frances Hocutt, Wikimedia community member and former Foundation staff

by Matthew Flaschen, Moriel Schottlender and Frances Hocutt at June 08, 2017 08:50 PM

Wiki Education Foundation

Women Screenwriters on Wikipedia

Liz Clarke is Assistant Professor of Media Arts & Cultures at the University of New Brunswick. In this post she shares her experience incorporating a Wikipedia assignment into her course on The History of Women Screenwriters, which she taught while at Concordia University.

In my film history courses a primary concept that I teach students is that the historical narratives we learn are shaped by who writes the narratives, what evidence is available, and what narratives have become dominant. As a feminist historian of film, this often means casting a critical eye on the ways that women are obscured from historical narratives. In the film industry, in particular, while there is still a dearth of women working in roles of production, they have never been entirely absent. Yet, the knowledge of women’s contributions to film is still hampered by the lack of visibility of their work. Incorporating an assignment where students were required to create Wikipedia entries for women and trans screenwriters allowed me to illustrate for students that Wikipedia still has holes in the information it provides, and that these holes can be linked to larger structures of historical knowledge and historical evidence.

In my course on The History of Women Screenwriters (Concordia University, Montreal, Winter 2016), a major intervention was required in the research materials available on the numerous women writers who have worked in film production since the origin of narrative film. The assignment required students to find a female screenwriter who either did not have a Wikipedia page or only had a stub article. The results were varied but almost all were successful. There were a variety of countries and time periods covered in our 60 student class: Anna Frijters, a silent film writer from Belgium; Nina Agadzhanova, who wrote an early version of what would become Battleship Potemkin; Jennifer Konner, the co-showrunner for HBO’s Girls; Melanie Dimantas, a contemporary Brazilian screenwriter; Sumie Tanaka, a Japanese screenwriter who worked heavily in film during the 1950s.

The success of the Wikipedia assignment in my course had a two-fold purpose: first, it allowed my students to become contributors to the information available for others to access online. Second, it taught students to think critically about who writes history and how the demographic of contributors can alter the content online. As part of the assignment I required students to do a reflection paper after completing and migrating their articles into the Wikipedia Mainspace. I tailored the reflection papers to ask them to discuss either what they had learned about gender bias on Wikipedia, what they felt about contributing to a public repository of information, or how their understanding of and interaction with Wikipedia had changed after becoming editors. The reflection papers revealed that the assignment worked on a variety of levels. First, it helped students better understand how Wikipedia articles are created, and how they can use Wikipedia with more critical awareness. Many students stated that they found it useful to learn about how Wikipedia articles are created, rather than simply being told by professors not to use it. A number of students also suggested that they were initially worried about the research involved because, many admitted, Wikipedia was often a “first stop” for them when trying to find information. However, the necessity to find original sources and the time we spent in class discussing how to do such research, helped many develop confidence in research skills. In my own experience with this assignment, I also noticed a significantly higher level of writing precisely because students were aware that their work would be public. The only challenge that I came across was answering the usual question, “what word count are you expecting?” Articles ranged in size, making it difficult to pinpoint a specific word count. Because the final product would necessarily be more varied than the process itself, I decided not to weight the grade of the article higher than other important parts of the process (the reflection paper, the peer reviews, etc) so that students would focus on crafting a strong Wikipedia article rather than word count alone. Finally, so many students expressed their enthusiasm about contributing to gaps in Wikipedia’s materials – of women and screenwriters. They felt as though they had made a difference, rather than just simply learning about the problem of women’s lack of representation in film history and on Wikipedia.

by Guest Contributor at June 08, 2017 04:44 PM

Wikimedia UK

Increasing diverse content on Wikimedia projects with UK music festivals and labels

Lady Leshurr at Field Day 2017 – image by Jwslubbock

I’ve been doing some outreach to various UK music festivals and labels to encourage them to release content on their artists and to consider giving Wikimedia community members press passes to take photographs at their events.

Last weekend I did some photography at Field Day 2017, taking photos of artists like Loyle Carner, Mura Masa, Omar Souleyman, Gaika, Lady Leshurr and Sinkane, most of whom did not have photos on Commons already. You can see all the photos here.

There are lots of other festivals where Black and Minority Ethnic (BME) artists make up a large proportion of the performers, but perhaps most prominently is Afropunk Festival in London on July 22-23. Artists like Lianne la Havas, Danny Brown, NAO, Corinne Bailey Rae, Little Simz, Saul Williams and Nadia Rose are performing at the new Printworks venue in Elephant and Castle, South London.

Afropunk’s organisers are happy to have Wikimedia photographers present, so if you would be interested in coming along to take photos, please get in touch with me at john.lubbock@wikimedia.org.uk. You can also help contribute to improving content on Wikimedia projects by adding to the WikiProject Black British Music page, which lists artists who need their articles improving or creating in the first place.

Sinkane at Field Day 2017 – image by Jwslubbock

We are blessed in the UK with an incredibly diverse and vibrant culture comprised of the hundreds of diaspora communities who live here. Britain grew rich and powerful by exploiting the peoples it colonised, but now we have the opportunity to open up knowledge and information so that it is accessible by everyone in the world. We also have the opportunity to animate and work in partnership with diaspora groups to encourage them to use Wikipedia as a way to make accurate information about their history and culture available to everyone.

That’s why I started the Kurdish Wikipedia Project, and why Wikimedia UK is working with Kurdish cultural organisations to train Kurdish people to edit Wikipedia and improve the its coverage of Kurdish history and culture. At the moment, there are only 28 people on Wikidata listed as Kurdish, compared to thousands of people belonging to groups with more developed Wikipedia communities.

Wikidata timeline showing all the Kurdish people with Wikidata items.

People in the music industry I have spoken to recognise that articles about their artists are often not very good, but they usually don’t understand how they can go about improving them without it being a conflict of interest, and why copyright makes it hard for them to release content to illustrate articles with. I spoke to representatives from two music labels a couple of weeks ago, but I found that content releases would be difficult as they would have to get permission from photographers who had granted them the rights to use photos of their artists, but might not be happy to release them on Open Licenses.

So that’s why we would like to encourage our community to get out there and help increase the diversity of content on Wikimedia. Perhaps you have photos of places outside Europe where little content exists currently on Commons? Perhaps there is a festival or cultural event you would like to go to but need help getting a press pass or with expenses? We can help.

Lots of organisations will be happy to give someone a press pass once they understand the content will be used to improve the Wikipedia articles about their event or artists. Tell us what events you would like to attend and we can see if we can get you a press pass.

Everyone can take part in improving the diversity of the content on Wikimedia projects. If we are to create the best, most accurate encyclopaedia in the world, it cannot only reflect the interests and culture of European people. So tell us your ideas, and let’s make Wikipedia more diverse.

by John Lubbock at June 08, 2017 02:01 PM

Wacky Wiki Races!

By Martin Poulter, Wikimedian-in-Residence at Bodleian Libraries

Wikipedia has more than five million articles in its English language version. No article is an island: with few exceptions, they have multiple incoming links as well as multiple links to other articles. Articles connect in a web, or like the cells in a brain. Take two widely different articles—say, Genghis Khan and Resonator guitar—and there is likely a path from one to another, but it will take quick thinking and ingenuity to find it. This is the idea behind Wikipedia racing.

A race can involve any number of players. At their computers, they “get on the starting line” by finding the start article on Wikipedia; in this case Genghis Khan. Once everybody is ready, the target article Resonator guitar is revealed, ideally on a screen to avoid it being misheard. There are variations of the rules, but in a straightforward example, the winner is the first to reach the target, only by following links in the body of the article. They cannot use the category links at the foot of the page, nor the links in the left sidebar, and definitely not the Wikipedia search box. They are allowed to use ctrl-F (command-F on Macs) to search the current page, as well as copy and paste. So if you see the word “guitar” on a page but it isn’t linked, you can save some keystrokes by copying and pasting it into the browser’s search box.

The Gregory Brothers—YouTube stars known for their hugely successful comedy songs—have made a series of Wikipedia racing videos which they call “Wiki-Wars”. They add post-match interviews, over-the-top graphics, and hilarious in-character commentary.

Ewan McAndrew and I ran a session on games at this year’s Open Educational Resources conference and discussed Wikipedia racing as an educational activity. It helps that players can reflect and discuss at the end of each round: the browser history (click and hold the back button of your web browser) show the articles visited in sequence. So players can easily retrace their path and analyse why their strategy won or lost.

In his keynote at the EduWiki 2013 conference, David White observed that assessment in schools and even universities usually assumes a scarcity of information; a scarcity that Wikipedia and other online resources have ended. Much more relevant to today’s world are overwhelming excesses of information and of options, where a person has to quickly evaluate the situation and make a choice. White challenged the audience to devise assessments that encourage the skills of leadership, including asking questions rather than just answering them.

While I wouldn’t be happy to see students sitting Wikipedia races for their university grades, it’s an activity that tests the skills White was talking about. Since the Open Educational Resources conference took place in the London district of Holloway, we got our audience to race from the Open educational resources article to Holloway, London. Success often involves moving from the starting article to a broader, more abstract concept, then zooming in to specifics to reach the target. London can be thought of as an aggregation of boroughs and districts; as an example of a large city, a capital city, or a city built on a river; or as the location of many notable events. Any of these facts might help with the race. A good racer will think of an article at multiple levels of abstraction at the same time.

A wiki race is not a situation where the teacher has “the answer” and the learners either find it or not. There will be an astronomical number of “correct” answers in the form of pathways from one article to the other, but most are prohibitively long. The players need to devise a strategy, carry it out quickly, and change tack if they do not make progress. They may well discover a path that is quicker than any the teacher had thought of.

Subject knowledge certainly helps in wiki racing, but not decisively. If you know that one of the central documents of the OER movement is the Paris OER Declaration, then you have a short-cut from Open educational resources to Paris and thence to London. If you don’t know this but can skim an article, find links, and judge which ones will take you towards the target, you can still win.

Having observed races on video and in real life, what stands out is a common theme in the psychology of problem-solving. People can get stuck in an inappropriate mental set: a set of assumptions and labels that they bring to the problem. Getting stuck in two-dimensional thinking for a puzzle that requires three dimensional thinking is an example. Progress involves changing a mental set that is no longer useful: people who can jump between ways can be very effective problem solvers. In wiki racing, people can hatch a plausible strategy but the link that they expect to see isn’t there. The rational thing to do is to backtrack and try another path, but it is easy for people to get stuck on the idea that their strategy should work. These are the players who read through same article again and again while others leap on to other articles.

Variations of the game and tips for customising are documented on a Wikipedia project page. You can choose widely different articles to make the game a test of information skills, or have similar articles (e.g. species, politicians) to make it more of a test of subject knowledge. You can make the race more difficult by forbidding the use of certain articles, or make it easier by allowing category links.

Our experience was that people found the game powerfully absorbing: it was hard to get people to stop and do something else! The feedback suggests that we showed people a different role for an educational resource such as Wikipedia: not like a book to be read from beginning to end, but like a public space in which you can run around, explore, and play games with other learners.

by Martin Poulter at June 08, 2017 01:08 PM

Wikimedia Performance Team

Improving time-to-logo performance with preload links

One of the goals of the Wikimedia Performance Team is to improve the performance of MediaWiki and the broader software stack used on Wikimedia wikis. In this article we’ll describe a small performance improvement we’ve implemented for MediaWiki and recently deployed to production for Wikimedia. It highlights some of the unique problems we encounter on Wikimedia sites and how new web standards can be leveraged to improve performance.

Logo as CSS background

The MediaWiki logo is defined as a CSS background image on an element. This is historically for caching reasons, because MediaWiki deployments tend to cache pages as a whole and changing the logo would thus require invalidating all pages if the logo was a regular <img> tag. By having it as a CSS background, updating the logo only requires invalidating the stylesheet where it resides. This constraint has significant implications on when the logo loads.

In the loading sequence of a web page, browsers will give a relatively low priority to CSS background images. In practice, assuming an empty browser cache, this means that the MediaWiki logo loads quite late, after most images that are part of the page content have been loaded. To the viewer, this results in the page loading somewhat out of order: images that aren’t necessarily in view are loaded first, and the logo is one of the last images to be loaded. This breaks the de facto expectation that a web page’s content loads from top to bottom.

This phenomenon extends the average duration of an imaginary metric one could call time-to-logo. The point in time when the logo appears is an important mental milestone, as it’s when a visitor has visual confirmation that they’ve landed on the right website. The issue of time-to-logo being high due to the CSS background limitation is felt even more on slow internet connections, where the logo can take seconds to appear - long after the page’s text and other images lower than the logo on the page have been loaded.

The preload link

We have been looking for a solution to this problem for some time, and a relatively new browser feature has enabled us to develop a workaround. The preload link keyword, developed by the W3C, allows us to inform the browser early that the logo will be needed at some point on the page. This feature can be combined with CSS media queries, which in our case means that the browser will only preload the right version of the logo for the current pixel density/zoom. This is essential, as we don’t want to preload a version of the logo that the page won’t need. Browser cache is also respected, meaning that all we’re doing is loading the logo a lot earlier than it naturally would, which is exactly what we were looking for. In fact, the browser now knows that it needs to load the logo a lot sooner than it would have if we displayed the logo as an <img> element without preload.

The preload links for the site logo have been deployed to production for all Wikimedia wikis. They can easily be spotted in the response header of pages that display the logo (the vast majority - if not all - pages on wikis for desktop users). This is actually leveraging a little-known browser feature where <link> tags can be passed as response headers, which in this situation allows us to inform the browser even sooner that the logo will be needed.

Link: </static/images/project-logos/enwiki.png>;rel=preload;as=image;media=not all and (min-resolution:1.5dppx),</static/images/project-logos/enwiki-1.5x.png>;rel=preload;as=image;media=(min-resolution:1.5dppx) and (max-resolution:1.999999dppx),</static/images/project-logos/enwiki-2x.png>;rel=preload;as=image;media=(min-resolution:2dppx)

Measuring the impact

To confirm the expected impact of logo preloading, we recorded a before and after video using synthetic testing with Sitespeed.io, on a simulated slow internet connection, for a large page (the Barack Obama article on English Wikipedia), where the problem was more dramatic. The left pane is the article loading without logo preloading, the right pane is with logo preloading enabled. Focus your attention on the top-left of the article, where the Wikipedia logo is expected to appear:

Unfortunately current javascript APIs in the browser aren’t advanced enough to let us measure something as fine-grained as time-to-logo directly from users, which means that we can only speculate about the extent to which it had an impact in the real world. The web performance field is making progress towards measuring more user-centric metrics, such as First Meaningful Paint, but we’re still very far from having the ability to collect such metrics directly from users.

In our case, the difference seen in synthetic testing is dramatic enough that have a high level of confidence that it has made the user experience better in the real world for many people.

The preload link isn’t supported by all major web browsers yet. When more browsers support it, MediaWiki will automatically benefit from it. We hope that wikis as large as Wikipedia relying on this very useful browser feature will be an incentive for more browsers to support it.

by Gilles (Gilles Dubuc) at June 08, 2017 11:07 AM

Wikimedia Scoring Platform Team

Join my Reddit AMA about Wikipedia and ethical, transparent AI

(This post was copied from https://lists.wikimedia.org/pipermail/ai/2017-May/000163.html)

Hey everybody,

TL;DR: I wanted to let you know about an upcoming experimental Reddit AMA ("ask me anything") chat we have planned. It will focus on artificial intelligence on Wikipedia and how we're working to counteract vandalism while also making life better for newcomers.

We plan to hold this chat on June 1st at 21:00 UTC/14:00 PST in the /r/iAMA subreddit[1]. I'd love to answer any questions you have about these topics questions, and I'll send a follow-up email to this thread shortly before the AMA begins.

For those who don't know who I am, I create artificial intelligences[2] that support the volunteers who edit Wikipedia[3]. I've been fascinated by the ways that crowds of volunteers build massive, high quality information resources like Wikipedia for over ten years.

For more background, I research and then design technologies that make it easier to spot vandalism in Wikipedia—which helps support the hundreds of thousands of editors who make productive contributions. I also think a lot about the dynamics between communities and new users—and ways to make communities inviting and welcoming to both long-time community members and newcomers who may not be aware of community norms. For a quick sampling of my work, check out my most impactful research paper about Wikipedia[3], some recent coverage of my work from *Wired*[4], or check out the master list of my projects on my WMF staff user page[5], the documentation for the technology team I run[9], or the home page for Wikimedia Research[8].

This AMA, which I'm doing with with the Foundation's Communications department, is somewhat of an experiment. The intended audience for this chat is people who might not currently be a part of our community but have questions about the way we work—as well as potential research collaborators who might want to work with our data or tools. Many may be familiar with Wikipedia but not the work we do as a community behind the scenes.

I'll be talking about the work I'm doing with the ethics of AI and how we think about artificial intelligence on Wikipedia, and ways we’re working to counteract vandalism on the world’s largest crowdsourced source of knowledge—like the ORES extension[6], which you may have seen highlighting possibly problematic edits on your watchlist and in RecentChanges.

I’d love for you to join this chat and ask questions. If you do not or prefer not to use Reddit, we will also be taking questions on ORES' MediaWiki talk page[7] and posting answers to both threads.

  1. https://www.reddit.com/r/IAmA/
  2. https://en.wikipedia.org/wiki/Artificial_intelligence
  3. https://www.mediawiki.org/wiki/ORES
  4. http://www-users.cs.umn.edu/~halfak/publications/The_Rise_and_Decline/halfaker13rise-preprint.pdf
  5. https://www.wired.com/2015/12/wikipedia-is-using-ai-to-expand-the-ranks-of-human-editors/
  6. https://en.wikipedia.org/wiki/User:Halfak_(WMF)
  7. https://www.mediawiki.org/wiki/Extension:ORES
  8. https://www.mediawiki.org/wiki/Talk:ORES
  9. https://www.mediawiki.org/wiki/Wikimedia_Research
  10. https://www.mediawiki.org/wiki/Wikimedia_Scoring_Platform_team

Principal Research Scientist @ WMF
User:EpochFail / User:Halfak (WMF)

by Halfak (Aaron Halfaker, EpochFail, halfak) at June 08, 2017 02:59 AM

June 07, 2017

Wikimedia Foundation

Building communities to support free knowledge: Addis Wang

Photo by Victor Grigas, CC BY-SA 3.0.

Every November, people from all around the world volunteer their time as part of Wikipedia Asian Month to write Wikipedia articles about the world’s largest continent. A participant in the contest who writes a few Wikipedia articles can get postcards featuring famous Asian monuments, assuming that the article meets certain quality standards.

The plan to encourage volunteer editors with simple yet relevant rewards has blossomed into thousands of new articles added to Wikipedia.

And who conceptualized all of this and made it a reality? Addis Wang first thought of Wikipedia Asian Month in 2015 and worked to coordinate the project with other Wikipedians.

The Chinese volunteer joined Wikipedia in 2008 as a young 16-year-old; he has always been concerned with Wikipedia’s presence in his country, China, and the Asian region. Wang extensively edited the Chinese Wikipedia for years until he came to believe that more sustainable change comes when the editing experience is shared with others. That’s when he turned his efforts toward outreach activities that encourage more people to edit, build healthier Wikipedia communities, and connect Wikipedians who don’t know each other.

“I feel most proud when a small effort I make can uniquely help knowledge spread,” says Wang. “I’ve created thousands of articles on the Chinese Wikipedia, but my current project, Wikipedia Asian Month, is the one I’m most proud of. It shows how Wikipedians around world collaborate and contribute to one common goal. It also has significantly assisted the development of many local communities and small language Wikipedia projects, especially those in Asia.”

According to Wang, a strong encyclopedia needs a community to support its existence, and those community members will not have the motivation to volunteer before learning about the project and understanding its needs.

“We want to share our knowledge with more people and want more people to edit,” Wang explains. “But we can’t say, ‘Hey, come edit Wikipedia’ to someone who doesn’t know that Wikipedia even exists.”

To help raise awareness about Wikipedia in his country, where local for-profit competitors host most of the local online content, Wang co-founded the first community user group for the Wikimedia movement in China in 2013. The group held regular meetups for existing Wikipedians and other activities for those interested in learning about Wikipedia. In addition, one of their very first activities was hosting a Wiki Loves Monuments contest, which is held globally but organized by country. The Chinese Wikimedia community joined Wiki Loves Monuments for the first time this year thanks to the efforts of Wang and others.

Moving to study at Ohio State University in 2012 might have separated Wang from the Wikipedia community in China, but that was no excuse for him to quit contributing. Instead, Wang quickly started to work with his fellow students on establishing a Wikipedian community on campus and off in Ohio.

“A long time ago, Wikipedian Kevin Payravi started a Wikipedia club that caught my attention, so I joined him,” Wang recalls. “We are also looking for opportunities to encourage more campus-based communities to practice what we’re doing.”

The student group aimed to invite students and educators to Wikipedia editing events; it has quickly grown and collaborated on starting a community user group in the state of Ohio in 2016. The new group has embarked on organizing several projects since its beginning, including Wiki Loves Monuments 2016 in the United States.

For Wang, Wikipedia has introduced a learning revolution, and that’s what keeps him willing to support it day after day.

“Wikipedia allows people from all over the world to share the world’s knowledge, and you can simply access it for free,” says Wang. “Because we use Wikipedia every day, it’s hard to imagine how we could find information without it. Online content is filled with unverified information and soft advertisements, books are limited and going to a library is time-consuming. This also depends on whether or not there is a decent library in your community, which is still uncommon, even today. That’s why I support Wikipedia and believe that it is so important.”

Interview by Jonathan Curiel, Senior Development Communications Manager
Profile by Samir Elsharbaty, Digital Content Intern
Wikimedia Foundation

by Jonathan Curiel and Samir Elsharbaty at June 07, 2017 02:27 PM

June 06, 2017

Wikimedia Scoring Platform Team

Status update (June 3rd, 2017)

Hey folks,

I'll be starting to post updates here on the phame blog from now on, but if you'd prefer to be notified via the mailing lists we used to post to, that's OK. I'll make sure that the highlights and the link to these posts gets pushed there too.

We had a big presence at the Wikimedia Hackathon 2017 in Vienna. We kicked off a lot of new language focused collaborations and we deployed a new Item Quality model for Wikidata.

French and Finnish Wikipedias now have advance edit quality prediction support!

ORES is available through api.php again via rvprop=orescores and rcprop=oresscores.

Wiki labels now has a new stats reporting interface. Check out https://labels.wmflabs.org/stats

We had a major hiccup when failing over to CODFW, but we worked it out and ORES is very happy again.

See the sections below for details.

Labeling campaigns

We deployed a new edit quality labeling campaign to English Wiktionary(T165876) and we're looking for someone who can work as a liaison for this task. We've also deployed secondary labeling campaigns to Finnish Wikipedia(T166558) and Turkish Wikipedia(T164672). These secondary campaigns help us improve ORES accuracy.

Outreach & comms

We hosted a session at the Wikimedia Hackathon to tell people about ORES and show how to work with us to get support for your local wiki(T165397). We also worked with the Collaboration Team to announce that ORES Review Tool would not be enabled by default and the New Filters would be deployed as a beta feature(T163153).

New development

Lots of things here. In our modeling library, we implemented the basics of Greek and Bengali language assets so that we can start working on prediction models(T166793, T162620). After talking to people at the Wikimedia Hackathon about peculiar language overlap, we implemented a regex exclusions strategy(T166793) that will allow us to clearly state that "ha" is not laughing in Hungarian or Italian, but it is in a lot of other contexts.

We also spent some time exploring the overlap of the "damaging" and "goodfaith" models on Wikipedia(T163995). We were able to show that there's useful overlap that will allow editors working on newcomer socialization to find goodfaith newcomer who are running into trouble. The Collaboration Team adjusted the thresholds in New Filters in response to our analysis(T164621).

Using data from Wiki labels(T157495), we trained a basic item quality model for Wikidata(T164862) and demonstrated it at the Wikimedia Hackathon(T166054). We used data from Wiki labels(T130261, T163012) to build advanced edit quality models for French and Finnish Wikipedia(T130282, T163013) and those are now deployed in ORES(T166047).

We implemented a new stats reporting interface in Wiki labels(T139956) and announced it (T166529). This interface makes it easier for people managing campaigns in Wiki labels to track progress. It's a long time coming. Props to @Ladsgroup for doing a bunch of work to make it happen.

Finally, we implemented a new "score_revisions" utility that makes it quick and easy to generate scores for a set of revisions using the ORES service(T164547). This is really useful for researchers who want lots of scores and would like to avoid taking down ORES. Personally, I've been using it to audit ORES.

Maintenance and robustness

We did a major deployment of ORES in mid-April(T162892) that had some serious problems in CODFW, but not EQIAD which was super confusing (T163950), so we re-routed traffic to EQIAD(350487). While investigating, we found out that some timeouts(T163944) and server errors(T163171, T163764, T163798) were due to the same problem: There were two servers in CODFW that we didn't know existed so they weren't getting new deployment and were poisoning our worker queue with old code!

We also fixed a couple of regressions that popped up in the ORES Review Tool while new work was being done on New Filters (T165011, T164984). We fixed some weird tokenization issues due to diacritics in Bengali not being handled correctly(T164767).

We re-enabled ORES in api.php(T163687). Props to @Tgr for making this happen.

We fixed some issues with ORES swagger documentation(T162184) and some UI issues in Wiki labels related to button colors(T163222) and confusing error messages(T138563).


We finished off some data-flow diagrams for ORES(T154441). As part of transitioning to a Wikimedia Foundation team (Scoring Platform! Woot!), we've moved all the documentation for ORES and our team to Mediawiki.org(T164991). Also, as part of the Tech Ops experimentation with failovers across datacenters, we updated our grafana metrics tracking to split metrics by datacenter(T163212). This helped us quite a bit with diagnosing the deployment issues we discussed in the last section.

That's all folks. I hope you enjoyed the new format!

by Halfak (Aaron Halfaker, EpochFail, halfak) at June 06, 2017 11:56 PM

Wikimedia Foundation

Sacrificing freedom of expression and collaboration online to enforce copyright in Europe?

Photo by Kain Kalju, CC BY 2.0.

Respect for copyright laws is a fundamental part of Wikipedia’s culture, and embedded in the free online encyclopedia’s five central pillars. Contributors diligently monitor new edits for compliance with copyright and collaboratively resolve disputes over the permissible use of a work or its status. When a rightsholder finds that their work is used without permission on a Wikimedia Project, we encourage them to talk to the community of editors to address the issue. If this does not lead to a resolution, under the Digital Millennium Copyright Act (the US analogue to the EU’s E-Commerce Directive), rightsholders can notify the Wikimedia Foundation, as the content host, of the alleged copyright infringement. The fact that we only received twelve such notices (only four of which were valid) in the second half of 2016 is a testament to the diligence of Wikipedia editors and the accuracy of human-based detection of copyright infringement.

Yet, because European lawmakers see copyright infringement as a problem on other platforms, they are currently debating a proposal for a new EU copyright directive that—if applied to Wikipedia—would put the site’s well functioning system in peril. Article 13 of the proposal would require “information society services” that store large amounts of content uploaded by users (as Wikimedia does) to take measures to prevent uploads of content that infringes copyright. A radical new “compromise amendment” would apply this requirement to all services—not just ones hosting “large amounts” of content. The Commission’s proposal suggests that hosts implement upload filters, or, as they call them, “effective content recognition technologies”. Some large, for-profit platforms already have such filtering technologies in place, but we strongly oppose any law that would make them mandatory. Filtering technologies have many flaws, and a requirement to implement them would be detrimental to the efficient and effective global online collaboration that has been Wikipedia’s foundation for the past 16 years.

First, filters are often too broad in their application because they aren’t able to account for the context of the use of a work. Automated content detection generally has no knowledge of licenses or other agreements between users, platforms, and rightsholders that may be in place. Such filtering systems also fail to make good case-by-case decisions that would take into consideration copyright laws in various countries that may actually allow for the use of a work online. As a result, a lot of culturally or otherwise valuable works are caught as “false positives” by the detection systems and consequently taken off the platforms. In fact, automated takedowns are such a prevalent phenomenon, that researchers have seen the need to document them in order to provide transparency around these processes that affect freedom of expression online, and maybe even the rule of law. Moreover, such filter systems also have been shown to create additional opportunities for attacks on users’ privacy.

Second, mandatory filtering technology that scans all uploads to a platform can be used for all kinds of purposes, not just copyright enforcement. Automatic content filters can also monitor expression and target illicit or unwanted speech, for instance under the guise of anti-terrorism policies. In other words: they can be repurposed for extensive surveillance of online communications. While intended to address copyright infringement, Art. 13 of the proposed copyright directive would actually lay the groundwork for mass-surveillance that threatens the privacy and free speech of all internet users, including Wikipedians who research and write about potentially controversial topics. All Europeans should be as concerned about these threats as the EU Wikimedia communities and we at the Wikimedia Foundation are.

Third, the broad and vague language of Art. 13 and the compromise amendment would undermine collaborative projects that rely on the ability of individuals around the world to discuss controversial issues and develop content together. Free knowledge that is inclusive, democratic, and verifiable can only flourish when the people sharing knowledge can engage with each other on platforms that have reasonable and transparent takedown practices. People’s ability to express themselves online shouldn’t depend on their skill at navigating opaque and capricious filtering algorithms. Automatic content filtering based on rightsholders’ interpretation of the law would—without a doubt—run counter to these principles of human collaboration that have made the Wikimedia projects so effective and successful.

Finally, automatic content detection systems are very expensive. YouTube spent USD 60 million to develop ContentID. Requiring all platforms to implement these filters would put young startups that cannot afford to build or buy them at a tremendous disadvantage. This would hurt, not foster, the digital single market in the European Union, as it would create a tremendous competitive advantage for platforms that already have implemented such filters or are able to pay for them. The result would be diminished innovation and diversity in the European interior market and less choice for European internet users.

As currently written, Art. 13 would harm freedom of expression online by inducing large-scale implementation of content detection systems. For many Europeans, Wikipedia is an important source of free knowledge. It is built by volunteers who need to be able to discuss edits with other contributors without opaque interference from automatic filters. Therefore, we urge the European Parliament and the Council to avert this threat to free expression and access to knowledge by striking Art. 13 from the proposed directive. If the provision stays, the directive will be a setback to true modernization of European copyright policy.

Jan Gerlach, Public Policy Manager
Wikimedia Foundation

by Jan Gerlach at June 06, 2017 03:00 PM

Brion Vibber

Brain dump: x86 emulation in WebAssembly

This is a quick brain dump of my recent musings on feasibility of a WebAssembly-based in-browser emulator for x86 and x86-64 processors… partially in the hopes of freeing up my brain for main project work. 😉

My big side project for some time has been ogv.js, an in-browser video player framework which uses emscripten to cross-compile C libraries for the codecs into JavaScript or, experimentally, the new WebAssembly target. That got me interested in how WebAssembly works at the low level, and how C/C++ programs work, and how we can mishmash them together in ways never intended by gods or humans.

Specifically, I’m thinking it would be fun to make an x86-64 Linux process-level emulator built around a WebAssembly implementation. This would let you load a native Linux executable into a web browser and run it, say, on your iPad. Slowly. 🙂

System vs process emulation

System emulators provide the functioning of an entire computer system, with emulated software-hardware interfaces: you load up a full kernel-mode operating system image which talks to the emulated hardware. This is what you use for playing old video games, or running an old or experimental operating system. This can require emulating lots of detail behavior of a system, which might be tricky or slow, and programs may not integrate with a surrounding environment well because they live in a tiny computer within a computer.

Process emulators work at the level of a single user-mode process, which means you only have to emulate up to the system call layer. Older Mac users may remember their shiny new Intel Macs running old PowerPC applications through the “Rosetta” emulator for instance. QEMU on Linux can be set up to handle similar cross-arch emulated execution, for testing or to make some cross-compilation scenarios easier.

A process emulator has some attraction because the model is simpler inside the process… If you don’t have to handle interrupts and task switches, you can run more instructions together in a row; elide some state changes; all kinds of fun things. You might not have to implement indirect page tables for memory access. You might even be able to get away with modeling some function calls as function calls, and loops as loops!

WebAssembly instances and Linux processes

There are many similarities, which is no coincidence as WebAssembly is designed to run C/C++ programs similarly to how they work in Linux/Unix or Windows while being shoehornable into a JavaScript virtual machine. 🙂

An instantiated WebAssembly module has a “linear memory” (a contiguous block of memory addressable via byte indexing), analogous to the address space of a Linux process. You can read and write int and float values of various sizes anywhere you like, and interpretation of bytewise data is up to you.

Like a native process, the module can request more memory from the environment, which will be placed at the end. (“grow_memory” operator somewhat analogous to Linux “brk” syscall, or some usages of “mmap”.) Unlike a native process, usable memory always starts at 0 (so you can dereference a NULL pointer!) and there’s no way to have a “sparse” address space by mapping things to arbitrary locations.

The module can also have “global variables” which live outside this address space — but they cannot be dynamically indexed, so you cannot have arrays or any dynamic structures there. In WebAssembly built via emscripten, globals are used only for some special linking structures because they don’t quite map to any C/C++ construct, but hand-written code can use them freely.

The biggest difference from native processes is that WebAssembly code doesn’t live in the linear memory space. Function definitions have their own linear index space (which can’t be dynamically indexed: references are fixed at compile time), plus there’s a “table” of indirect function references (which can be dynamically indexed into). Function pointers in WebAssembly thus aren’t actually pointers to the instructions in linear memory like on native — they’re indexes into the table of dynamic function references.

Likewise, the call stack and local variables live outside linear memory. (Note that C/C++ code built with emscripten will maintain its own parallel stack in linear memory in order to provide arrays, variables that have pointers taken to them, etc.)

WebAssembly’s actual opcodes are oriented as a stack machine, which is meant to be easy to verify and compile into more efficient register-based code at runtime.

Branching and control flow

In WebAssembly control flow is limited, with one-way branches possible only to a containing block (i.e. breaking out of a loop). Subroutine calls are only to defined functions (either directly by compile-time reference, or indirectly via the function table)

Control flow is probably the hardest thing to make really match up from native code — which lets you jump to any instruction in memory from any other — to compiled WebAssembly.

It’s easy enough to handle craaaazy native branching in an interpreter loop. Pseudocode:

loop {
instruction = decode_instruction(ip)
instruction.execute() // update ip and any registers, etc

In that case, a JMP or CALL or whatever just updates the instruction pointer when you execute it, and you continue on your merry way from the new position.

But what if we wanted to eke more performance out of it by compiling multiple instructions into a single function? That lets us elide unnecessary state changes (updating instruction pointers, registers, flags, etc when they’re immediately overridden) and may even give opportunity to let the compiler re-optimize things further.

A start is to combine runs of instructions that end in a branch or system call (QEMU calls them “translation units”) into a compiled function, then call those in the loop instead of individual instructions:

loop {
tu = cached_or_compiled_tu(ip)
tu.execute() // update registers, ip, etc as we go

So instead of decoding and executing an instruction at a time, we’re decoding several instructions, compiling a new function that runs them, and then running that. Nice, if we have to run it multiple times! But…. possibly not worth as much as we want, since a lot of those instruction runs will be really short, and there’ll be function call overhead on every run. And, it seems like it would kill CPU branch prediction and such, by essentially moving all branches to a single place (the tu.execute()).

QEMU goes further in its dynamic translation emulators, modifying the TUs to branch directly to each other in runtime discovery. It’s all very funky and scary looking…

But QEMU’s technique of modifying trampolines in the live code won’t work as we can’t modify running code to insert jump instructions… and even if we could, there are no one-way jumps, and using call instructions risks exploding the call stack on what’s actually a loop (there’s no proper tail call optimization in WebAssembly).


What can be done, though, is to compile bigger, better, badder functions.

When emscripten is generating JavaScript or WebAssembly from your C/C++ program’s LLVM intermediate language, it tries to reconstruct high-level control structures within each function from a more limited soup of local branches. These then get re-compiled back into branch soup by the JIT compiler, but efficiently. 😉

The binaryen WebAssembly code gen library provides this “relooper” algorithm too: you pass in blocks of instructions, possible branches, and the conditions around them, and it’ll spit out some nicer branch structure if possible, or an ugly one if not.

I’m pretty sure it should be possible to take a detected loop cycle of separate TUs and create a combined TU that’s been “relooped” in a way that it is more efficient.

BBBBuuuuutttttt all this sounds expensive in terms of setup. Might want to hold off on any compilation until a loop cycle is detected, for instance, and just let the interpreter roll on one-off code.

Modifying runtime code in WebAssembly

Code is not addressable or modifiable within a live module instance; unlike in native code you can’t just write instructions into memory and jump to the pointer.

In fact, you can’t actually add code to a WebAssembly module. So how are we going to add our functions at runtime? There are two tricks:

First, multiple module instances can use the same linear memory buffer.

Second, the tables for indirect function calls can list “foreign” functions, such as JavaScript functions or WebAssembly functions from a totally unrelated module. And those tables are modifiable at runtime (from the JavaScript side of the border).

These can be used to do full-on dynamic linking of libraries, but all we really need is to be able to add a new function that can be indirect-called, which will run the compiled version of some number of instructions (perhaps even looping natively!) and then return back to the main emulator runtime when it reaches a branch it doesn’t contain.

Function calls

Since x86 has a nice handy CALL instruction, and doesn’t just rely on convention, it could be possible to model calls to already-cached TUs as indirect function calls, which may perform better than exiting out to the loop and coming back in. But they’d probably need to be guarded for early exit, for several reasons… if we haven’t compiled the entirety of the relooped code path from start to exit of the function, then we have to exit back out. A guard check on IP and early-return should be able to do that in a fairly sane way.

function tu_1234() {
// loop
do {
// calc loop condition -> set zero_flag
ip = 1235
if !zero_flag {
ip = 1236
// CALL 4567
tu = cached_or_compiled_tu(4567)
if ip != 1236 {
// only partway through. back to emulator loop,
// possibly unwinding a long stack 🙂
// more code

I think this makes some kind of sense. But if we’re decoding instructions + creating output on the fly, it could take a few iterations through to produce a full compiled set, and exiting a loop early might be … ugly.

It’s possible that all this is a horrible pipe dream, or would perform too bad for JIT compilation anyway.

But it could still be fun for ahead-of-time compilation. 😉 Which is complicated… a lot … by the fact that you don’t have the positions of all functions known ahead of time. Plus, if there’s dynamic linking or JIT compilation inside the process, well, none of that’s even present ahead of time.

Prior art: v86

I’ve been looking at lot at v86, a JavaScript-based x86 system emulator. v86 is a straight-up interpreter, with instruction decoding and execution mixed together a bit, but it feels fairly straightforwardly written and easy to follow when I look at things in the code.

v86 uses a set of aliased typed arrays for the system memory, another set for the register file, and then some variables/properties for misc flags and things.

Some quick notes:

  • a register file in an array means accesses at difference sizes are easy (al vs ax vs eax), and you can easily index into it from the operand selector bits from the instruction (as opposed to using a variable per register)
  • is there overhead from all the object property accesses etc? would it be more efficient to do everything within a big linear memory?
  • as a system emulator there’s some extra overhead to things like protected mode memory accesses (page tables! who knows what!) that could be avoided on a per-process model
  • 64-bit emulation would be hard in JavaScript due to lack of 64-bit integers (argh!)
  • as an interpreter, instruction decode overhead is repeated during loops!
  • to avoid expensive calculations of the flags register bits, most arithmetic operations that would change the flags instead save the inputs for the flag calculations, which get done on demand. This still is often redundant because flags may get immediately rewritten by the next instruction, but is cheaper than actually calculating them.

WebAssembly possibilities

First, since WebAssembly supports only one linear memory buffer at a time, the register file and perhaps some other data would need to live there. Most likely want a layout with the register file and other data at the beginning of memory, with the rest of memory after a fixed point belonging to the emulated process.

Putting all the emulator’s non-RAM state in the beginning means a process emulator can request more memory on demand via Linux ‘brk’ syscall, which would be implemented via the ‘grow_memory’ operator.

64-bit math

WebAssembly supports 64-bit integer memory accesses and arithmetic, unlike JavaScript! The only limitation is that you can’t (yet) export a function that returns or accepts an i64 to or from JavaScript-land. That means if we keep our opcode implementations in WebAssembly functions, they can efficiently handle 64-bit ops.

However WebAssembly’s initial version allows only 32-bit memory addressing. This may not be a huge problem for emulating 64-bit processes that don’t grow that large, though, as long as the executable doesn’t need to be loaded at a specific address (which would mean a sparse address space).

Sparse address spaces could be emulated with indirection into a “real” memory that’s in a sub-4GB space, which would be needed for a system emulator anyway.

Linux details

Statically linked ELF binaries would be easiest to model. More complex to do dynamic linking, need to pass a bundle of files in and do fix-ups etc.

Questions: are executables normally PIC as well as libraries, or do they want a default load address? (Which would break the direct-memory-access model and require some indirection for sparse address space.)

Answer: normally Linux x86_64 executables are not PIE, and want to be loaded at 0x400000 or maybe some other random place. D’oh! But… in the common case, you could simplify that as a single offset.

Syscall on 32-bit is ‘int $80’, or ‘syscall’ instruction on 64-bit. Syscalls would probably mostly need to be implemented on the JS side, poking at the memory and registers of the emulated process state and then returning.

To do network i/o would probably need to be able to block and return to the emulator… so like a function call bowing out early due to an uncompiled branch being taken, would potentially need an “early exit” from the middle of a combined TU if it does a syscall that ends up being async. On the other hand, if a syscall can be done sync, might be nice not to pay that penalty.

Could also need async syscalls for multi-process stuff via web workers… anything that must call back to main thread would need to do async.

For 64-bit, JS code would have to …. painfully … deal with 32-bit half-words. Awesome. 😉


WebAssembly initial version has no facility for multiple threads accessing the same memory, which means no threads. However this is planned to come in future…

Processes with separate address spaces could be implemented by putting each process emulator in a Web Worker, and having them communicate via messages sent to the main thread through syscalls. This forces any syscall that might need global state to be async.

Prior art: Browsix

Browsix provides a POSIX-like environment based around web techs, with processes modeled in Web Workers and syscalls done via async messages. (C/C++ programs can be compiled to work in Browsix with a modified emscripten.) Pretty sweet ideas. 🙂

I know they’re working on WebAssembly processes as well, and were looking into synchronous syscalls vi SharedArrayBuffer/Atomics as well, so this might be an interesting area to watch.

Could it be possible to make a Linux binary loader for the Browsix kernel? Maybe!

Would it be possible to make graphical Linux binaries work, with some kind of JS X11 or Wayland server? …mmmmmmaaaaybe? 😀

Closing thoughts

This all sounds like tons of fun, but may have no use other than learning a lot about some low-level tech that’s interesting.

by brion at June 06, 2017 08:59 AM

Wikimedia Scoring Platform Team

Status update (September 28th, 2016)

(This post was copied from https://lists.wikimedia.org/pipermail/ai/2016-September/000102.html)


This is the 23rd weekly update from revision scoring team that we have sent
to this mailing list.

New development

  • We implemented and demonstrated a linguistic/stylometric processing strategy that should give us more signal for finding vandalism and spam[1]. See the discussion on the AI list[2].
  • As part of our support for the Collaboration Team, we've been producing tables of model statistics that correspond to set of thresholds[3]. This helps their designers work on strategies for reporting prediction confidence in an intuitive way.

Maintenance and robustness

  • We had a major downtime event that was caused by our logs being too verbose. We've recovered and turned down the log level[4].
  • We made sure that halfak got pings when ores.wikimedia.org goes down[5]


  • We created a database on Wikimedia Labs that provides access to a dataset containing a complete set of article quality predictions for English Wikipedia[6]. See our announcements[7,8,9].
  1. https://phabricator.wikimedia.org/T146335 -- Implement a basic scoring strategy for PCFGs
  2. https://lists.wikimedia.org/pipermail/ai/2016-September/000098.html
  3. https://phabricator.wikimedia.org/T146280 -- Produce tables of stats for damaging and goodfaith models
  4. https://phabricator.wikimedia.org/T146581 -- celery log level is INFO causing disruption on ORES service
  5. https://phabricator.wikimedia.org/T146720 -- Ensure that halfak gets emails when ores.wikimedia.org goes down
  6. https://phabricator.wikimedia.org/T106278 -- Setup a db on labsdb for article quality that is publicly accessible
  7. https://phabricator.wikimedia.org/T146156 -- Announce article quality database in labsdb
  8. https://lists.wikimedia.org/pipermail/ai/2016-September/000091.html
  9. https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)/Archive_149#ORES_article_quality_data_as_a_database_table

Aaron from the Revision Scoring team

by Halfak (Aaron Halfaker, EpochFail, halfak) at June 06, 2017 05:41 AM