weeklyOSM 437

14:37, Saturday, 08 2018 December UTC



Reference point for mobile devices 1 | © Kanton Zürich, Baudirektion, Amt für Raumentwicklung

About us


  • PoliMappers, the YouthMappers’ chapter at Politecnico di Milano (Italy), is organising the second edition of PoliMappers Adventures. During the month of December, you can complete a daily quest related to the world of OpenStreetMap. Quests range from mapping, humanitarian help, and many other subjects. Follow them on Twitter for more updates.
  • CartoRoute ("MapRoad") is a collaboration between Michelin and the OSM Ivory Coast association aiming to map the main road network across the country: road status, surface, speed limits, traffic signs, dining areas, hotels, and gas stations in the area surrounding the roads, etc. The pilot phase of the project has started in the commune of Plateau (Abidjan) where 11 local OSM contributors collected data during two days using OsmAnd, as related in this Twitter moment.
  • The tag cycleway:surface has already been used over 500 times but it was only now properly documented in the OSM Wiki. The tag allows specifying the surface of just the cycleway, for example for a cycleway on a road, where the surfaces differ.
  • The new Organised Editing Guidelines have been approved by the board during the previous OSMF board meeting on November 15th. The guidelines, which were drafted by the Data Working Group, aim to improve the documentation, transparency and the practices of organised mapping initiatives.
  • The lack of maintenance of the electoral boundaries in Germany led to the proposal (de) (automatic translation) to delete the data. The proposal received a lot of support but also some opposition.
  • The voting for mapping ‘tramtrack on highway’ failed. However, the originator Jukka Nikulainen is preparing a new proposal based on the feedback he received during the proposal process.


  • Can Ünen (OSM unen), university lecturer in Istanbul, Turkey, is the new Mapper of the Month. As usual, OSM Belgium published a background interview with the winner.
  • South Korean mapper GPIOIPG made a survey as to whether OSM public transport station names should have the suffix -역(i.e. station). Usually, the official names are without the suffix. Apparently, the suffix is widely used by people and even some signs show the suffix, but ‘station’ isn’t attached after station names in other countries. In a blog post at his user diary, GPIOIPG writes about the result of his survey, that he has advertised at the forum and the mailing list, and provides some more background information.
  • Nakaner has loaded all the e-mails from all public mailing lists on lists.openstreetmap.org into a database. Two entries (1, 2) in his user blog contain the number of messages per year and per mailing list and the list of the most active authors.
  • SunCobalt notes that shortly before the current and the past election the number of OSMF members increases strongly and visualises this by a graphic. Honi soit, qui mal y pense.

OpenStreetMap Foundation

  • The official manifestos and candidates responses for this year’s OSMF board election have now been published. Christoph Hormann has written a summary and Paul Norman has written an evaluation guide. There are discussions in the Forum as well as on osmf-talk mailing list.
  • The Membership Working Group (MWG) reports about its recent activity. The MWG is currently busy with the new membership fee waiver program and has to deal with countries that have no suitable money transfer options for paying the membership fee. As with weeklyOSM, and other OSMF working groups, the MWG is looking for volunteers.
  • Mikel Maron, OSMF board member, announced the launch of Welcome Mat. The website Welcome Mat was started to help external organisations to understand OSM’s difficult structure, to explain how OSM works, and how and where to engage.


  • FOSS4G Italy announced (automatic translation) a call for papers or workshops proposals for the third Italian FOSS4G 2019, which will be held in Padua from 20th to 24th February. Proposals are due by the 13th December.

Humanitarian OSM

  • OpenGovHub hosted a mapathon on December, 4th with the HOT Staff at the OpenGov Hub in Washington, DC.
  • HOT provides an update on their Microgrants 2018 program that was launched in April 2018. Eight communities received microgrants in order to improve OSM and help to minimise the impact of disasters.


  • OpenStreetBrowser now has a new category under "Leisure, Sport and Shopping". According to skunk’s blog post, the new category "Swimming and Bathing" includes all kind of swimming and bathing facilities and saunas.



  • Heidelberg University’s GIScience Research Group introduced API Playground, which allows you to explore OpenRouteService API services, parameters, and responses.
  • Matthias made a prototype of an external voice controller for JOSM. The tool speech2JOSM is available on GitHub.


  • A new stable version of JOSM has been released. Version 18.11 saves the height of each panel on the right when JOSM is closed, it comes with a dedicated button for Download as new layer, instead of a tick box and improves a lot of other features. The most notable change is probably the fix of the glitched GPS traces caused by the OSM website API change.

Did you know …

  • The MapOSMatic instance at osm-baustelle.de recently exceeded 40,000 rendered maps. The website further enhanced its functionality with user interface improvements and hyperlink support in multipage PDF outputs.
  • Stefan Keller, Computer Science Professor at the University of Applied Sciences of Rapperswil, has created a Fog Map. The map shows how to find your way to the sun on grey, foggy November days. The website Bluewin.ch provides (de) (automatic translation) some background information and tests if the map works. Unfortunately, the map only covers Switzerland.
  • …. the service kinderkiez.net? The service, that is available in English and German, lets you create a children’s play mat with a map based on OSM.
  • … the long list of links to places where you can find a suitable tag if you can’t find one in your editor’s presets?

Other “geo” things

  • In Switzerland, an old international boundary post has been relocated to the grounds of the Swiss National Museum in Zurich. It now provides the first reference point for mobile devices (automatic translation) via QR codes or a website (OSM-Link).
  • The iXpoint company won (de)(automatic translation) the Baden-Württemberg Challenge of the European Satellite Navigation Competition (ESNC) 2018 with its OSM based pedestrian routing.
  • Gaël Musquet, founder and former spokesman for OpenStreetMap France (French local chapter of the OSM Foundation), has been awarded (automatic translation) the French National Order of Merit, along with one of the VLC developers. He sees it as a recognition for the whole community and the leading French figures in open-source.
  • Medium.com explains how to get ADS-B position data from airplanes onto an OSM based map using Python and the Cartopy library.

Upcoming Events

Where What When Country
Alice PoliMappers Adventures 2018: One mapping quest each day 2018-12-01-2018-12-31 everywhere
Niamey OSM and GIS training camp at CNF 2018-12-03-2018-12-07 niger
Tångstad Foundation board elections voting opens 2018-12-08
Ouagadougou Mapathon Independance Day at La Ruche 2018-12-10 burkina faso
Rennes Réunion mensuelle 2018-12-10 france
Lyon Rencontre mensuelle pour tous 2018-12-11 france
Zurich Jubilee Stammtisch Zurich with Fondue 2018-12-11 switzerland
Salzburg Maptime Salzburg 2018-12-12 austria
Mannheim Mannheimer Mapathons – now in Ludwigshafen! 2018-12-12 germany
Helsinki Missing Maps Mapathon at Finnish Red Cross HQ – Dec 2018 2018-12-13 finland
Munich Münchner Stammtisch 2018-12-13 germany
Berlin 126. Berlin-Brandenburg Stammtisch 2018-12-14 germany
online via IRC Foundation Annual General Meeting 2018-12-15 everywhere
Cologne Bonn Airport Bonner Stammtisch 2018-12-18 germany
Lüneburg Lüneburger Mappertreffen 2018-12-18 germany
Nottingham Pub Meetup 2018-12-18 united kingdom
Reutti Stammtisch Ulmer Alb 2018-12-18 germany
Rennes Recensement des panneaux publicitaires 2018-12-23 france
Leipzig OpenStreetMap assembly 2018-12-27-2018-12-30 germany
Heidelberg State of the Map 2019 (international conference) 2019-09-21-2019-09-23 germany

Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please check your event in our public calendar preview and correct it, where appropriate.

This weeklyOSM was produced by PierZen, Polyglot, Rogehm, SK53, SunCobalt, TheSwavu, derFred, jinalfoflia.

It is with deep regret that I share that Eileen Hershenov, General Counsel and Board Secretary, will be departing from her position at the Wikimedia Foundation. On behalf of the executive team, Foundation Board, and myself, I want to thank Eileen for her critical contributions to advance our legal, public policy, and advocacy work during her time with us at the Foundation.

Eileen will be departing from her position in early December, and Tony Sebro, current Deputy General Counsel, will act as Interim General Counsel as we look to fill the position permanently. I thank Tony for his willingness to support this transition period and know and trust the Legal Department will be under excellent leadership during this time.

In her departing email to staff, Eileen said, “I leave hoping that I will be able to work with many of you again in some capacity in the future.  We work on such amazing projects, and for such an important mission. How lucky to work for something that matters so much to so many people, and to be fighting—in the face of sometimes daunting technological, social and political headwinds—to maintain and extend the foundational values and principles necessary to any truly open, free and egalitarian society.” In her email, Eileen also noted that while she had hoped to be at the Foundation for many years, family reasons necessitated her move back to New York.

During her tenure, Eileen led an experienced team of attorneys and public policy experts to defend the Wikimedia projects, communities, and Foundation and advance our mission. She was instrumental in our ongoing work to lift the block of Wikipedia in Turkey, representing Wikimedia in Ankara within days of joining us. She advanced our ongoing litigation against the NSA and saw us through when we became the sole plaintiff of the case.

Eileen led the organization’s policy direction during a period of increased global focus on privacy, while  improving our existing policies to protect the rights of our readers and editors around the world. She also worked with our Chief Financial Officer, Jaime Villagomez, to mature our organization-wide risk identification and assessment as part of strategic planning. For her team, she has been a loyal and committed leader, deeply focused on developing the incredible team she inherited, while recruiting additional excellent people to join our mission.

While Eileen is leaving too soon, she leaves us with a tremendously thoughtful, capable, and strategic department, in which I have complete confidence while we conduct a search for a new General Counsel in the coming months. I am also grateful that Eileen has agreed to continue to work with us on some projects over the remainder of this year and perhaps into the next.

With much gratitude, I thank Eileen for her service to our organization, our movement, and our mission. I will miss her counsel, but am grateful for all she did during her time here. I know I speak for my colleagues at the Foundation in looking forward to her continued presence in the Wikimedia community.

Katherine Maher, Executive Director
Wikimedia Foundation

To bridge all the gaps

16:33, Tuesday, 04 2018 December UTC

As a classroom project, we hear time and again that the Wikipedia assignment is a huge motivator for students. In our Fall 2016 research study into student learning outcomes, for example, we found that “in addition to their value in learning digital/information literacy, critical research, teamwork, and technology skills, Wikipedia-based assignments also help increase students’ motivation to complete work over traditional writing assignments.” In my recent webinar with a group of instructors from Indiana University of Pennsylvania, I also heard this feedback echoed:

“Social equity editing can be very motivating,” says Matthew Vetter, longtime Wiki Education advocate and Assistant Professor of English at Indiana University of Pennsylvania. For example, Dr. Vetter recently worked with his graduate students to complete a citation analysis across five Wikipedia articles related to computers, writing, digital literacy, and digital rhetoric in order to show how Wikipedia’s gender gap manifests in the absence of cited research by non-male scholars. He then asked his students to edit Wikipedia to improve representation of women and women’s research in these areas. But after using the Wikipedia assignment and Wiki Education’s tools in more than 10 courses, Dr. Vetter has also found that “some student’s aren’t ready… it can be overwhelming for them to confront knowledge as a thing in flux.”

In her first term teaching the Wikipedia assignment this fall, Jialei Jiang noticed that “some of my students have said they want to bridge all the gaps.” This is one of the great outcomes of teaching with Wikipedia… that students begin to see issues with sourcing and content more often, and are prepared to deal with those issues. “My students are always so surprised when they are told they will be doing this,” says Dr. Vetter, “We’re taking something they heard was unrealistic [using and editing Wikipedia] and flipping that narrative.”

Webinar participants also mentioned feeling “very fulfilled while teaching this assignment” and that it allows them to “teach students collaboration and genre specific work.” To get involved, visit teach.wikiedu.org or join our upcoming professional development course.

To read more about this workshop, visit the Digital Rhetoric Collaborative blog. Reach out to contact@wikiedu.org with questions.

WikiCite conference 2018

14:19, Tuesday, 04 2018 December UTC
Group photo – WikiCite 2018 (can you spot Jason?) – image by Satdeep Gill, Wikimedia Commons CC BY-SA 4.0

By Jason Evans, National Wikimedian at the National Library of Wales

Imagine a world in which anyone could use an open citation database to support free knowledge, with rich information about every citable source.

Any Wikipedian or Wikipedia advocate will tell you that one of the great strengths of Wikipedia is its citations. In fact, a Wikipedia article is only as strong as its citations. They provide evidence for the statements made in an article but they also provide a gateway to reliable secondary sources for deeper learning.

In recent years Wikipedia has been overtaken as the fastest growing Wikimedia project by Wikidata – a linked open database of facts – or the Wikipedia of data, if you like. Wikidata has grown at a tremendous rate, as people and institutions use it as a hub for their data, joining up the world’s open data in an interconnected web. Quite organically, it began to act as a platform for sharing bibliographic and citation data, to the point that 40% of Wikidata’s 60 million items now describe academic papers and articles.

Watch a video about Wikicite from the 2017 Wikidata convention in Berlin

The emergence of Wikidata has lead to the growth of the WikiCite movement which aims, broadly speaking, to harness the power of structured data to create open structured data for all citations used in Wikipedia.

This was my first WikiCite conference, and what became clear to me from day one was that this is very much a project still exploring its scope and trying to understand its place in the Wikimedia family of projects. But already there is a growing community of librarians, Wikimedians and data scientists keen to explore the potentials of the overarching concept.

Potential benefits of WikiCite are varied and wide reaching, and they serve separate communities in different ways. For example, since Wikidata items can be labelled and described in 100s of languages, any structured citations on Wikipedia become multilingual, which has clear benefits for smaller language communities. And structured citations would make it much easier for us to analyze the diversity and quality of citations being used in Wikipedia projects. It would allow us to map works which cite other works, or pick out retracted papers, making it easier to manage the relevance and quality of citations across multiple languages.

Approximately 1% of Wikipedia users click on a citation when they read a Wikipedia article, and this rises to 30% or more for more academic topics such as mathematics and engineering. And whilst these might seem like low numbers, 1% is still around 76 million clicks a month. So structured citations, in a standardised format that links to deeper data about a work (hopefully facilitating access to a digital copy of the work or providing details of physical holdings), will certainly add value to the current system for citations which are essentially comprised of strings of textual information.

Implementing this kind of fundamental change to Wikipedia, across multiple language editions presents huge technical and social challenges in itself, and as such it has been proposed that any conversion to structured citations should start small, on smaller Wikidata-friendly language versions of Wikipedia, before tackling English Wikipedia, with its nearly 6 million articles.

However the WikiCite vision is even bigger and more ambitious.

Participants at Wikicite 2018 – image by DarTar, Wikimedia Commons CC BY-SA 4.0

Imagine Wikidata items for every citation on Wikipedia, and then consider the added value of a massive centralised, or ‘federated’ bibliographic commons, where individuals, institutions and organisations can give access to bibliographic corpora, ranging from collections of niche scientific papers to a country’s entire publishing output – a library catalogue for the sum of all human knowledge. That may sound implausible, but Wikipedia didn’t become the 5th largest website in the world by dreaming small.

As you can imagine, this larger ambition has a few potential issues, which is why it is currently referred to as ‘the moonshot option’. There are questions around the technical ability to host, manage and maintain all this data in a standardised and centralised way. And if you decentralise the data to multiple instances of Wikibase (the platform which powers Wikidata), then how do you ensure that all these databases retain the semantic structure required for consistent and seamless communication between instances?


Wikicite presentation on gender diversity by Rosie Stephenson-Goodknight – Wikimedia Commons CC BY-SA 4.0

Another important question which comes out of this conference is: how do we ensure that any development is inclusive of other languages and cultures? Done properly this initiative should make it possible to have a greater diversity in sources on our Wikipedia. For years, the use of Western sources to inform readers about non-western concepts, languages and societies has been bugbear for Wikipedia.

In Wales, we have already embarked on a project to share the ‘Sum of all Welsh Literature’ via Wikidata, in a bid to encourage the use of Welsh publications to cite articles about Wales, its people and culture. And we heard of similar projects getting under way in other parts of the world. In Sweden, for example, the local Wikimedia chapter are working with the National Library to openly share data for around 700,000 works from the Swedish Bibliography.

Many challenges lie ahead, but it’s clear from the diversity of people and projects at this conference, that Wikicite is very much already happening.

To find out more about the project, check out the Wikicite Wiki page



Tech News issue #49, 2018 (December 3, 2018)

00:00, Monday, 03 2018 December UTC
TriangleArrow-Left.svgprevious 2018, week 49 (Monday 03 December 2018) nextTriangleArrow-Right.svg
Other languages:
Deutsch • ‎English • ‎dansk • ‎español • ‎français • ‎italiano • ‎polski • ‎suomi • ‎svenska • ‎čeština • ‎русский • ‎українська • ‎עברית • ‎العربية • ‎مصرى • ‎हिन्दी • ‎মেইতেই লোন্ • ‎中文 • ‎日本語 • ‎粵語

weeklyOSM 436

11:00, Sunday, 02 2018 December UTC



3D model of the Heidelberg Castle, one of the most famous ruins in Germany 1 | © Landesamt für Denkmalpflege im Regierungspräsidium Stuttgart 2018


  • John Whelan shares his view about the project Building Canada 2020 and how to proceed with mapping buildings. The release of government data and the expectation that Microsoft will release building footprints as they did for the US makes him conclude that the addition of metadata might be better than having inexperienced mappers tracing buildings using iD.
  • A proposal (de) (automatic translation) has emerged from the discussion in the German forum about rules for the use of relations with type=multipolygon. The intention of the proposal is, simply said, the avoidance of multipolygon relations if the geometry can be mapped with a single way representing the area, i.e. no inner areas and less than 2000 nodes for the area’s outer way. The topic is currently being discussed on the German forum (de) (automatic translation ) and German mailing list (de) (automatic translation) and is intended to be a guide to best practice for Germany initially.
  • The voting on the tagging proposal boundary=aboriginal_lands for the official territories of indigenous groups has started eight years after its drafting and is open until 8th December 2018. Alan McConchie created a GitHub issue for the OSM Carto map style as well and proposes a rendering with a brown outline colour.
  • Warin felt that the Wiki for antennas needed some rework to allow a better specification of the antenna type. As he is confused by the values of the key antenna:type in the German part of the wiki and he also does not favour that particular key, he is looking for alternative options.


  • InfosReseaux wrote a post on his OSM diary about the mapping of infrastructure like power networks.
    The article describes how infrastructure knowledge is changing from proprietary to open knowledge with OSM and how infrastructure mapping makes OSM better.
  • If you have recently used Pascal Neis’ How Did You Contribute website, you may have noticed a new section called Recently utilized presets – Top 3. In a blog post Pascal explains that it is intended as a new quality indicator.

OpenStreetMap Foundation

  • The recent DWG decision on the Crimean peninsula has provoked many discussions:

    • a summary in Russian language in Zverik’s OSM-Blog (ru) (automatic translation)
    • A discussion in the Ukrainian OSM-Forum (partly English, partly Ukrainian) (automatic translation)
    • A discussion on the talk mailing list started by a post criticising the decision.
    • Several OSM diary entries by Ukrainian mappers objecting to the decision including this one and this one by Kilkenni (with many comments, including some objecting to the ad-hominem attacks it contains).
    • Over on reddit
    • Multiple people have suggested how to map disputed and claimed territories on the Talk and Tagging (1, 2) mailing lists. (proposal by johnparis, proposal by Rory). It was also discussed whether it made sense to store such claims in OSM at all.
    • Someone has sent a formal complaint to the OSMF board of directors.
  • The OSMF board informed the members that it has declined a request to reject the membership application requests related to a "mass sign-up of 100 new accounts on 15 Nov 2018 from India, most coming from one single IP address from a company ‘well known’ to OpenStreetMap". The new members are not eligible to vote this year. However, the decision was followed by a long discussion on the mailing list.
  • Joost Schouppe, candidate for this year’s OSMF board election, analysed the OSMF membership per country in total and relative to the numbers of contributors.
  • The minutes of the Data Working Group meetings of 10 July, 13 September and 15 November have been published in the foundation’s "working group minutes" area.

Humanitarian OSM

  • The Erasmus + European Youth Humanitarian OpenStreetMap (euYoutH_OSM) Students announced in a tweet that on November 23th, some ESJEA teachers began their training in OpenStreetMap in a session guided by teachers Nuno Azevedo and Elizabete Oliveira. Teachers worked with JOSM putting Terceira Island more and more on the map. euYoutH_OSM announced on November 26th, the Sports Technician course had the opportunity to receive training in JOSM. The graduates of this course have learned how to design buildings, properties and to label structures yet to be mapped.
  • HOT and Kathmandu Living Labs will cooperate with the mapping of Kathmandu, the capital of Nepal. KLL will map buildings remotely before information will be gathered on the ground to complete the assessment. The results will be used to validate the data sets for Nepal that show the exposure of a region to natural hazards.
  • Three data science students are asking for "validated regions" for a machine learning project. The plan is to download a dataset to train and test automated mapping models using the satellite images.


  • The Data Wrangling part of Udacity’s Data Analyst course uses Openstreetmap as a project. Students are introduced to OSM and the data model and have to export, filter and analyze a large area from OSM.


  • Ingress agent Hinata Kino published (ja) a real-life locations map for the ingress anime. He mapped (ja) the place where the event happened with anime using a map based on OSM.
  • Wikipedia user Triglav wrote (ja) about a way to embed OSM on wikipedia pages. You can combine OSM relations with Wikidata and highlight it with Wikipedia.


  • Cadcorp, a British provider of GIS services, to local authorities in particular, have updated their software and now it includes an OSM layer.
  • An alpha version of Atlasr, an open-source map browser, has been released. Atlasr consists of modules like a map data database, a map renderer, a tile server, a geocoder and routing. It aims to offer a smooth, fast, reliable, pretty, and open source alternative to GoogleMaps, based on open data,. The software is available on GitHub.


  • Matthias(OSM username !i!) experimented with speech recognition and connected it to JOSM – speech2josm. The prototype is written in Python. Matthias wants to find someone who can create this as a new JOSM plugin. He writes more in his blog post. (de) (automatic translation)


  • Version 4.17.0 of the OSM’S Carto stylesheet has been released. As Daniel Koć writes in his blog, the changes include an earlier rendering of natural areas, clean up of medium zoom rendering, as well as new rendering and new icons for several features.
  • Heidelberg University’s GIScience Research Group has published version 3.2 of its QGIS OSM Tools Plugin. It now includes access to the routing functions of openrouteservice.org.
  • QGIS 3.4 ‘Madeira’ has recently been released. This version is the first long term release for version 3. The new version includes a ton of new features in the very long changelog.

Did you know …

  • … the highly detailed map of Bexhill-on-Sea in England, created using OpenStreetMap? And now with a topical issue. Highly recommended for all organisations active in tourism.

OSM in the media

  • The Deccan Chronicle reports about an event, hosted at the Indian Institute of Management-Bangalore, where over 300 mappers from 12 countries discussed disaster mapping related topics like the importance of disaster mapping, the role languages play in tagging places, community building and many other.
  • The website The Better India is proposing its readers pick up a new hobby: OSM. The short article covers some basics about OSM, what OSM can do for the civil society and mentions the recent State of the Map in India.

Other “geo” things

  • The Open Data Institute (ODI) calls on the government to break the dominance of commercial online giants of the UK’s geospatial data by moving the business model of public bodies more towards open data and thinking about mandating access to the geospatial data of private companies. A broad, public debate should discuss the role of public and private organisations in the context of geospatial data.
  • Municipal Dreams tweets a photo of an old map of London’s main sewage system.
  • The German state Baden-Württemberg has released many 3D models of historic castles, palaces and monasteries under a proprietary license. The repository Sketchfab holds a directory for the, so far non-downloadable, 3D models. The only other disadvantage is that Neuschwanstein Castle is in Bavaria.
  • Paul Ramsey blogged about the response by an ESRI sales representative when Paul spoke to a local GIS user group about the risks of being tied to a single software vendor. (via @anonymaps)

Upcoming Events

Where What When Country
Alice PoliMappers Adventures 2018: One mapping quest each day 2018-12-01-2018-12-31 everywhere
Toronto Mappy Hour 2018-12-03 canada
Niamey OSM and GIS training camp at CNF 2018-12-03-2018-12-07 niger
London London Missing Maps Mapathon 2018-12-04 united kingdom
Viersen OSM Stammtisch Viersen 2018-12-04 germany
Praha – Brno – Ostrava Kvartální pivo 2018-12-05 czech republic
Stuttgart Stuttgarter Stammtisch 2018-12-05 germany
Toulouse Rencontre mensuelle 2018-12-05 france
Bochum Mappertreffen 2018-12-06 germany
Dresden Stammtisch Dresden 2018-12-06 germany
Nantes Réunion mensuelle 2018-12-06 france
Tångstad Foundation board elections voting opens 2018-12-08
Rennes Réunion mensuelle 2018-12-10 france
Lyon Rencontre mensuelle pour tous 2018-12-11 france
Zurich Jubilee Stammtisch Zurich with Fondue 2018-12-11 switzerland
Salzburg Maptime Salzburg 2018-12-12 austria
Helsinki Missing Maps Mapathon at Finnish Red Cross HQ – Dec 2018 2018-12-13 finland
Munich Münchner Stammtisch 2018-12-13 germany
Berlin 126. Berlin-Brandenburg Stammtisch 2018-12-14 germany
online via IRC Foundation Annual General Meeting 2018-12-15 everywhere
Cologne Bonn Airport Bonner Stammtisch 2018-12-18 germany
Lüneburg Lüneburger Mappertreffen 2018-12-18 germany
Nottingham Pub Meetup 2018-12-18 united kingdom
Reutti Stammtisch Ulmer Alb 2018-12-18 germany
Rennes Recensement des panneaux publicitaires 2018-12-23 france
Heidelberg State of the Map 2019 (international conference) 2019-09-21-2019-09-23 germany

Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please check your event in our public calendar preview and correct it, where appropriate.

This weeklyOSM was produced by Nakaner, NunoMASAzevedo, Polyglot, Rogehm, SK53, SomeoneElse, SunCobalt, TheSwavu, YoViajo, derFred, jinalfoflia, k_zoar, muramototomoya.

A November to remember

17:51, Friday, 30 2018 November UTC

We hear time and again that people understand the importance of having well referenced information on Wikipedia. But most simply don’t know how to do something about it. “I did this at an edit-a-thon recently,” one conference attendee said, “and I loved it! How can I do more?” In November, I traveled to three academic conferences, with the hopes of answering that question for instructors, scholars, and scientists.

Society for Neuroscience

First, I attended the Society for Neuroscience conference in San Diego, where attendees research varied topics such as the lower urinary tract, behavioral neuroscience, and endocrinology. In my experience, many instructors in these kinds of courses crave a writing assignment where students have to practice talking about their field for an audience outside the classroom. Not unsurprisingly, blog assignments just aren’t cutting it. In a Wikipedia assignment, however, students can be asked to research biographies of people in their field or to summarize research about under-covered issues, and in the process, write about science for a worldwide audience via Wikipedia. This form of science communication can often be interdisciplinary, allowing students to bring together their course work from a variety of perspectives.

National Communication Association

Wiki Education attends the 104th annual National Communication Association annual convention.

But Wiki Education is also invested in finding ways that scholars and scientists themselves can contribute. That’s why this year we developed our online professional development courses where we train subject matter experts to write articles on Wikipedia in their fields. While at the National Communication Association annual convention in Salt Lake City, I got to talk with three of our current participants. “I’ve always wanted to expand these articles on Wikipedia, but couldn’t figure out where to start. It just seemed too daunting,” Margaret D’Silva, Professor of Communication and Director at the Institute for Intercultural Communication at the University of Louisville, said to me. Luckily, that’s where our course curriculum comes in. It helps breakdown the daunting process of learning to update content on Wikipedia, all with the support of our trained Wikipedia Experts.

I also had the opportunity to join a panel discussion at NCA about the use of open educational resources. Wiki Education is no stranger to discussions of open educational practice, so we were excited to participate in that session, especially as we continue to offer free teaching tools and resources.

American Anthropological Association

Director of Parnerships Jami Mathewson talks with an attendee at the Wiki Education booth.

My final stop in the circuit was the American Anthropological Association annual meeting. I spent most of my time talking with attendees about our upcoming professional development course in collaboration with the National Archives. The anthropologists I talked with were very excited about the opportunity to update topics about the history of women’s suffrage on Wikipedia. The course isn’t full yet though, so interested participants can apply here to join!

Last month, the Wikimedia Foundation, Wikimedia community members in Jordan, and Hashemite University—one of the most esteemed higher education institutions in Jordan—signed a Memorandum of Understanding (MOU) to advance cultural cooperation and access to free knowledge resources in the Arabic language. It is the first MOU of its type signed with any university in the Middle East.

The partnership was announced at a roundtable discussion hosted at Hashemite University by Katherine Maher, Executive Director of the Wikimedia Foundation, and Professor Kamal Bani-Hani, Hashemite University President. They were joined by Hashemite University staff, students, the Wikimedians of the Levant User Group—the local Wikimedia community of volunteer editors—and other guests.

“This relationship with Hashemite University is a critical step in expanding Arabic language free knowledge resources on Wikipedia and on the broader internet. We’re proud to be working with the staff and students of this university to improve Wikipedia while enriching the academic experience of the students across the university,” said Maher.

Through the partnership, the University’s Arabic Online Content Club – Deanship of Student Affairs will create a standing Wikipedia office at the university to support greater collaboration between both institutions. The office will support students in learning how to contribute to the Wikimedia projects through training by local volunteers. The university will also designate a space in the library to host Wikipedia introductory materials for students and integrate Wikisource, the multilingual digital library of free texts, into the university library search engine. To support these efforts, the Wikimedia Foundation will donate a laptop to the university Wikipedia office.

“This MOU is the product of years of work,” said Mossab Banat, the originator of the MOU and a member of the Levant User Group, “and represents a quantum leap in the Wikipedia Education Program’s capabilities here at the university. We are excited at the potential that this formalized agreement will open for us.”

The Wikimedia projects have been used in educational settings from the inception of Wikipedia. University professors have found that teaching students how to contribute to Wikipedia and other Wikimedia projects is engaging and motivating for students, helping them to learn essential 21st century skills like media literacy, writing and research development, and critical thinking, while content gaps on Wikipedia are filled thanks to students’ efforts. Hashemite University has been supporting Wikimedia in education since 2015 in collaboration with local Wikimedia volunteers. Over 500 students participated in 12 courses in 2017, after Banat proposed the idea at a Deans’ council meeting, and by 2018, 54 students formalized a university club to focus on improving content on the Arabic Wikipedia.

“Our partnership has enables us to join a growing network of global educational facilities who are using the power of technology to facilitate access to more educational content,” said Prof. Kamal Bani-Hani, President, Hashemite University.

Wikipedia in Jordan is supported by a local community of volunteer editors as part of the Wikimedians of the Levant User Group. Together, volunteers from the user group write and improve articles primarily in Arabic and English Wikipedias and the other Wikimedia projects. Recognized as a formal Wikimedia affiliate in 2015, the Wikimedians of the Levant User Group regularly hosts events to teach more people how to edit Wikipedia and improve the representation of Arabic language content across the Wikimedia projects.

The relationship between the Wikimedia movement and the education sector is overseen by the Wikimedia Foundation in close collaboration with local Wikimedia volunteers around the world. The effort to bring Wikimedia into education has grown to involve programs in all parts of the world, at every education level, and using every Wikimedia project.

When news broke yesterday that Margaret Atwood is writing a sequel to The Handmaid’s Tale, anyone who wished to know why this drew so much attention could visit Atwood’s Wikipedia page. There, they could read about her career, recurring themes in her work and their cultural contexts, film adaptations of her writings, and so much more that positions Atwood as a prominent cultural figure. The Wikipedia article is so informative thanks to the handiwork of National Women’s Studies Association member Dr. Jenn Brandt, a participant in our professional development course that trains subject-matter experts how to contribute their knowledge to Wikipedia.

Take a look at all that Dr. Brandt contributed to the article. Highlighted in purple here are Dr. Brandt’s edits. Before she came in and reworked both the article’s content and organization, it contained almost no information about Atwood’s career. Nor did it outline what influence her work has had in the American cultural and political landscape.

Perhaps most notable of Dr. Brandt’s contributions is the section on Atwood and feminism. Dr. Brandt is the Director of the Women’s and Gender Studies Program and an Assistant Professor of English at High Point University, positioning her perfectly to add such a perspective to a Wikipedia article about an author whose work holds such prominence in cultural discourse because of its feminist themes. Given that feminist activist groups have drawn on Atwood’s The Handmaid’s Tale over the past two years in political debate, documenting the real-world manifestations of Atwood’s influence is an important part of painting an accurate picture of her career. In Atwood’s announcement this week that she will be writing a sequel to her influential work, she nodded to this influence, saying that in addition to wanting to respond to fans’ desire for another book: “The other inspiration is the world we’ve been living in.”

Dr. Brandt took it upon herself to bring Atwood’s biography article up to Wikipedia’s second highest quality standard: that of Good Article. The process for designating an article of this quality requires a detailed back and forth with another editor to determine if the article meets requirements. Dr. Brandt made several more edits to the article through this rigorous feedback process and it has been accepted among the Wikipedia community as a comprehensive representation. In the last day, she has returned to the article to add information about the sequel – seven months after her course wrapped!

Atwood’s article receives a daily average of about 2,500 pageviews and has received more than 400,000 total since Dr. Brandt improved it. That’s a tangible impact that an individual has made for public knowledge. Become a Wiki Scholar yourself and learn how to edit in our unique courses. For more information about our current course offering, see bit.ly/NARAwiki.

Read about Dr. Brandt’s personal experience in our course and how she transferred her new skills to the classroom in this reflective piece.

ImageFile:Margaret Atwood – Foire du Livre de Francfort (37735253561).jpg, ActuaLitté, CC BY-SA 2.0, via Wikimedia Commons.

Production Excellence: October 2018

12:47, Thursday, 29 2018 November UTC

How’d we do in our strive for operational excellence last month? Read on to find out!

  • Month in numbers.
  • Highlighted stories.
  • Current problems.

📊 Month in numbers

  • 7 documented incident since from 24 September to 31 October. [1]
  • 79 Wikimedia-prod-error tasks closed from 24 September to 31 October. [2]
  • 69 Wikimedia-prod-error tasks created from 24 September to 31 October. [3]
  • 175 currently open Wikimedia-prod-error tasks (as of 25 November 2018).

October had a relatively high number of incidents – compared to prior months and compared to the same month last year (details).


  • An Exception (or fatal) causes user actions to be prevented. For example, a page would display "Exception: Unable to render page", instead the article content.
  • A Warning (or non-fatal, or error) can produce page views that are technically unaware of a problem, but may show corrupt, incorrect, or incomplete information. Examples – an article would display the code word “null” instead of the actual content, a user looking for Vegetables may be taken to an article about Vegetarians, a user may receive a notification that says “You have (null) new messages.”

I’ve highlighted a few of last month’s resolved tasks below.

*️⃣ Send your thanks for talk contributions

Fixed by volunteer @Mh-3110 (Mahuton).

The Thanks functionality for MediaWiki (created in 2013) wasn’t working in some cases. This problem was first reported in April, with four more reports since then. Mahuton investigated together with @SBisson. They found that the issue was specific to talk pages with structured discussions.

It turned out to be caused by an outdated array access key in SpecialThanks.php. Once adjusted, the functionality was restored to its former glory. The error existed for about eight months, since internal refactoring in March for T186920 changed the internal array.

This was Mahuton’s first Gerrit contribution. Thank you @Mh-3110, and welcome!

T191442 / https://gerrit.wikimedia.org/r/461189

*️⃣ One space led to Fatal exception

Fixed by volunteer @D3r1ck01 (Derick Alangi).

Administrators use the Special:DeletedContributions page to search for edits that are hidden from public view. When an admin typed a space at the end of their search, the MediaWiki application would throw a fatal exception. The user would see a generic error page, suggesting that the website may be unavailable.

Derick went in and updated the input handler to automatically correct these inputs for the user.


*️⃣ Fatal exception from translation draft access

Accessing the private link for ContentTranslation when logged-out isn’t meant to work. But, the code didn’t account for this fact. When users attempted to open such url when not logged in, the ContentTranslation code performed an invalid operation. This caused a fatal error from the MediaWiki application. The user would see a system error page without further details.

This could happen when opening the link from your bookmarks before logging in, or after restarting the browser, or after clearing one’s cookies.

Fixed by @santhosh (Santhosh Thottingal, WMF Language Engineering team).


🎉 Thanks!

Thank you to everyone who helped by reporting or investigating problems in Wikimedia production; and for devising, coding or reviewing the corrective measures. Including: @Addshore, @Aklapper, @Anomie, @ArielGlenn, @Catrope, @D3r1ck01, @Daimona, @Fomafix, @Ladsgroup, @Legoktm, @MSantos, @Mainframe98, @Melos, @Mh-3110, @SBisson, @Tgr, @Umherirrender, @Vort, @aaron, @aezell, @cscott, @dcausse, @jcrespo, @kostajh, @matmarex, @mmodell, @mobrovac, @santhosh, @thcipriani, and @thiemowmde.

📉 Current problems

Take a look at the workboard and look for tasks that might need your help. The workboard lists known issues, grouped by the week in which they were first observed.


💡 ProTip:

Cross-reference one workboard with another via Open Tasks Advanced Filter and enter Tag(s) to apply as a filter.


Until next time,
– Timo Tijhof


[1] Incidents. – wikitech.wikimedia.org/wiki/Special:AllPages...
[2] Tasks closed. – phabricator.wikimedia.org/maniphest/query...
[3] Tasks opened. – phabricator.wikimedia.org/maniphest/query...

The importance of Artemis Fowl

06:01, Thursday, 29 2018 November UTC

My bookshelf

Artemis Fowl, sitting right of center on my "favorite books" shelf.

Nearly two decades later, the Artemis Fowl movie is finally happening. It's hard for me to overstate how important Artemis Fowl has been to me. One of my friends asked me if I saw the trailer today and I pretty ecstatically said yes. Artemis Fowl Confidential, a website I registered with back in 2008, sent me an email as soon as it was released. Immediate nostalgia.

I read the original Artemis Fowl sometime in elementary school, by the time the final book, Artemis Fowl: The Last Guardian, was released, I had already graduated high school.

Sometime in eighth grade I joined the Wikipedia WikiProject Artemis Fowl - a group of editors dedicated to improve Wikipedia's coverage of Artemis Fowl related articles. I went through the archives and even found the original post, welcoming me to the project. Those were my first friends on Wikipedia...Calvin, Icy, Laptopdude. Miss y'all.

And at some point I learned templates, creating Template:AF Cite Book. Then that turned into the first ever bug I would file in Wikimedia Bugzilla (I still have bug 2700 memorized for some reason).

And then that Wikipedia thing spiraled out of control, and somehow I ended up with an actual, real, job. Definitely due to other things, but just a little bit thanks to Artemis Fowl.

Thanks Eoin, and here's to the next twenty years of Artemis Fowl!

The anatomy of search: The root of the problem

23:58, Wednesday, 28 2018 November UTC

A galloping overview

As we have done before, let’s get a bird’s-eye view of the parts of the search process: text comes in and gets processed and stored in a database (called an index); a user submits a query; documents that match the query are retrieved from the index, ranked based on how well they match the query, and presented to the user. That sounds easy enough, but each step hides a wealth of detail. Today we’ll focus on the last part of the step where “text gets processed”—and look at stemming, stop words, and thesauri.

Also keep in mind that, as discussed in more depth in the first installment in this series, humans and computers have very different strengths, and what is easy for one can be incredibly hard for the other.

The magic of morphology[1]

For the most straightforward kinds of search, the most important element of the grammar of most languages is morphology—the study of how words are put together from smaller pieces; in English, these pieces are generally prefixes, suffixes, and stems (also called roots in some grammatical traditions). Other elements of grammar, such as phonetics and phonology (relating to sounds) and syntax (how words are built up into sentences) usually don’t matter as much for search—though for some languages, the boundaries between such elements can be a lot blurrier than they are in English.

A breakdown of the morphology and parts of speech of the English word independently.

In addition to prefixes at the beginning and suffixes at the end of words, other languages have infixes in the middle—English has a limited set of slangy infixes, as in the lightly bowdlerized absofreakinglutely or hizouse (see more English infixes from Wiktionary)—and multi-part circumfixes that go around another word.

Other interesting morphological phenomena include:

  • clitics, which are morphological elements that act like words syntactically, but like affixes phonologically.[2]
  • reduplication, which repeats all or part of a word, as in English “schm-reduplication” (e.g., fancy-schmancy) or “contrastive focus reduplication” in which the repeating of an element indicates that it is “real” or “pure” or “prototypical” (as in, “you can put a bucket on your head and call it a hat, but that doesn’t make it a hat-hat”).
Distribution of reduplication as a regular part of a languages around the world. It’s not common in European languages, but it’s pretty common everywhere else!
  • noun incorporation, in which a noun gets folded into a verb to limit its scope. English has a wee bit of such incorporation in words like backstabbing or babysitting, but some languages do it much more frequently and with vigor!
  • ablaut and umlaut, in which a vowel change indicates an inflection. There are remnants of these processes in English in paradigms like foot/feet and sing/sang/sung that are now considered irregular. In other languages, like German, this is a common regular process.

So, generic English morphology is rather boring as these things go. That turns out to be good for search in English because the general morphological rules are comparatively simple (despite the interesting irregular exceptions, and the fact that English spelling is atrocious—which does sometimes make things a bit harder than they need to be).

The root of the problem

The reason that morphology matters for search is that we generally want related forms of a word to match each other when we perform a search. For example, English hope, hopes, hoped, and hoping are all reasonable matches if you search for hope. This is generally accomplished through a process called stemming, which tries to reduce a word to an approximation of its stem or root form.

For you hard core text analysis nerds out there, we can distinguish stemming from lemmatization. The goal of stemming is for all words that are related to be reduced to the same stem (even if that stem is not the true root form or even an actual word) and for unrelated words to be reduced to different stems. Lemmatization, on the other hand, is only successful if the result is the “lemma” of a word, or the exact root form of a word, like you’d expect to find in a dictionary. This is the definition of lemma that is used in so many categories on English Wiktionary. Since “lemma” is its own lemma in English, one of its categories is “English lemmas” (and another of its categories is “English autological terms”—neat!).

Trying to generate lemmas is in theory a good way to do stemming because if the results are true lemmas, then words with the same stem are going to mostly be related to each other.[3] However, in practice, because English spelling is atrocious,[4] generating accurate lemmas can be expensive because it often comes down to having an extensive list of exceptions—and not just for the obvious irregular paradigms like be/am/is/are/was/were, child/children, ox/oxen, sing/sang/sung, etc.[5] However, as alluded to above, sometimes a somewhat inaccurate rule will suffice.

Oxen, which is a now irregular plural of ox, takes the historical plural marker -en.

For example, the lemma of begging is beg, but the lemma of egging is egg (yep, it can be a verb). A general stemming rule that replaces -gg and -gging with -g, for example, will give an non-lemma stem for egg and egging of eg, and will conflate names like Flagg and Bragg with the more common words flag and brag, but overall it works pretty well for words ending in g. The stemmer we use on English-language wikis doesn’t actually use this specific rule, though, because English morphological analysis is generally well-developed, and there are very large (but still never complete) dictionaries in use behind the scenes.[6]

Despite the challenges offered by English spelling, English morphology is comparatively very easy. There aren’t that many forms to contend with: -s, -ed, and -ing cover most inflected forms of nouns and verbs. A few other derivational forms like -ly and -er are used to make related words.[7] Compare that with a few dozens forms of a Spanish verb and a couple thousand forms for a Finnish noun (so many cases!—though many combinations may never come up, they are very regular, which makes them relatively easy to generate and understand).

English is fairly analytic, so we use word order, and other separate words like auxiliaries and prepositions to indicate the grammatical relations in a sentence. Agglutinating and polysynthetic languages can smash a whole sentence into a single “word”. A favorite example among linguists is Ubykh’s “aχʲazbatɕʼaʁawdətʷaajlafaqʼajtʼmadaχ,” which means “if only you had not been able to make him take it all out from under me again for them.” For Ubykh (which, unfortunately, has gone extinct), any morphological software would have to be a lot more complex than what we have for English.

Balancing accuracy and complexity

For stemmers in general there are always potential trade-offs between accuracy and complexity. Fortunately stemming in many languages is an 80/20 kind of problem—so you can get good results with a fairly simple process, while on the other hand you may never reach 100% accuracy.

The simplest stemmers are typically rule-based, and simply try to remove affixes from a word and maybe change the resulting stem a little bit so it is more likely to match other related stems. These stemmers are fast and work on words they’ve never seen before, but also may make more mistakes,[8] and may or may not handle common exceptions (like be/been/is/am/are/was/were or sing/sang/sung in English). A very influential rules-based stemmer for English is the Porter Stemmer.

More complex stemmers or lemmatizers can use a dictionary for exceptions, and then apply rules to anything that’s not in the dictionary. Of course, the dictionary still can’t handle words it doesn’t know, though you can have heuristics to try to match parts of words, too, so that a dictionary entry for national could apply to international, multinational, transnational, binational, supernational, subnational, supranational, etc.

A much more complex approach is to use a statistical or machine-learning model, which should be able to generalize from the examples it trained on to new ones that are similar. Such models can find and exploit interesting patterns, but can also give terrible answers when given really unexpected input. For example, Elasticsearch—the search engine that our on-wiki search is built on—endorses a statistical stemmer for Polish that we deployed on Polish-language wikis; it does a good job on the vast majority of Polish words, but sometimes goes a bit off the rails when it gets input consisting of certain numbers or English words—which, of course, occur often in the Polish Wikipedia![9] We’ve since added filters to ignore the most predictable and egregious errors; we now rely on exact matching for those search terms, which we’ll talk about more in a later post on indexes.

As mentioned in footnote 3—you are reading the footnotes, right?—some forms are ambiguous on their own. For example, can can mean “to be able to” or can be a metal cylinder, and does can be a form of do or the plural of doe. Recognizing parts of speech—e.g., nouns, verbs, adjectives, adverbs, etc.—can help in cases like this, since these ambiguous forms differ in their part of speech. However distinguishing others, like putting—which can be a form of put or of putt—would require an incredible level of contextual and real-world information—the kind of thing that is easy for humans and hard for computers: “After putting the ball into the hole, I saw her putting the club into her bag.” This level of natural language processing is often not available, and is generally not worth the massive increase in complexity to bump stemming performance from 99% to 99.9% accuracy.

The stemmer used on English-language wikis comes bundled with Elasticsearch, which provides stemmers and other language processing for almost three dozen languages; third-party plugins are also available for others, like Polish. For languages with a larger search volume and no language analysis plugins from Elasticsearch, we’ve adapted other third-party open-source morphological analysis software for use with Elasticsearch. We’ve adapted software for Slovak and for Bosnian, Croatian, and Serbian; we worked with a developer to build something new for Esperanto; and we were able to apply existing analysis software for Indonesian to Malay.[10] The new stemmers for Bosnian, Croatian, and Serbian, Slovak, and Esperanto aren’t perfect, but they do take advantage of the 80/20 nature of stemming, and have improved the quality of search results in those languages.

To be or not to be indexed

Once we’ve got our stemmed forms of the words from our text, we might want to throw some of them away! “Stop words” are words that don’t usually carry a lot of meaning and appear frequently—often function words like prepositions (to, of, at, from) and determiners (a, an, the) or common, generic words (like make, do, be)—and so can be ignored and not put in the index. Back in the very early days of full-text search, there was also a matter of saving space in the search index, because stop words tend to occur frequently; nowadays that’s usually not much of a concern.

The problem is that sometimes a useful phrase can be made up entirely of stop words. Choosing the exact list used for any application is more of an art than a science, but be, not, or, and to are all common enough stop words, reducing the famous quote from Shakespeare—“To be or not to be”—unfindable!

A still from the 1942 film To Be or Not to Be, which would be very hard to find on Wikimedia Commons if we ignored all stop words.

One solution is not to have any stop words and use frequency analysis to discount words that would otherwise likely qualify as stop words. For example, lexicalism occurs in only five articles on English Wikipedia, while of occurs in 5.4 million. It’s easy to guess which word is more relevant to our query.

Another approach is to have multiple indexes—some with stop words and some without—and merge their results. We’ll talk more about that in a later post.

A rose by any other name

Some search engines support a thesaurus, which allows you to effectively insert synonyms or other related terms into a document, even though they aren’t actually there. For example, while there is a distinction between an attorney and a lawyer, many people outside the legal profession use them interchangeably. A thesaurus could add attorney wherever lawyer occurs, and vice versa.

Elasticsearch supports a thesaurus, but we don’t currently use one on-wiki. The search system we had before Elasticsearch has exactly one entry in its English thesaurus—film ↔movie—which makes a lot of sense for English Wikipedia!

Building and maintaining a thesaurus that’s any bigger than that has traditionally required careful planning and knowledge of the search domain—which for Wikipedia is pretty much everything. However, some researchers have gotten good results from automatically generated thesauri that use co-occurrence of terms in documents to find related terms; these thesauri are often less about specific synonyms and more about expanding a search with useful related terms.

We don’t currently have any concrete plans to build or enable thesauri for on-wiki search, but it is on our long-term radar.

Further reading / Homework

Amir Aharoni, Language Strategist at the WMF, did a nice presentation titled “The English Language as a Privilege” that touches on a lot of the ways that computers are better at dealing with English, whether because of particular features of English, as we discussed above, or because of the fact that many computer systems were originally developed by English speakers.

If you can’t wait for next time, I put together a poorly edited and mediocrely presented video in January of 2018, available on Commons, that covers the Bare-Bones Basics of Full-Text Search. It starts with no prerequisites, and covers tokenization and stemming, inverted indexes, basic boolean and proximity retrieval operations, TF/IDF and the vector space model of similarity, field-level indexing, using multiple indexes, and then touches on some of the elements of scoring.

Up next

In my next blog post, we’ll look at inverted indexes and maybe touch on queries and retrieval operations.

Trey Jones, Senior Software Engineer, Search Platform
Wikimedia Foundation



1. Or, one could say, “the glamour of grammar”—glamour has a older and now less common meaning: a magical spell. We know that glamour came to English through Scots, and one possible previous step in its etymology is that it branched off an earlier English meaning of grammar, which is given in Wiktionary as “any sort of scholarship, especially occult learning”. By the way, the letter r changing to an l isn’t at all out of the question; it’s a kind of sound change called dissimilation and it often happens to r’s and l’s. It explains the pronunciation/spelling split in English colonel, and the difference between the related words pilgrim and peregrine (both kinds of travelers or wanderers).

2. Clitics are interesting, but complicated, so their more complex explanation is relegated to a footnote, alas. Some linguists argue differently, but English possessive -’s can be seen as a clitic. Phonologically—that is, sound-wise—it is not a separate word. Like past tense -ed or plural -s when spoken it is clearly part of the word it attaches to. But unlike -ed which must attach to a verb, or plural -s which must attach to a noun, possessive -’s comes at the end of a noun phrase, and so can end up attached to almost any part of speech.

Consider hats that belong to many different people: Robin’s hat (-’s attaches to a proper noun), the child’s hat (attached to a common noun), the boy whose hair is green’s hat (adjective); the girl who walks quickly’s hat (adverb), the person I’m thinking of’s hat (preposition), the person Fred said liked him’s hat (pronoun). Any word that can be placed at the end of a noun phrase can have -’s attached to it. Examples with determiners (like the) and conjunctions (like and) are only hard to come by because noun phrases don’t typically end with those parts of speech. But if we invoke the use–mention distinction and use quoted speech, we can find an example where -’s attaches to the, and create a train wreck of apostrophes in the process: the linguist who said “my favorite word is ‘the’ ” ’s hat.

3. Even with generally accurate lemmas, there are some problem cases. Some words are ambiguous, but only in certain forms. A good example is can, which can be a defective verb meaning “to be able”, a noun referring to a metal cylinder, or a related verb meaning to put things in cans. The forms canning and cans are clearly the metal cylinder kind, but their lemma, can, is ambiguous. Similarly, the verb do and the noun doe (a deer, a female deer) are not easily confused as lemmas, but the form does is, at least in writing. Similarly, putting could be a form of either put or putt. Have I mentioned that English spelling is atrocious?

4. I like to say that there aren’t really any spelling rules in English, only moderately useful suggestions. There are a few famous examples of the… uh… let’s say “idiosyncrasies”… of English spelling that always bear repeating: ghoti—with the gh in laugh, the o in women, and the ti in nation—sounds like “fish”. Mapping ghoughpteighbteau to “potato” is left as an exercise for the reader. Non-native speakers of English who want to torture themselves should try to read a brutal poem by Gerard Nolst Trenité about English spelling, appropriately titled “The Chaos”. Even native speakers of English may find it challenging. It’s available on Wikisource. For some insight into how we ended up with this mess, see more about the history of English orthography on English Wikipedia.

5. English be is an incomplete melding of three(!) different verbs, sing is an old strong verb, and children and oxen show the remnants of an older Germanic system of pluralization from Old English. Eventually, all of these irregularities may disappear—as many others already have—through a process known as leveling.

6. Those English dictionaries include some interesting mappings, too. For example, the stem for french is france, for german it’s germany, for italian it’s italy, for holland it’s dutch, and for filipino it’s philippines. This works out well for certain locutions in English, making “president of france” and “french president” essentially the same (see more on the treatment of of in the section on stop words). The fact that most pairs map the adjective to the country while holland and dutch go the other way doesn’t matter much because the result is only used internally by the search engine.

7. Affixes that give a variant of the same word (hope, hopes, hoped, hoping) are called inflectional affixes, while affixes that give a related but distinct word (like hopehopeful, quickquickly, wordwordy, or workworker) are called derivational affixes. Stemmers generally try to unify words that are inflections of the root form. Whether or not derived forms should be included depends on the goals of whoever is creating the stemmer. It’s like deciding what forms to list in a dictionary. Should related forms—especially those whose meaning is predictable—all be listed under one headword (stem/lemma/root), or should they get their own entries? In English, words definitely belongs under word and hoping belongs under hope, but do wordy and hopefully get their own entries or not? It’s a judgement call.

8. A really common source of mistakes for rule-based stemmers comes from words that look like they have affixes on them, but don’t. For example, surnames like Belcher, Childers, or Johns, which can have their final -er, -ers, or -s stripped off. More common names, like Jones and Edwards, may be known exceptions and thus left untouched. Ironically, the name Stemmer is likely to be incorrectly but understandably stemmed to stem.

9. Some of the examples I found when reviewing the Polish stemmer are listed on MediaWiki. The worst is the stem ć, which is the infinitive verb ending in Polish. Lots of completely unrelated words—like Adrien, Button, Coins, Drag, Espinho, Frau, Girl, Hague, Issue, Judas, Keiki, Laws, Mammal, Nuxalk, Oloś, Pearl, Qaab, Rogue, Shingle, Trask, Uniw, Value, Wheel, XXIII, Yzaga, and Zizinho, just to name a few—get reduced to the ć ending by some overzealous bit of statistical machinery inside the stemmer. Fortunately the words that it makes mistakes on are fairly rare, and we’ve added filters to improve the results.

10. We have one stemmer for Bosnian, Croatian, and Serbian, and one for Indonesian and Malay. The languages in these two groups are closely related and have the same basic morphology even where they differ in vocabulary, spelling, pronunciation, or writing system. As an analogy, US and UK English differ in the spelling of color/colour or realize/realise, and in the vocabulary for truck/lorry, but the rules for inflecting those words are the same as general English words—colors/colours, realizing/realising, trucks/lorries—so the same stemmer can be used without knowing which variety we are dealing with. The situation is similar—but with potentially much more distinctiveness between varieties—for Bosnian, Croatian, and Serbian and for Indonesian and Malay.

Production Excellence: September 2018

18:10, Wednesday, 28 2018 November UTC

How’d we do in our strive for operational excellence last month? Read on to find out!

Month in numbers

  • 1 documented incident since August 9. [1]
  • 113 Wikimedia-prod-error tasks closed since August 9. [2]
  • 99 Wikimedia-prod-error tasks created since August 9. [3]

Current problems


  • [MediaWiki-Logging] Exception from Special:Log (public GET). – T201411
  • [Graph] Warning "data error" from ApiGraph in gzdecode. – T184128
  • [RemexHtml] Exception "backtrack_limit exhausted" from search index jobs. – T201184


  • [MediaWiki-Redirects] Exception from NS_MEDIA redirect (public GET). – T203942

This is an oldie: (Well..., it's an oldie where I come from... 🎸)

  • [FlaggedRevs] Exception from Special:ProblemChanges (since 2011). – T176232


  • An Exception (or fatal error) causes user actions to be aborted. For example, a page would display "Exception: Unable to render page", instead the article content.
  • A Warning (or non-fatal error) can produce page views that are technically unaware of a problem, but may show corrupt or incomplete information. For example, an article would display the word "null" instead of the actual content. Or, a user may be told "You have null new messages."

The combined volume of infrequent non-fatal errors is high. This limits our ability to automatically detect whether a deployment caused problems. The “public GET” risks in particular can (and have) caused alerts to fire that notify Operations of wikis potentially being down. Such exceptions must not be publicly exposed.

With that behind us... Let’s celebrate this month’s highlights!

*️⃣ Quiz defect – "0" is not nothing!

Tyler Cipriani (Release Engineering) reported an error in Quiz. Wikiversity uses Quiz for interactive learning. Editors define quizzes in the source text (wikitext). The Quiz program processes this text, creates checkboxes with labels, and sends it to a user. When the sending part failed, "Error: Undefined index" appeared in the logs. @Umherirrender investigated.

A line in the source text can: define a question, or an answer, or nothing at all. The code that creates checkboxes needs to decide between "something" and "nothing". The code utilised the PHP "if" statement for this, which compares a value to True and False. The answers to a quiz can be any text, which means PHP first transforms the text to one of True or False. In doing so, values like "0" became False. This meant the code thought "0" was not an answer. The code responsible for sending checkboxes did not have this problem. When the code tried to access the checkbox to send, it did not exist. Hence, "Error: Undefined index".

Umherirrender fixed the problem by using a strict comparison. A strict comparison doesn't transform a value first, it only compares.


*️⃣ PageTriage enters JobQueue for better performance

Kosta Harlan (from Audiences's Growth team) investigated a warning for PageTriage. This extension provides the New Pages Feed tool on the English Wikipedia. Each page in the feed has metadata, usually calculated when an editor creates a page. Sometimes, this is not available. Then, it must be calculated on-demand, when a user triages pages. So far, so good. The information was then saved to the database for re-use by other triagers. This last part caused the serious performance warning: "Unexpected database writes".

Database changes must not happen on page views. The database has many replicas for reading, but only one "master" for all writing. We avoid using the master during page views to make our systems independent. This is a key design principle for MediaWiki performance. [5] It lets a secondary data centre build pages without connecting to the primary (which can be far away).

Kosta addressed the warning by improving the code that saves the calculated information. Instead of saving it immediately, an instruction is now sent via a job queue, after the page view is ready. This job queue then calculates and saves the information to the master database. The master synchronises it to replicas, and then page views can use it.

T199699 / https://gerrit.wikimedia.org/r/455870

*️⃣ Tomorrow, may be sooner than you think

After developers submit code to Gerrit, they eagerly await the result from Jenkins, an automated test runner. It sometimes incorrectly reported a problem with the MergeHistory feature. The code assumed that the tests would finish by "tomorrow".

It might be safe to assume our tests will not take one day to finish. Unfortunately, the programming utility "strtotime", does not interpret "tomorrow" as "this time tomorrow". Instead, it means "the start of tomorrow". In other words, the next strike of midnight! The tests use UTC as the neutral timezone.

Every day in the 15 minutes before 5 PM in San Francisco (which is midnight UTC), code submitted to Code Review, could have mysteriously failing tests.

– Continue at https://gerrit.wikimedia.org/r/452873

*️⃣ Continuous Whac-A-Mole

In August, developers started to notice rare and mysterious failures from Jenkins. No obvious cause or solution was known at that time.

Later that month, Dan Duvall (Release Engineering team) started exploring ways to run our tests faster. Before, we had many small virtual servers, where each server runs only one test at a time. The idea: Have a smaller group of much larger virtual servers where each server could run many tests at the same time. We hope that during busier times this will better share the resources between tests. And, during less busy times, allow a single test to use more resources.

As implementation of this idea began, the mysterious test failures became commonplace. "No space left on device", was a common error. The test servers had their hard disk full. This was surprising. The new (larger) servers seemed to have enough space to accommodate the number of tests it ran at the same time. Together with Antoine Musso and Tyler Cipriani, they identified and resolved two problems:

  1. Some automated tests did not clean up after themselves.
  2. The test-templates were stored on the "root disk" (the hard drive for the operating system), instead of the hard drive with space reserved for tests. This root disk is quite small, and is the same size on small servers and large servers.

T202160 / T202457

🎉 Thanks!

Thank you to everyone who has helped report, investigate, or resolve production errors past month. Including:

Dan Duvall
Gilles Dubuc
Daniel Kinzler
Greg Grossmeier
Gergő Tisza (Tgr)
Sam Reed (Reedy)
Giuseppe Lavagetto
Brad Jorsch (Anomie)
Tim Starling (tstarling)
Kosta Harlan (kostajh)
Jaime Crespo (jcrespo)
Antoine Musso (hashar)
Roan Kattouw (Catrope)
Adam WMDE (Addshore)
Stephane Bisson (SBisson)
Niklas Laxström (Nikerabbit)
Thiemo Kreuz (thiemowmde)
Subramanya Sastry (ssastry)
This, that and the other (TTO)
Manuel Aróstegui (Marostegui)
Bartosz Dziewoński (matmarex)
James D. Forrester (Jdforrester-WMF)


Until next time,

– Timo Tijhof

Further reading:


[1] Incidents. – https://wikitech.wikimedia.org/wiki/Special:AllPages?from=Incident+documentation%2F20180809&to=Incident+documentation%2F20180922&namespace=0
[2] Tasks closed. – https://phabricator.wikimedia.org/maniphest/query/wOuWkMNsZheu/#R
[3] Tasks opened. – https://phabricator.wikimedia.org/maniphest/query/6HpdI76rfuDg/#R
[4] Quiz on Wikiversity. – https://en.wikiversity.org/wiki/How_things_work_college_course/Conceptual_physics_wikiquizzes/Velocity_and_acceleration
[5] Operate multiple datacenters. – https://www.mediawiki.org/wiki/Requests_for_comment/Master-slave_datacenter_strategy_for_MediaWiki

Commons app Project Grant proposal

17:34, Wednesday, 28 2018 November UTC

Hi folks,

Hope you’ve all been well! 🙂 We (the Commons app team) are applying for a Project Grant to fund the development of v3.0 of the Commons Android app. At the moment, we’re approaching completion of our 2nd Individual Engagement Grant, having implemented several major new features, e.g. a revamped map of “nearby places that need photos” with direct uploads and Wikidata integration, user talk notifications, browsing of other Commons pictures with focus on featured images, and 2FA logins. We currently have 4000+ active installs, and 15,000+ distinct images uploaded via our app have been used in Wikimedia articles. In the last 6 months alone, 21,241 files were uploaded via our app, and only 1738 (8.2%) of those files required deletion. We are also proud to report that we have a vibrant, diverse community of volunteers on our GitHub repository, and that we have increased our global user coverage since our first grant.

It has been a rocky road this year, however. One of the major issues we faced was that a large portion of our codebase is based on sparsely-documented legacy code from the very first incarnation of the app 5 years ago (a long time in the Android development world), leading to unpredictable behavior and bugs. We eventually found ourselves in a position where new features built on top of legacy code were causing other features to not work correctly, and even fixes to those problems sometimes had side effects that caused other problems. (My sincerest apologies to users for the inconveniences that they were caused!)

In view of that, our Project Grant proposal focuses on these areas:

  • Increasing app stability and code quality: We plan to overhaul our legacy backend to adhere to modern best practices, reduce complexity and dependencies in our codebase, and introduce test-driven development for the first time.
  • Targeted acquisition of photos for places that need them: The “Nearby places that need photos” feature has come a long way, but there is still plenty of room for improvement. We plan to introduce new quality-of-life features (e.g. by implementing filters and bookmarks) and fix a few outstanding bugs to make it more user-friendly and convenient to use. We will also complete the final link in the chain of collecting photos for Wikipedia articles that lack them by prompting users to add their recently-uploaded photo to the relevant Wikipedia article.
  • Increasing user acquisition in the Global South: We plan to implement a “limited connectivity” mode, allow pausing and resuming of uploads, and put more time and effort into outreach and socializing the app, especially to underrepresented communities.
  • We also wish to continue to assist the Commons community to reduce vandalism and improve usability of images uploaded. This will be done by implementing selfie detection, and a “to-do” system that reminds users if an image lacks a description/categories.

Your feedback is important to us! Please do take a look at our proposal, and feel free to let us know what you think on the Discussion page, and/or endorse the proposal if you see fit. If you would like to be part of the project, new volunteers and additions to our diverse team are always welcome – please visit our GitHub repository and say “Hi”. 🙂

We want to thank everyone who has cheered us on and supported us throughout the years. As a community-maintained app, we wouldn’t be here without you.

At Wiki Education, we believe education is a fundamental right, that knowledge should be freely available for all, and that diverse voices must be involved in the creation and curation of that knowledge. These values are at the basis of our strategy and guide all the work that we do. If they are values that also resonate with you, consider supporting Wiki Education this year on Giving Tuesday. Here’s how we’re making progress towards these ideals.

Education is a fundamental right

We have a track record for systematically improving one of the top resources for public knowledge. Students in our Wikipedia Student Program have added more than 47 million words to Wikipedia since 2010. That’s 68,000 Wikipedia articles created or expanded and a significant percentage of new and improved Wikipedia content. Each year we are able to support thousands more instructors and students in this mission to improve Wikipedia’s information. Your donation will help us expand our impact even further and bring more information to learners around the world.

Knowledge must be freely available for the benefit of all

Students, researchers, and professionals are uniquely positioned to add content to Wikipedia because they have access to institutional sources often restricted behind paywalls. For students in our Wikipedia Student Program, summarizing this paywalled information in relevant Wikipedia articles inspires increased motivation and a sense of digital citizenship. For scholars who take our professional development courses, adding this content awards an opportunity for public scholarship and the dissemination of the latest research in one’s field. Your donation will support greater equality of access to information.

Diverse participation and representation are essential

It matters who writes history. Currently, 80-90% of volunteers who curate information on Wikipedia identify as men. And much of the information on Wikipedia about women and marginalized groups is underdeveloped, or simply doesn’t exist at all. Through our programs, we are not only attempting to close these gaps in content, but also to engage more diverse voices in the production of knowledge on the world’s favorite online reference source. 60% of student editors in our program identify as women, for instance. And our professional development courses equip academics, researchers, and professionals with tools to impact public knowledge on a large scale in specialized areas that are underdeveloped on Wikipedia. Your donation will help us bring more diversity to and create more equitable opportunities for engagement on Wikipedia.

You can donate online at wikiedu.org/donate. If you do, we encourage you to let your friends know on social media know by posting “Join me in supporting @WikiEducation on #GivingTuesday. https://wikiedu.org/donate”

Wiki Education is a 501(c)(3) nonprofit organization who publishes our financials online to demonstrate our commitment to strong fiscal practices and transparency. Please make your gift today.

Bring in 'da noise, bring in defunct. It's a zombie party!

22:06, Tuesday, 27 2018 November UTC

Halloween is a full two weeks behind us here in the United States, but it's still on my mind. It happens to be my favorite holiday, and I receive it both gleefully and somberly.

Some of the more obvious and delightful ways I appreciate Halloween include: busting out my giant spider to hang in the front yard; getting messy with gory and gaudy decorations; scaring neighborhood children; stuffing candy in my face. What's not to like about all that, really?

But there are more deeply felt reasons to appreciate Halloween, reasons that aren't often fully internalized or even discussed. Rooted in its pagan Celtic traditions and echoed by similar traditions worldwide, like Día de los Muertos of Mexico and Obon of Japan, Halloween asks us, for a night, to put away our timidness about living and dying. It asks us to turn toward the growing darkness of winter, turn toward the ones we've lost, turn toward the decay of our own bodies, and honor these very real experiences as equal partners to the light, birth, and growth embodied by our everyday expectations. More precisely it asks us to turn toward these often difficult aspects of life not with hesitation or fear but with strength, jubilation, a sense of humor. It is this brave posture of Halloween's traditions that I appreciate so very much.

So Halloween is over and I'm looking back. What does that have to do with anything here at WMF and in Phabricator no less? Well, I want to take you into another dark and ominous cauldron of our experience that most would rather just forget about.

I want to show you some Continuous Integration build metrics for the month of October!

Will we see darkness? Oh yes. Will we see decay? Surely. Was that an awkward transition to the real subject of this post? Yep! Sorry, but I just had to have a thematic introduction, and brace yourself with a sigh because the theme will continue.


You see this past October, Release Engineering battled a HORDE OF ZOMBIE CONTAINERS! And we'll be seeing in our metrics proof that this horde was, for longer than anyone wishes zombies to ever hang around, chowing down on the brains of our CI.

Before I get to the zombies, let's look briefly at a big picture view of last month's build durations... Let's also get just a bit more serious.

Daily 75th, 95th, and 98th percentiles for successful build durations – October 2018

What are we looking at? We're looking at statistics for build durations. The above chart plots the daily 75th, 95th, and 98th percentiles of successful build durations during the month of October as well as the number of job configuration changes made within the same range of time.

These data points were chosen for a few reasons.

First, percentiles are used over daily means to better represent what the vast majority of users experience when they're waiting on CI[1]. It excludes outliers, build durations that occur only about 2 percent of the time, not because they're unimportant to us, but because setting them aside temporarily allows us to find patterns of most common use and issues that might otherwise be obfuscated by the extra noise of extraordinarily long builds.

Next, three percentiles were chosen so that we might look for patterns among both faster builds and the longer running ones. Practically this means we can measure the effects of our changes on the chosen percentiles independently, and if we make changes to improve the build durations of jobs that typically perform closer to one percentile, we can measure the effect discreetly while also making sure performance at other percentiles has not regressed.

Finally, job configuration changes are plotted alongside daily duration percentiles to help find indications of whether our changes to integration/config during October had an impact on overall build performance. Of course, measuring the exact impact of these changes is quite a bit more difficult and requires the build data used to populate this chart to be classified and analyzed much further—as we'll see later—but having the extra information there is an important first step.

So what can we see in this chart? Well, let's start with that very conspicuous dip smack dab in the middle.

Daily 75th, 95th, and 98th percentiles for successful build durations – dip around 10/14

And for background, another short thematic interlude:

Back in June, @thcipriani of Release Engineering was waiting on a particularly long build to complete—it was a "dark and stormy night" or something, *sighs and rolls eyes*—and during his investigation on the labs instance that was running the build, he noticed a curious thing: There was a Docker container just chugging away running a build that had started more than 6 hours prior, a build that had thought to be canceled and reaped by Jenkins, a build that should have been long dead but was sitting there very much undead and seemingly loving its long and private binge before the terminal specter of a meat-space man had so rudely interrupted.

"It's a zombie container," @thcipriani (probably) muttered as he felt his way backward on outstretched fingertips (ctrl-ccccc), logged out, and filed task T198517 to which @hashar soon replied and offered a rational but disturbing explanation.

I'm not going to explain the why in its entirety but you can read more about it in the comments of an associated task, T176747, and the links posted therein. I will, however, briefly explain what I mean by "zombie container."

A zombie container for the sake of this post is not strictly a zombie process in the POSIX sense, but means that a build's main process is still running, even after Jenkins has told it to stop. It is both taking up some amount of valuable host resources (CPU, memory, or disk space), and is invisible to anyone looking only at the monitoring interfaces of Gerrit, Zuul, or Jenkins.

We didn't see much evidence of these zombie containers having enough impact on the overall system to demand dropping other priorities—and to be perfectly honest, I half assumed that Tyler's account had simply been due to madness after ingesting a bad batch of homebrew honey mead—but the data shows that they continued to lurk and that they may have even proliferated under the generally increasing load on CI. By early October, these zombie containers were wreaking absolute havoc—compounded by the way our CI system deals with chains of dependent builds and superseding patchsets—and it was clear that hunting them down should be a priority.

Task T198517 was claimed and conquered, and to the dismay of zombie containers across CI:

Two integration/config patches were deployed to fix the issue. The first refactored all Docker based jobs to invoke docker run via a common builder. The second adds to the common docker-run builder the --init option which ensures a PID 1 within the container that will properly reap child processes and forward signals, and --label options which tag the running containers with the job name and build number; it also implements an additional safety measure, a docker-reap-containers post-build script that kills any running containers that could be errantly running at the end of the build (using the added labels to filter for only the build's containers).

Between the deployed fix and periodically running a manual process to kill off long-running containers that were started prior to the fix being deployed, I think we may be out of the woods for now.

Looking again at that dip in the percentiles chart, a few things are clear.

Daily 75th, 95th, and 98th percentiles for successful build durations – dip around 10/14

There's a noticeable drop among all three daily duration percentiles. Second, there also seems to be a decrease in both the variance of each day's percentile average expressed by the plotted error bars—remember that our percentile precision demands we average multiple values for each percentile/day—and the day-to-day differences in plotted percentiles after the dip. And lastly, the dip strongly coincides with the job configuration changes that were made to resolve T198517.


Say what? Oh. Right. I guess we didn't adequately measure exactly how much of an improvement in duration there was pre-and-post T198517 and whether or not there was unnoticed/unanticipated regression. Let's pause on that celebration and look a little deeper.

So how does one get a bigger picture of overall CI build durations before and after a change? Or of categories within any real and highly heterogeneous performance data for that matter? I did not have a good answer to this question, so I went searching and I found a lovely blog post on analyzing DNS performance across various geo-distributed servers[2]. It's a great read really, and talks about a specific statistical tool that seemed like it might be useful in our case: The logarithmic percentile histogram.

"I like the way you talk..." Yes, it's a fancy name, but it's pretty simple when broken down... backwards, because, well, English.

A histogram shows the distribution of one quantitative variable in a dataset, in our case build duration, across various 'buckets'. A percentile histogram buckets values for the variable of the histogram by its percentiles, and a logarithmic percentile histogram plots the distribution of values across percentile buckets on a logarithmic scale.

I think it's a bit easier to show than to describe, so here's our plot of build duration percentiles before and after T198517 was resolved, represented as a histogram on a logarithmic scale.

High-to-low percentiles before and after the zombie container issue was resolved

First, note that while we ranked build durations low to high in our other chart, this one presents a high-to-low ranking, meaning that longer durations (slower builds) are ranked within lower percentiles and shorter durations (faster builds) are ranked in higher percentiles. This better fits the logarithmic scale, and more importantly it brings the lowest percentiles (the slowest durations) into focus, letting us see where the biggest gains were made by resolving the zombie container issue.

Also valuable about this representation is the fact that it shows all percentiles, not just the three that we saw earlier in the chart of daily calculations, which shows us that gains were made consistently across the board and there are no notable regressions among the percentile ranks where it would matter—there is a small section of the plot that shows percentiles of post-T198517 durations being slighter higher (slower), but this is among some of the percentiles for the very fastest of builds where the absolute values of differences are very small and perhaps not even statistically significant.

Looking at the percentage gains annotated parenthetically in the plot, we can see major gains at the 0.2, 1, 2, 10, 25, and 50th percentiles. Here they are as a table.

percentile duration w/ zombies w/o zombies gain from killing zombies
p0.2 43.3 minutes 39.3 minutes -9.2%
p1 34.0 26.5 -22.2%
p2 27.7 22.2 -19.7%
p10 17.6 12.7 -27.9%
p25 11.0 7.2 -34.4%
p50 5.3 3.4 -36.9%

So there it is quite plain, a CI world with and without zombie containers, and builds running upwards of 37% faster without those zombies chomping away at our brains! It's demonstrably a better world without them I'd say, but you be the judge; We all have different tastes. 8D

Now celebrate or don't celebrate accordingly!

Oh and please have at the data[3] yourself if you're interested in it. Better yet, find all the ways I screwed up and let me know! It was all done in a giant Google Sheet—that might crash your browser—because, well, I don't know R! (Side note: someone please teach me how to use R.)


[1] https://www.dynatrace.com/news/blog/why-averages-suck-and-percentiles-are-great/
[2] https://blog.apnic.net/2017/11/24/dns-performance-metrics-logarithmic-percentile-histogram/
[3] https://docs.google.com/spreadsheets/d/1-HLTy8Z4OqatLnufFEszbqkS141MBXJNEPZQScDD1hQ/edit#gid=1462593305


Thanks to @thcipriani and @greg for their review of this post!

//"DOCKER ZOMBIE" is a derivative of https://linux.pictures/projects/dark-docker-picture-in-playing-cards-style and shared under the same idgaf license as original https://linux.pictures/about. It was inspired by but not expressly derived from a different work by drewdomkus https://flickr.com/photos/drewdomkus/3146756158//

UCL Arts and Sciences undergraduates working with Wikimedia UK – image by Carl Gombrich, with permission.

Professor Carl Gombrich, Programme Director for UCL’s new interdiscipliniary course, BA Arts and Sciences, approached Wikimedia UK early this year to talk about his interest in using a Wikimedia element in the Approaches to Knowledge module of the degree.

This semester, the course began and 150 students are now working on creating chapters for an Open Educational Resources book which will be constructed by the students on Wikibooks, and then published by UCL Press, the Open Access publishing journal that UCL has recently established.

After initially discussing the use of Wikipedia itself as the basis for the course, it was decided that it would be hard to assess the contributions of a large number of students using Wikipedia. Contributions are more likely to get deleted, and the students would likely be looking at improving only a small number of quite core Wikipedia pages related to epistemology. So it was decided to have them collaboratively create a book together on Wikibooks, so that students could still gain an insight into how open source platforms like the Wikimedia projects, function.

UCL is interested in what working with Wikimedia projects can teach students in terms of research and academic skills, and the media literacy which comes with a deeper understanding of the guidelines for Wikimedia projects. They also liked the idea of being able to make a textbook and the meta-approach of people creating knowledge about knowledge.

Dr Richard Nevell has been helping as a volunteer, and Wikimedian Katie Chan did a training session for staff on Wikipedia and Wikibooks before the course began. Hannah Evans gave an opening lecture for the course before an initial workshop where students got into teams to decide what subject area they would work on.

The groups could choose from:

  • Knowledge and imperialism
  • Knowledge and truth
  • Knowledge and evidence

The groups will write chapters of 1200 words. These will all go on Wikibooks, and the best ones will be collected into a book which will also be published by UCL Press, the UCL Open Access repository. The project will also tie into a UCL education conference on April 1, 2019, where students will be presenting about the work they are doing.

Wikimedia UK is now working with many different universities across the country, and you can read more about what different courses are doing with Wikimedia projects on our website.

Version 2.9 beta

14:31, Tuesday, 27 2018 November UTC

Version 2.9 of the Commons app is out for beta testing on the Google Play Store! \o/ You can register for beta testing here. If you experience any bugs or crashes, or if you find any aspect of the new features to be unwieldy, please do let us know on GitHub. There are several extensive UI changes in this patch, but with your help, we hope to make the transition as smooth as possible for everyone.

New features

New main screen UI

Our new main screen UI features a more prominent Nearby tab and a floating action button for uploads. It also displays the nearest place that needs pictures (if location permissions and GPS are enabled), and alerts you if you have user talk messages that have not been read.

New upload UI with multiple uploads

You can now do multiple uploads within the app itself, without needing to use the stock gallery workaround! You will need to input a title and description for every image, but can select categories once for all. This design decision was reached because we did not want to risk vandalism where someone uploads their entire camera roll in one shot, but we also wanted to provide some conveniences for genuine multiple uploaders. We may reconsider this in the future depending on feedback.

Oh, and we have a new upload UI! 🙂

“Send Log file” option

We have revamped the “send log file” feature, which was not generating anything useful previously. If you are experiencing a bug, it would help us greatly with solving it if you send us your log file shortly after the bug occurs. To do this, go to Settings > Send Log file, and follow the instructions there. As mentioned in the subtext, please note that this may potentially reveal identifying information about yourself to us, so only do this if you are comfortable with that possibility. Logs are sent to a private mailing list that only developers who have signed a NDA with WMF can access, and we have made every effort to sanitize the logs. However, things like your location etc may still be revealed in them.

Major bugfixes

Fixed issues with wrong “image taken” date

There was a bug with v2.8.x releases where the upload date was used in the date template of images, instead of the date the image was taken. This has now been fixed.

Fixed default zoom level in Nearby map

Previously, the default zoom level in the Nearby map was too high, meaning that you would need to zoom in a lot before the map was usable. It should now be fixed to a more reasonable level.


Today, Giving Tuesday, the Wikimedia Foundation begins its annual banner campaign on the English Wikipedia, inviting anyone who values Wikipedia to join us on our journey and support  its continued growth and evolution as the world’s free knowledge resource. Banners will appear on the English Wikipedia asking our readers to consider contributing to the site with a donation.

Wikipedia is the only top website that is supported entirely by a nonprofit. Our fundraising appeal is an opportunity for us to remind everyone what makes Wikipedia possible. At its core, Wikipedia runs on generosity: the generous volunteers, readers, editors, and donors who find value in Wikipedia and want to give something back to it.

When you give back to Wikipedia, you’re not only supporting its continued growth and longevity. You’re also supporting the sustaining values and vision behind Wikipedia: free, neutral, and fact-based knowledge for every person, everywhere.

While Wikipedia continues to reliably answer billions of the world’s questions each year, here at the Wikimedia Foundation, we focus on solving the challenges we still face. But nearly eighteen years in, we’re still a long way from achieving our vision—Wikipedia does not yet serve the whole world. We’re not even close. So this year, our goal is to improve Wikipedia’s usability. We want to expand access and participation in emerging economies, and welcome more women and non-binary readers and editors to our ranks. We will advocate for online privacy and freedom of information to protect the policies that allow Wikipedia and the broader free internet ecosystem to thrive. As long as the internet exists, we pledge that Wikipedia will strive to make it a better place.

Here’s some of the work we’ve done in the past year, thanks to our generous donors, and a few things we’re hoping to achieve in those to come.

We are expanding access to Wikipedia’s knowledge.

Jakarta is almost 14,000 kilometers from San Francisco.  Bangalore is nearly 8,000 kilometers from the Netherlands. That’s why we opened a new datacenter in Singapore, which has decreased the time required to deliver a page throughout the Asia-Pacific region—in Singapore itself, we’ve seen an improvement of over 40 percent. For those with little to no internet access, we’re working to bridge the gap with an offline medical pilot that will deploy Wi-Fi hotspots (“Internet-in-a-Box“) to healthcare providers in all 36 states in Nigeria. These boxes use Kiwix, a way of serving content and software to individuals and other NGOs to support people with little or no internet access, and have already been deployed and studied in the Dominican Republic, Guatemala, and in Syrian refugee camps.

We are making it easier for people to translate content into their own languages.

Many science fiction stories provide for a ‘universal translator’, often using it as a convenient plot device to quickly allow individuals from two or more species to communicate from their first words. Unfortunately, we here on Earth-prime haven’t yet developed such a thing, and so translation across Wikipedias in hundreds of languages is critically important: it allows multilingual editors to re-use efforts made by other volunteer editors, thereby lowering the cost of spreading knowledge across the world.

This is why we built a content translation tool—it simplifies the translation process, making it easier to bring the sum of all knowledge into your own languages. The tool has been used to create more than 350,000 Wikipedia articles since launching in January 2015, and its success led us to construct a new, improved version, which we beta-released this year.

We support efforts to address Wikipedia’s gender bias.

Yes, Wikipedia has a gender bias. While we strive for Wikipedia’s collection of knowledge to contain the full, rich diversity of all humanity, the same requirements that help ensure that what you read on the encyclopedia is accurate and verifiable also systemically disadvantage underrepresented groups of people, including women. We’re working to change this by adding criteria to make our events more inclusive, issuing grants to volunteers to solicit and support ideas to address these issues (including the first Wikipedian-in-Residence for gender equity), funding hundreds of in-person events that support increased gender diversity, and commissioning a report on the current state of our gender equity efforts. We also vocally support the work of volunteer editors like Jess Wade, a physicist working to increase the number of underrepresented scientists and engineers on Wikipedia.

We are working to develop and understand the pros and cons of AI.

Artificial intelligences (AIs) have grown ever more popular in recent years and have proven to be very useful in a wide variety of contexts. We see them as a potential boost for Wikipedia, taking some of the burden of maintaining and curating the encyclopedia off its core of volunteer editors. However, AIs have also proved that they have the potential to exacerbate existing inequalities and limit the diversity of our participants. That’s why we’ve joined the Partnership on AI, a coalition of partners working together to better understand and draft best practices on the impact of artificial intelligence technologies on people and the planet, and designed a tool to make the work of critiquing our AIs easier, by providing standardized ways to agree or disagree with an AI’s judgement.


Thank you to the millions of donors and editors who made the past year’s accomplishments possible. We are far from finished with our work. There is much left to do, and we’re excited to address these new challenges and more as we enter into 2019. If you care about this work, please support it. Visit donate.wikimedia.org to make a contribution today.

Pats Pena, Director of Payments and Operations, Advancement
Wikimedia Foundation

Our thanks to Brandon Black, Pau Giner, and Aaron Halfaker for contributing content above.

Understanding masculinity from a sociological perspective

19:23, Monday, 26 2018 November UTC

November 25th was International Day for the Elimination of Violence against Women this year. Part of moving towards a more equitable and safe future for all humans is to increase awareness of and reduce stigma around the violence faced by women around the world. That includes understanding how violence pervades cultures in subtle, as well as obvious ways. Movements like that of Me Too in the United States have brought conversations around everyday violence into public cultural conversations this year. Understanding how power structures, violence, and gender relate to each other can effect change.

Just one year ago when news of Harvey Weinstein and the actions of other powerful men across industries broke, Google searches related to toxic and healthy masculinity spiked. So when a sociologist in one of our professional development courses decided to improve the Wikipedia article about masculinity as part of our course just a few months later, the article was receiving around one thousand pageviews a day. Dr. Michael Ramirez took a sociologist’s lens to his article improvements and included more information about how conceptions of masculinity have changed over time and how they differ based on cultural context. Understanding how identity forms, the pressures people feel to conform to certain definitions of identity, and how the forces influencing identity can be toxic are each important for reducing the power of toxic expressions of masculinity. And that allows for more people to live safe, healthy lives without perpetuating or experiencing violence.

“We are ‘experts’ and as such, we should use our expertise for the greater good,” writes American Sociological Association member Dr. Ramirez in a reflection about his course experience. “Sociological perspectives are far too often overlooked in the creation of knowledge and understanding of world issues. We can partly remedy this situation by actively incorporating our knowledge into public venues such as Wikipedia.”

We are currently accepting applicants for an upcoming professional development course beginning in January, which will train scholars of diverse backgrounds to improve Wikipedia’s coverage of the history of women’s voting rights. For more information and to apply, visit: bit.ly/NARAwiki

Header image: File:Herakles Farnese MAN Napoli Inv6001 n01.jpg, Farnese Collection, Marie-Lan Nguyen, CC BY 2.5, via Wikimedia Commons.

QUnit anti-patterns

05:27, Monday, 26 2018 November UTC

Today, I’d like to challenge the assert.ok and assert.not* methods. I believe they’ve become an anti-pattern.


Using assert.ok() indicates one of two problems:

  • The software, or testing strategy, is unreliable. (Unsure what value to expect.)
  • The author is using it as shortcut for a proper comparison.

The former necessitates improvement to the code being tested. The latter comes with two additional caveats:

  1. Less debug information. (Inaccurate actual/expected diff). Without an expected value provided, one can’t determine what’s wrong with the value.
  2. Masking regressions. Even if the API being tested returns a proper boolean and ok is just a shortcut, the day the API breaks (e.g. returns a number, Promise, or other object) the test will not be able to catch this regression.

Common examples:

  // Meh...
assert.ok( result );
assert.ok( obj.fn );

// Better.
assert.equal( typeof obj.fn, 'function' );
assert.strictEqual( result, true );


Using assert.not*() indicates one of three problems:

  • The software is unreliable. (Value is indeterministic.)
  • The test uses an unreliable environment. (E.g. the input data is dynamic or variable, insufficient isolation or mocking.)
  • The author is using it as shortcut for a proper comparison.

Common example:

  var index = list.indexOf( item );

// Meh...
assert.notEqual( index, -1 );

// Better.
assert.equal( index, 2 );

// Even better?
assert.propEqual( index, [
] );

I’ve yet to see the first use of these assert methods that wouldn’t be improved by writing it a different way. I admit there are limited scenarios where assert.notEqual can’t be avoided in the short-term, for example when the intent is to detect a difference between two unpredictable return values. When calling a method such as Math.random() twice, one could use notEqual to assert the two return values differ. I still have my doubts about the value of such test, though. It’ll certainly be annoying when it randomly does produce the same value twice and cause test failure. In the mission of test coverage, my recommendation would be to instead assert that calling the method did not throw an exception, and perhaps assert the type and length of the return value, not its contents.

Also available on Medium.com

Tech News issue #48, 2018 (November 26, 2018)

00:00, Monday, 26 2018 November UTC
TriangleArrow-Left.svgprevious 2018, week 48 (Monday 26 November 2018) nextTriangleArrow-Right.svg
Other languages:
Deutsch • ‎English • ‎Tiếng Việt • ‎dansk • ‎español • ‎français • ‎italiano • ‎polski • ‎suomi • ‎svenska • ‎čeština • ‎русский • ‎українська • ‎עברית • ‎العربية • ‎فارسی • ‎مصرى • ‎हिन्दी • ‎মেইতেই লোন্ • ‎中文 • ‎日本語 • ‎한국어

weeklyOSM 435

18:16, Sunday, 25 2018 November UTC



SotM Asia 2018 group photo

State of the Map Asia 2018 brought together the mapping enthusiasts from around 12 countries across the globe. It was two days of interesting sessions and workshops. Find more information on the website and event tweets on this hashtag: #SotMAsia18.


  • Dave Swarthout wrote on the tagging mailing list that he just learnt about the power of multipolygon relations. He was applying his new knowledge by mapping bays in Alaska. However, he discovered the disadvantages of mapping very large structures, in his case the Cook Inlet that stretches over 290 km. He asked whether it would be easier to draw a simple shape rather than using a relation with dozens of member ways. However, the following lengthy discussion also brought a simple place node into play and focused on many other problems, mainly rendering. In the end the whole discussion did not result in solving the problem Dave brought up.
  • How to deal with multilingual names is a common question in OSM and there has been a lot of previous discussion about how best to deal with them. Here’s a discussion (de) (automatic translation) about bilingual “city limit” traffic signs in Austria from the Austrian mailing list last week.
  • Simon Poole estimates that half of all addresses in Switzerland have been mapped in OSM already.
  • According to a blog post from Anton Khorev the different floor numbering traditions from country to country make it difficult to understand values in level=. In his comprehensive post, he explains the issue and why it is not as easy to solve as it might appear.
  • The voting for office=diplomatic (formerly Consulate) is underway. It intends to distinguish between diplomatic, consular and other types of government-to-government liaison offices. The proposal was prepared by an expert in this area, an ambassador.
  • Jukka Nikulainen has brought his proposal for mapping tram tracks on highways to voting. Tram tracks are usually mapped on separate OSM ways, even when they are embedded in the highways. For cyclists this presents a danger, so it’s interesting to know where this is the case, so such streets can be avoided for bicycle routing.
  • François Lacombe is asking for comments on his proposal for additional attributes on pipeline and waterway valves.
  • Joseph Eisenberg is seeking advice on how to tag Neighborhood Gateway Signs as they exist in Indonesia and other parts of the globe like Portland and San Diego.


  • Pascal Neis, since October 2018 Professor for Geo-Government at Mainz University of Applied Sciences, made the average number of OSM contributors per country of the twelve-month rolling period available for download as a csv file. According to his tweet, this is the data he is using for osmstats.neis-one.org.
  • The Czech OSM community informs us about the establishment of the organisation OpenStreetMap Česká republika z.s.. The organisation has 14 founding members and will apply to become a local OSMF chapter once fully operational. Tomáš Kašpárek, Marián Kyral and Jakub Těšínský were elected as members of the first board.
  • Jakob Miksch prepared an appealing description of OSM, covering the data model, tags, editing, visualisation and the licence, on his blog. Interesting for beginners!
  • From 5th till 9th of November OSM and GIS training sessions were held for International Humanitarian Organisations in Mali. They were organized at OCHA building by OpenStreetMap Mali community with the support from Francophony international volunteers in Mali.
  • OSM Diaries published a short YouTube video about Gregory Marler’s OSM editing history. In the video he compares editing OSM ten years ago with now.
  • OpenStreetMap Benin – with the support of its International Volunteer of La Francophonieorganised a 3-day workshop on OpenStreetMap digital mapping for young women from various socio-professional backgrounds. The training took place at the Francophone Digital Campus in Cotonou.

OpenStreetMap Foundation

  • Rob Nickerson asked the OSMF representation about the status of the Fee Waiver Program, which was decided in 2014 with target date for implementing it by the end of that same year.
  • Paul Normann officially announced the OSMF Fee Waiver Program on the OSMF mailing list on November 15th. The fee waiver program allows those to join who couldn’t previously due to a lack of reasonably priced money transfer options or for whom the cost is prohibitive compared to incomes in their part of the world.
  • The minutes of the Licence Working Group meeting of November 8 has been published. One of the topics was the Directive on dealing with DMCA copyright complaints.
  • The sponsorship for the server that is used to build JOSM and offers the download ends soon. The German local OSMF chapter FOSSGIS has received a request for funding (de) (automatic translation) for a new one. A new server is needed anyway as the old one has reached its limits.
  • Michael Collinson published the official set of questions to the candidates for the upcoming OSMF board elections. The seven candidates are asked to submit their answers and manifesto via email by 30 November. They will be published together.
  • Christoph Hormann describes his impressions of the OSMF board meeting on 15 November in a blog post that he titled The most surreal and memorable OSMF board meeting yet. He names the person he thinks is in charge of softening the Organised Editing Guidelines and criticises that a policy draft from an anonymous company remains locked away.
  • The Data Working Group of the OSM Foundation published its decision that the Crimean peninsula belongs to Russia – OSM border data-wise. The decision lays down rules for the (non-) use of addr:country=* as well as requirements for changeset comments in the region. This decision replaces the previous rules.


  • SotM Asia happened this weekend! It was attended by around 280 participants from 12 countries around the globe for two days filled with enthusiasm, excitement, interesting sessions, panels, workshops related to using OpenStreetMap for conservation, mapping, data visualisation and everything maps. The organising team is thankful to one and all for making this event a success!The SotM Asia Twitter account gives you an idea of the presentations and talks held as well as many visual impressions. It is planned to upload the presentations on a dedicated YouTube channel, so stay tuned!
  • The FOSDEM 2019 in Brussels, a long standing free-software conference that attracts 6000+ developers, will take place on the 2nd and 3rd of February 2019. For a few years the conference has included a Geospatial devroom, which is now calling for talk proposals.
  • More than 110 people, invited by Doctors Without Borders, came (fr) (automatic translation) on Friday 16 November to the UPNA to map two crisis protected areas in Caracas and Niger by satellite imagery.

Humanitarian OSM

  • Melanie Eckle has published the minutes of the HOT board meeting of 8th November. It was the first board meeting that was open to the membership.
  • HOT is using open source tools to fight against malaria in Guatemala. In an article HOT explains how they partner with the Clinton Health Access Initiative and the Guatemala Ministry of Health. HOT added more than 1,600 buildings to OpenStreetMap in the area of Escuintla to help the coordination of indoor residual spraying.
  • On 28th November CartONG and the refugee community SINGA Grenoble invite (fr) (automatic translation) you to a special mapathon in La Coop. The mapping is done in pairs so that people with a migrant background or hardly any technical know-how can expand their knowledge together.


  • With the newest update, OpenStreetBrowser supports drawing patterns (e.g. arrow heads) on ways, which is now used on the cycle routes map and the public transport map.
    The cycle routes map gets the directions from the ways’ role (forward/backward), the public transport map from the connection to the previous/next way (as defined in PT Schema v2). read more. To prepare for the Christmas season, there’s a new “Christmas features” category.


  • Gmaps, claiming to be the number one map module for ExpressionEngine, changed its name using the formula Gmaps – Google = Maps. Google’s fee increase and the cuts to its free services caused a shift by the developers towards open source and the support of a couple of map providers, including OSM.


  • Richard Fairhurst wrote that his bike route-planning website cycle.travel now also covers Scandinavia and parts of Eastern Europe.
  • The surveyor app StreetComplete reached 1000 stars on GitHub.


  • The GIScience HD github repository already contains over 50 open source repositories and it’s still growing. Most are somehow related to OSM, from routing, processing, managing, analysing to visualising, etc. Who wants to participate? https://github.com/GIScience
  • If you’ve always wanted the upload of edits to be faster, now’s your chance to help. The upload of changes has been completely reimplemented and testers are wanted (de)(automatic translation) to test the code with iD, Potlatch or JOSM. mmd also published instructions on how to help with testing in English, so not speaking German is no longer an excuse 😉


  • Martijn van Exel upgraded MapRoulette to version 3.1.1 on maproulette.org. In his user blog Martijn explains the new features, which include the availability of Mapillary images on MapRoulette, rebuilding tasks and a public Leaderboard for each challenge
  • Wambacher has updated his comprehensive listing of OSM related software. Most recent updates include Naviki Android 3.1810.1, OpenStreetCam Android , Traccar Server 4.2 and Vespucci 11.2.0.
  • Potlatch 2 is still alive and actively maintained. Richard Fairhurst announced two small updates.

Did you know …

  • … the service Shareloc that allows you to create and share OpenLayers maps?
  • … the German city Schnöggersburg? The city, which is used as a training ground for armed forces, was featured by a news magazine (de) (automatic translation) in 2012. A mapper “rediscovered” the city which sparked a new discussion (de) (automatic translation) in the German forum about tagging and mapping of military installations.

Other “geo” things

  • Doug Rinckes launched a Plus code grid service on grid.plus.codes. The Plus code system, or the Open Location Code as it is officially called, was originally developed by Google. The system is seen as controversial in OSM and has caused heated discussions on GitHub, the mailing list and other places.
  • The New York Times published an article titled A Map of Every Building in America that provides different perspectives on the development of settlements in the USA and the arrangement of buildings in that context. The article discusses maps at different scales, from a nationwide figure-ground diagram to a map of the squiggly suburbs in Mesa, Arizona. A behind-the-scenes article in the New York Times covers the story of how the maps were drawn using the building footprints that Microsoft released this year (and are available for use in OSM as well).
  • The biking website road.cc has tested the bar-mounted GPS Hammerhead Karoo. In the article the author, Dave Atkinson, explains why the device is the best available GPS for its purpose, nevertheless he thinks there is still room for improvement.

Upcoming Events

Where What When Country
Melbourne FOSS4G SotM Oceania 2018 2018-11-20-2018-11-23 australia
Lübeck Lübecker Mappertreffen 2018-11-22 germany
Alajuela ES:State of the Map Costa Rica 2018-11-23-2018-11-25 costa rica
Manila 【MapaTime!】 2018-11-24 philippines
Dublin Monthly Mapping Party 2018-11-24 ireland
Ivrea Incontro mensile 2018-11-24 italy
Graz Stammtisch Graz 2018-11-26 austria
Bremen Bremer Mappertreffen 2018-11-26 germany
Arlon Espace public numérique d’Arlon – Formation Contribuer à OpenStreetMap 2018-11-27 belgium
Reutti Stammtisch Ulmer Alb 2018-11-27 germany
Düsseldorf Stammtisch 2018-11-28 germany
San José Civic Hack Night & Map Night[1] 2018-11-29 united states
Tångstad Foundation board elections discussion period opens 2018-11-30
Minamishimabara-shi 南島原マッピングパーティ #1 「世界文化遺産『原城』をマッピングしよう!」 2018-12-01 japan
Toronto Mappy Hour 2018-12-03 canada
London London Missing Maps Mapathon 2018-12-04 united kingdom
Praha – Brno – Ostrava Kvartální pivo 2018-12-05 czech republic
Stuttgart Stuttgarter Stammtisch 2018-12-05 germany
Toulouse Rencontre mensuelle 2018-12-05 france
Bochum Mappertreffen 2018-12-06 germany
Dresden Stammtisch Dresden 2018-12-06 germany
Tångstad Foundation board elections voting opens 2018-12-08
Rennes Réunion mensuelle 2018-12-10 france
Lyon Rencontre mensuelle pour tous 2018-12-11 france
Zurich Jubilee Stammtisch Zurich with Fondue 2018-12-11 switzerland
online via IRC Foundation Annual General Meeting 2018-12-15 everywhere
Heidelberg State of the Map 2019 (international conference) 2019-09-21-2019-09-23 germany

Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please check your event in our public calendar preview and correct it, where appropriate.

This weeklyOSM was produced by LuxuryCoop, Nakaner, PierZen, Polyglot, Rogehm, SunCobalt, TheSwavu, YoViajo, derFred, geologist, jinalfoflia.

Hacking vs Editing, Wikipedia & Declan Donnelly

21:20, Thursday, 22 2018 November UTC

On the 18th of November 2018 the Wikipedia article for Declan Donnelly was edited and vandalised. Vandalism isn’t new on Wikipedia, it happens to all sorts of articles throughout every day. A few minutes after the vandalism the change made its way to Twitter and from there on to some media outlets such as thesun.co.uk and  metro.co.uk the following day, with another headline scaremongering and misleading using the word “hack”.

“I’m A Celebrity fans hack Declan Donnelly by changing his height on Wikipedia after Holly Willoughby mocks him”

Hacking has nothing to do with it. One of the definitions of hacking is to “gain unauthorized access to data in a system or computer”. What actually happened is someone, somewhere, edited the article, which everyone is able and authorized  to do. Editing is a feature, and its the main action that happens on Wikipedia.

The word ‘hack’ used to mean something, and hackers were known for their technical brilliance and creativity. Now, literally anything is a hack — anything — to the point where the term is meaningless, and should be retired.

The word ‘hack’ is meaningless and should be retired – 15 June 2018 by MATTHEW HUGHES

The edit that triggered the story can be seen below. It adds a few words to the lead paragraph of the article at 22:04 and was reverted at 22:19 giving it 15 minutes of life on the site.

The resulting news coverage increased the traffic to the article quite dramatically, going from just 500-1000 views a day to 27,000-29,000 for the 2 days following then slowly subsiding to 12,000 and 9,800 on day 4. This is similar to the uptick in traffic caused by a youtube video I spotted some time ago, but realistically these upticks happen pretty much every day for various articles for various reasons.

Wikimedia pageviews tool for Declan Donnelly article

I posted about David Cameron’s Wikipedia page back in 2015 when another vandalism edit made some slightly more dramatic changes to the page. Unfortunately the page views tool for Wikimedia projects doesn’t have readily available data going back that far.

Maybe one day people will stop vandalising Wikipedia… Maybe one day people will stop reported everything that happens online as a “hack”.

The post Hacking vs Editing, Wikipedia & Declan Donnelly appeared first on Addshore.

Incident Documentation: An Unexpected Journey

18:06, Thursday, 22 2018 November UTC


The Release Engineering team wants to continually improve the quality of our software over time. One of the ways in which we hoped to do that this year is by creating more useful Selenium smoke tests. (From now on, test will be used instead of Selenium test.) This blog post is about how we determined where the tests should focus and the relative priority.

At first, I thought this would be a trivial task. A few hours of work. A few days at most. A week or two if I've completely underestimated it. A couple of months later, I know I have completely underestimated it.

Things I needed to do:

  • Define prioritization scheme.
  • Prioritize target repositories.

Define Prioritization Scheme

In general:

  • Does a repository have stewards? (Do the stewards want tests?)
  • Does a repository have existing tests?

For the last year:

  • How much change did happen for a repository? Simply put: more change can lead to more risk.
  • How many incidents is a repository connected to? We wanted to make sure we didn't miss any obvious problematic areas.

Does a Repository Have Stewards?

This was relatively simple task. The best source of information is Developers/Maintainers page.

Does a Repository Have Existing Tests?

This was also easy. Selenium/Node.js page has list of repositories that have tests in Node.js. I already had all repositories with Node.js and Ruby tests on my machine, so a quick search for webdriverio (Node.js) and mediawiki_selenium (Ruby) found all the tests. In order to be really sure I've found all repositories with tests, I've cloned all repositories from Gerrit.

$ ack --json webdriverio
27:        "webdriverio": "4.12.0"
$ ack --type-add=lock:ext:lock --lock mediawiki_selenium
42:    mediawiki_selenium (1.7.3)

To make extra sure I have not missed any repositories, I've used MediaWiki code search (mediawiki_selenium, webdriverio) and GitHub search (org:wikimedia extension:lock mediawiki_selenium, org:wikimedia extension:json webdriverio)

This is the list.

Repository Language
mediawiki/core JavaScript
mediawiki/extensions/AdvancedSearch JavaScript
mediawiki/extensions/CentralAuth Ruby
mediawiki/extensions/CentralNotice Ruby
mediawiki/extensions/CirrusSearch JavaScript
mediawiki/extensions/Cite JavaScript
mediawiki/extensions/Echo JavaScript
mediawiki/extensions/ElectronPdfService JavaScript
mediawiki/extensions/GettingStarted Ruby
mediawiki/extensions/Math JavaScript
mediawiki/extensions/MobileFrontend Ruby
mediawiki/extensions/MultimediaViewer Ruby
mediawiki/extensions/Newsletter JavaScript
mediawiki/extensions/ORES JavaScript
mediawiki/extensions/Popups JavaScript
mediawiki/extensions/QuickSurveys Ruby
mediawiki/extensions/RelatedArticles JavaScript
mediawiki/extensions/RevisionSlider Ruby
mediawiki/extensions/TwoColConflict JavaScript, Ruby
mediawiki/extensions/Wikibase JavaScript, Ruby
mediawiki/extensions/WikibaseLexeme JavaScript, Ruby
mediawiki/extensions/WikimediaEvents PHP
mediawiki/skins/MinervaNeue Ruby
phab-deployment JavaScript
wikimedia/community-tech-tools Ruby
wikimedia/portals/deploy JavaScript

How Much Change Did Happen for a Repository?

After reviewing several tools, I've found that we already use Bitergia for various metrics. There is even a nice list of top 50 repositories by the number of commits. The tool even supports limiting the report from a date to a date. Exactly what I needed.

Bitergia > Last 90 days > Absolute > From 2017-11-01 00:00:00.000 > To 2018-10-31 23:59:59.999 > Go > Git > Overview > Repositories (raw data: P7776, direct link).

This is the top 50 list (excludes empty commits and bots).

Repository Commits
mediawiki/extensions 11300
operations/puppet 7988
mediawiki/core 4590
operations/mediawiki-config 4005
integration/config 1652
operations/software/librenms 1169
pywikibot/core 927
mediawiki/extensions/Wikibase 806
apps/android/wikipedia 789
mediawiki/services/parsoid 700
mediawiki/extensions/VisualEditor 692
operations/dns 653
VisualEditor/VisualEditor 599
mediawiki/skins 570
mediawiki/extensions/MobileFrontend 504
mediawiki/extensions/ContentTranslation 491
translatewiki 486
oojs/ui 469
wikimedia/fundraising/crm 457
mediawiki/extensions/BlueSpiceFoundation 414
mediawiki/extensions/CirrusSearch 357
mediawiki/extensions/AbuseFilter 306
phabricator/phabricator 302
mediawiki/services/restbase 290
mediawiki/extensions/Flow 232
mediawiki/extensions/Echo 223
mediawiki/vagrant 221
mediawiki/extensions/Popups 184
mediawiki/extensions/Translate 182
mediawiki/extensions/DonationInterface 180
analytics/refinery 178
mediawiki/extensions/PageTriage 177
mediawiki/extensions/Cargo 176
mediawiki/tools/codesniffer 156
mediawiki/extensions/TimedMediaHandler 152
mediawiki/extensions/UniversalLanguageSelector 142
mediawiki/vendor 140
mediawiki/extensions/SocialProfile 139
analytics/refinery/source 138
operations/software 137
mediawiki/services/restbase/deploy 136
operations/debs/pybal 123
mediawiki/extensions/CentralAuth 116
mediawiki/tools/release 116
mediawiki/services/cxserver 112
mediawiki/extensions/BlueSpiceExtensions 110
mediawiki/extensions/WikimediaEvents 110
labs/private 108
operations/debs/python-kafka 104
labs/tools/heritage 96

I've got similar results with running git rev-list for all repositories (script, results: P7834).

How Many Incidents Is a Repository Connected To?

This proved to be the most time consuming task.

I have started by reviewing existing incident documentation. Take a look at a few incidents. Can you tell which incident report is connected to which repository? I couldn't. (If you can, please let me know. I need your help.)

Incident reports are a wall of text. It was really hard for me to connect an incident report to a repository. An incident report has a title and text, example: 20180724-Train. Text has several sections, including Actionables. Text contains links to Gerrit patches and Phabricator tasks. (From now on, I'll use patches instead of Gerrit patches and tasks instead of Phabricator tasks.)

A patch belongs to a repository. Wikitext [[gerrit:448103]] is patch mediawiki/extensions/Wikibase/+/448103, so repository is mediawiki/extensions/Wikibase. That is the strongest link between an incident and a repository.

A task usually has patches associated with it. Wikitext [[phab:T181315]] is patch T181315. Gerrit search bug:T181315 finds many connected patches, many of them in operations/puppet and one in mediawiki/vagrant. That is an useful, but not a strong link between an incident and a repository. Some tasks have several related patches, so it provides a lot of data.

A task also usually has several tags. Most of them are not useful in this context, but tags that are components (and not for example milestones or tags) could be useful, if the component can be linked to a repository. It is also not a strong link between an incident and a repository, and it usually does not provide a lot of data.

At the end, I wrote a tool with imaginative name, Incident Documentation. The tool currently collects data from patches and tasks from Actionables section of the incident report. It does not collect data from task components. It is tracked as issue #5.

Incident Review 2017-11-01 to 2018-10-31

After reviewing Actionables section for each incident report, related patches and tasks, here are the results. Please note this table only connects incident report and repositories. It does not show how many patches from a repository are connected to an incident report. It is tracked as issue #11.

Repository Incidents
operations/puppet 22
mediawiki/core 6
operations/mediawiki-config 4
mediawiki/extensions/Wikibase 4
wikidata/query/rdf 2
operations/debs/pybal 2
mediawiki/extensions/ORES 2
integration/config 2
wikidata/query/blazegraph 1
operations/software 1
operations/dns 1
mediawiki/vagrant 1
mediawiki/tools/release 1
mediawiki/services/ores/deploy 1
mediawiki/services/eventstreams 1
mediawiki/extensions/WikibaseQualityConstraints 1
mediawiki/extensions/PropertySuggester 1
mediawiki/extensions/PageTriage 1
mediawiki/extensions/Cognate 1
mediawiki/extensions/Babel 1
maps/tilerator/deploy 1
maps/kartotherian/deploy 1
integration/jenkins 1
eventlogging 1
analytics/refinery/source 1
analytics/refinery 1
All-Projects 1

Selecting Repositories

This table is sorted by the amount of change. The only column that needs explanation is Selected. It shows if a test makes sense for the repository, taking into account all available data. Repositories without maintainers and with existing tests are excluded.

Repository Change Stewards Coverage Incidents Selected
mediawiki/extensions 11300
operations/puppet 7988 SRE 22
mediawiki/core 4590 Core Platform JavaScript 6
operations/mediawiki-config 4005 Release Engineering 4
integration/config 1652 Release Engineering 2
operations/software/librenms 1169 SRE
pywikibot/core 927
mediawiki/extensions/Wikibase 806 WMDE JavaScript, Ruby 4
apps/android/wikipedia 789
mediawiki/services/parsoid 700 Parsing
mediawiki/extensions/VisualEditor 692 Editing
operations/dns 653 SRE 1
VisualEditor/VisualEditor 599 Editing
mediawiki/skins 570 Reading
mediawiki/extensions/MobileFrontend 504 Reading Ruby
mediawiki/extensions/ContentTranslation 491 Language engineering
translatewiki 486
oojs/ui 469
wikimedia/fundraising/crm 457 Fundraising tech
mediawiki/extensions/BlueSpiceFoundation 414
mediawiki/extensions/CirrusSearch 357 Search Platform JavaScript
mediawiki/extensions/AbuseFilter 306 Contributors
phabricator/phabricator 302 Release Engineering
mediawiki/services/restbase 290 Core Platform
mediawiki/extensions/Flow 232 Growth
mediawiki/extensions/Echo 223 Growth JavaScript
mediawiki/vagrant 221 Release Engineering 1
mediawiki/extensions/Popups 184 Reading JavaScript
mediawiki/extensions/Translate 182 Language engineering
mediawiki/extensions/DonationInterface 180 Fundraising tech
analytics/refinery 178 Analytics 1
mediawiki/extensions/PageTriage 177 Growth 1
mediawiki/extensions/Cargo 176
mediawiki/tools/codesniffer 156
mediawiki/extensions/TimedMediaHandler 152 Reading
mediawiki/extensions/UniversalLanguageSelector 142 Language engineering
mediawiki/vendor 140
mediawiki/extensions/SocialProfile 139
analytics/refinery/source 138 Analytics 1
operations/software 137 SRE 1
mediawiki/services/restbase/deploy 136 Core Platform
operations/debs/pybal 123 SRE 2
mediawiki/extensions/CentralAuth 116 Ruby
mediawiki/tools/release 116 1
mediawiki/services/cxserver 112
mediawiki/extensions/BlueSpiceExtensions 110
mediawiki/extensions/WikimediaEvents 110 PHP
labs/private 108
operations/debs/python-kafka 104 SRE
labs/tools/heritage 96

Since some of the repositories connected to incidents are not in the top 50 Bitergia report, I've used git rev-list to sort them. Numbers are different because Bitergia excludes empty commits and bots (script, results: P7834).

Repository Change Stewards Coverage Incidents Selected
mediawiki/extensions/WikibaseQualityConstraints 910 WMDE 1
mediawiki/extensions/ORES 364 Growth JavaScript 2
wikidata/query/rdf 204 WMDE 2
mediawiki/extensions/Babel 146 Editing 1
mediawiki/services/ores/deploy 84 Growth 1
maps/kartotherian/deploy 80 1
mediawiki/extensions/PropertySuggester 67 WMDE 1
maps/tilerator/deploy 61 1
mediawiki/extensions/Cognate 47 WMDE 1
All-Projects 37 1
eventlogging 26 1
integration/jenkins 19 Release Engineering 1
mediawiki/services/eventstreams 16 1
wikidata/query/blazegraph 10 WMDE 1

Prioritize Repositories

Change column uses Bitergia numbers. Numbers in italic are from git rev-list.

Repository Change Stewards Coverage Incidents Selected
mediawiki/extensions/VisualEditor 692 Editing
mediawiki/extensions/ContentTranslation 491 Language engineering
mediawiki/extensions/AbuseFilter 306 Contributors
phabricator/phabricator 302 Release Engineering
mediawiki/extensions/Flow 232 Growth
mediawiki/extensions/Translate 182 Language engineering
mediawiki/extensions/DonationInterface 180 Fundraising tech
mediawiki/extensions/PageTriage 177 Growth 1
mediawiki/extensions/TimedMediaHandler 152 Reading
mediawiki/extensions/UniversalLanguageSelector 142 Language engineering
mediawiki/extensions/WikibaseQualityConstraints 910 WMDE 1
mediawiki/extensions/Babel 146 Editing 1
mediawiki/extensions/PropertySuggester 67 WMDE 1
mediawiki/extensions/Cognate 47 WMDE 1

The same table grouped by stewards.

Repository Change Stewards Coverage Incidents Selected
mediawiki/extensions/VisualEditor 692 Editing
mediawiki/extensions/Babel 146 Editing 1
mediawiki/extensions/ContentTranslation 491 Language engineering
mediawiki/extensions/Translate 182 Language engineering
mediawiki/extensions/UniversalLanguageSelector 142 Language engineering
mediawiki/extensions/AbuseFilter 306 Contributors
phabricator/phabricator 302 Release Engineering
mediawiki/extensions/Flow 232 Growth
mediawiki/extensions/PageTriage 177 Growth 1
mediawiki/extensions/DonationInterface 180 Fundraising tech
mediawiki/extensions/TimedMediaHandler 152 Reading
mediawiki/extensions/WikibaseQualityConstraints 910 WMDE 1
mediawiki/extensions/PropertySuggester 67 WMDE 1
mediawiki/extensions/Cognate 47 WMDE 1


  • There are some repositories that do not fit the Selenium/end-to-end testing model (eg: operations/puppet or operations/mediawiki-config) but could benefit from other testing mechanisms or deployment practices.
  • A test could prevent an outage if it runs:
    • Every time a patch is uploaded to Gerrit. That way it could find a problem during development. That is already done for repositories that have tests.
    • After deployment. That way it could find a problem that was not found during development. In ideal case, deployment would be made to a test server in production, a test would run targeting the tests server. If it fails, further deployment would be cancelled. This is not yet done.
  • Automattic runs tests targeting WordPress.com production:

We decided to implement some basic e2e test scenarios which would only run in production – both after someone deploys a change and a few times a day to cover situations where someone makes some changes to a server or something.

Next steps:

  • I will contact owners of selected repositories (see Prioritize Repositories section) and offer help in creating the first test.
  • I will add results from Incident Documentation tool to incident reports as a new Related Repositories section. The section will link to the tool and explain how it got the data. It will also ask for edits if the data is not correct.
  • I will reach out to people that created (or edited) incident reports and ask them to populate Related Repositories section. This might have mixed results. For best results, the section will already be populated with the data from Incident Documentation tool.
  • I will add Related Repositories section to the incident report template.

Incident Documentation tool improvements:

  • There are several way to link from a wiki page to a patch or task. The tool for now only supports [[gerrit:]] and [[phab:]]. Tracked as issue #6.
  • Gerrit patches and Phabricator tasks from Actionables section do not provide enough data. The entire incident report should be used. I have limited it first because I was collecting data manually (and Actionables looked like the most important part of the incident report), later because of #6. Tracked as issue #4.
  • Find Gerrit repository from task component. Tracked as issue #5.
  • A table with the number of patches from each repository would be helpful. Tracked as issue #11.
  • A report with folder/file names from a repository that are mentioned the most. Especially useful for big repositories like operations/puppet and mediawiki/core. Tracked as issue #12.

A libel story

10:08, Thursday, 22 2018 November UTC

A visit to the Biligirirangan Hills just as the monsoons were setting in led me to look up on the life of one of the local naturalists who wrote about this region, R.C. Morris. One of the little-known incidents in his life is a case of libel arising from a book review. I had not heard of such a case before but it seems that libel cases are a rising risk for anyone who publishes critical reviews. There is a nice guide to avoid trouble and there is a plea within academia to create a safe space for critical discourse.

This is a somewhat short note and if you are interested in learning more about the life of R.C. Morris - do look up the Wikipedia entry on him or this piece by Dr Subramanya. I recently added links to most of his papers in the Wikipedia entry and perhaps one that really had an impact on me was on the death of fourteen elephants from eating kodo millet - I suspect it is a case of aflatoxin poisoning! Another source to look for is the book Going Back by Morris' daughter and pioneer mountaineer Monica Jackson. I first came to know of the latter book in 2003 through the late Zafar Futehally who were family friends of the Morrises. He lent me this rather hard to find book when I had posted a note to an email group (a modified note was published in the Newsletter for Birdwatchers, 2003 43(5):66-67 - one point I did not mention and which people find rather hard to believe is that my friend Rajkumar actually got us to the top of Honnametti in a rather old Premier Padmini car!).
R C Morris at a typewriter in camp. Photo by Salim Ali.

I came across the specific libel case against Morris in a couple of newspaper archives - this one in the Straits Times, 27 April 1937, can be readily found online:

Statements  Made In Book Review.

Major Leonard Mourant Handley, author of "Hunter's Moon," a book dealing with his experiences as a big game-hunter, was at the Middlesex Sheriff's Court awarded £3,000 damages for libel against Mr. Randolph Camroux Morris. Mr. Morris did not appear and was not represented. The libel appeared in a review of "Hunter's Moon" by Mr. Morris that appeared in the journal of the Bombay Natural History Society. Mr. Valentine Holmes said Major Handley wrote the book, his first, in 1933. and it met with amazing success.

Mr. Morris, in his review, declared that it did not give the personal experiences of Major Handley. Mr. Morris wrote :"There surely should be some limit to the inaccuracies which find their way into modern books, which purport to set forth observations of interest to natural  scientists  and  shikaris.

"The recent book. 'Hunters Moon.' by Leonard Handley, is so open to criticism in this respect, that one is led to the conclusion that the author has depended upon his imagination and trusted to the credulity of the public for the purpose of producing a 'bestseller' rather than a work of sporting or scientific value."

Then followed some 38 instances of alleged Inaccuracies.

Mr. Holmes said that at one time Mr. Morris was a close friend of Major Handley, but about 1927 some friction arose between Mrs. Morris and Mrs.  Handley. In evidence. Major Handley said that, following the libel, a man who had been a close friend of Ms refused to nominate him for membership of a club The Under-Sheriff. Mr. Stanley Ruston said there was no doubt that the motive of the libel lay in the fact that Major Handley had seized some of the thunder Mr. Morris was providing for his own book.

Naturally this forced me to read the specific book which is also readily available online

The last chapter deals with the hunter's exploits in the Biligirirangans which he translates as the "blue [sic] hills of Ranga"! It is also worth examining Morris' review of the book in the Journal of the Bombay Natural History Society which is merely marked under his initials. I wonder if anyone knows of the case history and whether it was appealed or followed up. I suspect Morris may have just quietly ignored it if the court notice was ever delivered in the far away estate of his up in Attikan or Honnameti.

The review is fun to read as well...

Meanwhile, here is a view of the Honnametti rock which lies just beside the estate where Morris lived.
Honnametti rock

Memorial to Randolph Camroux Morris
Grave of Mary Macdonald, wife of Leslie Coleman, who in a way
founded the University of Agricultural Sciences. Coleman was perhaps the first
to teach the German language in Bangalore to Indian students.

Sidlu kallu or lightning-split-rock, another local landmark.

ImageFile:Karen Kwon.jpgLightmatterchem, CC BY-SA 4.0, via Wikimedia Commons.

Youngah (Karen) Kwon is a graduate student at Columbia University and a member of the American Chemical Society who recently completed our Wikipedia professional development course. With a background in physical chemistry and material science, Karen expanded Wikipedia articles on women who have made major contributions to the sciences. Read her reflections about the experience below.

Lately, I have become increasingly frustrated by the way both women and science are discredited. How can I act as a counterbalancing force, I often wondered, while working as a full-time chemistry graduate student? I wanted to contribute, for example, by going to the Wikipedia edit-a-thons on women in science, but repeatedly found myself tied up with hefty lab duties, even on weekends.

So, imagine my excitement when I received a newsletter about Wikipedia Fellows from the American Chemical Society. The Society was recruiting people to learn how to write and improve Wikipedia articles on women scientists through Wiki Education’s unique online course. Multiple friends of mine forwarded me this same opportunity, knowing I have a deep interest in the gender gap in science. To my friends and me, editing the articles on Wikipedia – the fifth most visited website in the world – to improve the visibility of women scientists and their work sounded like the perfect opportunity for me. It also didn’t hurt that all the work could be done remotely via the web; I didn’t have to leave my graduate school work behind.

Once I started Wiki Education’s Women in Science course, it didn’t take me too long to notice the shocking gender gap in Wikipedia, both in the contents and in the number of Wikipedians. Only 17% of the biographies on Wikipedia are about women¹ and less than 20% of the Wikipedia editors are women.² These numbers provided me with extra motivation as I navigated the unfamiliar territory of Wikipedia editing. The wonderful Wiki Education team facilitated the learning process, holding my hands throughout and pointing me in the right direction whenever I got lost. It was also helpful that I could always reach out to the team and other Fellows in the course through a messaging app. With everyone’s help, I was able to write two new biographies of women in science and make numerous edits.

Equipped with better Wikipedia editing knowledge and skills, I came to appreciate the Wikipedia features that I was previously unaware of: the Talk page, where all the behind-the-scenes Wikipedian discussions take place, and the View History page, which offers the history of the article’s development. It was fascinating to learn how each Wikipedia article evolved with time. More importantly, I learned about the vibrant culture of the Wikipedian community, expressing themselves using user pages and forming WikiProjects to achieve common goals. I was excited and encouraged to find Wikipedians all over the globe collaborating to share free knowledge and also to participate in that endeavor myself.

At times, however, I was disappointed that the larger societal inequalities, including the gender gap, are well-reflected on Wikipedia. I was especially troubled by two particular guidelines on Wikipedia that determine who gets to have an article: the guideline on credible sources³ and the guideline on academic notability.⁴ As Wikipedia is first and foremost an encyclopedia, it certainly should require its articles to meet these guidelines. Otherwise, people would be able to create a Wikipedia page for, let’s say, their next door neighbor. Yet, it should also be aware that our society as a whole does not value the work of female scientists the same way it does the work of male scientists. There are a number of studies that demonstrate how gender biases undermine women’s work in science. To list a few, (1) the same National Institutes of Health (NIH) grant application gets a lower score when female-sounding name is attached to it,⁵ (2) astronomy papers receive fewer citations when the first authors have female-sounding names, even when the quality of the papers were controlled to be the same,⁶ and (3) science faculty choose to hire male students as lab managers over female students, even when the application materials are the same.⁷ These gender biases in every step of science eventually create a remarkable difference between female and male scientists. As a result, there is less media coverage on the female scientists and the work they do. Subsequently, women scientists also do not win as many awards as their male counterparts, Nobel Prize being the prime example.⁸ These, in turn, lead to a fewer number of biographies of women scientists on Wikipedia, due to the lack of credible sources and proofs of notability.

While the intent of these guidelines is well-meaning, Wikipedians, more than 80% of whom are men themselves, should always be mindful of the gender biases in applying them to biographies of women. Lack of evidence does not necessarily indicate the lack of quality when it comes to evaluating women’s work. Look no further than the recent discussion surrounding the Wikipedia page on Dr. Donna Strickland, the 2018 winner of the Nobel Prize in Physics. The Wikipedia article on her was created only after she won the Nobel Prize in October, five months after the initial draft was submitted and rejected.⁹ At the time of the draft’s submission, Dr. Strickland did not have any records of winning prestigious awards, and the references on her work that the initial draft provided were not robust enough to meet the Wikipedia guidelines. Therefore, it was completely reasonable that the editor who handled the submission initially declined the article. However, even though the gender bias that prevails in the society is first to blame, Wikipedians should also be conscious of the societal gender bias in play and be wary of it when applying the guidelines for the betterment of the platform. Otherwise, there will be many more incidents like Dr. Strickland’s page in the future.

Overall, I am glad that I was chosen to be a part of the Women in Science course and worked as a Wikipedia Fellow in this tumultuous time. Learning how to edit and create Wikipedia pages and experiencing the culture of Wikipedia was such a joy, especially since all my efforts were put towards a cause that I deeply care about. Even though I pointed out the possible obstacles in closing the gender gap in Wikipedia, I still believe that those obstacles should not be an excuse to stop the effort. I will continue to edit Wikipedia so that all deserving women scientists have Wikipedia pages. At the same time, I urge everyone on this platform to ponder on how the Wikipedia guidelines could affect this battle against the gender gap going forward, as well as to push the society to better acknowledge women’s accomplishments.

  1. http://whgi.wmflabs.org/gender-by-language.html#bokeh-alltime-plot
  2. https://en.wikipedia.org/wiki/Wikipedia:Wikipedians; https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0065782; https://en.wikipedia.org/wiki/Gender_bias_on_Wikipedia
  3. https://en.wikipedia.org/wiki/Wikipedia:Biographies_of_living_persons#Reliable_sources
  4. https://en.wikipedia.org/wiki/Wikipedia:Notability_(academics)
  5. www.statnews.com/2016/07/29/women-in-science/
  6. http://www.sciencemag.org/careers/2017/05/female-authors-get-fewer-citations-astronomy
  7. http://www.pnas.org/content/109/41/16474
  8. https://www.nature.com/articles/d41586-018-06879-z
  9. https://en.wikipedia.org/wiki/User:Bradv/Strickland_incident

ImageFile:Walking Through Columbia University (5892973734).jpgAlex Proimos, CC BY 2.0, via Wikimedia Commons.

The Popups MediaWiki extension previously used HTML UI templates inflated by the mustache.js template system. This provided good readability but added an 8.1 KiB dependency* for functionality that was only used in a few places. We replaced Mustache with ES6 syntax without changing existing device support or readability and now ship 7.8 KiB less of minified uncompressed assets to desktop views where Popups was the only consumer.


Given that ES6 template literals provided similar readability** and are part of JavaScript itself, we considered this to be a favorable and sustainable alternative to Mustache templates. Additionally, although the usage of template strings require transpilation, adding support enabled other ES6 syntaxes to be used, such as let / const, arrow functions, and destructuring, all of which Extension:Popups now leverages in many areas.

We compared the sizes before and after transpiling templates and they proved favorable:

index.js (gzip) index.js ext.popups ext.popups.main ext.popups.images mediawiki.template.mustache Total
Before 10.84 KiB 32.88 KiB 96 B 52.5 KiB 3.1 KiB 8.1 KiB 65224 B
After 11.46 KiB 35.15 KiB 96 B 52.7 KiB 3.1 KiB 0.0 KiB 57193 B

Where “index.js (gzip)” is the minified gzipped size of the resources/dist/index.js Webpack build product as reported by bundlesize, “index.js” is the minified uncompressed size of the same bundle as reported by source-map-explorer and Webpack performance hints, and the remaining columns are the sum of minified uncompressed assets for each relevant module as reported by mw.loader.inspect() with the last column being a total of these inspect() modules.

The conclusions to draw from this table are that transpiling templates does minimally increase the size of the Webpack bundle but that the overhead is less than that of the mustache.js dependency so the overall effect is a size improvement. Additionally, note that the transpiled bundle now encompasses the HTML templates which source-map-explorer reports as contributing a 2.53 KiB minified uncompressed portion of the 35.15 KiB bundle. (Previously, templates were part of ext.popups.main but only via ResourceLoader aggregation; now templates are part of index.js.) Allowing for rounding errors and inlining, this brings the approximate overhead of transpilation itself to nearly zero, 35.15 KiB - 32.88 KiB - 2.53 KiB ≈ 0, which suggests transpiling as a viable solution for improving code elsewhere that must be written in modern form without compromising on compatibility or performance.

We used the Babel transpiler with babel-preset-env to translate only the necessary JavaScript from ES6 to ES5 for grade A browsers. The overhead for this functionality may be nonzero in some cases but is expected to diminish in time and always be less than the size of the mustache.js dependency. Please note that while most ES6 syntaxes are supported, the transpiler does not provide polyfills for new APIs (e.g., Array.prototype.includes()) unless configured to do so via babel-polyfill. As polyfills add more overhead and are related but independent of syntax, API changes were not considered in this refactoring.

Manual HTML escaping of template parameters was a necessary part of this change. This functionality is built into the double-curly brace syntax of mustache.js but is now performed using mw.html.escape(). These calls are a blemish on the code but appear only in the templates themselves and would be replaced transparently in a UI library with declarative rendering (such as Preact). We also anticipate that the template literal syntax would transition neatly to such a library. We don't know that Extension:Popups will ever want to use a UI library and accept these shortcomings may always exist.

*As reported by mw.loader.inspect() on March 22nd, 2018.
**The Mustache version of previews:

<div class="mwe-popups" role="tooltip" aria-hidden>
  <div class="mwe-popups-container">
    <a href="{{url}}" class="mwe-popups-discreet"></a>
    <a dir="{{languageDirection}}" lang="{{languageCode}}" class="mwe-popups-extract" href="{{url}}"></a>
      <a class="mwe-popups-settings-icon mw-ui-icon mw-ui-icon-element mw-ui-icon-popups-settings"></a>

The ES6 version of the same template explicates dependencies but must manually escape them. The HTML snippet is quite similar but a call trim() is made so that parsing the result only creates a single text Node.

 * @param {ext.popups.PreviewModel} model
 * @param {boolean} hasThumbnail
 * @return {string} HTML string.
export function renderPagePreview(
        { url, languageCode, languageDirection }, hasThumbnail
) {
        return `
                <div class='mwe-popups' role='tooltip' aria-hidden>
                        <div class='mwe-popups-container'>
                                ${hasThumbnail ? `<a href='${url}' class='mwe-popups-discreet'></a>` : ''}
                                <a dir='${languageDirection}' lang='${languageCode}' class='mwe-popups-extract' href='${url}'></a>
                                        <a class='mwe-popups-settings-icon mw-ui-icon mw-ui-icon-element mw-ui-icon-popups-settings'></a>

Students make trans history more visible on Wikipedia

18:50, Tuesday, 20 2018 November UTC

November 20 is Transgender Day of Remembrance, a day where communities and organizations in more than 20 countries worldwide raise awareness about the prejudice and threat of violence that trans people face around the world.

One way to draw awareness and reduce violence is through education and visibility. Today we’re recognizing work that student editors in our Wikipedia Student Program have done to make trans people and trans history more visible on the world’s number one source of online information: Wikipedia.

After learning how to edit Wikipedia in their course at Loyola Marymount University last year, one student improved Wikipedia’s article about the term trans woman. The student added background information about the history of its terminology, as well as contextualizing information about violence that trans women experience in the United States.

A student at Xavier University of Louisiana added names of transgender people who hold political offices to Wikipedia’s list of the first LGBT holders of political offices. And students at Rice University contributed information about physical and mental health, as well as access to care, to the article about transgender health care.

These students alone added 11,600 words to Wikipedia. Student editors can make a real impact on public knowledge through Wikipedia.

Interested in teaching with Wikipedia? Visit teach.wikiedu.org.

Image: File:Transgender Pride flag.svg, public domain, via Wikimedia Commons.

Citizens of a well functioning democracy should have an understanding of voting rights. In the United States, the 19th Amendment to the Constitution was passed in 1920 to give women the right to vote. To our young people today, it can be shocking to learn that just within the last century, more than half of the country was denied that fundamental right based only on their sex. But young people are not alone in having a limited understanding of the long struggle for voting equality. We can all learn from this complex period in United States history. The public needs access to high-quality information about the history of women’s suffrage.

That’s why, in March 2019, the National Archives Museum will launch an exhibit, Rightfully Hers: American Women and the Vote, commemorating the 100th anniversary of the 19th Amendment. Visitors will learn about the history of suffrage in the United States, basic civics, suffragists, why voting matters, the women who were disenfranchised even after the 19th Amendment, and struggles that persist today.

When people learn about the centennial of the 19th Amendment through the National Archives or other means and want to learn more, their first stop will be Wikipedia. Wikipedia has a wealth of information about many topics, but some subject areas are better developed than others. Topics related to military history, for example, contain the most detailed and up-to-date information on Wikipedia. That’s because these topics are of interest to many of the volunteers who devote their time to creating and expanding Wikipedia’s content. Other articles, like the one about the Nineteenth Amendment to the United States Constitution, don’t receive as much attention and are thus of lower quality. The only thing that’s stopping Ida B. Wells from having as high quality of an article as Robert E. Lee is finding someone interested in improving her article. You can be that person.

In collaboration with the National Archives, we’re offering a professional development course to train scholars to improve Wikipedia articles related to women’s suffrage. Become a Wiki Scholar and ensure the public has access to the highest quality information about the people, events, laws, organizations, debates, and other subjects related to the history of women’s voting rights in the United States.

Wiki Scholars will get face time with Wikipedia experts, learning how to add knowledge to Wikipedia successfully. They will join the online community of Wikipedians, utilize emerging modes of knowledge transmission, and make a broader impact with their scholarship by reaching millions. This immersive course presents opportunities for collaboration across disciplines and topic areas. You’ll build connections with like-minded scholars who are just as passionate about equitable, open knowledge as you are!

Pulling together Wikipedia experts, detailed Wikipedia training, and tips to navigate the National Archives’ extensive digital collections, this is the only skills-development course of its kind worldwide. Give voice to the millions of women who have struggled for a political voice in the century both before and after the 19th Amendment was adopted. Become a Wiki Scholar, add this history to Wikipedia, and educate the world.

Applications for this unique professional development experience are due by December 8, 2018. Accepted applicants will engage in the virtual course from January to March 2019. For course information and to submit an application, visit: http://bit.ly/NARAwiki

ImageFile:National Archives Building DC.JPGJared Kofsky, CC BY-SA 3.0, via Wikimedia Commons.

Older blog entries