Much like Asgard, Wikipedia is not a place but a people. Specifically, an extensive community of volunteers from around the world who donate their time each day to building, curating, and watching over the largest collection of knowledge ever assembled.

Unlike Asgard, the people who contribute to Wikipedia often only rarely meet outside of the internet, and so we here at the Wikimedia Foundation set out in 2013 to make it easier for users to “thank” each other for taking actions on the site.

Fast forward six years, and the feature is still live on all Wikimedia projects; two clicks are all that is required to thank a user for making any particular change. Assuming that you’ve created and logged into a Wikimedia account, you can see the “thanks” option on the “history” tab accessible from any Wikimedia page.

• • •

Earlier this year, the Wikimedia Foundation’s Research team, in collaboration with researchers from Gunn High School and the University of Toronto, set out to determine how effective the thanks feature has been.

We posed a variety of questions as part of this study:

  • How often is the thanks feature used in Wikimedia projects?
  • Are there projects in which the thanks feature is used more often?
  • Are there specific groups of editors who use the thanks feature more (or less) than others?
  • What is the impact of thanking an editor’s revision?

Here’s what we learned:

  • In the largest languages, thanks are typically sent upwards (from less experienced to more experienced editors). However,  the most experienced editors send/receive thanks less frequently relative to their total edit count as compared to all other editor groups.
  • Some projects break from trends of “thanks” usage seen in the other studied languages. Thanks on the Norwegian Wikipedia, for instance, are typically sent downwards, not upwards. Additionally, a greater percentage of editors on that Wikipedia have interacted with the thanks feature than the overall average of 5%.
  • A controlled test indicates that receiving a single thank can increase a person’s edit count by a factor of >1.5 over the next day. This increased editing effect fades within the next month, but it is still strong for the next week following a thank.

Finally, here’s some possible directions for future studies:

  • Expanding this study to include smaller languages and further examining how “thanks” usage differs across projects.
  • Examining whether the impact of receiving thanks is cumulative. It appears that receiving a thank has a strong short-term effect. Does receiving multiple thanks have a longer term effect?
  • Advocating increased usage of the Thanks feature! If it can increase engagement and foster positive interaction between editors, it should be taken full advantage of!

Swati Goel, Gunn High School
Ashton Anderson, Department of Computer Science, University of Toronto
Leila Zia, Head of Research, Wikimedia Foundation

If you’d like to learn more, all of this study’s code, raw and intermediate data, and analysis is available. You can also read the paper published in WikiWorkshop 2019: In Companion Proceedings of The Web Conference 2019 (WWW ’19).

Cloud-vps Puppetmasters Moved to VMs, thanks to Krenair

10:58, Wednesday, 25 2019 September UTC

Last week, we completed a piece of long-neglected work relating to Puppet, the tool that manages the configuration of every virtual machine in our cloud. Historically, each VM has received its configuration from a physical, production server (the 'puppetmaster'). This meant that there was a constant chatter of traffic back and forth between each VM and unrelated networks and hardware sitting in Wikimedia production. Now, the puppetmasters are located on VMs, so all of that chatter is internal to Cloud Services.

Generally, we like to think of the cloud as an isolated sandbox, a safe place for volunteering and experimentation. Any tight links between the cloud and production require extra vigilance; as we sever those links we can worry a bit less about issues (security and otherwise) bleeding back and forth between the cloud and the public wikis.

A notable thing about this move is that nearly all the work was done by a technical volunteer, @Krenair. Krenair updated the code that runs the in-cloud puppetmasters, built out the server cluster, and designed the migration flow that transferred control over from the old controllers. It was his hard work (done on top of his unrelated day job) that moved this task from long-neglected to a box with a check mark.

Quite a few Cloud Services projects are partially (or, in some cases, completely) maintained and managed by technical volunteers. Not only does this allow us to run infrastructure well beyond the capacity of our small team, it's also a clear success in the mission of the Technical Engagement team (of which Cloud Services is a part). We work to build technical capacity in the community, and when volunteers start doing our job for us, we know we've succeeded. Almost all levels of access are available to trusted volunteers, and getting permission to hack on the WMCS infrastructure is not as hard as you might think. Come and join us![4]

With #DBpedia to the (data) cleaners

04:41, Wednesday, 25 2019 September UTC
The people at DBpedia are data wranglers. What they do is make the most of the data provided to them by the Wikipedias, Wikidata and a generous sprinkling of other sources. They are data wranglers because they take what is given to them and make the data shine.

Obviously, it takes skill and resources to get the best result and obviously, some of the data gathered does not pass the smell test. The process the data wranglers use includes a verification stage as described in this paper. They have two choices for when data that should be the same is not; they either have a preference or they go with the consensus ie the result that shows most often.

For data wranglers this is a proper choice.. There is an other option for another day, these discrepancies are left for the cleaners.

With the process well described, the data openly advertised as available, the cleaners will come. First people akin to the wranglers, they have the skills to build the queries, the tools to slice and dice the data. When these tools are discovered, particularly by those who care about specific subsets, they will dive in and change things where applicable. They will seek the references, make the judgments necessary to improve what is there.

The DBpedia data wranglers are part of the Wikimedia movement and do more than build something on top of what the Wikis produced; DBpedia and the Wikimedia projects work together improving our movement's qualities. With the processing data generally available this will become even more effective.

Today, the Court of Justice of the European Union (CJEU) has issued a landmark privacy ruling regarding Europeans’ right to request search engines delist search results about themselves. We are excited that the court has considered the effect of such delistings on other fundamental rights like freedom of expression in its decision, but concerned about the increasing reliance of both companies and now courts on geographical barriers to limit access to information on the internet. Before we address the court’s decision, however, let’s start with some background.

Over the past decade, the protection of privacy online has been a focus of European regulators, and Europe has seen a number of laws passed in an effort to protect citizens’ privacy. Often, these laws will lay out a principle or right, but the exact contours of that right end up being determined later by courts. The CJEU has just issued one such clarifying decision about a process called “delisting” or “de-referencing.” Delisting, sometimes referred to as the “right to be forgotten,” is a process through which a person can request that information about themselves be removed from search engine results returned for that person’s name. But how far does this right to delist extend?

This is the exact question tackled by the CJEU in Google v. CNIL. After delisting was recognized as a right in Europe, Google put in place processes for European citizens to request certain information about them that was available online be delisted. Through this process, Google would ensure that these results were delisted in its European domains such as or, but did not apply the delisting outside of Europe. This meant that if you were searching from the United States, using, these results would still be available.

However, in 2015, the French data protection authority (“CNIL”) informed Google that delisting requests must be honored worldwide, not just in the European Union. When Google proposed an alternative solution, the CNIL appealed to France’s highest court, where the Wikimedia Foundation intervened to offer our perspective on delisting and access to knowledge. The court sent questions regarding the scope of delisting to the CJEU, and the Wikimedia Foundation again submitted observations on the matter. Now, almost a year after the arguments before the CJEU, the court has found that search engines are not required under EU law to carry out delisting requests across all versions of the search engine, only those which correspond with EU Member states. While not closing off the possibility of global delistings in certain circumstances, the court has essentially approved Google’s current processes for delisting, which uses a process called “geoblocking” to identify where a user is searching from based on their IP address.

We applaud this recognition that delisting should not extend beyond EU borders to the rest of the globe. In the Wikimedia Foundation’s brief to the court, we expressed our concern that the practice of delisting search results could harm both free expression and access to knowledge online. We are happy to see this argument acknowledged in the CJEU’s decision, which says, “The processing of personal data should be designed to serve mankind. The right to the protection of personal data is not an absolute right; it must be considered in relation to its function in society and be balanced against other fundamental rights, in accordance with the principle of proportionality.”

Despite this, there are still some troubling aspects of this decision. Primarily, we remain concerned about the inequality in access to knowledge that results from any form of delisting orders. Wikipedia is founded on a premise of providing access to knowledge for all and Wikipedias are differentiated by language, not geography. What the CJEU’s decision means is that if someone requests the delisting of an article on Spanish Wikipedia, users in Mexico will still see this page in their search results, but users in Spain will not. Because these delisting decisions are often targeted toward only a small portion of the information contained on a page, this means that entire communities will lose the ability to easily search for information solely because of where they are located.

This highlights a larger trend of internet fragmentation, a growing concern for the interconnected global community that is the Wikimedia movement. As individual countries demonstrate an increased desire to regulate the internet, often in contradictory ways, this can lead to the internet looking very different depending on where you are located.  Although the Wikimedia communities generally strive to be respectful of national laws, this will become increasingly difficult to navigate as countries place more granular requirements on online content. Volunteers come to Wikipedia to share their knowledge and learn from others, no matter where they are located.

Geoblocking, as envisioned here, is far superior to global delisting, which would essentially make it impossible to find certain information through a search engine. However, the type of geographical fragmentation it envisions will present challenges for movements that cross borders.

In the end, the internet is a global resource that connects people across the world every day to share their perspectives, creations, and knowledge. Let’s try to keep it that way.

Allison Davenport, Technology Law and Policy Fellow, Legal
Wikimedia Foundation

Our thanks go to Claire Rameix-Séguin and SCP Baraduc-Duhamel- Rameix for their representation of the Wikimedia Foundation in this matter.

Semantic MediaWiki 3.1.0 released

09:05, Tuesday, 24 2019 September UTC

September 23, 2019

Semantic MediaWiki 3.1 (SMW 3.1.0), the next feature version after 3.0 has now been released.

This new version brings many enhancements and new features such as most notably the reworked embedded query update mechanism, constraint schema handling for enhanced validating of annotations, support for attachment link tracking as well as support of sequence mapping for annotations and last but not least replication monitoring for users of the Elasticsearch data store.

See also the version release page for information on further improvements and new features. Additionally this version fixes a lot of bugs and brings stability and performance improvements. Automated software testing was again further expanded to assure software stability.

Please refer to the help pages on installing or upgrading Semantic MediaWiki to get detailed instructions on how to do this.

In science fiction, the Encyclopedia Galactica is a compendium of a galaxy’s worth of knowledge.

Wikipedia isn’t quite there yet—for one, we’ve barely left Earth. However, that doesn’t mean we aren’t trying to put together a planet’s worth of knowledge and ensure that all of its inhabitants can learn from it in their own languages.

That’s where our content translation tool comes in. The tool simplifies translating Wikipedia articles into different languages by automating many of the tedious steps inherent in manually translating Wikipedia’s articles. As of last month, more than half a million articles have been created with the help of the content translation tool since its introduction.

Why did we build this tool? It’s simple: translating Wikipedia’s knowledge into new languages can help reduce our knowledge gap. For example, English-speaking users can access more than five million articles. Bengali speakers, a language of 260 million people, have access to a mere 75,000.

The content translation tool has not been easy to construct, maintain, and grow. We’ve spent four years constructing and fine-tuning it to best fit Wikipedia’s decentralized model, ensuring that it fits into all of Wikimedia’s many individual wikis which have grown and developed their own custom infrastructure to suit their local contexts.

More recently, the Wikimedia Foundation has worked to modernize the content translation tool with a rich-text interface and improve its output by utilizing artificial intelligence. The tool now provides users with an initial machine translation for them to improve prior to publishing, and incorporates safeguards which help ensure that all untouched machine translations are reviewed. We’ve partnered with Google to ensure that these automatically translated items are as high-quality as possible, and we’ve ensured that none of our users’ data is being passed to Google in the process.

We’re liking what we’re seeing from the newly improved content translation tool. In the last year, it has been used to translate nearly 150,000 articles, an over twenty percent increase in the year-over-year number. Moreover, our data shows that translated articles are less likely to be deleted than articles created from scratch.

What’s next for the tool? As part of the Wikimedia Foundation’s medium-term plan, released earlier this year, the Foundation’s language team will be focusing on two key areas: ensuring that the full suite of translation tools can be used by as many different language Wikimedia wikis as possible, and in expanding the kinds of contributions that the content translation tool supports. They’ll take on the former before tackling the latter, starting with the Malayalam, Bengali, Tagalog, Javanese, and Mongolian languages.

Pau Giner, Lead UX Designer, Product Design
Wikimedia Foundation

For more information about our content translation tool’s history and future, please see an expanded blog post on Wikimedia Space, the new site built for news, questions, and conversations within the Wikimedia movement.

weeklyOSM 478

11:20, Monday, 23 2019 September UTC



The still incomplete map “The Saints of Europe” 1 | data © openstreetMap contributors


  • Martijn Van Exel invited us, on Twitter, to view the MapRoulette presentation he gave at at SotM US.
  • Dan Stowell tweeted that OSM-UK’s quarterly project has reached 100,000 solar panels. In his blog he summarises where both rooftop solar panels and larger solar farms have been mapped in the UK.
  • Joseph Eisenberg has started a new proposal for aerodrome= to allow a better classification of airports. With the possible values international, commercial, general_aviation, private and airstrip he plans to replace aeroway=airstrip and aerodrome:type=*.
  • Antoine Jaury suggests reusable_packaging:offer= and reusable_packaging:accept= for shops accepting or proposing customer-owned reusable containers to reduce packaging waste. He asks for comments on his proposal.
  • Francesco Ansanelli proposes highway=tourist_bus_stop for a bus stop which is reserved for tourist buses and drafted a proposal.
  • Simon Poole reported (automatic translation) that after consultation with the AGIS, he has included their aerial photographs from March 2019 in the SOSM Mapproxy configuration.
  • hüggelzwerg recommends the addition of public_transport=stop_area_group into the public transport tagging scheme PTv2 and summarises (automatic translation) the reasons and advantages in his user diary. A public_transport=stop_area_group relation would include a number of public_transport=stop_area members and would link them together at large public transport hubs.
  • Microsoft’s dataset with approximatley 125 million building footprints in the US is currently available in the RapiD editor as an experimental feature. If you zoom into the USA you can see the suggested buildings and roads.
    More background is available at the website for RapiD and the GitHub page for the building footprint dataset.
  • Voting for Joseph Eisenberg’s proposal for campsite properties is underway and ends on 23 September 2019.
  • Roland Olbricht is moderating a session at State of the Map on tagging governance. On the OSM-talk mailing list he asked for people’s views and feedback about which issues are important.


  • On Talk-AT there was a discussion (automatic translation) about a Golem article that discussed how Lyft automatically tries to detect errors in OSM via algorithms (which we covered last week). However, the article was not completely clear, because there is only automatic recognition of the errors, rather than editing them. For corrections, MapRoulette challenges are used.
  • Sergey Golubev is trying (automatic translation) to find out in which cities of Russia the population is growing and in which it is not.
  • Valery Trubin continues a series of interviews with Russian mappers. This time he talked to two newcomers (Ivan, aka BANO.notIT, and Victor Vyalichkin), who came to OSM earlier this year. Ilya Zverev draws attention (automatic translation) to the unexpected observations of these users.

OpenStreetMap Foundation

  • Preliminary results of the OSM community survey have been released. Five of the OSMF Board members reflect on their views of the results in the OSMF Blog post.


  • Betaslb reported (automatic translation) on her user’s diary about a news item published by the Azorean local newspaper Diário Insular that the EuYoutH_OSM is in Heidelberg, Germany in a week to train in OSM and to attend SotM 2019.
  • Videos of presentations at the 2019 State of the Map US are now available on the programme page of their website.

Humanitarian OSM


  • The still incomplete map The Saints of Europe shows towns and cities of Europe that have been named after saints.


  • There seem to be issues with CC-BY licensed data sources. OSM insists in having a waiver or explicit permission for using such licensed data. Graeme Fitzpatrick asked whether he can use Australian QTopo maps and learnt that while OSM requires permission, the data owner, the Queensland Department of Natural Resources, Mines and Energy, thinks CC-BY is sufficient for using the data in OSM – a situation that does not seem to be unique. Posts from Andrew Harvey and Simon Poole from OSM’s License Working Group provide some more background on this topic.
  • Jez Nicholson updated the Copyright Infringement section on the United Kingdom Tagging Guidelines and asks others for their input as the Ordnance Survey dominates the numbers of mentions.


  • Poorly laid out paths are a typical problem in the landscaping of public spaces. As a result, unplanned paths appear on lawns and flowerbeds. The service Ant road planner will help to avoid this. It is based on OSM data.


  • The Russian public organisation Greenhouse of social technologies, which recently released a plugin for WordPress shMapper (we wrote about it in issue 462), published (automatic translation) an article on Habr about how they developed this application. They also made a short video (ru) showing how to use this plugin.
  • There is another way (automatic translation) to bring together Strava Global Heatmap and OpenStreetMap. However, as Simon Poole reminds us: we do not currently have permission to use Strava data for OSM.
  • The developer of the service “Sight Safari“, an app which can build interesting sightseeing routes in unfamiliar cities, published (automatic translation) an article on Habr, where they explained how they developed this application.


  • Tobias Zwick released StreetComplete v14.0. Among other enhancements and new quest types, the real highlight is that it now supports the splitting of ways.

Did you know …

OSM in the media

  • Russian news agency TASS started a special project Mercator: It’s a flat, flat world!, which tells how these well-known maps were created.

Other “geo” things

  • Hollewegs are “roads or tracks that are significantly lower than the land on either side”. Dirk Kloosterboer used OpenStreetMap and elevation data, to find that streets named holloway may be found in hilly areas, but not very often in high mountains.
  • One billion smartphone owners can now use the European satellite navigation system, Galileo, the operator announced in a press release. The system, which suffered a complete one week failure in July 2019, is expected to be fully operational this year and completed, with 24 operational and 6 active spare satellites, in 2021 with the start of the new Ariane 6 rocket, which might cause another delay as the first Ariane 6 test flight is scheduled for 2020.
  • John Wyatt Greenlee created a map to answer the question of where people paid their rent in eels in England, during the 10th to 17th centuries, as part of the English Eel-Rents project.
  • The INTERGEO, a trade fair and conference for geodesy, geoinformation and land management, will take place 17 to 19 September in Stuttgart, Germany for the 25th time. More information about the event can be found in the INTERGEO report 2019.
  • Adam Van Etten described how In-Q-Tel are extracting roads topology from satellite imagery using computer vision.
  • The tourist information office of the city of Oldenburg wants to switch (automatic translation) from Google Maps to an “own database” as a result of Google’s step to charge the organisation for its services from July 2018.
  • An article on summarises the most important information about Trump’s Dorian map. It includes an interview with Mark Monmonier, author of How to Lie With Maps (already published in its third edition).
  • The worldwide Friday for Future movement, triggered by the strike of 16-year-old activist Greta Thunberg, was supported by many people around the world on 20 September in the form of peaceful demonstrations. In Germany alone, over 1.4 million people gathered in many cities to demonstrate for action on climate change. An interactive map (de) of the planned demonstrations in Germany, based on OSM, was produced a few days ago.

Upcoming Events

Where What When Country
Heidelberg Erasmus+ EuYoutH OSM Meeting 2019-09-18-2019-09-23 germany
Heidelberg State of the Map 2019 [1] 2019-09-21-2019-09-23 germany
Bremen Bremer Mappertreffen 2019-09-23 germany
Nottingham Nottingham pub meetup 2019-09-24 united kingdom
Mannheim Mannheimer Mapathons 2019-09-25 germany
Singen Stammtisch Bodensee 2019-09-26 germany
Lübeck Lübecker Mappertreffen 2019-09-26 germany
Riga Latvian OSM Meetup 2019-09-26 Latvia
Düsseldorf Stammtisch 2019-09-27 germany
Dortmund Mappertreffen 2019-09-27 germany
Nagoya 第2回まちマップ道場-伊勢湾台風被災地を訪ねる- 2019-09-28 japan
Strasbourg Rencontre périodique de Strasbourg 2019-09-28 france
Kameoka 京都!街歩き!マッピングパーティ:第12回 穴太寺(あなおじ) 2019-09-29 japan
Mainz Stammtisch 2019-09-30 germany
Rome Incontro mensile 2019-09-30 italy
London London Missing Maps Mapathon 2019-10-01 united kingdom
Stuttgart Stuttgarter Stammtisch 2019-10-02 germany
Brno State of the Map CZ+SK 2019 2019-10-02-2019-10-03 czech republic
Bochum Mappertreffen 2019-10-03 germany
Nantes Réunion mensuelle 2019-10-03 france
Fujisawa 湘南マッピングパーティ 2019-10-05 japan
Ballaghadereen Map Ballaghadereen 2019-10-05 ireland
Budapest OSM Hungary Meetup reboot 2019-10-07 hungary
Lyon Rencontre mensuelle pour tous 2019-10-08 france
Munich Münchner Stammtisch 2019-10-08 germany
Salt Lake City SLC Mappy Hour 2019-10-08 united states
Hamburg Hamburger Mappertreffen 2019-10-08 germany
Cologne Köln Stammtisch 2019-10-09 germany
Arlon Espace public numérique d’Arlon – Formation Consulter OpenStreetMap 2019-10-09 belgium
San José Civic Hack & Map Night 2019-10-10 united states
Berlin 136. Berlin-Brandenburg Stammtisch 2019-10-11 germany
Berlin Berlin Hack Weekend Oktober 2019 2019-10-12-2019-10-13 germany
Greater Manchester Joy Diversion 8 2019-10-12 united kingdom
Santa Fe State of the Map Argentina 2019 2019-10-12 argentina
Prizren State of the Map Southeast Europe 2019-10-25-2019-10-27 kosovo
Dhaka State of the Map Asia 2019 2019-11-01-2019-11-02 bangladesh
Wellington FOSS4G SotM Oceania 2019 2019-11-12-2019-11-15 new zealand
Grand-Bassam State of the Map Africa 2019 2019-11-22-2019-11-24 ivory coast

Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please check your event in our public calendar preview and correct it, where appropriate.

This weeklyOSM was produced by Polyglot, Rogehm, SK53, Silka123, SunCobalt, TheSwavu, YoViajo, derFred, geologist, muramototomoya, ᚛ᚏᚒᚐᚔᚏᚔᚋ᚜ 🏳️‍🌈.

Free Medical Images Collection

06:54, Monday, 23 2019 September UTC

These days I am illustrating Wikipedia articles with images related to medicine. Sometimes, the existing image(s) on an article are too old, so I would want to add a newer, higher resolution image by replacing the old one. Some articles do not have images at all. A major problem for me was in finding the right image for the given article. Wikipedia accepts images/media that are CC-BY-SA or lower, so I had to go through the existing image repositories to find out those with the right license for Wikipedia. I decided to tabulate some of the image repositories that have medical content, along with the license they are shared under. I hope this would be useful not only for me, but for everyone else who are looking for free images related to medicine. Please note that this is not a comprehensive list, I have only included the repositories that I know of.

Source License Notes
Creative Commons search CC-varied Datasets from these collections are found on CC-search.
All Free Photos Free photos of all kinds
Burst Images Public Domain Free photos of all kinds
Medpix All Rights Reserved Medpix is a repository of medical cases run by the NIH, USA. The images are free for personal use, but need permission from the authors for any use other than personal. Contact the authors directly for permission.
Radiopedia CC-BY-NC-SA Collection of radiology images. Copyright rests with the author of the image.
Flickr Commons CC varied Media from Flickr Commons also shows up on CC search.
British Library Images from British Library, UK
ASH Image bank Fair Use A collection of hematology images. Login needed, free account creation.
Centre for Disease Control and Prevention Mostly Public Domain Images related to healthcare, diseases, health promotion etc.
Brain Biodiversity Bank All rights reserved Altas of human brain. Radiology images and 3D movies available. Free re-use permitted, contact the authors for re-use permissions.
US National Library of Medicine Fair Use Contain images related to. medicine. Obtain permission from the website for re-use. Permission shall be granted on a case-by-case basis. Some images are CC.
National Eye Institute CC varied Some images are CC-BY. Results can be found from CC-search.
Duke University Digital Repository CC-BY-NC-SA Contains advertisements and handouts of medical products
Visible Body All Rights Reserved Some content is available without subscription. Contains 3D anatomy resources.
3D Embryo Atlas CC-BY-NC-ND Media related to embryology
Bio Atlas Use with attribution Contains high resolution histology and histopathology images of humans and animals
CAOM Histopathology slides, pages are slow to load. From Poznan
Brain-Maps Histo- and gross images of brains of humans and animals
Cancer Digital Archive Image repository of oncopathology
Aurora M-scope Most images in Public Domain Contains histopathology slides. Needs a special software for opening the files in high resolution.
Heidelberg University All Rights Reserved Contains educational images related to pathology
Pathobin A platform for uploading pathology slides. Copyright lies with the uploader.
National Institute of Health, USA Public Domain Images are on Flickr, hence available using CC-search.
Europeana CC varied Contains media related to history of medicine and natural history
Fossil Forum Collection of fossils. Individual uploaders hold the copyright. Fair use permitted.
Medillsb Varied Website of the association of medical illustrators. Contact individual authors for re-use.
Medical Graphics DE CC-BY-ND Illustrations related to medicine.
LifeScienceDB CC-BY-SA Create your own photos and videos of human anatomy
Neuroanatomy CC-BY-SA-NC Neuroanatomy media. From University of British Columbia. Contains 360 degree views of the brain, MRIs etc.
Dollar Street CC-BY-SA Collection of everyday objects, people, families showing socioeconomic status of people around the world.
Cell Image Library CC-varied Mostly public domain images of cells.
Heal Collection CC varied Images for medical education.
Stanford Medical Library CC varied Images related to medicine from Stanford.
National Cancer Institute CC-varied Contains media related to cancer.
Histology Atlas CC-BY-NC-ND Histology images
Audilab CC-BY-NC-SA 3D images related to anatomy
Sketchfab CC-BY-NC-SA Illustrations related to human body
Open Access Biomedical Search Engine Can perform advanced search by License type
Science Images of Australia CC-BY Natural history, medicine images
Library of Congress collection Varied History of medicine
The noun project CC-BY Contains icons for general use and those related to medicine
Somersault Images CC-BY-SA-NC Illustrations related to medicine
Smart Servier CC-BY Illustrations related to medicine
Ghorayeb Images CC-BY-NC=ND Collection of images from ENT
Ecure Me All Rights Reserved Illustrations and photos of diseases
University of California All Rights Reserved Images of clinical signs and symptoms
University of Iowa All Rights Reserved Images of dermatological conditions
Internet Pathology Laboratory All Rights Reserved Images related to pathology
Atlas of endoscopy All Rights Reserved Images related to endoscopy/gastroenerology

Tech News issue #39, 2019 (September 23, 2019)

00:00, Monday, 23 2019 September UTC
TriangleArrow-Left.svgprevious 2019, week 39 (Monday 23 September 2019) nextTriangleArrow-Right.svg
Other languages:
Deutsch • ‎English • ‎Nederlands • ‎français • ‎polski • ‎português do Brasil • ‎suomi • ‎svenska • ‎čeština • ‎Ελληνικά • ‎македонски • ‎русский • ‎українська • ‎עברית • ‎العربية • ‎فارسی • ‎हिन्दी • ‎ગુજરાતી • ‎中文 • ‎日本語 • ‎ꯃꯤꯇꯩ ꯂꯣꯟ
When Wikidata was created, it was created with a purpose. It replaced the Wikipedia based interwiki links, it did a better job and, it still does the best job at that. Since then the data has been expanded enormously, no longer can Wikidata be defined by its links to Wikipedia as it is now only a subset.

There are many ongoing efforts to extract information from the Wikipedias. The best organised project is DBpedia, it continuously improves it algorithms to get more and higher grade data and it republishes the data in a format that is both flexible and scalable. Information is also extracted from the Wikipedias by the Wikidata community. Plenty of tools like petscan and the awarder and plenty of people working on single items one at a time.

Statistically on the scale of a Wikidata, individual efforts make little or no impression but in the subsets the effects may be massive. It is for instance Siobhan working on New Zealand butterflies and other critters. Siobhan writes Wikipedia articles as well strengthening the ties that bind Wikidata to Wikipedia. Her efforts have been noticed and Wikidata is becoming increasingly relevant to and used by entomologists.

There are many data sets, because of its wiki links every Wikipedia is one as well. The notion that one is bigger or better does not really matter. It is all in the interoperability, it is all in the usability of the data. Wikipedia wiki links are highly functional and not interoperable at all. More and more Wikipedias accept that cooperation will get them better quality information for its readers. Once the biggest accept data as a resource to curate the shared data the act of comparing data sets is improved quality for all.

Performance perception: the effect of late-loading banners

01:42, Saturday, 21 2019 September UTC

Unlike most websites, Wikipedia and its sister projects are ad-free. This is actually one of the reasons why our performance is so good. We don't have to deal with slow and invasive third-parties.

However, while we don't have ads, we do display announcement and fundraising banners frequently at the top of wikis. Here's an example:

Those are driven by JS and as a result always appear after the initial page render. Worse, they push down content when they appear. This is a long-standing technical debt issue that we hope to tackle one day. One of the most obvious issues we deal with that may impact performance perception. How big is the impact? With our performance perception micro survey asking our visitors about page performance, we can finally find out.

Perception distribution

We can look at the distribution (Y axis) of positive and negative survey answers based on when the banner was injected into the DOM, in milliseconds (X axis).

We see the obvious pattern that positive answers to the micro-survey question (did this page load fast enough?) are more likely if the banner appeared quickly. However, by looking at the data globally like this, we can't separate the banner's slowness from the page's. After all, if your internet connection and device are slow, both the page itself and the banner will be slow, and users might be responding based on the page, ignoring the banner. This distribution might be near identical to the same being done for page load time, regardless of a banner being present or not.

Banner vs no banner

A simple way to look at this problem is to check the ratio of micro-survey responses for pageviews where a banner was present vs pageviews where there was no banner. Banner campaigns tend to run for specific periods, targeting certain geographies, meaning that a lot of visits don't have a banner displayed at all. Both samples sizes should be enough to draw conclusions.

Corpus User satisfaction ratio Sample size
No banner or answered before banner 86.64% 1,111,542
Banner and answered after banner 87.8% 311,332

For the banner case, we didn't collect whether the banner was in the user's viewport (i.e. was it seen?).

What is going on? It would seem that users are slightly more or equally satisfied of the page performance when a banner is injected. It would suggest that our late-loading banners aren't affecting page performance perception. This sounds too good to be true. We're probably looking at data too globally, including all outliers. One of our team's best practices when findings that are to good to be true appear is to keep digging to try to disprove it. Let's zoom in on more specific data.

Slow vs fast banners

Let's look at "fast" pageloads, where loadEventEnd is under a second. That event happens when the whole page has fully loaded, including all the images.

Corpus User satisfaction ratio Sample size
Banner injected into DOM before loadEventEnd 92.66% 4,761
Banner injected into DOM less than 500ms after loadEventEnd 92.03% 67,588
Banner injected into DOM between 2 and 5 seconds after loadEventEnd 85.33% 859

We can see that the effect on user performance satisfaction starts being quite dramatic as soon as the banner is really late compared to the speed of the main page load.

What if the main pageload is slow? Are users more tolerant of a banner that takes 2-5 seconds to appear? Let's look at "slow" pageloads, where loadEventEnd is between 5 and 10 seconds:

Corpus User satisfaction ratio Sample size
Banner injected into DOM before loadEventEnd 79.13% 3019
Banner injected into DOM less than 500ms after loadEventEnd 78.45% 2488
Banner injected into DOM between 2 and 5 seconds after loadEventEnd 76.17% 2480

While there is a loss of satisfaction, it's not as dramatic as for fast pages. This makes sense, as users experiencing slow page loads probably have a higher tolerance to slowness in general.

Slicing it further

We've established that even for a really slow pageload, the impact of a slow late-loading banner is already visible at 2-5 seconds. If it happens within 500ms after loadEventEnd, the impact isn't that big (less than 1% satisfaction drop). Let's look at the timespan after loadEventEnd in more detail for fast pageloads (< 1s loadEventEnd) in order to find out where things start to really take a turn for the worse.

Here's the user page performance satisfaction ratio, based on how long after loadEventEnd the banner was injected into the DOM:


The reason why the issues caused by late-loading banners when looking at data globally is probably because most of the time banners load fast. But when they happen after loadEventEnd, users start to be quite unforgiving, with the performance satisfaction ratio dropping rapidly. For users with an otherwise fast experience, we can't afford for banners to be injected more than 500ms after loadEvendEnd if we want to maintain a 90% satisfaction ratio.

Of course, we would like to change our architecture so that banners are rendered server-side, which would get rid of the issue entirely,. But in the meantime loadEventEnd + 500ms seems like a good performance budget we should aim for if we want to mitigate the user impact of our current architectural limitations.

How the Wikimedia Foundation is making efforts to go green

14:00, Thursday, 19 2019 September UTC

We at the Wikimedia Foundation strive to ensure that our work and mission support a sustainable world.

Today, we are releasing a sustainability assessment that chronicles the total carbon footprint of the Foundation’s work and commits us to reducing our emissions.

This plan, over two years in the making, will commit us to becoming more environmentally sustainable and conscious of our environmental impact while we work to make free knowledge available to every human being. You can read the full document on Wikimedia Commons, which holds much of the media used on Wikipedia, and find a short summary of the report below.

• • •

Late last year, the Wikimedia Foundation worked with the Strategic Sustainability Group to research our current practices with regards to the environment, help establish baselines, and advise on a possible roadmap forward.

The consultation included the environmental impact of all direct spending of the Wikimedia Foundation, including its internet services, office, distributed staff and contractors, travel, and major events. The consultation did not include the impact of indirect spending, such as grant-funded activity, cash investments, or endowment investments; nor did it look at the totality of the Wikimedia movement’s emissions.

The sustainability report we’ve published today details that the Foundation caused approximately 2.1 kilotonnes of CO₂-equivalent impact in the calendar year 2018:

  • 56% was due to electricity usage (data centers and other facilities).
  • 26% was due to global air travel.
  • 11% was due to hotel stays.
  • 7% other.

This impact is approximately the same as the emissions of 251 average US homes’ energy use for one year, according to the US Environmental Protection Agency’s greenhouse gas equivalencies calculator.

What are some of the environmental strengths of the Foundation?

  • We are making best efforts to ensure our servers run on sustainable energy and using simple technical architecture when effective, as aligned with our privacy and non-commercialization values. This means that Wikipedia and the Wikimedia projects are sustained with approximately one-thousandth the amount of servers of comparable volume websites.
  • Green building features are already in place through property management.
  • Wide remote workforce policies and practices, with an office location in an urban area with lots of public transit options; moreover, our many remote workers had half the carbon impact of those in San Francisco, numbers which were very low relative to comparable organizations.
  • Our paperless policy and wide use of cloud-based workflows reduces impact
  • Telecommuting is already a core part of the internal culture.
  • There is strong enthusiasm across the organization, including in senior leadership, for exploring sustainability impacts and opportunities.

What are some of the environmental opportunities for Foundation?

  • Explore hosting more virtual meetings and events, rather than in-person.
  • Explore carbon offsets as a way to reduce overall carbon footprint in our data centers.
  • Engage stakeholders across all channels: employees are eager to learn more and get involved.  How can we coordinate action with the wider communities and volunteers?

What are the next steps for the Foundation?

  • Create a sustainability policy statement, definition, and context of what sustainability means.
  • Develop a sustainability framework with roles and responsibilities, including operations, events, and technical infrastructure.
  • Identify and track green key performance indicators. Create a reporting template and schedule for aggregating, validating and communication of results.

For more information about the Wikimedia Foundation’s sustainability efforts, please see our presentation of the report at Wikimania 2019, the annual conference which brings together the community of volunteers who make Wikipedia and the Wikimedia projects possible. You can also ask us a question on the discussion page at

Lydia Hamilton, Director of Operations, Operations 
Deb Tankersley, Program Manager, Technology 
Wikimedia Foundation

The Wikimedia Foundation is excited to announce the appointment of Grant Ingersoll as Chief Technology Officer (CTO). Grant brings two decades of experience in open source software development and natural language processing engineering to the Foundation. He will join the Foundation on 23 September.

The Wikimedia Foundation is the nonprofit organization that operates Wikipedia and the other Wikimedia free knowledge projects. Together, Wikipedia and the Wikimedia projects are visited by around 1.5 billion unique devices every month. The Wikimedia Foundation is driven by its vision of building a world in which every single person can freely share in the sum of all knowledge.

“Grant joins us with a passion for invention, innovation, and a lengthy career in open source that aligns with our values,” said Katherine Maher, Executive Director of the Wikimedia Foundation. “His background and expertise will help Wikimedia advance our investment in the platforms that power the Wikimedia projects and ensure our preparation for the future.”

As Chief Technology Officer, Grant will lead the development and execution of the technical platform strategy for the Wikimedia Foundation and Wikimedia projects. He will lead a diverse and global department of researchers, engineers, security and machine learning experts, analysts, and more to evolve and scale Wikimedia’s platforms and infrastructure.

“I’m incredibly excited to join the Wikimedia Foundation as CTO at a time when providing everyone with open, free, trusted knowledge has never been more important,” Grant said. “Having led teams toward delivering solutions that help people access and use information, I can think of no better place to amplify that mission working with a talented group. I look forward to the technical opportunities and challenges that come with the role, as well as the chance to be a part of a dedicated and vibrant global community.”

Prior to joining Wikimedia, Grant was the CTO and co-founder of Lucidworks, a company delivering AI-powered search solutions for organizations built on open-source software, Apache Lucene and Apache Solr. Grant is still a contributing member of the broader Lucene community of developers. He is one of the original contributors to Lucene and Solr, a co-founder of the Apache Mahout machine learning project, and a long standing member of the Apache Software Foundation. Grant also worked at the Center for Natural Language Processing at Syracuse University in natural language processing and information retrieval.

He earned his bachelor of science from Amherst College in math and computer science and his master’s in computer science from Syracuse University. Grant is also the lead author of Taming Text from Manning Publications. He is based in North Carolina.

Wikipedia's JavaScript initialisation on a budget

23:00, Tuesday, 17 2019 September UTC

This week saw the conclusion of a project that I’ve been shepherding on and off since September of last year. The goal was for the initialisation of our asynchronous JavaScript pipeline (at the time, 36 kilobytes in size) to fit within a budget of 28 KB.

Chart showing a decline in Startup manifest size from 36.2 kilobytes in 2018 to just under 28 KB in September 2019

The above graph shows the transfer size over time. Sizes are after compression (i.e. the net bandwidth cost as perceived from a browser).

In total, the year-long effort is saving 4.3 terabytes a day of data bandwidth for our users’ page views.

How we did it

The startup manifest is a difficult payload to optimise. The vast majority of its code isn’t functional logic that can be optimised by traditional means. Rather, it is almost entirely made of pure data. The data is auto-generated by ResourceLoader and represents the registry of module bundles. (ResourceLoader is the delivery system Wikipedia uses for its JavaScript, CSS, interface text.)

This registry contains the metadata for all front-end features deployed on Wikipedia. It enumerates their name, currently deployed version, and their dependency relationships to other such bundles of loadable code.

I started by identifying code that was never used in practice (task #202154). This included picking up unfinished or forgotten software deprecations, and removing unused compatibility code for browsers that no longer passed our Grade A feature-test. I also wrote a document about Page load performance. This document serves as reference material, enabling developers to understand the impact of various types of changes on one or more stages of the page load process.

Fewer modules

Next was collaborating with the engineering teams here at Wikimedia Foundation and at Wikimedia Deutschland, to identify features that were using more modules than is necessary. For example, by bundling together parts of the same feature that are generally always downloaded together. Thus leading to fewer entry points to have metadata for in the ResourceLoader registry.

Some highlights:

  • Editing product team (WMF):
    The WikiEditor extension has 11 fewer modules now. Another 31 modules were removed in UploadWizard.
  • Language product team (WMF):
    Combined 24 modules of the ContentTranslation software.
  • Reading product team (WMF):
    Combined 25 modules in MobileFrontend.
  • Community Wishlist team (WMDE):
    Removed 20 modules from the RevisionSlider and TwoColConflict features.

Last but not least, there was the Wikidata client for Wikipedia. This was an epic journey of its own (task #203696). This feature originally had a whopping 248 distinct modules registered on Wikipedia page views. The magnificent efforts of Amir Sarabadani removed over 200 modules, bringing it down to 42 today.

The bar chart above shows small improvements throughout the year, all moving us closer to the goal. Two major drops stand out in particular. One is around two-thirds of the way, in the first week of August. This is when the aforementioned Wikidata improvement was deployed. The second drop is toward the end of the chart and happened this week – more about that below.

Less metadata

This week’s improvement was achieved by two holistic changes that organised the data in a smarter way overall.

First – The EventLogging extension previously shipped its schema metadata as part the startup manifest. Roan Kattouw (@Catrope) refactored this mechanism to instead bundle the schema metadata together with the JavaScript code of the EventLogging client. This means the startup footprint of EventLogging was reduced by over 90%. That’s 2KB less metadata in the critical path! It also means that going forward, the startup cost for EventLogging no longer grows with each new event instrumentation. This clever bundling is powered by ResourceLoader’s new Package files feature. This feature was expedited in February 2019 in part because of its potential to reduce the number of modules in our registry. Package Files make it super easy to combine generated data with JavaScript code in a single module bundle.

Second – We shrunk the average size for each entry in the registry overall (task #229245). The startup manifest contains two pieces of data for each module: Its name, and its version ID. This version ID previously required 7 bytes of data. After thinking through the Birthday mathematics problem in context of ResourceLoader, we decided that the probability spectrum for our version IDs can be safely reduced from 78 billion down to “only” 60 million. For more details see the code comments, but in summary it means we’re saving 2 bytes for each of the 1100 modules still in the registry. Thus reducing the payload by another 2-3 KB.

Below is a close-up for the last few days (this is from synthetic monitoring, plotting the decompressed size):

Line graph showing a sudden drop in Startup JS size from 55.6KB to 52.8KB

The change was detected in ResourceLoader’s synthetic monitoring. The above is captured from the Startup manifest size dashboard on our public Grafana instance, showing a 2.8KB decrease in the uncompressed data stream.

With this week’s deployment, we’ve completed the goal of shrinking the startup manifest to under 28 KB. This cross-departmental and cross-organisational project reduced the startup manifest by 9 KB overall (net bandwidth, after compression); From 36.2 kilobytes one year ago, down to 27.2 KB today.

We have around 363,000 page views a minute in total on Wikipedia and sister projects. That’s 21.8M an hour, or 523 million every day (User pageview stats). This week’s deployment saves around 1.4 terabytes a day. In total, the year-long effort is saving 4.3 terabytes a day of bandwidth on our users’ page views.

What’s next

Percentage of bundle metadata size, by component. 26% is for MediaWiki core's bundles, 12% for ContentTranslation bundles, 7% for VisualEditor, 5% for Wikidata.

It’s great to celebrate that Wikipedia’s startup payload now neatly fits into the target budget of 28 KB – chosen as the lowest multiple of 14KB we can fit within subsequent bursts of Internet packets to a web browser.

The challenge going forward will be to keep us there. Over the past year I’ve kept a very close eye (spreadsheet) on the startup manifest — to verify our progress, and to identify potential regressions. I’ve since automated this laborious process through a public Grafana dashboard.

We still have many more opportunities on that dashboard to improve bundling of our features, and (for Wikimedia’s Performance Team) to make it even easier to implement such bundling.

– Timo Tijhof

Further reading:

Tech News issue #38, 2019 (September 16, 2019)

00:00, Monday, 16 2019 September UTC
TriangleArrow-Left.svgprevious 2019, week 38 (Monday 16 September 2019) nextTriangleArrow-Right.svg
Other languages:
Deutsch • ‎English • ‎Tiếng Việt • ‎français • ‎magyar • ‎polski • ‎português do Brasil • ‎suomi • ‎svenska • ‎čeština • ‎Ελληνικά • ‎русский • ‎українська • ‎עברית • ‎العربية • ‎فارسی • ‎हिन्दी • ‎中文 • ‎日本語 • ‎ꯃꯤꯇꯩ ꯂꯣꯟ

weeklyOSM 477

14:27, Sunday, 15 2019 September UTC


lead picture

Development Seed announced the launch of, a tool to coordinate mapping and build communities. 1 | © kamicut, Development Seed

About us

  • In our last issue we reported on the vacancies on the OSMF Board of Directors. A late update is that four posts will be available for election, not three as we reported. Heather Leson is retiring from the board before the end of her term.


  • OpenStreetMap Guinea announces in a tweet the collection of data on illegal dumping points in the Commune of Ratoma in Conakry, which are visualised on a thematic uMap map.
  • Ruben suggests extending opening_hours by an additional value for amenities which require an appointment.
  • Jeremiah Rose suggests footway=indoor for marking indoor routes within a building mapped with Simple Indoor Tagging but has not so far provided the distinction to the existing highway=corridor.
  • Andrew, from Apple’s Maps team, has uploaded potential road and routing related issues in Angola, Egypt, Kenya, Rwanda, South Africa, Tunisia and Uganda to MapRoulette.
  • SK53 tried to unify the tagging for showgrounds and posted the various inconsistent approaches he found in the UK.
  • A German-specific discussion concerns a dispute as to how many lanes a road has if there are no markings. A article in the Zeit Online provides (de) (automatic translation) arguments for mappers who think the number of lanes is determined by the physical width.
  • Lyft, an on-demand transportation company based in San Francisco, USA, reports about its efforts to detect, fix and report map errors in OSM.
  • Miami University students and geographic information systems (GIS) professionals have started a “Bahamas Mapathon” to help the Bahamas after the devastating Hurricane Dorian.


  • Ireland’s OSM community has submitted an application to become a local OSMF chapter for mappers in both parts of the Irish Island, the Republic of Ireland and Northern Ireland. All, except one, welcome the idea of a local chapter for mappers in the two countries.


  • Brian M. Sperlongano would like to import the boundaries of the Census-designated places (CDP) in Hawaii. However, there are uncertainties about tagging, as CDPs do not meet the criteria for boundary=administrative.
  • At SotM US 2019, Facebook demonstrated their computer vision software to detect buildings suitable for import into OSM with RapiD.

OpenStreetMap Foundation

  • Nuno Caldeira has noticed a new button on Facebook maps that can be used to report map errors. Unfortunately the error report process does not credit OSM, instead referring to “Facebook maps”. Nuno had already reported the incorrect identification to Facebook a year ago and three months ago asked the OSM Foundation to persuade Facebook to attribute it correctly.
  • OSMF’s trademark policy is still a current topic on the mailing list and not everyone agrees with the current procedures.
  • With the Annual General Meeting of the OpenStreetMap Foundation impending, Michael Reichert provides a reminder (automatic translation) about how to renew one’s membership, with some hints to make life easier for the Membership Working Group. He also calls on others to join the Foundation.


  • This year’s State of the Map conference was sold out on 6 September 2019. If you missed out you’ll be able to livestream or watch the recordings, see the programme for details.
  • The Einfachbahn (de) (automatic translation) initiative organised (de) (automatic translation) a mapping party (de) in Frankfurt am Main on 12 September. Two groups practised the acquisition of railway-related geodata in Frankfurt-Höchst and Frankfurt-Rödelheim stations and subsequently edited with JOSM. In addition to mapping, there was a lively exchange of information on railway geodata.
  • FOSS4G SotM Oceania is looking for a host for next year’s event.
  • The next French-German cross-border OSM community meetup will take place (automatic translation) on 28 September 2019 in Strasbourg.
  • The SigInfoLibre blog talks about (automatic translation) the sub-regional workshop for leaders of OpenStreetMap communities in French-speaking Africa. The workshop is being held at the University of Lomé in Togo with the support of the Organisation Internationale de la Francophonie. On the menu: participants talk about OpenStreetMap Governance, project engineering, communication techniques, and free digital mapping techniques.
  • You often hear of the University Rapperswill, Switzerland, in the context of innovative OSM technologies and applications. So it makes perfect sense that Rapperswill has applied to host the next State of the Map conference from 19 to 21 June 2020.
  • The FOSSGIS 2020 (automatic translation) conference, the annual meeting for open source geo-software and OpenStreetMap in Germany, will take place from 11 to 14 March in Freiburg. As usual, the last day is always a Saturday and dedicated to OSM.
  • Missing Maps is organising another mapathon at the Médecins Sans Frontières offices in Brussels. The Save the World – Map it titled event will take place on 24 September 2019.
  • Ilya Zverev reports from SotM-US – Wednesday (automatic translation) – Thursday (automatic translation) – Friday (automatic translation). (ru)


  • CartONG, a French NGO specialising in mapping and related services for humanitarian and development organisations, is offering a position for a data specialist based in Chambéry (France).


  • Severin Kann reported, via Twitter, that in Krakow OpenStreetMap is used on trams for real-time position display.


  • [1] Development Seed announced the launch of, a tool to coordinate mapping and build communities. The diary entry provides a fairly comprehensive overview and recommends that you test the beta version, create an own instance, or contribute code to the GitHub hosted project.
  • Simon Poole has published a feature preview of the upcoming versions of the Vespucci editor for Android devices. The first beta of Vespucci 14 is expected to be released in September.


  • OsmAnd announced the release of version 3.0 of their iOS app. The new version adds a quick action feature to access various map actions from the home screen, the ability to download a missing region map directly from the context menu, and added hints for tags and values to the advanced OpenStreetMap editing screen.

Did you know …

  • … the OSM-Science mailing list? A place to discuss research ideas, to develop surveys, and to communicate with science-oriented OpenStreetMap community members.
  • … there are at least two kinds of gas-driven vehicles? Some run with liquefied petroleum gas (fuel:lpg), some use compressed natural gas (fuel:cng). Don’t get them mixed up when tagging fuel service stations, as the two are incompatible with each other!

OSM in the media

  • LinuxInsider reports on the new OSGeoLive release. OSGeoLive is a Linux distribution that specialises in geospatial applications.

Other “geo” things

  • OpenMaptiler blogs on the highlights from FOSS4G 2019 in Bucharest.
  • Elijah Zarlin from Mapbox presents 15 projects that have been created recently with Mapbox tools.
  • The Pudding have produced a People Map of the US, where city names are replaced by their most Wikipedia’ed resident. There is also a UK for those who prefer to see the other side of the Atlantic.
  • Google Maps now also includes ridesharing offers “for the first and last mile” in its public transport routing in order to save even more footsteps.
  • The North American Datum of 1983 is the Everest of the classical survey art, completed just before the satellite geodesy revolution swept all aside. Tim Burch discusses the four plate-fixed terrestrial reference frames that will replace the venerable NAD83 in 2022. A combination of correcting the 2m offset in NAD83 from the geocentre and dropping the US survey foot and adopting the international foot, will see some points apparently move by up to 4m.
  • It’s nothing new that Pokémon GO players like to cause controversy in the OSM community. The Silph Road shows that some OSM tags prevent Pokémon from spawning in certain areas of the game. Of course, this is not due to OSM itself, but to the processing of the data by Niantic.

Upcoming Events

Where What When Country
Wuppertal OSM-Treffen Wuppertaler Stammtisch im Hutmacher 18 Uhr 2019-09-11 germany
Leoben Stammtisch Obersteiermark 2019-09-12 austria
Munich Münchner Stammtisch 2019-09-12 germany
Berlin 135. Berlin-Brandenburg Stammtisch 2019-09-12 germany
San José Civic Hack Night & Map Night 2019-09-12 united states
Budapest OSM Hungary Meetup reboot 2019-09-16 hungary
Bratislava Missing Maps mapathon Bratislava #7 2019-09-16 slovakia
Habay Rencontre des contributeurs du Pays d’Arlon 2019-09-16 belgium
Cologne Bonn Airport Bonner Stammtisch 2019-09-17 germany
Lüneburg Lüneburger Mappertreffen 2019-09-17 germany
Reading Reading Missing Maps Mapathon 2019-09-17 united kingdom
Salzburg Maptime Salzburg Mapathon 2019-09-18 austria
Edinburgh FOSS4GUK 2019 2019-09-18-2019-09-21 united kingdom
Heidelberg Erasmus+ EuYoutH OSM Meeting 2019-09-18-2019-09-23 germany
Heidelberg HOT Summit 2019 2019-09-19-2019-09-20 germany
Heidelberg State of the Map 2019 [1] 2019-09-21-2019-09-23 germany
Nantes Journées européennes du patrimoine 2019-09-21 france
Bremen Bremer Mappertreffen 2019-09-23 germany
Nottingham Nottingham pub meetup 2019-09-24 united kingdom
Mannheim Mannheimer Mapathons 2019-09-25 germany
Lübeck Lübecker Mappertreffen 2019-09-26 germany
Düsseldorf Stammtisch 2019-09-27 germany
Dortmund Mappertreffen 2019-09-27 germany
Nagoya 第2回まちマップ道場-伊勢湾台風被災地を訪ねる- 2019-09-28 japan
Strasbourg Rencontre périodique de Strasbourg 2019-09-28 france
Kameoka 京都!街歩き!マッピングパーティ:第12回 穴太寺(あなおじ) 2019-09-29 japan
London London Missing Maps Mapathon 2019-10-01 united kingdom
Stuttgart Stuttgarter Stammtisch 2019-10-02 germany
Brno State of the Map CZ+SK 2019 2019-10-02-2019-10-03 czech republic
Prizren State of the Map Southeast Europe 2019-10-25-2019-10-27 kosovo
Dhaka State of the Map Asia 2019 2019-11-01-2019-11-02 bangladesh
Wellington FOSS4G SotM Oceania 2019 2019-11-12-2019-11-15 new zealand
Grand-Bassam State of the Map Africa 2019 2019-11-22-2019-11-24 ivory coast

Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please check your event in our public calendar preview and correct it, where appropriate.

This weeklyOSM was produced by Polyglot, Rogehm, SK53, SunCobalt, TheSwavu, YoViajo, derFred, geologist, keithonearth, roptat.

ALTC Personal Highlights

15:05, Friday, 13 2019 September UTC

I’ve already written an overview and some thoughts on the ALTC keynotes, this post is an additional reflection on some of my personal highlights of the conference. 

I was involved in three sessions this year; Wikipedia belongs in education with Wikimedia UK CEO Lucy Crompton-Reid and UoE Wikimedian in Residence Ewan McAndrew, Influential voices – developing a blogging service based on trust and openness with DLAM’s Karen Howie, and Supporting Creative Engagement and Open Education at the University of Edinburgh with LTW colleagues Charlie Farley and Stewart Cromar.  All three sessions went really well, with lots of questions and engagement from the audience.  

It’s always great to see that lightbulb moment when people start to understand the potential of using Wikipedia in the classroom to develop critical digital and information literacy skills.    There was a lot of interest in (and a little envy of) UoE’s Academic Blogging Service and centrally supported WordPress platform,, so it was great to be able to share some of the open resources we’ve created along the way including policies, digital skills resources, podcasts, blog posts, open source code and the blogs themselves.  And of course there was a lot of love for our creative engagement approaches and open resources including Board Game Jam and the lovely We have great stuff colouring book.  

Stewart Cromar also did a gasta talk and poster on the colouring book and at one point I passed a delegate standing alone in the hallway quietly colouring in the poster.  As I passed, I mentioned that she could take one of the colouring books and home with her.  She nodded and smiled and carried on colouring.  A lovely quite moment in a busy conference.

It was great to hear Charlie talking about the enduringly popular and infinitely adaptable 23 Things course, and what made it doubly special was that she was co-presenting with my old Cetis colleague R. John Robertson, who is now using the course with his students at Seattle Pacific University.   I’ve been very lucky to work with both Charlie and John, and it’s lovely to see them collaborating like this.

Our Witchfinder General intern Emma Carroll presented a brilliant gasta talk on using Wikidata to geographically locate and visualise the different locations recorded within the Survey of Scottish Witchcraft Database.  It’s an incredible piece of work and several delegates commented on how confidently Emma presented her project.  You can see the outputs of Emma’s internship here

Emma Carroll, CC BY NC 2.0, Chris Bull for Association for Learning Technology

I really loved Kate Lindsay’s thoughtful presentation on KARE, a kind, accessible, respectful, ethical scaffolding system to support online education at University College of Estate Management.  And I loved her Rosa Parks shirt. 

Kate Lindsay, CC BY NC, Chris Bull for Association for Learning Technology

I also really enjoyed Claudia Cox’s engaging and entertaining talk Here be Dragons: Dispelling Myths around BYOD Digital Examinations.  Claudia surely wins the prize for best closing comment…

Sheila MacNeill and Keith Smyth gave a great talk on their conceptual framework for reimagining the digital university which aims to challenge neoliberalism through discursive, reflective digital pedagogy.  We need this now more than ever.

Keith Smyth, CC BY, Lorna M. Campbell

Sadly I missed Helen Beetham’s session Learning technology: a feminist space? but I heard it was really inspiring.  I think I can count on one hand the number of times I’ve been able to hear Helen talk, we always seem to be programmed in the same slot!  I also had to miss Laura Czerniewicz’s Online learning during university shut downs, so I’m very glad it was recorded. I’m looking forward to catching up with is as soon as I can.

The Learning Technologist of the Year Awards were truly inspiring as always. Lizzie Seymour, Learning Technology Officer, Royal Zoological Society of Scotland at Edinburgh Zoo was a very well deserved winner of the individual award, and I was really proud to see the University of Edinburgh’s Lecture Recording Team win the team award.  So many people across the University were involved in this project so it was great to see their hard work recognised.

UoE Lecture Recording Team, CC BY NC, Chris Bull for Association for Learning Technology

Without doubt though the highlight of the conference for me was Frances Bell‘s award of Honorary Life Membership of the Association for Learning Technology.  Frances is a dear friend and an inspirational colleague who really embodies ALT’s core values of participation, openness, collaboration and independence, so it was a huge honour to be invited to present her with the award.  Frances’ nomination was led by Catherine Cronin, who wasn’t able to be at the conference, so it gave me great pleasure to read out her words.

“What a joy to see Frances Bell – who exemplifies active, engaged and generous scholarship combined with an ethic of care –being recognised with this Honorary Life Membership Award by ALT.

As evidenced in her lifetime of work, Frances has combined her disciplinary expertise in Information Systems with historical and social justice perspectives to unflinchingly consider issues of equity in both higher education and wider society.

Uniquely, Frances sustains connections with people across higher education, local communities and creative networks in ways which help to bridge differences without ignoring them, and thus to enable understanding.

Within and beyond ALT, we all have much to thank her for.” 

I confess I couldn’t look at Frances while I was reading Catherine’s words as it was such an emotional moment.   I’m immensely proud of ALT for recognising Frances’ contribution to the community and for honouring her in this way.

Frances Bell, Honorary Life Member or ALT, CC BY NC, Chris Bull for Association for Learning Technology

And finally, huge thanks to Maren, Martin and the rest of the ALT team for organising another successful, warm and welcoming conference. 

How to protect yourself from npm

23:00, Wednesday, 11 2019 September UTC

What’s the worst that could happen after npm install?

When you open an app or execute a program from the terminal, that program can do anything that you can do.

In a nutshell: Imagine if your computer were to disappear in front of your eyes and re-appear in front of mine. Still open. Still unlocked. What could I do from this moment on? That is what an unknown program could do.

Upon running npm install, you may be downloading and executing hundreds of unknown programs.

  1. What is at stake?
  2. How does it compare to other package managers?
  3. What can you do about it?

Two surveillance cameras on a lamppost with a clear blue sky behind them.

Photo by Raysonho

Programs from nice people sometimes ask for your permission. This is because a developer choose to do so.

There may also be laws that could punish them if they get caught choosing differently.

What about programs of which the authors choose differently? Well, such program could do quite a bit.

  • It could access any of your files, modify them, delete them, or upload them. This also applies to the internal files used by other applications.
  • It could install other programs in the background.
  • It could talk to other devices linked to your home network.

What is at stake

Files you might not be thinking about:

  • The cookies in your web browser.
  • Desktop applications. Chat history, password managers, todo lists, etc. They all use files to store the text and media you send or receive.
  • Digital media. Your photo albums, home videos, and voice memos.
  • SSH private keys, GPG key rings, and other crypto files used by developers.

A red face in a white rectangle made of nanoblocks, resting on a silver Apple keyboard.

Photo by DaraKero_F / CC BY 2.0

Browser cookies

Browsers cookies make it so you’re immediately logged-in when you open a new tab for Gmail, or Twitter. An evil program can copy the browser’s cookies file and share it with the attacker.

They could then read any e-mail you’ve ever received or sent stored there. It could also delete any. (Got a backup?) They can naturally access future e-mails as well. Like the ones you get from “Forgot password” buttons. They could also hide any trace of these (e.g. filter rules).

This affects any website you use. Social network? Access to any post or DM — regardless of privacy setting. Company e-mail, Google Drive? That too.

Sleeper programs

The evil program may configure itself to always start in the background when you open your laptop. A new friend for life!

It could also add local command-line programs that wrap the popular sudo and ssh commands, to make them do a little extra behind the scenes. Next time you run sudo <something> to perform an administrator action and enter your password—you may have given away full system access. Deploying some code? Running ssh cloud.someplace.special might let the attacker tailgate along with you, opening one shell for itself and another for you.

Statue of King Louis XIV on a horse with a red blindfold over his eyes. Taken in Paris, France.

Photo by BikerNormand / CC BY-SA 2.0

Local web server

These background programs could also affect you in a myriad of other ways. I won’t detail those today, except to mention they can keep a local web server running. Spotify and Zoom have been seen in the news doing questionable things with their local web servers.

Is this an npm problem?

Maybe. Technically these concerns apply to any method of executing unknown code. Running npm install isn’t very different from pasting a command like curl url… | bash. They both execute a downloaded program from your terminal. The difference is in user expectation.

Upon seeing the url and the bash invocation, you have a choice: Trust the publisher (the url), or trust the script (download, review, then decide whether to run). The result is generally predictable and without hidden dependencies.

Other package managers

What about Debian (apt-get) or Homebrew? Like npm, code published there is unknown to most of us and hard to review. But, there is an important difference: Peer-review. These traditional repositories are curated by a central authority. You don’t have to trust the script or original authors of each package, so long as you trust the publishers and their curation process.

Earth is small compared to Jupiter. Jupiter is roughly 11 times larger.

Image by NASA / Public domain

The scale has changed the game

What about PyPI or Packagist (Composer)? These are like npm. Anyone can publish anything. There is however a difference in scale. PyPI has 194K projects. Packagist is host to 237K packages with 0.5 billion downloads a month. npm has over 1.3 million packages and 30 billion downloads a month. This makes it a much more popular target. [1] [2] [3]

Dependency graphs

There is also a difference in habit: PyPI packages have 7 dependencies on average, with typically 1 indirect dependency. And, I would expect most dependencies there to be from authors the user has trusted before. [4] published in April that the average npm package has a whopping 86 dependencies, with a 4+ levels of indirect dependencies. [4]

The ESLint package has 118 npm dependencies [5]. Eleventy, a popular static site generator, requires 555 dependencies (Explore dependency graph). Each one of these may run arbitrary shell commands from the terminal both during the installation process, after later when using the tool.

I get it. Now, what can we do about it?

There isn’t a magic bullet to make everything perfectly safe. But, there are a number of things you can do to reduce risk.


For the past year, I’ve been using disposable Docker containers as a way to reduce the risk of compromise. It has controls for network access, and for which directories can be exposed. Docker isn’t a perfect safety net by any means, but it’s a step in the right direction.

Earth is small compared to Jupiter. Jupiter is roughly 11 times larger.

Image by Victor Grigas / CC BY-SA 3.0

My base image uses Debian and comes with Node.js, npm, and a few other utilities (such as headless browsers, for automated tests). I use a bash script to launch a temporary container, based on that image. It runs as the unprivileged nobody user, and mounts only the current working directory.

From there, I would run npm install and such. The only thing it interacts with is the source code and local node_modules directory for that specific project. It isn’t given access to any other Git repos, desktop apps, browser cookies, or crypto files. And, once that terminal tab is closed, the container is destroyed.

I’ve published the script I use at I don’t recommend using it outside Wikimedia, however. Create your own instead. The repository explains how it works.

Other options for isolating your environment:

  • Speed and flexibility: Use systemd-nspawn or chroot. This takes more work to setup, but provides a faster environment than Docker. In terms of security it is comparable to Docker. Read more systemd-nspan on ArchWiki.

  • Security and ease of use: Use a virtual machine (e.g. VirtualBox/Vagrant). This is more secure by default and offers a GUI for controlling what to expose. The downside is that VMs are significantly slower.

Fewer dependencies

Finally, you can reduce risk by reducing the number of packages you depend on in your projects (and then shrink-wrap them). Especially development dependencies, as these tend to be explicitly aimed at executing from the CLI.

Question yourself and question others before introducing new dependencies. Perhaps even encourage maintainers of your favourite packages to Reduce the size of their dependency graph!

See also

Further reading

Painted enamel throne table with the seal mark of the 18th century Chinese Qianlong emperor – image from Khalili Collections CC BY-SA 4.0

Wikimedia UK is launching a landmark partnership with the UK-based Khalili Collections – one of the greatest and most comprehensive private collections in the world. Over the course of five decades, UNESCO Goodwill Ambassador Professor Nasser D. Khalili has assembled eight of the world’s finest art collections – each being the largest and most comprehensive of its kind. They comprise:

  • Islamic Art (700-2000)
  • Hajj and the Arts of Pilgrimage (700-2000)
  • Aramaic Documents (535BC-324 BC)
  • Japanese Art of the Meiji Period (1868-1912)
  • Japanese Kimono (1700-2000)
  • Swedish Textiles (1700-1900)
  • Spanish Damascened Metalwork (1850-1900)
  • Enamels of the World (1700-2000)

Together, the Eight Collections comprise some 35,000 works, many of which have been exhibited at prestigious museums and institutions worldwide.

Panoramic View of Mecca, 1845 – image from Khalili Collections CC BY-SA 4.0

As part of the “Masterpieces of the World” project, the Khalili Collections will initially release a thousand high resolution images on Creative Commons licenses, as well as summaries of its extensive research content relating to artwork and objects from around the world. The Collections plans to continue working with Wikimedia UK to further share knowledge about art on Wikimedia platforms and increase the visibility of cultures and art forms that are currently under-represented on Wikipedia.

“At Wikimedia, we are actively seeking to diversify our cultural content, and the Khalili Collections is one of the most geographically and culturally diverse collections in the world, spanning some two and a half millennia, with masterpieces from Europe, the Middle East, Scandinavia, East Asia, Russia, South Asia, North Africa and beyond”, said Lucy Crompton-Reid, CEO of Wikimedia UK. “We are proud to be partnering with one of the world’s great preservers of global cultural heritage”.

“We are delighted to be working with Wikimedia UK, undeniably a pioneer in delivering free access to cultural knowledge worldwide”, said Professor Nasser D. Khalili, Founder of the Khalili Collections. “The partnership is an important part of our wider, long-standing strategy to make the Collections – and the five decades of expert research dedicated to them – more accessible to art and culture lovers worldwide”.

Initial outputs from the partnership will include new Wikipedia articles on The Khalili Collections (an overview article has just been published and articles for the eight individual collections will be forthcoming), 1000 images which will be freely available for reuse, including on Wikipedia, metadata records about the images on Wikimedia Commons, and content from the collections being showcased on Wikipedia, Commons, and Wikidata.

A Complete Cover for a Damascus Mahmal, Istanbul, 16th century – image from Khalili Collections CC BY-SA 4.0

To achieve this, the Khalili Collections will change the licence on 1,000 of its images from “all rights reserved” to CC-BY-SA. These are very high-quality images depicting treasures from non-Western cultures. Some use state-of-the-art high-resolution digitisation. KC will also freely licence some short summaries of the academic books it has published, allowing them to be used as the basis for Wikipedia articles.

In the longer term, we hope that the success of the initial pilot release of content will lead to further joint work sharing perspectives on the history of the world, as revealed through cultural treasures. Wikimedia UK recently published a report on the long-term impact of our Wikimedians in Residence who work at cultural and educational institutions, and we are looking at the potential of hiring a Wikimedian to work with the Khalili Collections to help make the most of the information and content that the KC is making available.

Wikimedia UK are very excited about this project, as it helps us to meet one of our main goals, to increase the diversity of the content and contributors to the Wikimedia projects. Digital inequality across the world means that Wikipedia is much better at representing the culture and history of European civilisations than those of other continents, and we hope that the release of content by the Khalili Collections will help Wikimedia to fill some of the gaps in its representation of the world.

We have a very long way to go in our quest to make an encyclopaedia which represents the breadth and diversity of the world’s history and culture, but partnerships like this are hugely important in making the art heritage of the world freely available to anybody with an internet connection.

This Month in GLAM: August 2019

07:22, Wednesday, 11 2019 September UTC

Measuring Wikipedia page load times

00:04, Wednesday, 11 2019 September UTC

This post shows how we measure and interpret load times on Wikipedia. It also explains what real-user metrics are, and how percentiles work.

Navigation Timing

When a browser loads a page, the page can include program code (JavaScript). This program will run inside the browser, alongside the page. This makes it possible for a page to become dynamic (more than static text and images). When you search on, the suggestions that appear are made with JavaScript.

Browsers allow JavaScript to access some internal systems. One such system is Navigation Timing, which tracks how long each step takes. For example:

  • How long to establish a connection to the server?
  • When did the response from the server start arriving?
  • When did the browser finish loading the page?

Where to measure: Real-user and synthetic

There are two ways to measure performance: Real user monitoring, and synthetic testing. Both play an important role in understanding performance, and in detecting changes.

Synthetic testing can give high confidence in change detection. To detect changes, we use an automated mechanism to continually load a page and extract a result (eg. load time). When there is a difference between results, it likely means that our website changed. This assumes other factors remained constant in the test environment. Factors such as network latency, operating system, browser version, and so on.

This is good for understanding relative change. But synthetic testing does not measure the performance as perceived by users. For that, we need to collect measurements from the user’s browser.

Our JavaScript code reads the measurements from Navigation Timing, and sends them back to This is real-user monitoring.

How to measure: Percentiles

Imagine 9 users each send a request: 5 users get a result in 5ms, 3 users get a result in 70ms, and for one user the result took 560ms. The average is 88ms. But, the average does not match anyone’s real experience. Let’s explore percentiles!

Diagram showing 9 labels: 5ms, 5ms, 5ms, 5ms, 5ms, 70ms, 70ms, 70ms, and 560ms.

The first number after the lower half (or middle) is the median (or 50th percentile). Here, the median is 5ms. The first number after the lower 75% is 70ms (75th percentile). We can say that "for 75% of users, the service responded within 70ms". That’s more useful.

When working on a service used by millions, we focus on the 99th percentile and the highest value (100th percentile). Using medians, or percentiles lower than 99%, would exclude many users. A problem with 1% of requests is a serious problem. To understand why, it is important to understand that, 1% of requests does not mean 1% of page views, or even 1% of users.

A typical Wikipedia pageview makes 20 requests to the server (1 document, 3 stylesheets, 4 scripts, 12 images). A typical user views 3 pages during their session (on average).

This means our problem with 1% of requests, could affect 20% of pageviews (20 requests x 1% = 20% = ⅕). And 60% of users (3 pages x 20 objects x 1% = 60% ≈ ⅔). Even worse, over a long period of time, it is most likely that every user will experience the problem at least once. This is like rolling dice in a game. With a 16% (⅙) chance of rolling a six, if everyone keeps rolling, everyone should get a six eventually.

Real-user variables

The previous section focussed on performance as measured inside our servers. These measurements start when our servers receive a request, and end once we have sent a response. This is back-end performance. In this context, our servers are the back-end, and the user’s device is the front-end.

It takes time for the request to travel from the user’s device to our systems (through cellular or WiFi radio waves, and through wires.) It also takes time for our response to travel back over similar networks to the user’s device. Once there, it takes even more time for the device’s operating system and browser to process and display the information. Measuring this is part of front-end performance.

Differences in back-end performance may affect all users. But, differences in front-end performance are influenced by factors we don’t control. Such as network quality, device hardware capability, browser, browser version, and more.

Even when we make no changes, the front-end measurements do change. Possible causes:

  • Network. ISPs and mobile network carriers can make changes that affect network performance. Existing users may switch carriers. New users come online with a different choice distribution of carrier than current users.
  • Device. Operating system and browser vendors release upgrades that may affect page load performance. Existing users may switch browsers. New users may choose browsers or devices differently than current users.
  • Content change. Especially for Wikipedia, the composition of an article may change at any moment.
  • Content choice. Trends in news or social media may cause a shift towards different (kinds of) pages.
  • Device choice. Users that own multiple devices may choose a different device to view the (same) content.

The most likely cause for a sudden change in metrics is ourselves. Given our scale, the above factors usually change only for a small number of users at once. Or the change might happen slowly.

Yet, sometimes these external factors do cause a sudden change in metrics.

Case in point: Mobile Safari 9

Shortly after Apple released iOS 9 (in 2015), our global measurements were higher than before. We found this was due to Mobile Safari 9 introducing support for Navigation Timing.

Before this event, our metrics only represented mobile users on Android. With iOS 9, our data increased its scope to include Mobile Safari.

iOS 9, or the networks of iOS 9 users, were not significantly faster or slower than Android’s. The iOS upgrade affected our metrics because we now include an extra 15% of users – those on Mobile Safari.

Where desktop latency is around 330ms; mobile latency is around 520ms. Having more metrics from mobile, skewed the global metrics toward that category.

Line graph for responseStart metric from desktop pageviews. Values range from 250ms to 450ms. Averaging around 330ms.
Line graph for responseStart metric from mobile pageviews. Values range from 350ms to 700ms. Averaging around 520ms.

The above graphs plot the "75th percentile" of responseStart for desktop and mobile (from November 2015). We combine these metrics into one data point for each minute. The above graphs show data for one month. There is only enough space on the screen to have each point represent 3 hours. This works by taking the mean average of the per-minute values within each 3 hour block. While this provides a rough impression, this graph does not show the 75th percentile for November 2015. The next section explains why.

Average of percentiles

Opinions vary on how bad it is to take the average of percentiles over time. But one thing is clear: The average of many 1-minute percentiles is not the percentile for those minutes. Every minute is different, and the number of values also varies each minute. To get the percentile for one hour, we need all values from that hour, not the percentile summary from each minute.

Below is an example with values from three minutes of time. Each value is the response time for one request. Within each minute, the values sort from low to high.

Diagram with four sections. Section One is for the minute 08:00 to 08:01, it has nine values with the middle value of 5ms marked as the median. Section Two is for 08:01 to 08:02 and contains five values, the median is 560ms. Section Three is 08:02 to 08:03, contains five values, the median of Section Three is 70ms. The last section, Section Four, is the combined diagram from 08:00 to 08:03 showing all nineteen values. The median is 70ms.

The average of the three separate medians is 211ms. This is the result of (5 + 560 + 70) / 3. The actual median of these values combined, is 70ms.


To compute the percentile over a large period, we must have all original values. But, it’s not efficient to store data about every visit to Wikipedia for a long time. We could not quickly compute percentiles either.

A different way of summarising data is by using buckets. We can create one bucket for each range of values. Then, when we process a time value, we only increment the counter for that bucket. When using a bucket in this way, it is also called a histogram bin.

Let’s process the same example values as before, but this time using buckets.

There are four buckets. Bucket A is for values below 11ms. Bucket B is for 11ms to 100ms. Bucket C is for 101ms to 1000ms. And Bucket D is for values above 1000ms. For each of the 19 values, we find the associated bucket and increase its counter.

After processing all values, the counters are as follows. Bucket A holds 9, Bucket B holds 4, Bucket C holds 6, and Bucket D holds 0.

Based on the total count (19) we know that the median (10th value) must be in bucket B, because bucket B contains values 10 to 13. And that the 75th percentile (15th value) must be in bucket C because it contains values 14 to 19.

We cannot know the exact millisecond value of the median, but we know the median must be between 11ms and 100ms. (This matches our previous calculation, which produced 70ms.)

When we use exact percentiles, our goal was for that percentile to be a certain number. For example, if our 75th percentile today is 560ms, this means for 75% of users a response takes 560ms or less. Our goal could be to reduce the 75th percentile to below 500ms.

When using buckets, goals are defined differently. In our example, 6 out of 19 responses (32%) are above 100ms (bucket C and D), and 13 of 19 (68%) are below 100ms (bucket A and B). Our goal could be to reduce the percentage of responses above 100ms. Or the opposite, to increase the percentage of responses within 100ms.

Rise of mobile

Traffic trends are generally moving towards mobile. In fact, April 2017 was the first month where Wikimedia mobile pageviews reached 50% of all Wikimedia pageviews. And after June 2017, mobile traffic has stayed above 50%.

Bar chart showing percentages of mobile and desktop pageviews for each month in 2017. They mostly swing equal at around 50%. Looking closely, we see mobile first reaches 51% in April. In May it was below 50% again. But for June and every month since then mobile has remained above 50%. The peak was in October 2017, where mobile accounted for 59% of pageviews. The last month in the graph, November 2017 shows 53% of mobile pageviews.

Global changes like this have a big impact on our measurements. This is the kind of change that drives us to rethink how we measure performance, and (more importantly) what we monitor.

Further reading

Moving Plants

02:59, Tuesday, 10 2019 September UTC
All humans move plants, most often by accident and sometimes with intent. Humans, unfortunately, are only rarely moved by the sight of exotic plants. 

Unfortunately, the history of plant movements is often difficult to establish. In the past, the only way to tell a plant's homeland was to look for the number of related species in a region to provide clues on their area of origin. This idea was firmly established by Nikolai Vavilov before he was sent off to Siberia, thanks to Stalin's crank-scientist Lysenko, to meet an early death. Today, genetic relatedness of plants can be examined by comparing the similarity of DNA sequences (although this is apparently harder than with animals due to issues with polyploidy). Some recent studies on individual plants and their relatedness have provided insights into human history. A study on baobabs in India and their geographical origins in East Africa established by a study in 2015 and that of coconuts in 2011 are hopefully just the beginnings. These demonstrate ancient human movements which have never received much attention from most standard historical accounts.
Inferred trasfer routes for Baobabs -  source

Unfortunately there are a lot of older crank ideas that can be difficult for untrained readers to separate. I recently stumbled on a book by Grafton Elliot Smith, a Fullerian professor who succeeded J.B.S.Haldane but descended into crankdom. The book "Elephants and Ethnologists" (1924) can be found online and it is just one among several similar works by Smith. It appears that Smith used a skewed and misapplied cousin of Dollo's Law. According to him, cultural innovation tended to occur only once and that they were then carried on with human migrations. Smith was subsequently labelled a "hyperdiffusionist", a disparaging term used by ethnologists. When he saw illustrations of Mayan sculpture he envisioned an elephant where others saw at best a stylized tapir. Not only were they elephants, they were Asian elephants, complete with mahouts and Indian-style goads and he saw this as definite evidence for an ancient connection between India and the Americas! An idea that would please some modern-day Indian cranks and zealots.

Smith's idea of the elephant as emphasised by him.
The actual Stela in question
 "Fanciful" is the current consensus view on most of Smith's ideas, but let's get back to plants. 

I happened to visit Chikmagalur recently and revisited the beautiful temples of Belur on the way. The "Archaeological Survey of India-approved" guide at the temple did not flinch when he described an object in the hand of a carved figure as being maize. He said maize was a symbol of prosperity. Now maize is a crop that was imported to India and by most accounts only after the Portuguese reached the Americas in 1492 and made sea incursions into India in 1498. In the late 1990s, a Swedish researcher identified similar  carvings (actually another one at Somnathpur) from 12th century temples in Karnataka as being maize cobs. It was subsequently debunked by several Indian researchers from IARI and from the University of Agricultural Sciences where I was then studying. An alternate view is that the object is a mukthaphala, an imaginary fruit made up of pearls.
Somnathpur carvings. The figures to the
left and right hold the puported cobs in their left hands.
(Photo: G41rn8)

The pre-Columbian oceanic trade ideas however do not end with these two cases from India. The third story (and historically the first, from 1879) is that of the sitaphal or custard apple. The founder of the Archaeological Survey of India, Alexander Cunningham, described a fruit in one of the carvings from Bharhut, a fruit that he identified as custard-apple. The custard-apple and its relatives are all from the New World. The Bharhut Stupa is dated to 200 BC and the custard-apple, as quickly pointed out by others, could only have been in India post-1492. The Hobson-Jobson has a long entry on the custard apple that covers the situation well. In 2009, a study raised the possibility of custard apples in ancient India. The ancient carbonized evidence is hard to evaluate unless one has examined all the possible plant seeds and what remains of their microstructure. The researchers however establish a date of about 2000 B.C. for the carbonized remains and attempt to demonstrate that it looks like the seeds of sitaphal. The jury is still out.
The Hobson-Jobson has an interesting entry on the custard-apple
I was quite surprised that there are not many writings that synthesize and comment on the history of these ideas on the Internet and somewhat oddly I found no mention of these three cases in the relevant Wikipedia article (naturally, fixed now with an entire new section) - pre-Columbian trans-oceanic contact theories

There seems to be value for someone to put together a collation of plant introductions to India along with sources, dates and locations of introduction. Some of the old specimens of introduced plants may well be worthy of further study.

Introduction dates
  • Pithecollobium dulce - Portuguese introduction from Mexico to Philippines and India on the way in the 15th or 16th century. The species was described from specimens taken from the Coromandel region (ie type locality outside native range) by William Roxburgh.
  • Eucalyptus globulus? - There are some claims that Tipu planted the first of these (See my post on this topic).  It appears that the first person to move eucalyptus plants (probably E. globulosum) out of Australia was  Jacques Labillardière. Labillardiere was surprized by the size of the trees in Tasmania. The lowest branches were 60 m above the ground and the trunks were 9 m in diameter (27 m circumference). He saw flowers through a telescope and had some flowering branches shot down with guns! (original source in French) His ship was seized by the British in Java and that was around 1795 or so and released in 1796. All subsequent movements seem to have been post 1800 (ie after Tipu's death). If Tipu Sultan did indeed plant the Eucalyptus here he must have got it via the French through the Labillardière shipment.  The Nilgiris were apparently planted up starting with the work of Captain Frederick Cotton (Madras Engineers) at Gayton Park(?)/Woodcote Estate in 1843.
  • Muntingia calabura - when? - I suspect that Tickell's flowerpecker populations boomed after this, possibly with a decline in the Thick-billed flowerpecker.
  • Delonix regia - when?
  • In 1857, Mr New from Kew was made Superintendent of Lalbagh and he introduced in the following years several Australian plants from Kew including Araucaria, Eucalyptus, Grevillea, Dalbergia and Casuarina. Mulberry plant varieties were introduced in 1862 by Signor de Vicchy. The Hebbal Butts plantation was establised around 1886 by Cameron along with Mr Rickets, Conservator of Forests, who became Superintendent of Lalbagh after New's death - rain trees, ceara rubber (Manihot glaziovii), and shingle trees(?). Apparently Rickets was also involved in introducing a variety of potato (kidney variety) which got named as "Ricket". -from Krumbiegel's introduction to "Report on the progress of Agriculture in Mysore" (1939) [Hebbal Butts would be the current day Airforce Headquarters)

Further reading
  • Johannessen, Carl L.; Parker, Anne Z. (1989). "Maize ears sculptured in 12th and 13th century A.D. India as indicators of pre-columbian diffusion". Economic Botany 43 (2): 164–180.
  • Payak, M.M.; Sachan, J.K.S (1993). "Maize ears not sculpted in 13th century Somnathpur temple in India". Economic Botany 47 (2): 202–205. 
  • Pokharia, Anil Kumar; Sekar, B.; Pal, Jagannath; Srivastava, Alka (2009). "Possible evidence of pre-Columbian transoceanic voyages based on conventional LSC and AMS 14C dating of associated charcoal and a carbonized seed of custard apple (Annona squamosa L.)" Radiocarbon 51 (3): 923–930. - Also see
  • Veena, T.; Sigamani, N. (1991). "Do objects in friezes of Somnathpur temple (1286 AD) in South India represent maize ears?". Current Science 61 (6): 395–397.
  • Rangan, H., & Bell, K. L. (2015). Elusive Traces: Baobabs and the African Diaspora in South Asia. Environment and History, 21(1):103–133. doi:10.3197/096734015x1418317996982 [The authors however make a mistake in using Achaya, K.T. Indian Food (1994) who in turn cites Vishnu-Mittre's faulty paper for the early evidence of Eleusine coracana in India. Vishnu-Mittre himself admitted his error in a paper that re-examined his specimens - see below]
Dubious research sources
  • Singh, Anurudh K. (2016). "Exotic ancient plant introductions: Part of Indian 'Ayurveda' medicinal system". Plant Genetic Resources. 14(4):356–369. 10.1017/S1479262116000368. [Among the claims here are that Bixa orellana was introduced prior to 1000 AD - on the basis of Sanskrit names which are assigned to that species - does not indicate basis or original dated sources. The author works in the "International Society for Noni Science"! ] 
  • The same author has rehashed this content with several references and published it in no less than the Proceedings of the INSA - Singh, Anurudh Kumar (2017) Ancient Alien Crop Introductions Integral to Indian Agriculture: An Overview. Proceedings of the Indian National Science Academy 83(3). There is a series of cherry-picked references, many of the claims of which were subsequently dismissed by others or remain under serious question. In one case there is a claim for early occurrence of Eleusine coracana in India - to around 1000 BC. The reference cited is in fact a secondary one - the original work was by Vishnu-Mittre and the sample was rechecked by another bunch of scientist and they clearly showed that it was not even a monocot - in fact Vishnu-Mittre himself accepted the error - the original paper was Vishnu-Mittre (1968). "Protohistoric records of agriculture in India". Trans. Bose Res. Inst. Calcutta. 31: 87–106. and the re-analysis of the samples can be found in - Hilu, K. W.; de Wet, J. M. J.; Harlan, J. R. Harlan (1979). "Archaeobotanical Studies of Eleusine coracana ssp. coracana (Finger Millet)". American Journal of Botany. 66 (3):330–333. Clearly INSA does not have great peer review and have gone with argument by claimed authority.
  • PS 2019-August. Singh, Anurudh, K. (2018). Early history of crop presence/introduction in India: III. Anacardium occidentale L., Cashew Nut. Asian Agri-History 22(3):197-202. Singh has published another article claiming that cashew was present in ancient India well before the Columbian exchange - with "evidence" from J.L. Sorenson of a sketch purportedly made from a Bharhut stupa balustrade carving - the original of which is not found here and a carving from Jambukeshwara temple with a "cashew" arising singly and placed atop a stalk that rises from below like a lily! He also claims that some Sanskrit words and translations (from texts/copies of unknown provenance or date) confirm ancient existence. I accidentally asked about whether he had examined his sources carefully and received a rather interesting response which I find very useful as a classic symptom of the problems of science in India. More interestingly I learned that John L. Sorenson is well known for his affiliation with the Church of Jesus Christ of Latter-day Saints and apparently part of Mormon foundations is the claim that Mesoamerican cultures were of Semitic origin and much of the "research" of their followers have attempted to bolster support for this by various means.

Tech News issue #37, 2019 (September 9, 2019)

00:00, Monday, 09 2019 September UTC
TriangleArrow-Left.svgprevious 2019, week 37 (Monday 09 September 2019) nextTriangleArrow-Right.svg
Other languages:
Bahasa Indonesia • ‎Deutsch • ‎English • ‎Tiếng Việt • ‎italiano • ‎magyar • ‎polski • ‎português do Brasil • ‎suomi • ‎čeština • ‎Ελληνικά • ‎русский • ‎українська • ‎עברית • ‎العربية • ‎فارسی • ‎हिन्दी • ‎中文 • ‎日本語 • ‎ꯃꯤꯇꯩ ꯂꯣꯟ

weeklyOSM 476

16:40, Sunday, 08 2019 September UTC



The tourism organisation of the Durmitor National Park in Žabljak, Montenegro, recommends the use of OSM 1 | Photo © CC0


  • Stolpersteine (literally “stumbling blocks”) are small brass-plated cubes laid, around Europe, in front of the last-known residences or workplaces of those who were driven out or murdered by the Nazis. Reclus asked (automatic translation) if the 8700 Stolpersteine with a Wikidata entry are linked to OpenStreetMap.
  • Hauke ​​Stieler has made (automatic translation) a map of objects tagged shop=yes in Germany. Rendering of shop=yes was dropped in OpenStreetMap Carto v4.22.0.
  • amilopowers’s proposal to tag the possibility of withdraw cash in a shop or amenity can now be voted on.
  • Klumbumbus proposes traffic_calming=dynamic_bump for the new type of dynamic traffic calming, whose impact depends on the driver’s speed, and asks for your opinion.
  • Vadim Shlyakhov is proposing leisure=sunbathing to mark outdoor locations where people can sunbathe.


  • The OSM Operations Team announced that anonymous users will shortly be no longer able to comment on notes. The reasons and more background information can be found in a GitHub issue, which was opened 2 years ago.
  • Samuel Darkwah Manu, founder of the Kwame Nkrumah University of Science and Technology YouthMappers and a member of the OSM Ghana community, shares his experience participating in the Open Cities Accra project.
  • OpenStreetMap encourages all mappers to vote for the OpenStreetMap Awards 2019. Voting ends on 18 September so vote now.

OpenStreetMap Foundation

  • For the upcoming 2019 OSMF Board Elections Kate Chapman announced that Mikel Maron will be up for re-election and Frederik Ramm, as well as Kate Chapman, will be stepping down and not be re-running.


  • The State of the Map is looming. In less than 2 weeks the annual OSM event, with many interesting events, will start in Heidelberg, Germany. The State of the Map will take place from 21st to 23rd September and follows directly after the HOT Summit at the same place.
  • Lukas and Fabian, from HeiGIT, presented a 90 minute lab about analysing OpenStreetMap data history with the ohsome platform at the FOSS4G 2019 conference in Bucharest. The teaching material and code has been made available as linked snippets in the GIScience HD Gitlab.

Humanitarian OSM

  • HOT reports about the work on setting up an effective solid waste collection system with Open Source Tools in Dar es Salaam.


  • [1] The tourism organisation of the Durmitor National Park in Žabljak, Montenegro, recommends the use of OSM on their mountain bike maps and also on their homepage with the wording “For successful orientation in the region of Durmitor and Sinjajevina, we recommend that you use OpenStreetMap and Open Cycle Maps which are regularly updated and new information is added every day”.

Open Data

  • The German federal state Saxony has released (automatic translation) aerial images, digital topographic maps, elevation and landscape models, and cadastre data to the public. Unfortunately the new data is licensed with a CC-BY-like licence and, hence, not compatible with our requirements. However, orthophotos and a public map with roads, road names, building footprints and house numbers were already and will continue to be available to OSM mappers.
  • The CCC is hosting a video, from the Free and Open Source Software for Geospatial (FOSS4G) event in Bucharest, about how to use OSM and Wikidata together with data science tools and Python.


  • tchaddad explains, in his user diary, how Wikidata queries using SPARQL and the API work, and how Wikidata could be used to improve Nominatim.


  • Paul Norman informed us about an update to Carto, OSM’s main map style. The minor improvements include bug fixes, performance and code cleanup, as well as some optical changes such as retail colour fill on malls, and the rendering of historic=citywalls the same as barrier=city_wall.
  • Sarah Hoffmann announced a new release of osm2pgsql. The new version (1.0.0) drops support for old-style multipolygons and has received major functionality improvements.

Did you know …

  • … Matt Mikus answered the question of whether Minnesota has more shoreline than California, Hawaii and Florida, combined. With the help of OSM data he found that the answer is yes if you include rivers and streams in your calculation.
  • … about the ski resort CopenHill in the Danish capital Copenhagen? It was built, as a globally unique project, on the green roof of a waste incineration plant to give Danes an opportunity to spend their ski holidays in their own country. In OSM it looks like this.

OSM in the media

  • The Austrian newspaper Der Standard wrote (de) an article about China’s practice of distorting its maps. While Russia ended its efforts to falsify roads, rivers and even city quarters at the end of the 80s, in light of upcoming satellite imagery based maps, China continues its own efforts for unknown reasons. A map law, with 68 paragraphs, requires that only approved maps are allowed to be published, with “correct” borders of course. Only 14 Chinese companies have a licence to produce and publish maps. The distortion is assumed to range between 50 and 700 metres and can be seen when comparing a satellite image on Google maps with the road map layer. The article mentions OSM as an alternative with “controversial legality”.

Other “geo” things

  • On Day 3 of Pista ng Mapa, Leigh deployed her DJI Phantom 4 to survey the event venue and its surrounding community. Leigh uploaded all of the drone-derived data into OpenAerialMap, including the elevation models. Maning Sambale demonstrates how to use QGIS to extract the heights from the derived DSM/DTM and to use them for visualising building polygons from OpenSteetMap.
  • The Africa Geospatial Data and Internet Conference 2019 will be held in Accra, Ghana from 22nd to 24th October. The conference aims to bring people together in discussions on public policy issues relating to geospatial and open data, ICTs and the Internet in Africa.
  • For more than 30 years WGS84 has acted as a “pivot” datum through which one datum can be transformed into another. Michael Giudici explains how, in a world that demands sub-metre accuracy, the WGS84 pivot has outlived its usefulness. The “GDAL Coordinate System Barn Raising” is currently working on improving GDAL, PROJ, and libgeotiff so they can handle time-dependent coordinate reference systems and accurately transform between datums.
  • Katja Seidel investigated (automatic translation) alternatives to GPSies to use after it is merged into AllTrails.

Upcoming Events

Where What When Country
Minneapolis State of the Map U.S. 2019 [1] 2019-09-06-2019-09-08 united states
Taipei OSM x Wikidata #8 2019-09-09 taiwan
Bordeaux Réunion mensuelle 2019-09-09 france
Toronto Toronto Mappy Hour 2019-09-09 canada
Salt Lake City SLC GeoBeers 2019-09-10 united states
Hamburg Hamburger Mappertreffen 2019-09-10 germany
Lyon Rencontre mensuelle pour tous 2019-09-10 france
Wuppertal OSM-Treffen Wuppertaler Stammtisch im Hutmacher 18 Uhr 2019-09-11 germany
Leoben Stammtisch Obersteiermark 2019-09-12 austria
Munich Münchner Stammtisch 2019-09-12 germany
Berlin 135. Berlin-Brandenburg Stammtisch 2019-09-12 germany
San José Civic Hack Night & Map Night 2019-09-12 united states
Budapest OSM Hungary Meetup reboot 2019-09-16 hungary
Bratislava Missing Maps mapathon Bratislava #7 2019-09-16 slovakia
Habay Rencontre des contributeurs du Pays d’Arlon 2019-09-16 belgium
Cologne Bonn Airport Bonner Stammtisch 2019-09-17 germany
Lüneburg Lüneburger Mappertreffen 2019-09-17 germany
Reading Reading Missing Maps Mapathon 2019-09-17 united kingdom
Salzburg Maptime Salzburg Mapathon 2019-09-18 austria
Edinburgh FOSS4GUK 2019 2019-09-18-2019-09-21 united kingdom
Heidelberg Erasmus+ EuYoutH OSM Meeting 2019-09-18-2019-09-23 germany
Heidelberg HOT Summit 2019 2019-09-19-2019-09-20 germany
Heidelberg State of the Map 2019 [2] 2019-09-21-2019-09-23 germany
Nantes Journées européennes du patrimoine 2019-09-21 france
Bremen Bremer Mappertreffen 2019-09-23 germany
Nottingham Nottingham pub meetup 2019-09-24 united kingdom
Mannheim Mannheimer Mapathons 2019-09-25 germany
Lübeck Lübecker Mappertreffen 2019-09-26 germany
Düsseldorf Stammtisch 2019-09-27 germany
Dortmund Mappertreffen 2019-09-27 germany
Nagoya 第2回まちマップ道場-伊勢湾台風被災地を訪ねる- 2019-09-28 japan
Strasbourg Rencontre périodique de Strasbourg 2019-09-28 france
Kameoka 京都!街歩き!マッピングパーティ:第12回 穴太寺(あなおじ) 2019-09-29 japan
Prizren State of the Map Southeast Europe 2019-10-25-2019-10-27 kosovo
Dhaka State of the Map Asia 2019 2019-11-01-2019-11-02 bangladesh
Wellington FOSS4G SotM Oceania 2019 2019-11-12-2019-11-15 new zealand
Grand-Bassam State of the Map Africa 2019 2019-11-22-2019-11-24 ivory coast

Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please check your event in our public calendar preview and correct it, where appropriate.

This weeklyOSM was produced by Polyglot, Rogehm, SK53, SunCobalt, TheSwavu, YoViajo, derFred, geologist, jinalfoflia.

2019 Youth Film Festival in Charlottesville

11:13, Sunday, 08 2019 September UTC

On Saturday 7 September 2019 I attended the 18th Annual Youth Film Festival in Charlottesville. A nonprofit organization called Light House Studio presents this.

I like that there is an organization which provides a channel for youth to produce and publish films locally. Because I am a media access advocate, I liked less that all the films had a tag “copyright Lighthouse Studio”, which communicates that the nonprofit organization acquires the copyright from all creators in the program. I do not necessarily mind them acquiring the copyright but they also assert a conventional copyright license after the manner of a film studio, and I would prefer that either they use a free and open license or permit creators to retain the copyright. The other context I have for my view is that I have seen repeatedly that nonprofit organizations of this sort invest no budget, expertise, or consideration of the long-term management of their media collections, and typically they lose the cataloging metadata of content which they produce. The usual outcome is that the media becomes mostly undiscoverable in a few years, when I would rather it be archived for the long term. This is all speculation based on my past experience, observations, their copyright notice, and their lack of published archiving procedure.

I enjoyed the works. I liked the two documentaries more than the others. One was interviews with local Charlottesville students who were immigrants from Central or South America. Those students said that people in Charlottesville harassed them either for being Latino or speaking Spanish. This seems believable to me because as a recent move here I see strange racism and prejudice here continually. Local people typically express the idea that Charlottesville is a friendly place but they compare it to other towns in the region, which people describe as either ignorant or sometimes proundly hatemongering. I still get surprised when I see historically oppressed demographics here, women, black people, Latin, LGBT+ and the rest act deferentially to an oppressive norm. 

Another documentary had students visit local nursing homes, ask residents where they would like to virtually visit, then put virtual reality headsets on them. This was after the genre of video for exposing someone to technology not of their generation. The people in this video had little awareness of virtual reality and were moved by the experience.

Young people are capable of meaningful media creation and publishing when they have the opportunity to do so. I expect that participants in the program take great inspiration for years from the work they produce. Probably the video production for these films happens in a week, so as a life experience, this entire program seems high impact at relatively low cost. I recognize that a complicated nonprofit network must exist in a community for this to work, including funding to the host organization but also to the youth organizations which make the student participants ready to join these programs.

Some of the homes and locations featured in this program were evidence that at least one young person on the production team was from a wealthy family. I try to notice when there is a nonprofit community resource which offers benefits within easier reach of the wealthy as compared to the underserved. I appreciate that the host organization in this case is seeking diversity, but of course diversity costs money and the rich kids’ families pay the participation fee.

The entire event was great and would compare favorably with anything similar.

Language barriers to @Wikidata

14:49, Saturday, 07 2019 September UTC
Wikidata is intended to serve all the languages of all the Wikipedias for starters. It does in one very important way; all the interwiki links or the links between articles on the same subject are maintained in Wikidata.

For most other purposes Wikidata serves the "big" languages best, particularly English. This is awkward because particularly people reading other languages stand to gain most from Wikidata. The question is: how do we chip away on this language barrier.

Giving Wikidata data an application is the best way to entice people to give Wikidata a second look.. Here are two:
  • Commons is being wikidatified and it now supports a "depicts" statement. As more labels become available in a language, finding pictures in "your" language becomes easy and obvious. It just needs an application
  • Many subjects are likely to be of interest in a language. Why not have projects like the Africa project with information about Africa shared and updated by the Listeria bot? Add labels and it becomes easier to use, link to Reasonator for understanding and add articles for a Wikipedia to gain content.
Key is the application of our data. Wikidata includes a lot, the objective is to find the labels and we will when the results are immediately applicable. It will also help when we consider the marketing opportunities that help foster our goals.


@Wikidata - #Quality is in the network

13:15, Saturday, 07 2019 September UTC
What amounts to quality is a recurring and controversial subject. For me quality is not so much in the individual statements for a particular Wikidata item, it is in how it links to other items.

As always, there has to be a point to it. You may want to write Wikipedia articles about chemists, artists, award winners. You may want to write to make the gender gap less in your face but who to write about?

Typically connecting to small subsets is best. However we want to know about the distribution of genders so it is very relevant to add a gender. Statistically it makes no difference in the big picture but for subsets like: the co-authors of a scientist or a profession, an award, additional data helps understand how the gender gap manifests itself.

The inflation of "professions" like "researcher" is such that it is no longer distinctive, at most it helps with the disambiguation from for instance soccer stars. When a more precise profession is known like "chemist" or "astronomer", all subclasses of researcher, it is best to remove researcher as it is implied.

Lists like members of "Young Academy of Scotland", have their value when they link as widely as possible. Considering only Wikidata misses the point, it is particularly the links to the organisations, the authorities (ORCiD, Google Scholar, VIAF) but also Twitter like for this psychologist. We may have links to all of them, the papers, the co-authors. But do we provide quality when people do not go into the rabbit hole?

On a germ trail

04:22, Thursday, 05 2019 September UTC

Hidden away in the little Himalayan town of Mukteshwar is a fascinating bit of science history. Cattle and livestock really mattered a lot in the pre-engine past, especially for transport and power,  on farms and in cities but also and especially for people in power. Hyder Ali and Tipu were famed and feared for their ability to move their guns rapidly, most famously, making use of bullocks, of the Amrut Mahal and Hallikar breeds. The subsequent British conquerors saw the value and maintained large numbers of them, at the Commissariat farm in Hunsur for instance.

The Commissariat Farm, Hunsur
Photo by Wiele & Klein, from: The Queen's Empire. A pictorial and descriptive record. Volume 2.
Cassell and Co. London (1899). [p. 261]
The original photo caption given below, while being racy, was most definitely inaccurate,
these were not maintained for beef :

It is said that the Turkish soldier will live and fight upon a handful of dates and a cup of water, the Greek upon a few olives and a pound of bread—an excellent thing for the commissariats of the two armies concerned, no doubt! But though Turk and Greek will be satisfied with this Spartan fare, the British soldier will not—not if he can help it, that is to say. Sometimes he cannot help it, and then it is only just to him to admit that he bears himself at a pinch as a soldier should, and is satisfied with what he can get. But what the British soldier wants is beef, and plenty of it : and he is a wise and provident commander who will contrive that his men shall get what they want. Here we see that the Indian Government has realised this truth. The picture represents the great Commissariat Farm at Hunsur in Mysore, where the shapely long-horned bullocks are kept for the use of the army.
Report of the cattle plague commission
led by J.H.B. Hallen (1871)

Imagine the situation when cattle die off in their millions - the estimated deaths of cows and buffaloes in 1870 was 1 million. Around 1871, it rang alarm bells high enough to have a committee examining the situation. Britain had had a major "cattle plague" outbreak in 1865 and so the matter was not unknown to the public. The generic term for the mass deaths was "murrain", a rather old-fashioned word that refers to an epidemic disease in sheep and cattle derived from the French word morine, or "pestilence," with roots in Latin mori "to die." A commission headed by Staff Veterinary Surgeon J.H.B. Hallen went across what would best be called the "cow belt" of India and noted among other things that the cattle in the hills were doing better and that rivers helped isolate the disease. Remarkably there were two little-known Indians members - Mirza Mahomed Ali Jan (a deputy collector) and Hem Chunder Kerr (a magistrate and collector). The report includes 6 maps with spots where the outbreaks occurred in each year from 1860 to 1866 and the spatial approach to epidemiology is dominant. This is perhaps unsurprising given that the work of John Snow would have been fresh in medical minds. One point in the report that caught my eye was "Increasing civilization, which means in India clearing of jungle, making of roads, extended agriculture, more communication with other parts, buying and selling, &c, provides greater facilities for the spread of contagious diseases of stock." The committee identified the largest number of deaths to be caused by rinderpest. Rinderpest has a very long history and the its attacks in Europe are quite well documented. There had been two veterinary congresses in Europe that looked at rinderpest. One of the early researchers was John Burdon Sanderson (a maternal grand-uncle of J.B.S. Haldane) who noted that the blood of infected cattle was capable of infecting others even before the source individual showed any symptoms of the disease. He also examined the relationship to smallpox and cowpox through cross-vaccination and examination for resistance. C.A. Spinage in his brilliant book (but with a European focus) on The Cattle Plague - A History (2003) notes that rinderpest belongs to the Paramyxoviruses, a morbillivirus which probably existed in Pleistocene Bovids and perhaps the first relative that jumped to humans was measles, and was associated with the domestication of cattle. The English believed that the origin of rinderpest lay in Russia. The Russians believed it came from the Mongols.
Gods slaandehand over Nederland, door de pest-siekte onder het rund vee
[God's lashing hand over the Netherlands, due to the plague disease among cattle]
Woodcut by Jan Smits (1745) - cattle epidemics evoked theological explanations
The British government made a grant of £5,000 in 1865 for research into rinderpest which was apparently the biggest ever investment in medical research upto that point of time. This was also a period when there was epidemic cholera epidemic, mainly affecting the working class, and it was noted that hardly any money was spent on it. (Spinage:328) The result of the rewards was that a very wide variety of cures were proffered and Spinage provides an amusing overview. One cure claim came from a Mr. M. Worms of Ceylon and involved garlic, onion, and asafoetida. Worms was somehow related to Baron Rothschild and the cure was apparently tested on some of Rothschild's cattle with some surprising recoveries. Inoculation as in small pox treatments were tried by many and they often resulted in infection and death of the animals.

As for the India scene, it appears that the British government did not do much based on the Hallen committee report. There were attempts to regulate the movement of cattle but it seems that the idea that it could be prevented through inoculation or vaccination had to wait. In the 1865 outbreak in Britain, one of the control measures was the killing and destruction of infected cattle at the point of import. This finally brought an end to outbreaks in 1867. Several physicians in India tried experiments in inoculation. In India natural immunity was noted and animals that overcame the disease were valued by their owners. In India natural immunity was noted and animals that overcame the disease were valued by their owners. In 1890 Robert Koch was called into service in the Cape region on the suggestion of Dr J. Beck. In 1897 Koch announced that bile from infected animals could induce resistance on inoculation. Koch was then sent on to India to examine the plague leaving behind a William Kolle to continue experiments in a disused mine building at Kimberley belonging to the De Beers. Around the same time experiments were conducted by Herbert Watkins-Pitchford and Arnold Theiler who found that serum from cattle that recovered worked as an effective inoculation. They however failed to publish and received little credit. Koch, a German, beating the English researchers was a cause of hurt pride.

The Brown Institution was destroyed in 1944
by German bombing
Interesting to see how much national pride was involved in all this. The French had established an Imperial Bacteriological Institute at Constantinople with Louis Pasteur as their leading light. This was mostly headed by Pasteur Institute Alumni. Maurice Nicolle and Adil-Bey were involved in rinderpest research. They demonstrated that the causal agent was small enough to pass through bacterial filters. In India, Alfred Lingard was chosen in 1890 to examine the problems of livestock diseases and to find solutions. Lingard had gained his research experience at the Brown Animal Sanatory Institution - whose workers included John Burdon Sanderson. About six years earlier, Robert Koch, had caused more embarrassment to the British establishment by identifying the cholera causing bacteria in Calcutta. Koch had however not demonstrated that his bacteria isolate could cause disease in uninfected animals - thereby failing one of the required tests for causality that now goes by the name of Koch's postulates. There were several critiques by British researchers who had been working for a while on cholera in India - these included David Douglas Cunningham (who was also a keen naturalist and wrote a couple of general natural history books as well) and T.R. Lewis (who had spent some time with German researchers).  The British government (the bureaucrats were especially worried about quarantine measures for cholera and had a preference for old-fashioned miasma theories of disease) felt the need for a committee to examine the conflict between the English and German claims - and they presumably chose someone with a knowledge of German for it -  Emanuel Edward Klein assisted by Heneage Gibbes. Klein was also from the Brown Animal Sanatory Institution and had worked with Burdon Sanderson. Now Klein, the Brown Institution, Burdon Sanderson and many of the British physiologists had come under the attack of the anti-vivisection movement. During the court proceedings that examined claims of cruelty to animals by the anti-vivisectionists, Klein, an east European (of Jewish descent) with his poor knowledge of English had made rather shocking statements that served as fodder for some science fiction written in that period with evil characters bearing a close resemblance to Klein! Even Lingard had been accused of cruelty, feeding chickens with the lungs of tuberculosis patients, to examine if the disease could be transmitted. E.H. Hankin, the man behind the Ganges bacteriophages had also been associated with the vivisection-researchers and the British Indian press had even called him a vivisector who had escaped to India.

Lingard initially worked in Pune but he found the climate unsatisfactory for working on anti-rinderpest sera. In 1893 he moved the laboratory in the then remote mountain town of Mukteshwar (or Muktesar as the British records have it) and his first lab burnt down in a fire. In 1897 Lingard invited Koch and others to visit and Koch's bile method was demonstrated. The institution, then given the grand name of Imperial Bacteriological Laboratory was rebuilt and it continues to exist as a unit of the Indian Veterinary Research Institute. Lingard was able to produce rinderpest serum in this facility - producing 468,853 doses between 1900 and 1905 and the mortality of inoculated cattle was as low as 0.43%. The institute grew to produce 1,388,560 doses by 1914-15. Remarkably, several countries joined hands in 1921 to attack rinderpest and other livestock diseases and it is claimed that rinderpest is now the second virus (after smallpox) to have been eradicated. The Muktesar institution and its surroundings were also greatly modified with dense plantations of deodar and other conifers. Today this quiet little village centered around a temple to Shiva is visited by waves of tourists and all along the route one can see the horrifying effects of land being converted for housing and apartments.

The Imperial Bacteriological Laboratory c. 1912 (rebuilt after the fire)
In 2019, the commemorative column can be seen.
Upper corridor
A large autoclave made by Manlove & Alliott, Nottingham.
Stone marker
A cold storage room built into the hillside
Koch in 1897 at Muktesar
Seated: Lingard, Koch, Pfeiffer, Gaffky

The habitat c. 1910. One of the parasitologists, a Dr Bhalerao,
described parasites from king cobras shot in the area.

The crags behind the Mukteshwar institute, Chauli-ki-Jhali, a hole in a jutting sheet of rock (behind and not visible)
is a local tourist attraction.
Here then are portraits of three scientists who were tainted in the vivisection debate in Britain, but who were able to work in India without much trouble.
E.H. Hankin

Alfred Lingard

Emanuel Edward Klein

The cattle plague period coincides nicely with some of the largest reported numbers of Greater Adjutant storks and perhaps also a period when vultures prospered, feeding on the dead cattle. We have already seen that Hankin was quite interested in vultures. Cunningham notes the decline in adjutants in his Some Indian Friends and Acquaintances (1903). The anti-vivisection movement, like other minority British movements such as the vegetarian movement, found friends among many educated Indians, and we know of the participation of such people as Dr Pranjivan Mehta in it thanks to the work of the late Dr. S. R. Mehrotra. There was also an anti-vaccination movement, and we know it caused (and continues to cause) enough conflict in the case of humans but there appears to be little literature related to opposition to their use on livestock in India.

Further reading
Thanks are due to Dr Muthuchelvan and his colleague for an impromptu guided tour of IVRI, Mukteshwar.
The Imperial Bacteriologist - Alfred Lingard in this case in 1906 - was apparently made "Conservator" for the "Muktesar Reserve Forest" and the 10 members of the "Muktesar Shikar Club" were given exemption from fees to shoot carnivores on their land in 1928. See National Archives of India document.
Klein, Gibbes and D.D. Cunningham were also joined by H.V. Carter (who contributed illustrations to Gray's Anatomy - more here).

Perf Matters at Wikipedia 2015

19:51, Wednesday, 04 2019 September UTC

Hello, WANObjectCache

This year we achieved another milestone in our multi-year effort to prepare Wikipedia for serving traffic from multiple data centres.

The MediaWiki application that powers Wikipedia relies heavily on object caching. We use Memcached as horizontally scaled key-value store, and we’d like to keep the cache local to each data centre. This minimises dependencies between data centres, and makes better use of storage capacity (based on local needs).

Aaron Schulz devised a strategy that makes MediaWiki caching compatible with the requirements of a multi-DC architecture. Previously, when source data changed, MediaWiki would recompute and replace the cache value. Now, MediaWiki broadcasts “purge” events for cache keys. Each data centre receives these and sets a “tombstone”, a marker lasting a few seconds that limits any set-value operations for that key to a miniscule time-to-live. This makes it tolerable for recache-on-miss logic to recompute the cache value using local replica databases, even though they might have several seconds of replication lag. Heartbeats are used to detect the replication lag of the databases involved during any re-computation of a cache value. When that lag is more than a few seconds (a large portion of the tombstone period), the corresponding cache set-value operation automatically uses a low time-to-live. This means that large amounts of replication lag are tolerated.

This and other aspects of WANObjectCache’s design allow MediaWiki to trust that cached values are not substantially more stale, than a local replica database; provided that cross-DC broadcasting of tiny in-memory tombstones is not disrupted.

First paint time now under 900ms

In July we set out a goal: improve page load performance so our median first paint time would go down from approximately 1.5 seconds to under a second – and stay under it!

I identified synchronous scripts as the single-biggest task blocking the browser, between the start of a page navigation and the first visual change seen by Wikipedia readers. We had used async scripts before, but converting these last two scripts to be asynchronous was easier said than done.

There were several blockers to this change. Including the use of embedded scripts by interactive features. These were partly migrated to CSS-only solutions. For the other features, we introduced the notion of “delayed inline scripts”. Embedded scripts now wrap their code in a closure and add it to an array. After the module loader arrives, we process the closures from the array and execute the code within.

Another major blocker was the subset of community-developed gadgets that didn’t yet use the module loader (introduced in 2011). These legacy scripts assumed a global scope for variables, and depended on browser behaviour specific to serially loaded, synchronous, scripts. Between July 2015 and August 2015, I worked with the community to develop a migration guide. And, after a short deprecation period, the legacy loader was removed.

Line graph that plots the firstPaint metric for August 2015. The line drops from approximately one and a half seconds to 890 milliseconds.

Hello, WebPageTest

Previously, we only collected performance metrics for Wikipedia from sampled real-user page loads. This is super and helps detect trends, regressions, and other changes at large. But, to truly understand the characteristics of what made a page load a certain way, we need synthetic testing as well.

Synthetic testing offers frame-by-frame video captures, waterfall graphs, performance timelines, and above-the-fold visual progression. We can run these automatically (e.g. every hour) for many urls, on many different browsers and devices, and from different geo locations. These tests allow us to understand the performance, and analyse it. We can then compare runs over any period of time, and across different factors. It also gives us snapshots of how pages were built at a certain point in time.

The results are automatically recorded into a database every hour, and we use Grafana to visualise the data.

In 2015 Peter built out the synthetic testing infrastructure for Wikimedia, from scratch. We use the open-source WebPageTest software. To read more about its operation, check Wikitech.

The journey to Thumbor begins

Gilles evaluated various thumbnailing services for MediaWiki. The open-source Thumbor software came out as the most promising candidate.

Gilles implemented support for Thumbor in the MediaWiki-Vagrant development environment.

To read more about our journey to Thumbor, read The Journey to Thumbor (part 1).

Save timing reduced by 50%

Save timing is one of the key performance metrics for Wikipedia. It measures the time from when a user presses “Publish changes” when editing – until the user’s browser starts to receive a response. During this time, many things happen. MediaWiki parses the wiki-markup into HTML, which can involve page macros, sub-queries, templates, and other parser extensions. These inputs must be saved to a database. There may also be some cascading updates, such as the page’s membership in a category. And last but not least, there is the network latency between user’s device and our data centres.

This year saw a 50% reduction in save timing. At the beginning of the year, median save timing was 2.0 seconds (quarterly report). By June, it was down to 1.6 seconds (report), and in September 2015, we reached 1.0 seconds! (report)

Line graph of the median save timing metric, over 2015. Showing a drop from two seconds to one and a half in May, and another drop in June, gradually going further down to one second.

The effort to reduce save timing was led by Aaron Schulz. The impact that followed was the result of hundreds of changes to MediaWiki core and to extensions.

Deferring tasks to post-send

Many of these changes involved deferring work to happen post-send. That is, after the server sends the HTTP response to the user and closes the main database transaction. Examples of tasks that now happen post-send are: cascading updates, emitting “recent changes” objects to the database and to pub-sub feeds, and doing automatic user rights promotions for the editing user based on their current age and total edit count.

Aaron also implemented the “async write” feature in the multi-backend object cache interface. MediaWiki uses this for storing the parser cache HTML in both Memcached (tier 1) and MySQL (tier 2). The second write now happens post-send.

By re-ordering these tasks to occur post-send, the server can send a response back to the user sooner.

Working with the database, instead of against it

A major category of changes were improvements to database queries. For example, reducing lock contention in SQL, refactoring code in a way that reduces the amount of work done between two write queries in the same transaction, splitting large queries into smaller ones, and avoiding use of database master connections whenever possible.

These optimisations reduced chances of queries being stalled, and allow them to complete more quickly.

Avoid synchronous cache re-computations

The aforementioned work on WANObjectCache also helped a lot. Whenever we converted a feature to use this interface, we reduced the amount of blocking cache computation that happened mid-request. WANObjectCache also performs probabilistic preemptive refreshes of near-expiring values, which can prevent cache stampedes.

Profiling can be expensive

We disabled the performance profiler of the AbuseFilter extension in production. AbuseFilter allows privileged users to write rules that may prevent edits based on certain heuristics. Its profiler would record how long the rules took to inspect an edit, allowing users to optimise them. The way the profiler worked, though, added a significant slow down to the editing process. Work began later in 2016 to create a new profiler, which has since completed.

And more

Lots of small things. Including the fixing of the User object cache which existed but wasn’t working. And avoid caching values in Memcached if computing them is faster than the Memcached latency required to fetch it!

We also improved latency of file operations by switching more LBYL-style coding patterns to EAFP-style code. Rather than checking whether a file exists, is readable, and then checking when it was last modified – do only the latter and handle any errors. This is both faster and more correct (due to LBYL race conditions).

So long, Sajax!

Sajax was a library for invoking a subroutine on the server, and receiving its return value as JSON from client-side JavaScript. In March 2006, it was adopted in MediaWiki to power the autocomplete feature of the search input field.

The Sajax library had a utility for creating an XMLHttpRequest object in a cross-browser-compatible way. MediaWiki deprecated Sajax in favour of jQuery.ajax and the MediaWiki API. Yet, years later in 2015, this tiny part of Sajax remained popular in Wikimedia's ecosystem of community-developed gadgets.

The legacy library was loaded by default on all Wikipedia page views for nearly a decade. During a performance inspection this year, Ori Livneh decided it was high time to finish this migration. Goodbye Sajax!

Further reading

This year also saw the switch to encrypt all Wikimedia traffic with TLS by default. More about that on the Wikimedia blog.

— Timo Tijhof

Older blog entries