Celebrating 600,000 commits for Wikimedia

10:00, Saturday, 30 2020 May UTC

Earlier today, the 600,000th commit was pushed to Wikimedia's Gerrit server. We thought we'd take this moment to reflect on the developer services we offer and our community of developers, be they Wikimedia staff, third party workers, or volunteers.

At Wikimedia, we currently use a self-hosted installation of Gerrit to provide code review workflow management, and code hosting and browsing. We adopted this in 2011–12, replacing Apache Subversion.

Within Gerrit, we host several thousand repositories of code (2,441 as of today). This includes MediaWiki itself, plus all the many hundreds of extensions and skins people have created for use with MediaWiki. Approximately 90% of the MediaWiki extensions we host are not used by Wikimedia, only by third parties. We also host key Wikimedia server configuration repositories like puppet or site config, build artefacts like vetted docker images for production services or local .deb build repos for software we use like etherpad-lite, ancillary software like our special database exporting orchestration tool for dumps.wikimedia.org, and dozens of other uses.

Gerrit is not just (or even primarily) a code hosting service, but a code review workflow tool. Per the Wikimedia code review policy, all MediaWiki code heading to production should go through separate development and code review for security, performance, quality, and community reasons. Reviewers are required to use their "good judgement and careful action", which is a heavy burden, because "[m]erging a change to the MediaWiki core or an extension deployed by Wikimedia is a big deal". Gerrit helps them do this, providing clear views of what is changing, supporting itemised, character-level, file-level, or commit-level feedback and revision, and allowing series of complex changes to be chained together across multiple repositories, and ensuring that forthcoming and merged changes are visible to product owners, development teams, and other interested parties.

Across all of repositories, we average over 200 human commits a day, though activity levels vary widely. Some repositories have dozens of patches a week (MediaWiki itself gets almost 20 patches a day; puppet gets nearly 30), whereas others get a patch every few years. There are over 8,000 accounts registered with Gerrit, although activity is not distributed uniformly throughout that cohort.

To focus engineer time where it's needed, a fair amount of low-risk development work is automated. This happens in both creating patches and also, in some cases, merging them.

For example, for many years we have partnered with TranslateWiki.net's volunteer community to translate and maintain MediaWiki interfaces in hundreds of languages. Exports of translators' updates are pushed and merged automatically by one of the TWN team each day, helping our users keep a fresh, usable system whatever their preferred language.

Another key area is LibraryUpgrader, a custom tool to automatically upgrade the libraries we use for continuous integration across hundreds of repositories, allowing us to make improvements and increase standards without a single central breaking change. Indeed, the 600,000th commit was one of these automatic commits, upgrading the version of the mediawiki-codesniffer tool in the GroupsSidebar extension to the latest version, ensuring it is written following the latest Wikimedia coding conventions for PHP.

Right now, we're working on upgrading our installation of Gerrit, moving from our old version based on the 2.x branch through 2.16 to 3.1, which will mean a new user interface and other user-facing changes, as well as improvements behind the scenes. More on those changes will be coming in later posts.


Header image: A vehicle used to transport miners to and from the mine face by 'undergrounddarkride', used under CC-BY-2.0.

Are you familiar with occupational epidemiology? It’s the study of whether working conditions are safe for workers. As workplaces determine whether or not it’s safe to open up facilities again and resume “normal” work amidst a global pandemic, organization leaders are ideally making these important decisions with science and employee safety in mind.

Public health students in Tania Carreon-Valencia and Thais Morata’s course at the University of Cincinnati exercised their science communication muscles this spring as they added worker health and workplace safety information to Wikipedia. These topics are at the forefront of our collective consciousness right now as we contemplate (locally and globally) what “returning to work” looks like. And Wikipedia has proven to be a valuable resource during the pandemic as the world seeks updates on what to do.

While there may not be lots of peer-reviewed research yet about the effects of the pandemic on essential workers, it’s still worth keeping these topics up to date as information becomes available. Being aware of the risks of dental aerosols (a new Wikipedia page created by one of these students) might cause workplaces to contemplate how else coronavirus can spread and take precautions for reducing the risk. As this new page will inform you, the instruments that dentists use to probe and clean your teeth create aerosols that can pose a risk to clinicians and other patients. These dental aerosols even have the possibility to transmit diseases by spreading viruses, including SARS-CoV-2, which causes COVID-19. This is why on March 16th, 2020, the American Dental Association advised dentists to postpone all elective procedures. This student’s work has already been viewed more than 1,300 times, showing that even seemingly obscure topics can fill the information needs of many.

Another student improved the Wikipedia page about incident stress—the behavioral, emotional, and physical symptoms a frontline worker might experience after experiencing something traumatic on the job. While there is no method that is completely effective for preventing incident stress, there are ways to reduce its impact on the affected person. Possible steps to maintaining on-site health include maintaining nutrition and rest; limiting exposure to further stimuli, like noise; whether or not an employer is prepared to respond to cases of incident stress; and more. These steps are now captured in the corresponding Wikipedia page in a brand new section about “prevention” thanks to a student.

And the Wikipedia page about shift work sleep disorder, which consistently receives about 150 views a day, saw quite a few improvements in April. The disorder causes adverse health effects in people whose work schedule disrupts their typical sleeping patterns. A student added that it often goes undiagnosed and that the health effects include increased risk of bone fractures, low fertility, obesity, diabetes, decreased immune functioning, and negative effects on mental health. The page now also makes clear that sleep deprivation may lead to medical errors, workplace accidents, and low productivity. And it includes more methods through which decreased sleep quality can be assessed. The page has received 10,000 visits since this student made these changes.

The Wikipedia writing assignment was internationally recognized as an important tool for science communication around public health by the National Institute for Occupational Safety and Health (NIOSH) in 2019. NIOSH recognizes that Wikipedia makes research “usable” for the general public and lauds the site for policies that make information verifiable for readers. When Wikipedia is one of the leading sources for medical information out there, making sure that information is rooted in the latest science is hugely important. And students are great folks to do that work (with the assistance of their expert instructors and our Wikipedia training materials). Let’s make sure workers know their rights and that employers are up to date on science that can best prepare them to make positive decisions for their employees.


Interested in incorporating a Wikipedia writing assignment into a future course? Visit teach.wikiedu.org for all you need to know to get started. And here are some tips for incorporating the assignment into a virtual course.


Thumbnail image by Gmihail, via Wikimedia Commons (CC BY-SA 3.0 RS).

Monthly​ ​Report,​ March 2020

18:25, Thursday, 28 2020 May UTC

Highlights

  • As a result of the global COVID-19 pandemic, Wiki Education closed its office in the Presidio and moved all its operations online. In order to deal with the new situation, staff created a contingency and a crisis communications plan for each program. We also instituted a weekly COVID-19 briefing aimed at creating a shared understanding of how the pandemic affects our organization. A “Friday virtual social hour” helps staff deal with being isolated at home.
  • March 2020 also saw dramatic changes to the higher education landscape as the vast majority of courses in the U.S. moved to online platforms as a result of the outbreak of COVID-19. It was a chaotic time for our instructors and students as they all adjusted to this new mode of learning, and Wiki Education was there to help. Wikipedia Student Program Manager Helaine Blumenthal checked in on courses to see if they needed additional help and to let them know that Wiki Education’s support would remain uninterrupted. We were truly heartened to hear from so many of our instructors as we all adjust to these new circumstances both in our professional and personal lives. We are grateful that we can continue to work with our instructors and students during this challenging time, and hope we can provide our students with a meaningful educational experience whether they are on or off campus.
  • We launched the third Scholars & Scientists course in partnership with the Society of Family Planning (SFP) to improve Wikipedia articles related to abortion and contraception. We know that Wikipedia plays a significant role in the research people do about health and medicine, and we are happy to work with SFP to ensure the public has access to the highest quality information about family planning.

Read more…

For student work highlights; examples of great work from our Scholars & Scientists, Wikidata, and Visiting Scholars Programs; finance and fundraising updates; and more read our full report here.


Header/thumbnail image by Marcela McGreal (CC BY 2.0) shows protesters in New York, was uploaded to Wikimedia Commons by a student in Amy Carleton’s English course at Northeastern University, and is used in the Wikipedia article Asian American university resource center. Read this month’s report for more examples of great student work.
I have a renewed interest in Commons because the first steps have been made to make it actually useful. According to Wikidata there are two distinct Sarah T. Roberts. One is an epidemiologist the other is into information & media studies.

At Commons it was a mess, the picture of Sarah was used to illustrate an info box of the other Sarah. It is not that interesting to tell you how I did what. Relevant is that I did. I did because you will will find things when there is a label for whatever in "your" language..

Given that we do not research the use of Commons or Wikidata for that matter, why should the WMF give priority to opening up Commons even further? After all, there is no data to support it..
Thanks,
      GerardM

Traditionalism on the wikis

02:01, Thursday, 28 2020 May UTC

There's a discussion happening on wikimedia-l; I had no idea of the "highest ideal of the Prussian civil servant", but it sounds sensible. Ziko van Dijk writes:

It seems to me that many Wikipedians or Wikimedians think of themselves as being progressive and modern. Our wikis are a tribute to science and enlightenment. Spontaneity and a laissez-faire-attitude are held in high regard; "productive chaos" and "anarchy" are typical for wikis.

When I had a closer look at our values and ideas, I got the impression that the opposite is true. Many attitudes and ideals sound to me more like bureaucracy and traditionalism:

  • being thorough, with regard to content and writing about it
  • community spirit
  • treating everyone equally without regard of the person (the highest ideal of the Prussian civil servant)
  • individual initiative
  • reliability

Enhancing the disability healthcare information on Wikipedia is a powerful way to combat misinformation, discrimination, and prejudice around disability and disability healthcare. The online encyclopedia is the most utilized healthcare resource in the world with a reach of 500 million readers per month. Policy makers, doctors, and others need to understand the diverse communities they serve and the existing barriers for adults with disabilities, and Wikipedia’s content can help institutions institute more inclusive practices. But quality of content on Wikipedia varies widely, and the volunteers who write it may not have access to expensive medical journal articles or an understanding of the evolving field of Disability Studies. The majority of Wikipedia pages related to developmental disabilities need significant improvement. Many get hundreds of page views a day, indicating a demand for content that just isn’t complete.

Wiki Scientist Kathleen Downes was less than impressed with the depiction of spastic cerebral palsy on Wikipedia, so she uploaded a photo of herself as a child. (CC BY-SA 4.0)

Thanks to a grant from the WITH Foundation, our first WITH Wiki Scientists course helped combat these problems. We supported a group of 20 experts as they worked to add more than 11,000 words to Wikipedia about topics like spastic cerebral palsy, diagnostic overshadowing, the Civil Rights Act of 1968, special needs dentistry, the connection between sexual abuse and intellectual disability, and much more. Together, their work has been viewed more than 224,000 times, and their work will live on long beyond the course.

When we approached the WITH Foundation last year with the idea to run a Wikipedia training course for disability healthcare professionals, we hoped the course would be interesting and impactful for prospective participants. We worked with the WITH Foundation to share this opportunity with their networks and were excited to receive almost twice as many applications than there were seats available, including 14 from members of the American Academy of Developmental Medicine & Dentistry (AADMD). In our first WITH Wiki Scientists course this spring, we were able to support 9 of those members. Next month, we will present at the upcoming AADMD virtual conference to share these medical professionals’ impact to public scholarship.

We had hoped to use the conference presentation to share this virtual learning opportunity with AADMD members, but the coronavirus pandemic has pushed the virtual presentation beyond the registration deadline. If you’re attending the AADMD virtual conference, please join Director of Partnerships Jami Mathewson on Thursday, June 18, 2020 7:30-8:00 PM EDT. The 30-minute session aims to achieve the following learning outcomes:

1. To understand Wikipedia as a means of public scholarship and increasing access to current academic research about developmental disabilities

2. To learn how medical professionals are making Wikipedia more inclusive for people with developmental disabilities

3. To understand how healthcare providers are applying their new Wikipedia knowledge in their daily professional lives

Whether or not you’re an AADMD member, if you’re interested in participating in our second cohort from June 15th–September 4th, please apply at wikiedu.org/with-AADMD by June 5th. We encourage adults with developmental disabilities to apply and/or spread the opportunity in your networks. Together, we can help ensure medical professionals can provide comprehensive healthcare to everyone.

These students in India have to do a project. The subject is Botswana. Their teacher wants them to find many pictures so he searched Wikimedia Commons among others for pictures of  Mokgweetsi Masisi, the president of Botswana. He marked the pictures that depicts Mr Masisi and now his pupils will find more pictures of him when they look for मोकेगसेसी मासी.

At the same time in Japan students have to do a project about Botswana. Their teacher is pleasantly surprised when he find so many pictures for モクウィツィ・マシシ...
Thanks,
       GerardM

Commons app v2.13 beta

14:16, Tuesday, 26 2020 May UTC

Hope you are all safe and well. We’ve just released v2.13 to beta, which includes:

– A new media details UI, which includes the ability to zoom and pan around images
– When the user uploads a picture with a geotag, the app will check for Nearby places that need photos around that location, and one is found, it will ask the user “Is this a picture of Place X?”
– Modifications to Nearby filters based on user feedback
– Bug and crash fixes for stuff that got broken by the codebase overhaul

Our next release will likely contain structured data integration, bookmarks for the Nearby map, and a couple of other new features.

Stay tuned!

@WikiCommons - meanwhile in a different universe

21:34, Monday, 25 2020 May UTC
And again there was a discussion that it should not be this hard to find pictures in Commons. The big difference this time is that there is now a wealth of images that have been tagged for what they "depict". They are linked to Wikidata items and they have a wealth of labels in many, many languages. In essence it has always been an objective of Wikidata to share its content in any and all of the 300+ languages supported by a Wikipedia.

The ideas that floated around soon made it into a "proof of concept" and as so often it actually worked after a fashion. The first iteration was in true Wikimedia tradition English only. The proof of concept got its second language in Dutch, Hay Kranen the developer is Dutch. Now there are nine languages and we are waiting for French to be the tenth.

So what does it do. You can look for pictures in Commons, it has 61 million media files, and when you are looking for available pictures in your language, you will find it as long as Wikidata has a label in your language.  This is for instance a result in Japanese and this is the result in German.

What can you do to make it better? Add labels in your language for the things you want to find and find media files that depicts what you are looking for. When nobody translated the software in your language, you can even do that.

Why is this so relevant? Have you ever wondered how many pictures you find in one of the smaller languages using Google or Bing? Let me tell you, it is disappointing to be polite. Commons is the repository of the mediafiles that illustrate all the Wikipedias so yes, it covers "almost anything".

The Wikimedia Foundation has this big strategy for its movement to be inclusive. This is a wonderful opportunity to show how agile it is, that it understands and supports a need that has been expressed for many many years. The beauty is the the way forward has been expressed in something that already works.

ABSOLUTELY, there will be challenges in integrating this functionality where it fulfills a need.

Luckily it is not necessary for it all to be done in one go. The first step can be as little as to take the "proof of concept" an rewrite it in the preferred language of the WMF, internationalise and localise it and keep it stand alone for now. The people who know about it will use it and they will be the first to point out what more they want to be done. A priority will be to retain its KISSable nature.

The objective is to open up Commons. Open it up in any and all languages. For me it is obvious. I will gladly give it my attention in the expectation that both Wikidata and Commons actually find a public, have a purpose that is more than what we do for ourselves.
Thanks,
      GerardM

Production Excellence #20: April 2020

16:23, Monday, 25 2020 May UTC

How are we doing on that strive for operational excellence during these unprecedented times?

📊  Numbers for March and April
  • 3 documented incidents. [1]
  • 60 new Wikimedia-prod-error reports. [2]
  • 58 Wikimedia-prod-error reports closed. [3]
  • 178 currently open Wikimedia-prod-error reports in total. [4]

For more about recent incidents and pending actionables see Wikitech and Phabricator.


📉  Outstanding reports

Take a look at the workboard and look for tasks that could use your help.

→  https://phabricator.wikimedia.org/tag/wikimedia-production-error/

Breakdown of recent months:

  • April 2019: Two reports closed, 2 of 14 left.
  • May: (All clear!)
  • June: 4 of 11 left (unchanged). ⚠️
  • July: 8 of 18 left (unchanged).
  • August: 2 of 14 reports left (unchanged).
  • September: 7 of 12 left (unchanged).
  • October: Two reports closed, 4 of 12 left.
  • November: One report closed, 4 of 5 left.
  • December: Two reports closed, 4 of 9 left.
  • January 2020: One report closed, 5 of 7 reports left.
  • February: One report closed, 6 of 7 reports left.
  • March: 2 new reports survived the month of March.
  • April: 13 new reports survived the month of April.

At the end of February the total of open reports over recent months was 58. Of those, 12 got closed, but with 15 new reports from March/April still open, the total is now up at 61 open reports.

The workboard overall (which includes pre-2019 tasks) has 178 tasks open. This is actually down by a bit for the first time since October with December at 196, January at 198, and February at 199, and now April at 178. This was largely due to the Release Engineering and Core Platform teams closing out forgotten reports that have since been resolved or otherwise obsoleted.

💡 Tip: Verifying existing tasks is a good way to (re)familiarise yourself with Kibana. For example: Does the error still occur in the last 30 days? Does it only happen on a certain wiki? What do the URLs or stack traces have in common?

🎉  Thanks!

Thank you to everyone who helped by reporting, investigating, or resolving problems in Wikimedia production. Thanks!

Until next time,

– Timo Tijhof


Footnotes:
[1] Incidents. – https://wikitech.wikimedia.org/wiki/Incident_documentation
[2] Tasks created. – https://phabricator.wikimedia.org/maniphest/query/HjopcKClxTfw/#R
[3] Tasks closed. – https://phabricator.wikimedia.org/maniphest/query/ts62HKYPBxod/#R
[4] Open tasks. – https://phabricator.wikimedia.org/maniphest/query/Fw3RdXt1Sdxp/#R

Shocking tales from ornithology

03:05, Monday, 25 2020 May UTC
Manipulative people have always made use of the dynamics of ingroups and outgroups to create diversions from bigger issues. The situation is made worse when misguided philosophies are peddled by governments that put economics ahead of ecology. The pursuit of easily gamed targets such as GDP is preferrable to ecological amelioration since money is a man-made and controllable entity. Nationalism, pride, other forms of chauvinism, the creation of enemies and the magnification of war threats are all effective tools in the arsenal of Machiavelli for use in misdirecting the masses when things go wrong. One might imagine that the educated, especially scientists, would be smart enough not to fall into these traps, but cases from history dampen hopes for such optimism.

There is a very interesting book in German by Eugeniusz Nowak called "Wissenschaftler in turbulenten Zeiten" (or scientists in turbulent times) that deals with the lives of ornithologists, conservationists and other naturalists during the Second World War. Preceded by a series of recollections published in various journals, the book was published in 2010 but I became aware of it only recently while translating some biographies into the English Wikipedia. I have not yet actually seen the book (it has about five pages on Salim Ali as well) and have had to go by secondary quotations in other content. Nowak was a student of Erwin Stresemann (with whom the first chapter deals with) and he writes about several European (but mostly German, Polish and Russian) ornithologists and their lives during the turbulent 1930s and 40s. Although Europe is pretty far from India, there are ripples that reached afar. Incidentally, Nowak's ornithological research includes studies on the expansion in range of the collared dove (Streptopelia decaocto) which the Germans called the Türkentaube, literally the "Turkish dove", a name with a baggage of cultural prejudices.

Nowak's first paper of "recollections" notes that: [he] presents the facts not as accusations or indictments, but rather as a stimulus to the younger generation of scientists to consider the issues, in particular to think “What would I have done if I had lived there or at that time?” - a thought to keep as you read on.

A shocker from this period is a paper by Dr Günther Niethammer on the birds of Auschwitz (Birkenau). This paper (read it online here) was published when Niethammer was posted to the security at the main gate of the concentration camp. You might be forgiven if you thought he was just a victim of the war. Niethammer was a proud nationalist and volunteered to join the Nazi forces in 1937 leaving his position as a curator at the Museum Koenig at Bonn.
The contrast provided by Niethammer who looked at the birds on one side
while ignoring inhumanity on the other provided
novelist Arno Surminski with a title for his 2008 novel -
Die Vogelwelt von Auschwitz
- ie. the birdlife of Auschwitz.

G. Niethammer
Niethammer studied birds around Auschwitz and also shot ducks in numbers for himself and to supply the commandant of the camp Rudolf Höss (if the name does not mean anything please do go to the linked article / or search for the name online).  Upon the death of Niethammer, an obituary (open access PDF here) was published in the Ibis of 1975 - a tribute with little mention of the war years or the fact that he rose to the rank of Obersturmführer. The Bonn museum journal had a special tribute issue noting the works and influence of Niethammer. Among the many tributes is one by Hans Kumerloeve (starts here online). A subspecies of the common jay was named as Garrulus glandarius hansguentheri by Hungarian ornithologist Andreas Keve in 1967 after the first names of Kumerloeve and Niethammer. Fortunately for the poor jay, this name is a junior synonym of  G. g. anatoliae described by Seebohm in 1883.

Meanwhile inside Auschwitz, the Polish artist Wladyslaw Siwek was making sketches of everyday life  in the camp. After the war he became a zoological artist of repute. Unfortunately there is very little that is readily accessible to English readers on the internet (beyond the Wikipedia entry).
Siwek, artist who documented life at Auschwitz
before working as a wildlife artist.
 
Hans Kumerloeve
Now for Niethammer's friend Dr Kumerloeve who also worked in the Museum Koenig at Bonn. His name was originally spelt Kummerlöwe and was, like Niethammer, a doctoral student of Johannes Meisenheimer. Kummerloeve and Niethammer made journeys on a small motorcyle to study the birds of Turkey. Kummerlöwe's political activities started earlier than Niethammer, joining the NSDAP (German: Nationalsozialistische Deutsche Arbeiterpartei = The National Socialist German Workers' Party)  in 1925 and starting the first student union of the party in 1933. Kummerlöwe soon became a member of the Ahnenerbe, a think tank meant to provide "scientific" support to the party-ideas on race and history. In 1939 he wrote an anthropological study on "Polish prisoners of war". At the museum in Dresden that he headed, he thought up ideas to promote politics and he published them in 1939 and 1940. After the war, it is thought that he went to all the European libraries that held copies of this journal (Anyone interested in hunting it should look for copies of Abhandlungen und Berichte aus den Staatlichen Museen für Tierkunde und Völkerkunde in Dresden 20:1-15.) and purged them of his article. According to Nowak, he even managed to get his hands (and scissors) on copies held in Moscow and Leningrad!  

The Dresden museum was also home to the German ornithologist Adolf Bernhard Meyer (1840–1911). In 1858, he translated the works of Charles Darwin and Alfred Russel Wallace into German and introduced evolutionary theory to a whole generation of German scientists. Among Meyer's amazing works is a series of avian osteological works which uses photography and depicts birds in nearly-life-like positions (wonder how it was done!) - a less artistic precursor to Katrina van Grouw's 2012 book The Unfeathered Bird. Meyer's skeleton images can be found here. In 1904 Meyer was eased out of the Dresden museum because of rising anti-semitism. Meyer does not find a place in Nowak's book.

Nowak's book includes entries on the following scientists: (I keep this here partly for my reference as I intend to improve Wikipedia entries on several of them as and when time and resources permit. Would be amazing if others could pitch in!).
In the first of his "recollection papers" (his 1998 article) Nowak writes about the reason for writing them - noticing that the obituary for Prof. Ernst Schäfer  was a whitewash that carefully avoided any mention of his wartime activities. And this brings us to India. In a recent article in Indian Birds, Sylke Frahnert and coauthors have written about the bird collections from Sikkim in the Berlin natural history museum. In their article there is a brief statement that "The  collection  in  Berlin  has  remained  almost  unknown due  to  the  political  circumstances  of  the  expedition". This might be a bit cryptic for many but the best read on the topic is Himmler's Crusade: The true story of the 1939 Nazi expedition into Tibet (2009) by Christopher Hale. Hale writes: 
He [Himmler] revered the ancient cultures of India and the East, or at least his own weird vision of them.
These were not private enthusiasms, and they were certainly not harmless. Cranky pseudoscience nourished Himmler’s own murderous convictions about race and inspired ways of convincing others...
Himmler regarded himself not as the fantasist he was but as a patron of science. He believed that most conventional wisdom was bogus and that his power gave him a unique opportunity to promulgate new thinking. He founded the Ahnenerbe specifically to advance the study of the Aryan (or Nordic or Indo-German) race and its origins
From there Hale goes on to examine the motivations of Schäfer and his team. He looks at how much of the science was politically driven. Swastika signs dominate some of the photos from the expedition - as if it provided for a natural tie with Buddhism in Tibet. It seems that Himmler gave Schäfer the opportunity to rise within the political hierarchy. The team that went to Sikkim included Bruno Beger. Beger was a physical anthropologist but with less than innocent motivations although that would be much harder to ascribe to the team's other pursuits like botany and ornithology. One of the results from the expedition was a film made by the entomologist of the group, Ernst Krause - Geheimnis Tibet - or secret Tibet - a copy of this 1 hour and 40 minute film is on YouTube. At around 26 minutes, you can see Bruno Beger creating face casts - first as a negative in Plaster of Paris from which a positive copy was made using resin. Hale talks about how one of the Tibetans put into a cast with just straws to breathe from went into an epileptic seizure from the claustrophobia and fear induced. The real horror however is revealed when Hale quotes a May 1943 letter from an SS officer to Beger - ‘What exactly is happening with the Jewish heads? They are lying around and taking up valuable space . . . In my opinion, the most reasonable course of action is to send them to Strasbourg . . .’ Apparently Beger had to select some prisoners from Auschwitz who appeared to have Asiatic features. Hale shows that Beger knew the fate of his selection - they were gassed for research conducted by Beger and August Hirt.
SS-Sturmbannführer Schäfer at the head of the table in Lhasa

In all, Hale makes a clear case that the Schäfer mission had quite a bit of political activity underneath. We find that Sven Hedin (Schäfer was a big fan of him in his youth. Hedin was a Nazi sympathizer who funded and supported the mission) was in contact with fellow Nazi supporter Erica Schneider-Filchner and her father Wilhelm Filchner in India, both of whom were interned later at Satara, while Bruno Beger made contact with Subhash Chandra Bose more than once. [Two of the pictures from the Bundesarchiv show a certain Bhattacharya - who appears to be a chemist working on snake venom at the Calcutta snake park - one wonders if he is Abhinash Bhattacharya.]

My review of Nowak's book must be uniquely flawed as  I have never managed to access it beyond some online snippets and English reviews.  The war had impacts on the entire region and Nowak's coverage is limited and there were many other interesting characters including the Russian ornithologist Malchevsky  who survived German bullets thanks to a fat bird observation notebook in his pocket! In the 1950's Trofim Lysenko, the crank scientist who controlled science in the USSR sought Malchevsky's help in proving his own pet theories - one of which was the ideas that cuckoos were the result of feeding hairy caterpillars to young warblers!

Issues arising from race and perceptions are of course not restricted to this period or region, one of the less glorious stories of the Smithsonian Institution concerns the honorary curator Robert Wilson Shufeldt (1850 – 1934) who in the infamous Audubon affair made his personal troubles with his second wife, a grand-daughter of Audubon, into one of race. He also wrote such books as America's Greatest Problem: The Negro (1915) in which we learn of the ideas of other scientists of the period like Edward Drinker Cope! Like many other obituaries, Shufeldt's is a classic whitewash.  

Even as recently as 2015, the University of Salzburg withdrew an honorary doctorate that they had given to the Nobel prize winning Konrad Lorenz for his support of the political setup and racial beliefs. It should not be that hard for scientists to figure out whether they are on the wrong side of history even if they are funded by the state. Perhaps salaried scientists in India would do well to look at the legal contracts they sign with their employers, especially the state, more carefully. The current rules make government employees less free than ordinary citizens but will the educated speak out or do they prefer shackling themselves. 

Postscripts:
  • Mixing natural history with war sometimes led to tragedy for the participants as well. In the case of Dr Manfred Oberdörffer who used his cover as an expert on leprosy to visit the borders of Afghanistan with entomologist Fred Hermann Brandt (1908–1994), an exchange of gunfire with British forces killed him although Brandt lived on to tell the tale.
  • Apparently Himmler's entanglement with ornithology also led him to dream up "Storchbein Propaganda" - a plan to send pamphlets to the Boers in South Africa via migrating storks! The German ornithologist Ernst Schüz quietly (and safely) pointed out the inefficiency of it purely on the statistics of recoveries!

Tech News issue #22, 2020 (May 25, 2020)

00:00, Monday, 25 2020 May UTC
TriangleArrow-Left.svgprevious 2020, week 22 (Monday 25 May 2020) nextTriangleArrow-Right.svg
Other languages:
British English • ‎Deutsch • ‎English • ‎Esperanto • ‎Nederlands • ‎Tiếng Việt • ‎español • ‎français • ‎italiano • ‎magyar • ‎polski • ‎português do Brasil • ‎suomi • ‎svenska • ‎čeština • ‎русский • ‎српски / srpski • ‎українська • ‎עברית • ‎العربية • ‎فارسی • ‎ไทย • ‎中文 • ‎日本語 • ‎한국어

weeklyOSM 513

10:33, Sunday, 24 2020 May UTC

12/05/2020-18/05/2020

lead picture

Tracking changes on magOSM 1 | © Magellium, OpenLayers | map data © OpenStreetMap contributors

Mapping

  • Mapillary images have been used by the Ukrainian community to map over a thousand speed bumps in Kiev.
  • muramoto tweeted screenshots of two tools. Street-level POI Viewer displays POIs from OSM and Wikipedia over Mapillary images. He also pointed to another tool that allows the calculation of angles and distances, also based on Mapillary and an OSM basemap. You can use the two values to determine heights with the online calculator provided. The project files are available on GitHub.
  • Pascal Neis drew attention to the high share of paid mappers. He specifically mentioned India, where 8 of the top 10 mappers are working for Facebook.
  • Ty S wants to mark areas that have dangerous dogs and has created a proposal for dog_warning=*.
  • User SteveA wanted to use boundary=administrative for a range of local government entities in Connecticut. This led to long and involved discussions on both the talk-us and tagging mailing lists as to what qualifies as an administrative boundary.
  • Bob Gambrel asked on the talk-us list for advice about mapping snowmobile trails.
  • NetWormKido invites (ru) (automatic translation) everyone to join his initiative on drawing roads to villages in the Privolzhskiy Federal District of Russia, which are currently not connected in OSM to the rest of the world. Task on MapRoulette.
  • User b-unicycling is interested in field names in Ireland. As part of a local archaeological society activity in Kilkenny they have been collecting field names using Field Papers. They have also published a umap showing all existing places in Ireland where the names of individual fields have been added to OSM.

Community

  • fr1 (a user from Russia) conducted (ru) (automatic translation) an experiment. He simultaneously recorded GPS tracks with a regular smartphone and with one of the new generation which uses a two-frequency GPS receiver.
  • The podcast Nodes and Ways published its 3rd episode. This episode features Ciarán Staunton, who is speaking about mapping in Ireland, particularly the #osmIRL_buildings campaign.
  • The COVID-19 pandemic has forced humanity to take a pause in its activities. The Côte d’Azur University, in collaboration with CartONG, invites (fr) (automatic translation) the inhabitants of the planet to add to a map of natural phenomena and solidarity actions that have arisen in this period. Contributors are asked to remember the best too. The project Open map of the global pause was born. Add a photo to it!
  • Rovastar pointed out that daily mapper numbers reached a new peak of 6999 on 12 May. The 7000 barrier was broken two days later for another record high of 7209.
  • Sergey Astahov reflected (ru) (automatic translation), in his diary on GPS receivers, on the movement of lithospheric plates and how this affects OSM.
  • OSM Kosova, in collaboration with FLOSSK, posted on Facebook about a series of virtual workshops held during the past two months about OpenStreetMap and Wikidata with local high school students. They were introduced to the projects and taught how to edit them properly.
  • Valeriy Trubin continues his series of interviews with OSMers. This time he spoke with Dmitry Lebedev (ru) (automatic translation) about using OSM for research and Darafei Praliaskouski (ru) (automatic translation) about the work of the OSM Foundation.

OpenStreetMap Foundation

  • Christoph Hormann (imagico) provided a statistical overview of applications for the OSMF microgrants programme.

Events

  • This year’s AGIT, an Austrian yearly conference and trade fair about geoinformation, will take place (de) (automatic translation) virtually from 6 to 10 July 2020. It is still to be decided if OSM and OSGeo will be featured.
  • The Transatlantic Council of Boy Scouts of America organised a five-day Virtual Mapathon Challenge, allowing Sea Scouts to complete community service requirements by completing tasks on the HOT Tasking Manager.
  • Geomob events organised by OpenCage and Mappery have so far taken place in London, Munich and Barcelona. Since COVID-19 the talks, which always have a geographical background, have taken place on the Internet. Commercial and non-commercial, open and closed source speakers report on their work. The next online conference will take place on 10 June. All are welcome to attend, but places are limited to 100 people, on a first-come, first-served basis. Details of how to sign up, and other news, can be found in the monthly newsletter forthcoming in early June.

Maps

  • Julien Minet introduced OpenArdenneMap a cartoCSS style optimised for several scales of topographic maps. The style is available on github.
  • Jochen Topf’s recent release of the osm2pgsql flex backend has been discussed generally by Adrien Pavie, and very specifically by Styxman, who is interested in rendering bus routes.
  • The University of Heidelberg has stopped operating its tile server Mapsurfer.NET due to organisational difficulties.

switch2OSM

  • The tourist portal (ru) of the Republic of Mordovia (region in Russia) uses OSM as a basemap. Unfortunately, the site does not attribute OSM properly.
  • A team of Russian urbanists has started (ru) a public GIS project (ru) on analysing public transport routes. So far in Moscow (automatic translation) only, but OSM is the basis of their project. At the moment they are also raising (ru) (automatic translation) funds for further development of the project.

Software

  • [1] The French company Magellium announced (automatic translation) on talk-fr a new ‘Tracking changes’ (fr) web portal for the magOSM project. About twenty themes are available, covering metropolitan France for the past 30 days. On the database side it uses PGSQL triggers on osm2pgsql tables to detect and store changes before analysing them. The source code is published under a free licence.
  • We have written earlier about the open source program OpenDroneMap that can be used to assemble orthophotomaps. This article (ru) (automatic translation) explains how to make the app work.

Programming

  • In a blogpost Mikel Maron, Lead Mapbox Community team, co-founder of HOT and OSMF board member, published an interview with the Tasking Manager’s lead developer, Felix Delattre, on technical details and other background information of the new version of HOT’s widely used tool.
  • In March, Paul Norman reported to the QGIS developers that a particular feature (XYZ tile backgrounds) consumes far more slippy map tiles than necessary. QGIS now represents 5% of all tile requests on the main OSM servers. QGIS developer elpaso has filed a pull request with a fix which should be included in the imminent release of QGIS 3.14. The fix also provides backport patches for two earlier versions: 3.10 and 3.12.
  • OpenMapTiles provided an update on recent developments (with the slightly misleading title of ‘The Future of OpenMapTiles Project’) with their software stack. A significant change is moving away from using MapnikVT to a native PostGIS function, ST_AsMVT, which both simplifies the stack and improves performance. They now also run continuous integration tests on the tile output after each code change is integrated.

Releases

  • Translators from Latin America produced a Spanish version of ‘Mapping routes‘, Trufi Association documentation on how to map informal bus routes.
  • HeiGIT, the Heidelberg University’s GIScience Research Group, announced the release of version 1.0 of its API for the history analysis platform for OpenStreetMap called ohsome. The ohsome project aims to make OSM data from the full history of edits more easily accessible.
  • Trail Router, a service which helps users to find new running routes, improved its feature to avoid hills. A blog post details changes that have been made to improve the sensitivity of the option to avoid hills, and a new feature to avoid hills when getting multiple suggestions.
  • Martijn van Exel has fixed his map OSM Then And Now, which compares OSM in early October 2007 with today.

Did you know …

  • … how to map permanent orienteering course markers? A Twitter conversation between Gregory Marler and Ollie O’Brien, orienteer and maintainer of OpenOrienteering Map, provides some useful hints.

OSM in the media

  • The online newspaper New Indian Express reports on how over a thousand volunteer students have been adding to OpenStreetMap through the Mapathon Keralam initiative of the Kerala State IT Mission.

Other “geo” things

  • In the small village of Quiliano (northern Italy) local police had to install (it) (automatic translation) road signs to warn truckers not to follow route instructions from Google Maps, because trucks often get stuck or cause traffic congestion in narrower streets.
  • Freedom of information requests have revealed the official terminology for many parts of bus stops in London. Tim Dunn summarises the key points visually on Twitter.
  • Peter Rushforth informed us about the re-opened call for positions or presentations for the W3C – OGC online workshop on standardising maps. The event is planned for the week from 21 September to 2 October 2020 and will be held in a format that allows global participation.
  • The website IanVisits features an article about a map of London street trees. The TreeTalk map helps you to answer the question ‘What kind of tree is that?’ It is not obvious from the map, but the data comes from the Greater London Datastore which published open data on street trees back in 2016.
  • The Guardian interviewed the Slovakian graphic designer Martin Vargic, who has created nice fictional maps including among others: ‘Britannia Under the Waves‘, ‘Map of Literature‘, ‘Map of Festivals‘, and ‘Map of Common Foods‘.
  • The Guardian presents five of the best online map apps.
  • Ride with gps announced that Garmin developed Varia to create a safer cycling environment. Varia is a first-of-its-kind rearview bike radar and smart bike light system that warns cyclists of vehicles approaching from behind, while also alerting approaching vehicles of a cyclist ahead. Ride with gps users now have the ability to connect these Garmin units with their Ride with GPS mobile apps.
  • Russian mobile operator Beeline has launched (automatic translation) a geoplatform ‘Save the bees(ru). With this platform they want to introduce (ru) landowners and bee keepers to each other so they can exchange information. This would help to prevent the death of bees from chemicals used in fields.
  • More than three thousand new petrol (gas) stations were added (ru) (automatic translation) to Yandex.Zapravki, a service which allows you to pay for your fuel without leaving your car.

Upcoming Events

Where What When Country
Düsseldorf Düsseldorfer OSM-Stammtisch 2020-05-27 germany
Biella Incontro mensile 2020-05-30 italy
London Missing Maps ONLINE London Mapathon 2020-06-02 united kingdom
Stuttgart Stuttgarter Stammtisch 2020-06-03 germany
Arlon Atelier ouvert OpenStreetMap 2020-06-03 belgium
Rennes Réunion mensuelle 2020-06-08 france
Taipei OSM x Wikidata #17 2020-06-08 taiwan
Lyon Rencontre mensuelle 2020-06-09 france
Munich Münchner Treffen 2020-06-11 germany
Zurich 117. OSM Meetup Zurich 2020-06-11 switzerland
Berlin 144. Berlin-Brandenburg Stammtisch 2020-06-12 germany
Cape Town HOT Summit 2020-07-01-2020-07-02 south africa
Kandy 2020 State of the Map Asia 2020-10-31-2020-11-01 Sri Lanka

Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please check your event in our public calendar preview and correct it, where appropriate.

This weeklyOSM was produced by AnisKoutsi, NunoMASAzevedo, PierZen, Polyglot, Rogehm, SK53, Silka123, SunCobalt, TheSwavu, YoViajo, derFred.

Today, the Wikimedia Foundation Board of Trustees voted to ratify new trust and safety standards for Wikipedia and all other Wikimedia projects. The standards, as outlined in a new Community Culture Statement, provide direction and priority to address harassment and incivility within the Wikimedia movement and create welcoming, inclusive, harassment-free spaces in which people can contribute productively and debate constructively.

Specifically, the Board has tasked the Foundation with:

  • Developing and introducing, in close consultation with volunteer contributor communities, a universal code of conduct that will be a binding minimum set of standards across all Wikimedia projects;
  • Taking actions to ban, sanction, or otherwise limit the access of Wikimedia movement participants who do not comply with these policies and the Terms of Use;
  • Working with community functionaries to create and refine a retroactive review process for cases brought by involved parties, excluding those cases which pose legal or other severe risks; and
  • Significantly increasing support for and collaboration with community functionaries primarily enforcing such compliance in a way that prioritizes the personal safety of these functionaries.

 

The Board’s statement formalizes years’ of longstanding efforts by individual volunteers, Wikimedia affiliates, Foundation staff, and others to stop harassment and promote inclusivity on Wikimedia projects.

Please see the Board’s Community Culture Statement below and on Meta-Wiki.

Statement on Healthy Community Culture, Inclusivity, and Safe Spaces

Harassment, toxic behavior, and incivility in the Wikimedia movement are contrary to our shared values and detrimental to our vision and mission. They negatively impact our ability to collect, share, and disseminate free knowledge, harm the immediate well-being of individual Wikimedians, and threaten the long-term health and success of the Wikimedia projects. The Board does not believe we have made enough progress toward creating welcoming, inclusive, harassment-free spaces in which people can contribute productively and debate constructively.

In recognition of the urgency of these issues, the Board is directing the Wikimedia Foundation to directly improve the situation in collaboration with our communities. This should include developing sustainable practices and tools that eliminate harassment, toxicity, and incivility, promote inclusivity, cultivate respectful discourse, reduce harms to participants, protect the projects from disinformation and bad actors, and promote trust in our projects.

Specifically, the Foundation shall:

  • Develop and introduce a universal code of conduct (UCoC) that will be a binding minimum set of standards across all Wikimedia projects.
    • The first phase, covering policies for in-person and virtual events, technical spaces, and all Wikimedia projects and wikis, and developed in collaboration with the international Wikimedia communities, will be presented to the Board for ratification by August 30, 2020.
    • The second phase, outlining clear enforcement pathways, and refined with broad input from the Wikimedia communities, will be presented to the Board for ratification by the end of 2020;
  • Take actions to ban, sanction, or otherwise limit the access of Wikimedia movement participants who do not comply with these policies and the Terms of Use;
  • Work with community functionaries to create and refine a retroactive review process for cases brought by involved parties, excluding those cases which pose legal or other severe risks; and
  • Significantly increase support for and collaboration with community functionaries primarily enforcing such compliance in a way that prioritizes the personal safety of these functionaries.

 

Until such directives are implemented, the Board instructs the Foundation to adopt and implement policies for reducing harassment and toxicity on our projects and minimizing legal risks for the movement, in collaboration with communities whenever practicable. Until these two phases of the UCoC are complete and operational an interim review process involving community functionaries will be in effect. In this interim period, the Product Committee of the Board of Trustees will also advise the Trust & Safety team.

To that end, the Board further directs the Foundation, in collaboration with the communities, to make additional investments in Trust & Safety capacity, including but not limited to: development of tools needed to assist our volunteers and staff, research to support data-informed decisions, development of clear metrics to measure success, development of training tools and materials (including building communities’ capacities around harassment awareness and conflict resolution), and consultations with international experts on harassment, community health and children’s rights, as well as additional hiring.

The above efforts will be undertaken in coordination and collaboration with appropriate partners from across the movement, seek to increase effective community governance of conduct and behavioral standards, and reduce the long-term need of the Foundation to act. It is the shared goal of the Board and Foundation that these efforts advance a sustainable Wikimedia movement and support, rather than substitute, effective models of community governance.

We urge every member of the Wikimedia communities to collaborate in a way that models the Wikimedia values of openness and inclusivity, step forward to do their part to create a safe and welcoming culture for all, stop hostile and toxic behavior, support people who have been targeted by such behavior, assist good-faith people learning to contribute, and help set clear expectations for all contributors.

The Depicts

17:00, Friday, 22 2020 May UTC

So Structured Data on Commons (SDC) has been going for a while. Time to reap some benefits!

Besides free-text image descriptions, the first, and likely most used, element one can add to a picture via SDC is “depicts”. This can be one or several Wikidata items which are visible (prominently or as background) on the image. Many people have done so, manually or via JavaScript- or Toolforge-based mass editing tools.

This is all well and good, but what to do with that data? It can be searched for, if you know the magic incantation for the search engine, but that’s pretty much it for now. A SPARQL query engine would be insanely useful for more complex queries, especially if it would work seamlessly with the Wikidata one, but no usable, up-to-date one is in sight so far.

Inspired by a tweet by Hay, and with some help from Maarten Dammers, I found a way to use SDC “depicts” information in my File Candidates tool. It suggests files that might be useful to add to specific Wikidata items.

Now, since proper SDC support is … let’s say incomplete at the moment, I had to go a bit off beaten path. First, I use the “random” sort in the Commons API search for files with a “depicts” statement. That way, I get 50 such files with one query. Then, I use the wikibase API on Commons to get the structured data for these files. The structured data contains the information which Wikidata item(s) each file depicts.

Armed with these Wikidata item IDs, I use the database replicas on Toolforge to retrieve the subset of items that (a) have no image (P18), (b) have P31 “instance of”, (c) have no P279 “subclass of”, and (d) do not link to any of a number of “unsuitable” items (eg. templates or given names). For that subset, I get the files the items use, eg as a logo image (to not suggest their usage with the item), and then I add an entry to the database that says “this item might use this image”, according to the depicts statements in the respective image (Code is here, in case you are interested).

50 files (a restriction imposed by the Commons API) are not much, especially since many images with depicts statements probably are used as an image on the respective Wikidata item. So I do keep running such random requests in the background and collect them for the File Candidates tool. At the time of writing, over 12k such candidates exist.

Happy image matching, and don’t forget to check out the other candidate image groups in the tool (including potentially useful free images from Flickr!).

"Semantic-MediaWiki.org" got a new look

13:00, Friday, 22 2020 May UTC

April 14, 2018

"Semantic-MediaWiki.org" got a new look

"Semantic-MediaWiki.org" the home of Semantic MediaWiki got a new look. The new skin based on Chameleon finally emancipated the website from the standard wiki appearance and aims to provide a professional looking view increasing the user experience. Thanks go to Stephan Gambke, Iván Hernández and Karsten Hoffmeyer for working on this.

“Guadalupe”

11:26, Friday, 22 2020 May UTC

Here’s a story of how I tried to remove a fake story marginally related to COVID-19 from Wikipedia, and, at least for now, achieved the opposite and contributed to its dissemination and perpetuation.

On a BBC-produced podcast (in Russian) I heard a story about Lupe Hernández, a nurse who allegedly invented hand sanitizer. The story was born in a 2012 Guardian article, which was subsequently quoted by viral Facebook posts and a bunch of news sites in Spanish and a bunch of other languages, and even mention in an academic nursing book published by Springer. In the last few months hand sanitizer became more popular than ever, and so the story regained popularity.

When contacted for confirmation, the original Guardian story’s author said that “she couldn’t remember the source, and that her notebooks are in storage facility she currently can’t get to”.

The podcast, as well as a thorough LA Times article, conclude that the whole story is probably an urban legend and that the person probably never existed. No one was even sure whether it’s a woman or a man, even though the original story said “she”.

The podcast did mention that there is a very short Wikipedia article. I proposed it for deletion. The result of the deletion discussion was that the article was kept and renamed to “Lupe Hernández hand sanitizer legend”.

Before it was renamed in the English Wikipedia to be an article about a legend, it was also translated to Spanish and French, as an article about “Guadalupe Hernández”, a female nurse who invented hand sanitizer, even though zero sources say that her name was actually “Guadalupe”. Sure, you can assume that “Lupe” is short for “Guadalupe”, as some imaginative writers did, but why do we do it on Wikimedia sites?

I’m still of the firm opinion that the subject should be completely removed from Wikipedia in all languages, as well as from Wikidata, but there’s only so much I can do about this. If any of you know French or Spanish, can you please make sure the articles in your languages are not too awful, or perhaps consider proposing them for deletion?

And if you think I’m badly wrong about it all, please do tell me, too.

2014 WikiConference USA (Group F) 25 By now dozens of women have stepped into open source via Outreach Program for Women, a paid internship program administered by the GNOME Foundation. I recently asked several of them whether they had been able to transition from intern to volunteer.*

Are you succeeding at continuing to volunteer in your open source project? Or are you running into trouble? I'd love to know how people are doing and whether y'all need help.

When you were an OPW intern, you had a mentor and you had committed to a specific project for three months. Volunteering is freer -- you can change your focus every week if you want -- but the training wheels are gone and you have to steer yourself.

(I bet Google Summer of Code alumni have similar experiences.)

I got several answers, and in them I saw some common problems to which I suggest solutions.

  1. Problem: seems as though there are no more specific tasks to do within your project. Solutions: ask your old mentor what they might like you to do next. If they don't respond within 3 days, repeat your question to the mailing list for your open source project. Or switch to another open source project, maybe one your friends are working on!
  2. OPW mentors and interns at Wiki Conference USA 2014 Problem: finding the time. Solutions: set aside a weekly appointment, just as you might with a therapist or an exercise class. Pair up with someone else from the OPW alum list and set yourself a task to complete during a one-hour online sprint! Or if you know your time is being eaten up by your new job, set yourself a reminder for 3 months from now to check whether you have more free time in December.
  3. Problem: loneliness. Solutions: talk more in the #opw chat channel on GNOME's IRC (irc.gnome.org). Use http://www.pairprogramwith.me/ and http://lanyrd.com/ and https://lwn.net/Calendar/ to find get-togethers in your area, or launch one using http://hackdaymanifesto.com/ and http://meetup.com/.
  4. Karen Sandler, GNOME and OPW advocate. Problem: motivation. Solutions: consider the effects you're having in the world. Or focus on the bits of work you enjoy for their own sake, whatever those are. Or teach others the things you know, and see the light spark in their eyes.

These are tips for the graduating interns themselves; it would be good for someone, maybe me, to also write a list of tips for the organizers and mentors to nurture continued participation.


* OPW also provides a list of paid opportunities for alumni.

The Persistence of Poverty, which is today’s episode of NPR’s famous “Indicator” podcast, made me think of how small things that happened long ago in the history of Wikipedia and other Wikimedia wiki sites still affect us, for better or worse.

Here are some examples.

Example one: People didn’t want to have full copies of historical documents on the English Wikipedia, because they are not encyclopedic articles. So they created a whole separate wiki for it, called “Primary Sources Wikipedia”: “ps.wikipedia.org”. It turned out that this would be the URL for the Wikipedia in the Pashto language, which has the ISO 639 code “ps”, so it was renamed to Wikisource, becoming Wikimedia’s first non-Wikipedia wiki. The movement wasn’t even called “Wikimedia” then—the organization was created later. Later, Wiktionary, Wikibooks, WikiCommons, and other projects joined. And Wikisource and all of these other projects are awesome, but now this also has the side effect of having to have some challenging discussions between the Foundation and the community about how non-Wikipedia wikis should be branded in the long term.

Example two: A French Wikipedia editor who is curious about Ancient Egypt wanted to insert Egyptian hieroglyphics into Wikipedia articles, and he happened to know some PHP, so he wrote the Wikihiero extension, which is installed on all the wikis. Because it’s an extension that adds its own wiki syntax, Visual Editor shows a button to insert Hieroglyphics on every page, including the page about Astronomy on Wikiversity, which doesn’t have much to do with Ancient Egypt. This is not bad—this is mostly very good. What is bad is that the Visual Editor doesn’t have a button to insert infoboxes or “citation needed” tags, even though they are far more common than hieroglyphics, because they are implemented as templates and not as PHP, and Visual Editor handles all templates as one generic type of object. (If you are wondering how can this get fixed, the first necessary step in that direction is described on the page Global templates on mediawiki.org.)

Example three: Some people didn’t like that too many wikis are created in new languages and stay inactive, so they wanted a proper way to prove that people plan to be active editors. So they created the “Incubator” wiki, where people would show they are serious by writing the first bunch of articles. For various technical reasons, using it was more difficult than using a usual Wikipedia, but they probably quietly assumed that everybody who wants to create a Wikipedia in a new language is experienced in editing Wikipedia in English or Italian or some other big language, so almost no one ever bothered to improve it. By now, we know that that assumption was tragically wrong: most people who want to create a Wikipedia in a new language are not experienced in editing in other languages, so they are newest and the least experienced editors, but they get the most complicated user interface. (If you are wondering how can this get fixed, see this page on Phabricator.)

Yes, I’m oversimplifying all of these stories for brevity. And I’m not implying any malice or negligence in any of the cases here. These were good people with good intentions, who made assumptions that were reasonable for the time.

It’s just a shame that the problems they created are proving more difficult to fix as the time goes by.

Most could tell you the significance of Hiroshima and Nagasaki: the first usage of nuclear weapons in warfare. But many would be surprised to learn that the US continued to drop nuclear bombs on islands of the Pacific, long after World War II was finished. Students in the Japanese Environmental History class taught by Dr. Elyssa Faison at University of Oklahoma collaborated to enhance Wikipedia’s coverage of one such incident. In 1954, twenty-three Japanese sailors set out to catch tuna. While their ship, the Daigo Fukuryū Maru (the Lucky Dragon No. 5) was near the Marshall Islands, the sky started glowing in the west, and ash fell like snow from the sky. Unbeknownst to the sailors, and despite being outside of the US-declared “danger zone”, the fishermen had just been exposed to the radioactive fallout of a nuclear test, one that was more than twice as powerful as it had been intended. The sailors immediately fell ill with radiation poisoning; one would later die from the exposure while the other twenty-two men were hospitalized for over a year. One sailor Oishi Matashichi, had a stillborn child and later developed liver cancer, both of which he attributed to his radiation poisoning. The ship itself remained highly radioactive at first, with radiation detectable from one hundred feet away.

Daigo Fukuryū Maru, shortly before the 1954 nuclear incident. (Public domain)

Students made substantial revisions to Daigo Fukuryū Maru, adding detail about the health effects to the surviving fishermen and the response of the US government, which was initially denying culpability and claiming that the fishermen were actually spies. The US eventually paid Japan more than 15 million dollars in reparations. The fate of the Daigo Fukuryū Maru was also added: initially purchased by the Japanese government, by 1970 the Lucky Dragon No. 5 was sitting in a garbage-filled canal. It was then pulled from the water and put on public display in Tokyo as a symbol of opposition to nuclear weapons. Students even created a brand new biography of survivor Oishi Matashichi, who went on to become an author advocating for nuclear disarmament, attending a 2015 memorial service on the Marshall Islands for the victims of the nuclear testing at Bikini Atoll.

By writing this information into Wikipedia, the students have shared the story of the Lucky Dragon No. 5 with a global audience, helping thousands to understand the far-reaching ripples of nuclear testing in the Pacific.


Interested in incorporating a Wikipedia writing assignment into a future course? Visit teach.wikiedu.org for all you need to know to get started.

By Arturo Borrero Gonzalez and Brooke Storm, Wikimedia Cloud Services

One of the most successful and important products provided by the Wikimedia Cloud Services team at the Wikimedia Foundation is Toolforge. Toolforge is a platform that allows users and developers to run and use a variety of applications that help the Wikimedia movement and mission from the technical point of view in general. Toolforge is a hosting service commonly known in the industry as a Platform as a Service (PaaS). Toolforge is powered by two different backend engines, Kubernetes and GridEngine

This article focuses on how we made a better Toolforge by integrating a newer version of Kubernetes and, along with it, some more modern workflows.

The starting point in this story is 2018. Yes, two years ago! We identified that we could do better with our Kubernetes deployment in Toolforge. We were using a very old version, v1.4. Using an old version of any software has more or less the same consequences everywhere: you lack security improvements and some modern key features.

Once it was clear that we wanted to upgrade our Kubernetes cluster, both the engineering work and the endless chain of challenges started.

It turns out that Kubernetes is a complex and modern technology, which adds some extra abstraction layers to add flexibility and some intelligence to a very old systems engineering need: hosting and running a variety of applications. 

Our first challenge was to understand what our use case for a modern Kubernetes was. We were particularly interested in some key features:

  • The increased security and controls required for a public user-facing service, using RBAC, PodSecurityPolicies, quotas, etc.
  • Native multi-tenancy support, using namespaces
  • Advanced web routing, using the Ingress API

Soon enough we faced another Kubernetes native challenge: the documentation. For a newcomer, learning and understanding how to adapt Kubernetes to a given use case can be really challenging. We identified some baffling patterns in the docs. For example, different documentation pages would assume you were using different Kubernetes deployments (Minikube vs kubeadm vs a hosted service). We are running Kubernetes like you would on bare-metal (well, in CloudVPS virtual machines), and some documents directly referred to ours as a corner case.

During late 2018 and early 2019, we started brainstorming and prototyping. We wanted our cluster to be reproducible and easily rebuildable, and in the Technology Department at the Wikimedia Foundation, we rely on Puppet for that. One of the first things to decide was how to deploy and build the cluster while integrating with Puppet. This is not as simple as it seems because Kubernetes itself is a collection of reconciliation loops, just like Puppet is. So we had to decide what to put directly in Kubernetes and what to control and make visible through Puppet. We decided to stick with kubeadm as the deployment method, as it seems to be the more upstream-standardized tool for the task. We had to make some interesting decisions by trial and error, like where to run the required etcd servers, what the kubeadm init file would look like, how to proxy and load-balance the API on our bare-metal deployment, what network overlay to choose, etc. If you take a look at our public notes, you can get a glimpse of the number of decisions we had to make.

Our Kubernetes wasn’t going to be a generic cluster, we needed a Toolforge Kubernetes service. This means we don’t use some of the components, and also, we add some additional pieces and configurations to it. By the second half of 2019, we were working full-speed on the new Kubernetes cluster. We already had an idea of what we wanted and how to do it. 

There were a couple of important topics for discussions, for example:

  • Ingress
  • Validating admission controllers
  • Security policies and quotas
  • PKI and user management

We will describe in detail the final state of those pieces in another blog post, but each of the topics required several hours of engineering time, research, tests, and meetings before reaching a point in which we were comfortable with moving forward.

By the end of 2019 and early 2020, we felt like all the pieces were in place, and we started thinking about how to migrate the users, the workloads, from the old cluster to the new one. This migration plan mostly materialized in a Wikitech page which contains concrete information for our users and the community.

The interaction with the community was a key success element. Thanks to our vibrant and involved users, we had several early adopters and beta testers that helped us identify early flaws in our designs. The feedback they provided was very valuable for us. Some folks helped solve technical problems, helped with the migration plan or even helped make some design decisions. Worth noting that some of the changes that were presented to our users were not easy to handle for them, like new quotas and usage limits. Introducing new workflows and deprecating old ones is always a risky operation.

Even though the migration procedure from the old cluster to the new one was fairly simple, there were some rough edges. We helped our users navigate them. A common issue was a webservice not being able to run in the new cluster due to stricter quota limiting the resources for the tool. Another example is the new Ingress layer failing to properly work with some webservices’s particular options.

By March 2020, we no longer had anything running in the old Kubernetes cluster, and the migration was completed. We then started thinking about another step towards making a better Toolforge, which is introducing the toolforge.org domain. There is plenty of information about the change to this new domain in Wikitech News.

The community wanted a better Toolforge, and so do we, and after almost 2 years of work, we have it!  All the work that was done represents the commitment of the Wikimedia Foundation to support the technical community and how we really want to pursue technical engagement in general in the Wikimedia movement. In a follow-up post we will present and discuss more in-depth about some technical details of the new Kubernetes cluster, stay tuned!

About this post

Featured image credit: El yunque de frente, Alcalá de Henares, Benjamín Núñez González, CC BY-SA 4.0

Tech News issue #21, 2020 (May 18, 2020)

00:00, Monday, 18 2020 May UTC
TriangleArrow-Left.svgprevious 2020, week 21 (Monday 18 May 2020) nextTriangleArrow-Right.svg
Other languages:
Bahasa Indonesia • ‎Deutsch • ‎English • ‎Esperanto • ‎Nederlands • ‎français • ‎italiano • ‎polski • ‎português do Brasil • ‎suomi • ‎svenska • ‎čeština • ‎русский • ‎српски / srpski • ‎українська • ‎עברית • ‎العربية • ‎中文 • ‎日本語 • ‎한국어

Covid-19 Wikipedia pageviews, a first look

23:13, Sunday, 17 2020 May UTC

World events often have a dramatic impact on online services. A past example would be the death of Michael Jackson which brought down Twitter and Wikipedia and made Google believe that they were under attack according to the BBC.

Events like the COVID-19 (Coronavirus) pandemic have less instantaneous affect but trends can still be seen to change. Cloudflare recently posted about some of the internet wide traffic changes due to the pandemic and various government announcements, quarantines and lockdowns.

Currently the main English Wikipedia article for the COVID-19 pandemic is receiving roughly 1.2 million page views per day (14 per second). This article has already gone through 4 different names over the past months, and the pageview rate continues to climb.

Wikipedia pageviews tool showing English Wikipedia COVID-19 pandemic article views up to 21 March 2020 (source)

Interestingly there was a decrease in pageviews throughout February compared to the week after the article was first created and since the continued increase in the pandemic. This decrease in pageviews also lines up with a decrease in general interest according to Google Trends.

Interest over time on Google Trends for Coronavirus – Worldwide, 17/01/2020 – 22/03/2020 (source)

Information is not only available in English, and other language Wikipedia pages are also seeing high and increasing pageviews with Russian leading the way, closely followed by Spanish, German and Chinese.

Taking these other language editions into account we reach roughly 2.4 million daily page views for the pandemic (28 per second), which is double that of the English article alone.

Wikipedia langviews tool showing the top 10 language editions COVID-19 pandemic pages on the 20th march 2020 (source)

A comprehensive list of current Wikipedia article titles relating to COVID-19 in all languages can be generated using the Wikidata Query Service using a fairly simple query. These ~2,500 page titles can then be used to retrieve further pageview data. A snapshot of the list used can be found here.

Looking at this full list of article titles across all language Wikipedias over the last week the topic interest has continued growing, and on the 21st March the topic received 4.5 million page views (52 per second).

COVID-19 related Wikipedia pageviews between the 14th and 21st March 2020

A continued increase in interest can be seen across all continents. The split per continent looks roughly consistent with general Wikipedia viewing figures, though Asia would normally be below North America, and Africa would normally be below South America.

COVID-19 related Wikipedia pageviews split by continent between the 14th and 21st March 2020

Looking at per country trends for the countries with the largest number of pageviews most countries appear to be trending up. Germany appears to have shown the most dramatic increase in interest in the past week. The United States and India have the highest pageviews on a single day. Italy actually appears to be trending down.

COVID-19 related Wikipedia pageviews split by country, where the country made over 100k pageviews a day, between the 14th and 21st March 2020

In order to see trends across all Wikimedia sites for a large number of pages it will be important to account for historical page names of articles. As identified at the top of this post the English Wikipedia article has passed through 4 different names in the past months, as I expect is also the case for other languages. As a result simply generating trend data for the current names misses data before the last name change, which is why the total, continent and country graphs only show the last 1 week.

Notes

  • All “page views” within this post refer to views by real users (excluding web crawlers etc).
  • Final aggregate data & splits by country and continent generated using the WMF Data Lake.
  • A typo in the Wikidata Query Service SPARQL query meant that ~300 page titles (out of ~1,700) were not checked during this blog post.

The post Covid-19 Wikipedia pageviews, a first look appeared first on Addshore.

WBStack 2020 Update 2 (May)

23:10, Sunday, 17 2020 May UTC

WBStack is now in its 7th month with 76 user accounts who have created 226 MediaWiki sites running Wikibase, of which 145 are currently online (81 deleted sites). 295,000 edits have now been made in total, which is an increase of 95,000 in the last month, which roughly equates to 2 edits a minute for the month.

The most active site is currently UniTest which is “a Wikibase sandbox with information about the research ecosystem”. Second and third come School of Design and Hercules Demo.

Screenshot of the WESO UniTest Main Page, 17 May 2020

Updates

I have been keenly listening to the discussions going on in the Telegram group and some of the top requests are now available for you to try out. These include:

  • Default skin customization
  • Optional restrictions for who can register an account (using the ConfirmAccount extension)
  • Wikibase string value maximum length increases

All of which are available on the configuration page for a site.

Screenshot of the wbstack site management page, 17 May 2020

Cradle & Magnus Tools

You can now use Cradle, a tool by Magnus Manske, on your WBStack site. You should find a link in the toolbox on all sites. Cradle is a tool that allows Wikibase users to create new Items using a UI form that can be defined on a wiki page by users, or by using a ShEx.

Cradle also brings the deployment of the WiDaR tool as a dependency, which will likely make bringing more Magnus tools to WBStack easier in the future if there is more demand.

A screenshot showing an example Cradle form for creating and Item with an ISBN value

All of the Magnus Tools (cradle, widar, quickstatements) now also have sessions that remain across service restarts, so you should get logged out less!

General resilience

The platform had one outage in early May which left sites unusable for a period. This was caused by the first batch deletion of Items using the Nuke extension on a site. All deleted Items were sent simultaneously, instead of being batched, to an API endpoint controlling the query service update process which quickly used up all available CPU and memory. This issue has now been fixed and the platform has since seen multiple much larger deletions with no issues.

In order to ensure that registered users can make use of password reset functionality, and have registered with the correct email address, any user that missed verifying their email will now be forced to do so before continuing to interact with their control panel.

Finally

Thanks again to Rhizome, who run their very own Wikibase, for their support paying the Google Cloud bill in the early stages of this project.

Thanks to Magnus Manske for the changes to Cradle that made it easier to deploy for WBStack. I hope to be able to submit a few more changes for it upstream soon.

If you want to give it a whirl, then contact me, or tweet me.

The post WBStack 2020 Update 2 (May) appeared first on Addshore.

WBStack 2020 Update 1

23:10, Sunday, 17 2020 May UTC

WBStack has now been up and running for 6 months. During that time it has helped 70 people create 178 MediaWiki installs running Wikibase, a SPARQL query service and quickstatements, all at the click of a button, with a total of around 200,000 edits across all sites.

The most active site is currently virus-taxonomy.wiki.opencura.com which was developed during the Virtual Biohackathon on COVID-19 as a staging environment for “improving the taxonomy of viruses on Wikidata”. It currently stands at 20,000 edits, around 7000 Items.

Screenshot of the virus-taxonomy Wikibase Main Page, 19 April 2020

Thanks again to Rhizome, who run their very own Wikibase, for their support paying the Google Cloud bill in the early stages of this project.

Updates

2020 has so far seen 135 commit to the currently private git repo (38 a month). Today the git repo hit 1038 commits, the first being on 29 December 2017.

For previous update posts see the 2019 October Introduction, November Review and January 2020 Infrastructure Overview.

MediaWiki & Wikibase

MediaWiki and extensions have all be updated to include the latest security fixes, this is MediaWiki 1.33.3. You can find the release notes here.

MediaWiki has had a large number of commonly used extensions enabled. These include: JsonConfig, Kartographer, Math, Score, PageImages, Scribunto, Cite, TemplateSandbox, CodeEditor, WikiEditor, SecureLinkFixer, Echo, Graph, Poem, TemplateData, AdvancedSearch, ParserFunctions, MobileFrontend, DeleteBatch, MultimediaViewer and EmbedVideo. The MinervanNeue skin was also added.

Wikibase has seen the addition of a few new datatypes that have already been on Wikidata for quite some time, these include Musical Notation and Mathematical Expression.

WikibaseLexeme is also deployed to the platform and enabled by request on some sites. If you want to try out this extension please get in touch as a feature toggle has not yet made its way into the UI.

If you want to read up move on the skins, extensions of data types then I advise that you click one of the many links above.

You’ll also notice a shameless bit of self promotion that has been added to the bottom of all sites. Hopefully we can add this to Wikibase soon (Wikibase task).

For site managers

Site managers in this context are the users that created the site on wbstack.com. Currently that is limited to 1 manager per site, but that will be changing in the future.

Many wbstack.com UI bugs have been fixed, these include confusing form input errors when creating accounts and sites. More content is now included on the main dashboard and, if you forget your password, there is now a password reset flow available from the login page.

Site managers can now set a Logo for the site that will automatically be sized and applied to MediaWiki. And to get rid of all of those pesky test sites, there is finally a delete button!

When creating a site you now also have the option to use a custom domain name.

Screenshot of the wbstack site management page, 19 April 2020

Queryservice

The WBStack query service updater has been a terrible hack since day 1. It was a PHP script wrapping around the main Java updater, and the Java updater would be shelled out to based on events from MediaWiki. This is of course slow, and the JVM for the updater could regularly take 30 seconds to fully initiate, all to send a single update to the backend.

Finally this has been totally rewritten in Java, so you should see less delays to your query service. Though currently specific to WBStack, this multi site updater should make its way into the main Wikidata query git repo for use by others if needed. You can find the current Gerrit patch here.

The query service itself has also seen an expansion to the whitelisted SPARQL endpoints. You can see the full list here.

Under the hood

If you want to know more about how all of the moving parts tie together take a look at my 2019 infrastructure post.

Backups always existed of all sites, however these were taken by hand, now automatic snapshots of all sites are taken every single night!

The main platform is powered by Laravel, and was recently updated from version 5.8 to 6.18. The main wbstack.com UI is written in VueJS using Vuetify which has been updated from version 1.5 to 2.2, which has lead to some UI improvements and layout changes. Other backend services such as Redis, MariaDB, Nginx, and Cert-Manager have also been kept up to date with security fixes and new releases.

The whole platform is powered using Docker images on Kubernetes running in the Google Cloud. One part of deploying to the site involves building docker images. The build pipeline now makes use of Kaniko build caching which dramatically speeds up the time to live for needed changes. The “wikibase-docker” images are not used on WBStack, and the images that are used are not really fit for use outside of the infrastructure. However learnings will continue to filter into “wikibase-docker”.

The future

The future is bright, and although WBStack is still very much in an alpha state the platform has proved itself to be scalable, managale, maintainable and of use to people.

In the last month I presented the idea of WBStack at a remote version of EMWcon (Enterprise MediaWiki con). One of the quotes from the slide deck is:

WMDE work around “Wikibase as a Service” is planned in this area during the second half of 2020.

EMWCon 2020 slides

If you are interested in this effort then please contact the Product manager for Wikibase, Samantha Alipio.

Planned changes

One of the biggest changes which should happen in the next 3 months will be the upgrade from MediaWiki 1.33 to 1.35. This will bring a variety of features including some new special pages, a new REST API and a PHP based Parsoid service. The PHP based parsoid service should allow VisualEditor to more easily be deployed on WBStack (something I have been waiting for).

As well as VisualEditor I am keen to try and deploy some sort of collaborative editor for wiki pages. This could be the VisualEditor “CollabPad”, or some other, possibly new extension.

Login is not currently where I want it to be. MediaWiki “user identity” extensions don’t currently offer the level of flexibility that I would like. This being custom usernames, but easy registration and login making use of common authentication methods such as Google, Twitter or others.

I want to push more settings into the site manager view such site language, site name and default skin selection, and also open up the ability to have a site managed by multiple people. Sites should also be more discoverable on the wbstack website itself.

A documentation hub of some description is also long overdue. This could be added to the main wbstack site, or created as a wiki itself, possibly sitting at wiki.wbstack.com. Dog food is good for you after all!

The post WBStack 2020 Update 1 appeared first on Addshore.

weeklyOSM 512

09:52, Sunday, 17 2020 May UTC

05/05/2020-11/05/2020

lead picture

GraphHopper needs feedback – people can influence the route of GraphHopper. 1 | © GraphHopper | map data © OpenStreetMap contributors

Mapping

  • _PG_ published (ru) (automatic translation) a list of techniques in his diary and tips for using JOSM that he has learned recently. Some things in the list may be new to even seasoned users of JOSM.
  • Jan Michel proposed improving vehicle tagging by including electric_bicycle= and speed_pedelec= into the schema for access rights.
  • Joseph Eisenberg wrote about how the proposal for amenity=motorcycle_taxi was not approved in a wiki vote. He makes a number of points, one which highlights how tag voting is dominated by people from first-world countries.
  • OpenStreetMap contributor IsStatenIsland shows a commendable attention to detail in their account of making the boundary of New York State on Ellis Island more accurate.
  • higa4 describes (ja) (automatic translation) how to analyse and manipulate OSM data tagging with OpenRefine.
  • Micromapping does not stop at playgrounds. In the German forum a discussion started (de) (automatic translation) about the tagging of a tree house in a playground.

Community

  • The portal Ça reste ouvert (fr) is now active in Italy and the user interface has been translated into Italian. restiamoaperti.it (it) allows users to visualise and update businesses’ availability and other specific information during the COVID-19 emergency. All of the data available in the map are based on OSM and are actively updated by the local OSM community. (it) (automatic translation)
  • Maggie Cawley and Alyssa Wright, from OSM US, are interviewed in Sustain Ep. 28, a podcast about sustainable FLOSS.
  • User Silka123 summarised (ru) (automatic translation) a ‘Couchmapping’ project of the Russian community. 37 mappers helped to improve the data for Yegoryevsk, a town in the region of Moscow. This couchmapping was different than previous remote mapping activities as there was much more social interaction, not just conversations but also live streams during the mapping and a lot of knowledge transfer. Hence, it is not surprising that one commenter on Silka123’s blog post describes his personal, positive experience.

OpenStreetMap Foundation

  • Tobias Knerr started a talk on the OSMF mailing list titled: ‘Framework for the foundation’s hiring practices’ and initiated an ongoing discussion.
  • The announcement of the application by Geolibres to become a local chapter of the OpenStreetMap Foundation in Argentina prompted a discussion about the legal barriers to becoming a member that exist in the by-laws of local chapters. Craig Allan noted that only the local chapter in Iceland has unlimited open-door membership.
  • The OSMF Data Working Group published activity reports for the first and second quarters of 2019. The reports disclose noteworthy events of vandalism and copyright violations and are an interesting read, particularly regarding a novel type of vandalism termed ‘Anti-Pokemon’.
  • The minutes of the meeting of the OSMF Licensing Working Group on 9 April have been published.

Events

  • The Programme for the online conference State of the Map 2020 (4 and 5 July) has been published.

Humanitarian OSM

  • HOT presented the new Tasking Manager. The improved frontend with integrated iD editor and mapping roles such as Validators, Mapper and Project Manager are intended to assist collaborative mapping.
  • Sawan Shariar, from the OpenStreetMap Bangladesh Foundation, blogged about his background, HOT’s role in the humanitarian mapping world and his new role as Data Quality Intern for HOT. The hire of a ‘Legendary Mapper (Highly Active)’ in a position intended to care about data quality will hopefully lower the number of complaints about data quality issues.

Open Data

  • Researchers from the University of Southampton have created the first global, open-access, harmonised spatial datasets of wind and solar installations. The data are based on power infrastructure objects in OpenStreetMap. They analysed global distribution and estimated output through a combination of techniques including other third-party external data. The datasets are available in a range of formats: geopackages, shapefiles, or comma-delimited text. Last but not least their data pipeline is documented and can be re-run anytime. It is worth checking in OpenInfraMap whether the power infrastructure around your area is mapped correctly.
  • Mikel Maron, from Mapbox, described how mobility data and map data can be used to analyse the risks which arise from a ‘re-opening’, or the different types thereof, to help communities and policymakers.
  • The OpenStreetMap Croatia team has processed and published two sets of data from the Zagreb City Office for the Strategic Planning and Development of the City, for which permission has been granted for use in OpenStreetMap. These are topographical data (automatic translation) and POI data (automatic translation) for the city.

Programming

  • [1] GraphHopper needs feedback for a new feature where even people without programming or Java knowledge can influence the routes produced by GraphHopper.
  • Jochen Topf continues his blog posts about osm2pgsql. Last month he wrote about the technical details and changes he made to reduce the ‘technical debt’ in the code. He now continues with a blog post about his work on adding ‘flex output’, an output option for osm2pgsql which allows the user to specify the handling, transformation and database storage options for each OSM object.

Releases

  • Joseph Eisenberg announced the release of v5.2.0 of the OpenStreetMap Carto style sheet, which is the style for OSM’s main map. The changes include adding rendering of man_made=goods_conveyor and waterway=canal with tunnel=flooded, removing rendering of residential, unclassified, cycleway, path, and track highway areas, and many more changes, which are as usual listed on GitHub.
  • With v19 Tobias Zwick has released a large update for StreetComplete. The most important additions are statistics and achievements through which users are introduced to OpenStreetMap, its editors, community and related projects. In the release notes, he mentions that this will probably be the last big update for some time.

Did you know …

  • … that Ed Pratt cycled around the world and used maps.me and OSM to navigate his way? The rest of his videos are enjoyable as well.
  • OSM Streak, the gamified web application that encourages you to do small tasks for OpenStreetMap every day? There is also a channel on Telegram named @osm_streak.

Other “geo” things

  • Bloomberg’s CityLab features an article on how forms and functions of maps change during the coronavirus. The article also highlights the efforts of the Library of Congress to collect maps and visualisations of the coronavirus pandemic and includes an interview with John Hessler, from the Library of Congress.
  • These maps reveal how COVID-19 has influenced our mobility patterns.

Upcoming Events

Many meetings are being cancelled – please check the calendar on the wiki page for updates.

Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please check your event in our public calendar preview and correct it, where appropriate.

This weeklyOSM was produced by AnisKoutsi, Nakaner, PierZen, Polyglot, Rogehm, SK53, Sammyhawkrad, SunCobalt, TheSwavu, YoViajo, derFred, hbogner, jinalfoflia.

Last weekend, I attended “Wikimedia Remote Hackathon 2020”. Due to the pandemic period, all the events are being moved to remote.

Like other events, wikimedia hackathon also moved to remote. It was on May 9, 10 2020. Though it was remote, it was well planned and the organizing team took same efforts like the in person hackathon.

All the announcements, planning, communications, various sessions, new telegram channel for new comers, plenty of tools, walks with dogs, quick answers over irc/telegram for any questions, show case/demo, music etc, are really well planned. We can learn a lot from the organizing team on how to conduct a remote hackathon.

Here you can get all the details about the event – https://mediawiki.org/wiki/Wikimedia_Hackathon_2020/Remote_Hackathon

I could not decide what to do till the event start time. On the previous day night, just thought to get the page page count stats of the indic wikipedia sites and showcase them with good graphs, charts etc.

For past few months, I am working on writing custom metrics exporters for prometheus. We can write custom exporter to expose the wikipedia page stats as prometheus metrics. We can setup a prometheus server to scrap the data from the exporter and use grafana to show the graphs.

Good Idea. Right? Now, How to get the page counts for any wikipedia site?

Posted a query on the wiki-tech mailing list.
wikitech-l AT lists.wikimedia.org

See the discussion here
https://lists.wikimedia.org/pipermail/wikitech-l/2020-May/093363.html

This a wonderful place to get answers for any technical queries related to wikipedia.

All wikipedia sites give nice REST API to interact with them.

Basic information on article counts can be fetched from each wiki
using the Action API’s action=query&meta=siteinfo endpoint. See
<https://www.mediawiki.org/wiki/API:Siteinfo&gt; for more information
about this API.

See <https://ta.wikipedia.org/wiki/%E0%AE%9A%E0%AE%BF%E0%AE%B1%E0%AE%AA%E0%AF%8D%E0%AE%AA%E0%AF%81:ApiSandbox#action=query&format=json&meta=siteinfo&siprop=statistics&gt;
for an example usage on tawiki.

This url gives the stats as json
https://ta.wikipedia.org/w/api.php?action=query&format=json&meta=siteinfo&siprop=statistics

With this query, we get the below answer.

{
“batchcomplete”: “”,
“query”: {
“statistics”: {
“pages”: 406044,
“articles”: 129087,
“edits”: 2961233,
“images”: 7758,
“users”: 174664,
“activeusers”: 431,
“admins”: 40,
“jobs”: 0,
“queued-massmessages”: 0
}
}
}

We can parse this and get the required details.

Thats it. After seeing this answer in early morning, I could not sleep. Just woke up. Wrote a custom exporter for these metrics for all indic wikipedia sites.

Here is the code – http://github.com/tshrinivasan/indicwiki_stats_exporter

Here is the phabricator task – https://phabricator.wikimedia.org/T252212

I used a digital ocean droplet server to run the exporter, prometheus and grafana.
Built a dashboard and published the dashboard for public grafana dashboard too. Yey. My first contribution to public grafana dashboards.

https://grafana.com/grafana/dashboards/12265

Here is the grafana dashboard – http://139.59.47.5:3000/d/kx1Pb36Zz/indic-wiki-stats?orgId=1

Image

 

Image

Shared this with wikitech list and telegram groups for the remote hackathon.

Felt good to got an idea and a quicker implementation. Found that wikistats team provides various analytics possibilities here – https://stats.wikimedia.org/#/ta.wikipedia.org/content/pages-to-date/normal|line|2-year|~total|monthly

And here is another site to look for such numbers – https://wikistats.wmflabs.org/display.php?t=wp

Still, this custom exporter and grafana shows a comparing graphs, which is not available anywhere.

Will add more stats for indic wikisource, wikibooks, wikinews, wiktionary sites soon.

Apart from this, I could not join in any of the events, live demos happened on the hackathon. I thought all the live sessions would be recorded. Alas. They were only live. No recording due to inability of the meet.google.com to share the streams with youtube.

I could not attend the showcase event. But saw the event. happy to see the great efforts of other participants.

Thus, the two days remote wikipedia hackathon 2020 came to end. Happy to see that there is my little contribution to this event.

Tons of thanks for the event organizers, Indic Wiki team, Noolaham Foundation for the server I used, wiki tech mailing list and entire wikipedia contributors for making the world a little better.

We’re all quickly learning that in a global pandemic, non-COVID-related healthcare looks very different from what we were accustomed to. This is a new age of telehealth, where people access health services through communication technologies rather than in-person visits. As we hear from our Society of Family Planning (SFP) Wiki Scholars, a group of reproductive health experts, their patients often already come to them with a self-diagnosis they’ve made through Googling or searching Wikipedia. As individuals have less access to in-person visits, they’re likely looking to online resources like these more than ever.

That’s why Wiki Scholars in our current Wikipedia training course sponsored by the Society of Family Planning are rising to the challenge of keeping Wikipedia information on family planning up-to-date.

Urgency has always driven this group to edit Wikipedia, one of the leading sources of health information in the world. When one considers Wikipedia’s daily traffic of hundreds of thousands, the question “What do I want my patients to know right now?” becomes “What do I want the world to know right now?” So one Wiki Scholar went straight to Wikipedia’s page about telehealth and added a new section about what abortioncare looks like and how laws have changed in our new health landscape. They also updated the medical abortion page to include informaton about telehealth access.

The telehealth article has received 900 pageviews every day during the last month, three times its typical traffic before the coronavirus pandemic. It now has a section about teleabortion, thanks to an SFP Wiki Scholar.
Wikipedia’s page about medical abortion receives around 350 pageviews every day. A Wiki Scholar added telehealth information here.

Why does this matter?

Globally, medical information on Wikipedia earned a staggering 4.8 billion page views in 2013.¹  In 2014, Wikipedia was found to be a more popular source of health content than the NIH, WebMD, Mayo Clinic, and others.² And not only do patients use Wikipedia, so do doctors.³

So now, as the world goes to Wikipedia to stay up-to-date about the global pandemic, readers are inevitably turning to the more than 155,000 health-related Wikipedia pages to make decisions about their healthcare in uncertain times. Wikipedia, perhaps now more than ever, is important to keep verifiable, representative, and complete.

As the U.S. Department of Health and Human Services’ Office on Women’s Health urges women to make their health a priority during women’s health week, we’d like to commend the physicians, professors, researchers, and other family planning experts and advocates who can dedicate time to making health information more available to the public.

Who’s doing this work?

We often hear from SFP Wiki Scholars that our course to learn how to improve Wikipedia felt like a natural fit for their personal and professional goals. Patients are already relying on Wikipedia as a supplemental resource and they remark that they also use it in their own lives. They tell us, Wikipedia has the power to:

  • Simplify contraception,
  • Provide the public with up-to-date information on family planning (especially as realities change so quickly with shelter-in-place orders),
  • Connect healthcare providers with their community using language their community uses,
  • Help researchers become better communicators of their expertise using “layman’s” terms.

Through Wikipedia, practitioners can suddenly take part in healthcare conversations happening in public spheres outside the hospital or clinic. “Why make more of our own closed information systems when people already use Dr. Wikipedia?” one Wiki Scholar pointed out. The importance of non-technical, easy-to-understand information cannot be overstated.

So far, SFP has sponsored three courses, training 64 scholars to do this work. These scholars have added almost 60,000 words to high-impact Wikipedia pages about abortion and contraception, reaching 11 million readers. The latest cohort has continued to add well-researched information to a variety of topics, including about the insertion of intrauterine devices (950 daily pageviews) and the procedure of an anomaly scan in pregnancy (750 daily pageviews). One person added an image of a contraceptive diaphragm to the corresponding page, which receives about 250 pageviews every day.

How to get involved

SFP’s Executive Director Dr. Amanda Dennis was already searching for a way to bring more information to Wikipedia when she attended a conference presentation by our Director of Partnerships Jami Mathewson at the American Sociological Society’s 2018 conference. From there, a flourishing partnership between our organizations was born. As conferences and working environments go virtual, fostering these kinds of connections is more important than ever.

If your institution or organization is passionate about equipping the public with information about a particular topic, get in touch. We can help expand your reach through Wikipedia. Jami works personally with organizations to set up Wikipedia training courses that align with their mission. Our partners recognize the value of giving experts the dedicated time and support to do public engagement work, which is why many of them sponsor seats for their members or staff in our courses. To discuss partnering with Wiki Education, contact Jami at jami@wikiedu.org or visit partner.wikiedu.org for more information.


Thumbnail/header icon by Timofey Rostilov from the Noun Project.

By C. Estelle Smith, University of Minnesota

What does it mean to “keep community in the loop” when building algorithms for Wikipedia?

-C. Estelle Smith

Imagine you’ve just created a profile on Wikipedia and spent 27 minutes working on what you earnestly thought would be a helpful edit to your favorite article. You click that bright blue “Publish changes” button for the very first time, and you see your edit go live! Weeee! But 52 seconds later, you refresh the page and discover that your edit has been wiped off the planet. How would you feel if you knew that an algorithm had contributed to this rapid reversion of all your hard work?

For the sake of illustration, let’s say you were editing a “stub” article about a woman scientist you admire. You can’t remember where you read it, but there’s this great story about how she got interested in computing. So, you spend some time writing up the story to improve her mostly empty bio. Clearly, you’re trying to be helpful. But unfortunately, you didn’t cite your source…and boom!—your work gets blown away. Without any way to understand what happened, you now feel snubbed and unwanted. Will you ever edit again?! 😱

Many edits (like yours) are damaging to Wikipedia, even if they were completed in good faith—e.g. missing citations [ ], bad grammars, mis-speled werds, and incorrect {syntax. And then there are plenty of edits that are malicious—e.g. the addition of offensive, racist, sexist, homophobic, or otherwise unacceptable content. All of these examples make it necessary for human moderators (a.k.a. “patrollers”) to review edits and revert (or fix) the bad ones. However, given the massive volume of edits to Wikipedia each day, it’s impossible for humans to review every edit, or even to identify which edits should be reviewed. 

In order to make it possible(-ish) to build and maintain Wikipedia, the community absolutely requires the help of algorithmic systems. But we need these algorithmic systems to be effective community partners (think R2-D2, cheerfully supporting the Rebel Alliance!) rather than AI overlords (think Terminator…being Terminator). How can we possibly design these systems in a way that supports all of its well-intentioned community stakeholders…including patrollers, newcomers, and everyone in between?

Our team of researchers from the University of Minnesota, Carnegie Mellon University, and the Wikimedia Foundation explored this question in our new open access research paper. We used a method called Value-Sensitive Algorithm Design which has three steps: 

(1) Understand community stakeholders’ values related to algorithms.

(2) Incorporate and balance these values across the full span of the ML development pipeline.

(3) Evaluate algorithms based not only on accuracy, but also on their acceptability and broader impacts.

We argue that if you follow these three steps, you can “keep community in the loop” as you build algorithmic systems, making you more likely to avoid catastrophic and community-damaging consequences. Our paper completes the first step of Value-Sensitive Algorithm Design with respect to a prominent machine learning system on Wikipedia called ORES (Objective Revision Evaluation Service).

ORES is a collection of machine learning algorithms which look at textual changes made by humans, and then, produce statistical guesses of how likely the edits are to be damaging. These guesses are continuously fed via API in real-time all across Wikipedia, as editors and patrollers complete their work in parallel. 

For example, one prominent place where ORES’ guesses affect user experience is in the “Recent Changes” feed, which looks like a list that shows every new edit to the encyclopedia chronologically. Patrollers often spend time looking through the Recent Changes list, using a highlighting tool built into the interface. 

If we fed an edit like yours into ORES, it might output guesses like “82% likely to be damaging” and “79% likely to be done in good faith.” The Recent Changes list could use these scores to highlight your edit in red to show that it is “moderately likely to be problematic.” Or, if the patroller wanted, it could highlight your edit in green to show that you likely meant well. 

In either case, both the underlying algorithms of ORES and the highlights they generate majorly impact: (1) how the patroller interacts with your edit, and (2) whether or not you will continue editing in the future. That’s why, in our study, we wanted to understand what values should guide our design decisions with regard to systems like ORES, and how we can balance these values to lead to the best outcomes for the whole community.

We spoke to dozens of ORES stakeholders, including editors, patrollers, tool developers, Wikimedia Foundation employees, and even researchers, in order to systematically identify which values matter to the community. The infographic above summarizes the results. 

For example, one critical value is “Human Authority.” On Wikipedia, the community believes it is vitally important to avoid giving final decision-making authority to the algorithmic system itself. In other words, please, nobody build Terminator! There should never be an algorithm that gets to call the shots and make the final decision about which edits stay, and which edits go. But we do need community partners like R2-D2 to assist with “Effort Reduction” by pointing us in the right direction.

At the same time, the example of your edit shows that along with “Effort Reduction,” we also need to build systems that foster “Positive Engagement.” In other words, ORES should reduce how much work it takes for patrollers to find bad edits, and it also needs to make sure that well-intentioned community members are having positive experiences, even when their edits aren’t up to snuff. 

So, maybe when ORES detects damaging (but good faith) edits in Recent Changes, those edits could receive special treatment. For example, rather than wiping out your red-highlighted edit without explanation, perhaps your edit could be allowed to stay online for just a few extra minutes. Recent Changes could take a hint from Snuggle and direct a patroller to first reach out to you before reverting, provide some scaffolded text like, “Hi @yourhandle! Thanks for making your first edit to Wikipedia! Unfortunately, our algorithm detected an issue… It seems like you meant well, so I wanted to see if you could fix this by adding a citation so that I don’t have to revert it?” 

(Yes, this is challenging the BOLD, Revert, Discuss (B-R-D) paradigm, and suggesting that in some cases, B-D-R may be a more appropriate way to balance community values. Please discuss!)

In the full paper, we share our journey of applying VSAD to understand the Wikipedia community’s values, along with 25 concrete recommendations for developers interested in building ML-driven systems in complex socio-technical contexts. As you navigate community-based moderation, we hope our experiences may shed light on approaches to problems you may be experiencing in your community, as well.


Thanks for reading! Please share your thoughts in the comments, or get in touch with me @fauxneme on Wikipedia.

About this post

Featured image credit: Le pei (pôle entrepreneuriat et innovation) est à viva tech startup connect 2016, Ecole polytechnique Université Paris-Saclay, CC BY-SA 2.0

Crowdsourced Indian geology in the 1800s

13:44, Thursday, 14 2020 May UTC
Crowd might be a bit of a stretch for less than a hundred contributors but George Bellas Greenough (1778 – 1855), one of the founders of the Geological Society of London produced the first geological map of India which was posthumously published in 1855. Greenough was the first president of the Geological Society of London and was reportedly best known for his ability to compile and synthesize the works of others and his annual addresses to the Society were apparently much appreciated. He was however entirely against the idea that fossils could be used to differentiate strata and in that he failed to admire William "Strata" Smith who produced the first geological map of England. One obituarist noted that Greenough was an outspoken critic of theoretical frameworks and a "drag" on the progress of the science of geology!

Not much has been written about the history of the making of the Greenough map of Indian geology - it was begun somewhere in 1853 and was finally published in 1855 and consisted of four sheets and measured 7 by 5¾ foot. A small number of copies were made which are apparently collector items but hardly any are available online for anyone wishing to study the contents. The University of Minnesota has a set of scanned copies of three-fourths of the map but if you want to read it you need to download three large files (each of about 300 MB!) . I decided to stitch together these images and to enhance them a bit and since the image is legally in the public domain (ie. copyright expired), I have placed it on Wikimedia Commons. There really is a research need for examining the motivations for making this map and on how Greenough went about to produce it. He apparently had officers of the East India Company providing him information and he seems to have sent draft maps on which they commented. There is a very interesting compilation of the correspondence that went into the making of this map. It has numerous errors both in geology as well as in the positions and labelling but is definitely something to admire for its period. Thomas Oldham representing the professional GSI in India was particularly critical while heading a committee (that included Henry "Cyclone" Piddington) to examine the map.

On has to lament that nobody has made a nice geological map subsequently that shows interesting regional formations, fossil localities and so on. So much for our human-centricity and recentism. 

Here is a small overview of the 1855 map. You can find and download the whole image on Wikimedia Commons.


You can zoom into this image and enjoy the details by using this viewer that uses the Flash plugin or this one that is Flash-free.
An even higher resolution stitch can be found here (with the zoom-viewer here)


PS: November 8, 2016 - just created an entry in Wikipedia for Henry Wesley Voysey (with the only known portrait of the man when no likeness has been recorded by the Oxford Dictionary of National Biography!) who is wrongly claimed by D T Moore to have made the first geology map of India - covering a part of the Hyderabad region (1821) but the two known copies of that map disappeared from Calcutta and London. An older geology map is by Benjamin Heyne published in 1814.

April 11, 2018 - thanks to David G. Bate, there is now a complete map in the French digital library. The above image is now complete.