September 22, 2017

Wikimedia Foundation

Community digest: Italy gets ready for State of the Map 2018; news in brief

State of the map 2017. Photo by Takehiro Watannabe‎, CC BY-SA 4.0.

Milan, Italy will be the host for State of the Map (SOTM), the annual OpenStreetMap community gathering, in 2018.

The conference will be organized by Wikimedia Italy (Italia) and the Polytechnic University of Milan (Politecnico di Milano, PDM), the largest technical university in Italy, after they submitted a joint proposal.

The event will take place from 28 to 30 July 2018 and will be hosted on PDM’s university campus. It will bring together at least 400 participants from 5 continents, including several community members from emerging communities supported by the SOTM Scholarship Program.

PDM has put great effort in fostering the advancement of OpenStreetMap in recent years, developing projects based on OSM and promoting training, research activities, mapathons and initiatives like PoliMappers.

State of the Map 2018 will aim to enhance the collaboration between the Wikimedia and OpenStreetMap communities, who will join forces to showcase their efforts to promote open and free culture.

Wikimedia Italy started laying the groundwork for the event by working with the Wikimedia and OSM communities in Italy. This included a joint effort with Fondazione BEIC to geocode the photos of the Paolo Monti photographic archive, released under a free license by Fondazione BEIC on Wikimedia Commons. The photos helped make a prototype map that shows how libraries and heritage institutions could use OpenStreetMap with Wikimedia projects to provide an alternative way of browsing their collections. At Wikimania 2017, the project was called out as one of the “coolest projects” developed by an independent Wikimedia chapter.

And Wikimedia Italy is continuing to support these data-growing projects: one of our main objectives for 2017 is to increase the coverage of house numbers and street names to facilitate routing. Between January and April 2017, we reached a big goal by adding over a million house numbers for Emilia-Romagna.

In the next few weeks, Wikimedia Italy and PDM will start event organization work with the SOTM Committee: one of the first steps in our plan is launching a contest to choose a logo and build visual identity guidelines for the event. Conference updates will be available on the SOTM and Wikimedia Italy websites.

Francesca Ussani, Communications Manager
Wikimedia Italia

In brief

2018 Wikimedia Conference dates are announced: Wikimedia Germany (Deutschland, WMDE) has announced the dates and plans for hosting the 2018 Wikimedia Conference. WMDE is the independent chapter that supports the Wikimedia movement in the country, and the conference is the annual meeting for all Wikimedia organizations around the world; the chapter has expressed their intention to continue hosting and supporting the conference in Berlin. A report about the learned lessons from hosting the conference in Berlin in 2015–2017 will be published by Wikimedia Germany by the end of October. More details can be found on Wikimedia-l.

Editathon on the Western Ghats biodiversity in India: The Malayalam Wikipedia community and the College of Forestry, in Kerala Agricultural University, have hosted an editing workshop that aims at improving the Malayalam Wikipedia content on on Biodiversity of Western Ghats in Malayalam. The organizers shared their experience with the event.

2018 Wikimedia developer summit basic plans announced: The Wikimedia Developer Summit 2018 will be held on 22 and 23 January in San Francisco, California. The program and call for participation will be shared with the public shortly; the organizers have shared their basic plans and are inviting the developer community to share their thoughts to help make a successful meetup. “We invite technologists, managers and users to study, reflect and propose ways to support the strategic vision we are committed to. We would like you to capture your thoughts in a short position statement and join the conversation,” says Victoria Coleman, the Foundation’s Chief Technology Officer, in an email to Wikimedia-l.

Armenian Wikipedia milestone: Last week, the Armenian Wikipedia community celebrated their 230,000th article on Wikipedia. The Armenian Wikipedia community and Wikimedia Armenia, the independent Wikimedia chapter, have been exerting great efforts to recruit and support new participants in their community, primarily, their WikiCamps, which mixed editing and fun to encourage young learners to participate in editing Wikipedia.

Affiliations update: Last week, the Wikimedia Affiliations Committee (AffCom) announced the de-recognition of Wikimedia Macedonia, a now-former Wikimedia independent chapter. The chapter had been notified by AffCom in February about the guidelines and requirements they need to go through to keep their chapter recognition. Details on that decision can be found on Wikimedia-l and information about movement affiliation de-recognition can be found on Meta-Wiki.

New board for Wikimedia Argentina: Last Saturday, the General Assembly of Wikimedia Argentina convened to hold the chapter’s board elections. More details on the election results can be found in an email from Anna Torres, the chapter’s executive director on Wikimedia-l mailing list.

Compiled and edited by Samir Elsharbaty, Digital Content Intern
Wikimedia Foundation


by Francesca Ussani and Samir Elsharbaty at September 22, 2017 04:23 PM

Weekly OSM

weeklyOSM 374



Please help in the humanitarian mapping tasks for the earthquake in Mexico (2017-09-19). 1


  • Contributor Cascafico has their own instance of OSM Analytic Tracker for Northeast Italy running on an Orange Pi minicomputer. This is a changeset observer with its own graphical visualisation of the changesets (example).
  • Tom Pfeifer asks on the Tagging mailing list if the contact:*=* tags are being misused for review websites, such as TripAdvisor and Yelp.
  • Contributor dega reports (FR) on talk-ca that Telenav added nonexistent roads. These planned roads were stopped by Environment Canada. In the same changeset, motorway temporary deviations have also been added. These problems seems to be related to interpretation of road navigation data coming out of some softwares. To be use very carefully!
  • User GOwin reports about HailHydra(nt)!, an initiative he launched in the Davao region (Philippines), in collaboration with the local fire officials. Its goal is the mapping of fire hydrants, fire stations and volunteer brigades. The tools used during the mapathon were OSMHydrant and OsmAnd.
  • Giacomo Servadei continued to work on the PT_Assistant plugin for JOSM, which makes editing public transport on OpenStreetMap a lot more convenient. It provides visual feedback, validation and some extra tools. Now, the plugin also reports to the user any problems with continuity in hiking and bicycle-route relations. And it helps with adding forward/backward roles where these routes fork.
  • User praktikyadav from Mapbox reported in his user blog a major update to the Mapbox satellite-imagery base-map layer.
  • The proposed feature, Language Information, (we reported earlier), has been rejected.
  • Simon Poole reported on a major update of the editor presets for suggesting the names of shops and other branded businesses. He requests that users check the suggestions in non-Latin alphabets, as many more have been added.


  • Visit downtown Tarija, a Bolivian city (South America) with images you can use to improve maps.
  • Marena Brinkhurst tweets saying Imagining 2020 headlines about a poster stating ‘No maps are missing’. Let’s hope that it will be added «They are exact and of great quality».
  • Dave Corley rebooted the idea of an OSM Ireland chapter.
  • Yuri Astrakhan starts the name search for his Wikidata+OSM SPARQL query service. The JOSM Wikipedia plugin was extended. Once installed, you can download data using a SPARQL query, combining conditions based on OSM and Wikidata details.
  • Jo is looking for students with knowledge of Java and the OSM data model. Interest in route relations would be an advantage.
  • This month’s “Mapper of the Month” in Belgium is Jonathan Beliën aka jbelien. Read the interview.


  • Statistic Canada hosted a mapping workshop during the HOT Summit 2017. They presented the results of the Pilot project with the local OSM community in Ottawa-Gatineau to import building footprints. The aim is to map every Canadian building by 2020. StatCan counts on municipalities to release building footprints as Open Data.
  • Vincent Frison plans to import building heights in Nice, France, following the same process he used for Paris two years ago. He asks for feedback on Talk-FR and on the French forum.

OpenStreetMap Foundation

  • On the Talk mailing list, the discussion on the draft directive about the use of the trademark “OpenStreetMap” (“Trademark Policy”) has resumed (we reported). Among others, this directive concerns those whose projects are based on “OpenStreetMap” or “OSM”.


  • The detailed program of SotM-US 2017 in Boulder, Colorado has been published on the event’s website. Mark the dates: October 20th to 22nd!
  • SotM-Asia 2017 will take place in Kathmandu, Nepal on September 23-24. The chosen theme is “from creation to use of OSM data”: the increasing amount of available data about the Asian continent needs further ideas and tools able to process them. More details on registration, location and schedule are available on the event’s website.

Humanitarian OSM

  • Daniel Mietchen writes some detailed notes, including feedback and critiques, whilst using MapSwipe for the first time.
  • [1] In Mexico, mappers and reviewers are still needed. For current cause: new earthquake, new task.
  • Martin Noblecourt reminds that satellite imagery is not necessarily correctly aligned, and shows how you can deal with the problem.
  • On Talk-US, Jeffrey Ollie mentions that Digital Globe has published post-Harvey and Irma images, promptly made available as tiles by OpenAerialMap.


  • kocio-pl suggests in a pull request to add boundary=protected_area to OSM Carto.
  • math1985 wants to darken the colour of farmlands in OSM Carto and submitted a pull request for that purpose.


  • Pierre Béland (aka pierzen) reports on Talk-CA that the “Canadian Press” agency releases OpenStreetMap maps via the MapBox services.These maps should appear in the in various Canadian newspapers subscribed to the Canadian Press.

Open Data

  • The provincial administration of New Brunswick (Canada) has donated its aerial photos to ESRI.


  • On the Talk mailing list, Nicolas Guillaumin seeks a new maintainer for OSMTracker for Android.
  • StephaneP’s OSM diary shows usage of low-cost GPS RTK with open-source software RTKLIB to create an RTK base station and obtain a few centimeters precision with correction of the satellite signal. To learn more, you can ask questions in the diary.


  • The OpenRouteService backend is now available under the Apache license on Github. As this is a GraphHopper fork, the ORS was eventually converted from its own implementation to a GraphHopper-based system in recent years.
  • During his Google Summer of Code, Bogdan Afonins has developed many improvements for the search and download dialog of JOSM.


  • OSM Carto 4.3.0 fixed mostly bugs.

Did you know …

OSM in the media

  • The New York Times published an article about the humanitarian needs after Hurricane Irma. The maps showing the information are OSM based.
  • The magazine We Demain reports that the French organisation “Hackers Against Natural Disasters (Hand)” has sent equipment worth €30,000 to the area affected by Irma Saint-Martin and Saint-Barthélémy. Among other things, drones were delivered for mapping. (fr) (automatic translation)

Other “geo” things

  • Nothing for nerds, but a nice advertising video for the GraphHopper API.
  • OSM is also frequently used for visualisation of geographically relevant content: here in the macroplastics project of the University of Oldenburg, Germany.
  • Mercedes-Benz decided to use the proprietary and patented What3words geocoding method in their navigation aids. These codes must not be entered into the OSM database due to their restrictive licence policy!

Upcoming Events

Where What When Country
Liguria Wikigita a Santo Stefano Magra, Santo Stefano di Magra, La Spezia 2017-09-23 italy
Tokyo 東京!街歩かない!マッピングパーティ3 2017-09-23 japan
Patan State of the Map Asia 2017 2017-09-23-2017-09-24 nepal
Taipei OpenStreetMap Taipei Meetup, MozSpace 2017-09-25 taiwan
Bremen Bremer Mappertreffen 2017-09-25 germany
Graz Stammtisch Graz 2017-09-25 austria
Berlin 111.1 Berlin-Brandenburg Sonderstammtisch Intergeo 2017-09-25 germany
Salt Lake City OSM Utah GeoBeers 2017-09-26 united states
Scotland Mapathon Missing maps, Glasgow 2017-09-26 united kingdom
Berlin Intergeo 2017 2017-09-26-2017-09-28 germany
Leuven Leuven Monthly OpenStreetMap Meetup 2017-09-27 belgium
Lyon Mapathon missing maps à Lyon à l’atelier des médias, L’atelier des médias 2017-09-28 france
Lübeck Lübecker Stammtisch 2017-09-28 germany
Brisbane West End Mapping Party, West End 2017-09-30 australia
Ise 伊勢河崎でマッピングパーティ 2017-09-30 japan
La Paz Mapas Digitales para Periodistas, El Alto 2017-09-30 bolivia
Turin Viverone mapping party, Viverone, Biella 2017-10-01 italy
Essen Mappertreffen Essen 2017-10-04 germany
Stuttgart Stuttgarter Stammtisch 2017-10-04 germany
Montreal Les Mercredis cartographie 2017-10-04 canada
Dresden Stammtisch Dresden 2017-10-05 germany
Salt Lake City Mapping Night 2017-10-05 united states
Dortmund Mappertreffen Dortmund 2017-10-08 germany
Fukuchi Machi 福智町の歴史・文化まち歩きプロジェクト~自分のチカラで世界中にタカラ発信!~ 2017-10-08 japan
Rennes Réunion mensuelle 2017-10-09 france
Colorado,Boulder State of the Map U.S. 2017 2017-10-19-2017-10-22 united states
Buenos Aires FOSS4G+State of the Map Argentina 2017 2017-10-23-2017-10-28 argentina
Brussels FOSS4G Belgium 2017 2017-10-26 belgium
Lima State of the Map LatAm 2017 2017-11-29-2017-12-02 perú
Bonn FOSSGIS 2018 2018-03-21-2018-03-24 germany

Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please check your event in our public calendar preview and correct it, where appropriate.

This weeklyOSM was produced by Anne Ghisla, Nakaner, PierZen, Polyglot, SK53, SeleneYang, Spanholz, Spec80, YoViajo, derFred, jinalfoflia, techlady.

by weeklyteam at September 22, 2017 02:54 PM

Wikimedia UK

People of the Enlightenment: an opportunity for Wikipedians

Grandjean de Fouchy, who as yet does not have a Wikipedia page. Image CC BY SA via Commons.

By Dr Martin Poulter, Wikimedian in Residence, Bodleian Libraries Oxford

A major scholarly database is offering free access to selected Wikipedians, thanks to an arrangement with Oxford University Press.

Housed at the Bodleian Libraries, Electronic Enlightenment (EE) is is the most wide-ranging online collection of edited correspondence of the early modern period, linking people across Europe, the Americas and Asia from the early 17th to the mid-19th century. It gives access to thousands of short biographies and to 70,000 annotated pieces of correspondence.

EE is already available to many people via subscribing institutions that include universities and public libraries. Still, there are Wikipedians who do not have access but would benefit from it. As a result of the Wikimedian In Residence placement at the University of Oxford, they can now apply for free access through the Wikipedia Library (TWL). TWL is a Wikimedia Foundation program that supports Wikipedia’s volunteer editors by facilitating access donations for paywalled resources from leading publishers.

EE is now included in the Oxford University Press Scholarship accounts which also give free access to eight other online resources, including the Oxford Dictionary of National Biography and American National Biography Online.

Wikipedians, usually with an active account and at least 500 edits, can apply through the Wikipedia Library Card Platform. The accounts last one year, but if you still need to use EE or the other Scholarship resources once the year has elapsed, you can apply again.

In another part of the collaboration, EE has shared a dataset with Wikidata covering more than three thousand individuals. This lets us explore those people’s representation on Wikipedia. Of this set, we have identified 2663 EE people who already have English Wikipedia biographies, 287 with no English Wikipedia article but an article in another language version, and 168 with no representation in Wikipedia at all. The latter two sets are listed on a project page. We welcome help in creating new articles for these people, filling in the story of the Enlightenment.

Through EE, I’ve learned about the early feminist Sarah Chapone, for whom I’ve created a Wikisource profile, and discovered my near-namesake Francois-Martin Poultier. It has confirmed that Clotworthy Rowley and Slingsby Bethell are not made-up names from the Goon Show but real British politicians. Like Wikipedia, it is a web of knowledge calling out to be explored.

by John Lubbock at September 22, 2017 11:13 AM

September 21, 2017

Wikimedia Cloud Services

Tool creation added to toolsadmin.wikimedia.org

Toolsadmin.wikimedia.org is a management interface for Toolforge users. On 2017-08-24, a new major update to the application was deployed which added support for creating new tool accounts and managing metadata associated with all tool accounts.

Under the older Wikitech based tool creation process, a tool maintainer sees this interface:

wikitech screenshot

As @yuvipanda noted in T128158, this interface is rather confusing. What is a "service group?" I thought I just clicked a link that said "Create a new Tool." What are the constrains of this name and where will it be used?

With the new process on toolsadmin, the initial form includes more explanation and collects additional data:

toolsadmin screenshot

The form labels are more consistent. Some explanation is given for how the tool's name will be used and a link is provided to additional documentation on wikitech. More information is also collected that will be used to help others understand the purpose of the tool. This information is displayed on the tool's public description page in toolsadmin:

toolinfo example

After a tool has been created, additional information can also be supplied. This information is a superset of the data needed for the toolinfo.json standard used by Hay's Directory. All tools documented using toolsadmin are automatically published to Hay's Directory. Some of this information can also be edited collaboratively by others. A tool can also have multiple toolinfo.json entries to support tools where a suite of functionality is published under a single tool account.

The Striker project tracks bugs and feature ideas for toolsadmin. The application is written in Python3 using the Django framework. Like all Wikimedia software projects, Striker is FLOSS software and community contributions are welcome. See the project's page on wikitech for more information about contributing to the project.

by bd808 (Bryan Davis) at September 21, 2017 07:46 PM

Wiki Education Foundation

Andrew Newell is the Deep Carbon Observatory Visiting Scholar

If you’re like most people, when you want to learn about a scientific topic, Wikipedia is probably your first stop. Increasingly, scientists and science communicators understand the powerful role Wikipedia plays in the public’s understanding of science, and are taking steps to ensure its information is accurate and up-to-date. The Deep Carbon Observatory (DCO) is a great example of this. DCO is eight years into a ten-year interdisciplinary project to explore the quantities, movements, forms, and origins of carbon deep within Earth. Its Engagement Team has committed to disseminating knowledge about deep carbon science with the public and with the broader science community through Wikipedia. As we announced a few months ago, this made them a great fit for the Wikipedia Visiting Scholars program.

Andrew Newell
Image: RockMagnetist.tif, by RockMagnetist, CC BY-SA 4.0, via Wikimedia Commons.

Today, I’m pleased to announce that DCO has selected Andrew Newell as Wikipedia Visiting Scholar.

Andrew is an Associate Research Professor in the Marine, Earth, and Atmospheric Sciences department at North Carolina State University, specializing in rock magnetism. On Wikipedia, Andrew edits as User:RockMagnetist, a long-time contributor and administrator. If you’ve read about geophysics-related subjects on Wikipedia, there’s a very good chance you could find his username somewhere in the articles’ edit histories. Included in his contributions are impressive articles on big, complex subjects like Earth’s magnetic field and momentum. He has also been involved with or helped to create WikiProjects Biophysics, Women Scientists, and Geology.

DCO Engagement Team Leader Rob Pockalny views the collaboration as having “incredible synergistic potential to contribute significant, long-lasting content to numerous Wikipedia topics, while helping to ensure accurate, rich content in science topics spanning earth science, chemistry, physics, and biology.”

In Andrew’s words,

“I began editing Wikipedia when I discovered that the Wikipedia coverage of my research area, indeed much of geophysics, was alarmingly poor—yet some of the articles are read by hundreds of thousands of people per year! Writing Wikipedia articles became a hobby, and at times my curiosity has led me well outside of my core expertise—most recently to the Deep Carbon Observatory. Research at the DCO has a lot of very intriguing themes like mineral evolution and deep life. I see lots of potential for adding material to basic subjects like diamond, mineralogy, and Carbon Cycle.”

For more, read the announcement on the DCO website.

Image: URI Carothers Library.jpg, by Kenneth C. Zirkel, CC BY-SA 4.0, via Wikimedia Commons.

by Ryan McGrady at September 21, 2017 06:31 PM

Wikimedia Foundation

Raising awareness for Wikipedia in Nigeria

Photo by Zack McCune/Wikimedia Foundation, CC BY-SA 4.0.

With an estimated 190 million residents, Nigeria is the largest country in Africa. A remarkable 60% of Nigerians are school-aged, creating one of the largest student bodies in the world. With internet access in Nigeria quickly growing, local Wikimedians are working together to raise awareness for the platform and how Nigeria’s many students can both use and improve Wikipedia.

Olushola Olaniyan is the acting President of Wikimedia User Group Nigeria. In 2017, he has boldly created student clubs for Wikipedia readers in Nigerian universities and created weekly radio programs to explain Wikipedia over national airwaves.

He also partnered with the Wikimedia Foundation to produce video ads explaining Wikipedia. These ads were produced in Nigeria by Nigerian filmmakers for Nigerian audiences. Olushola invited 4 fellow volunteers to join him and the Foundation in creatively directing the videos. These “community marketing team” included Sam Oyeyele, Blossom Ozurumba, Kayode Yussuf, and Eyitayo Alimi.

With the video ads launching in Lagos this September, we wanted to share a short Q&A with the Nigerians raising awareness for Wikipedia.

“The video ads show a blend of characters between the old and young generations in Nigeria. It reflects our cultures, it shows that Nigerians are eager to learn new things, it shows that knowledge is not restricted by age.” –Olushola Olaniyan


Just 27% of internet users in Nigeria say they have heard of Wikipedia. Why is awareness so low?

Blossom:  The chief reasons range from lack of access to internet to the non-affordability of internet data in Nigeria.

Kayode: Wikipedia is known majorly within academic environments and even at that, its usage is not encouraged because the lecturers claim information from Wikipedia is not reliable. This serves as a discouragement even within Wikipedia fans (students).

I also believe that the slow adaptation of Wikipedia mobile can also take a blame. Mobile telephony grew very fast in Nigeria and it was easier and cheaper to connect to the internet via mobile phones than with computers. So people usually used apps on mobile phones. I think Wikipedia got its mobile app quite late and this added to the low awareness.


What are Wikimedians in Nigeria doing to increase awareness of Wikipedia and promote usage?

Eyitayo: We are bootstrapping! Doing a whole lot. Organizing meetup events—at least once in a month, sometimes twice—working on partnership deals with media agencies both digital and print, ensuring we get spaces for events, inviting people for training, and following up to ensure they are retained within the community.

Blossom: There is a great work emanating at the south-west region of Nigeria especially Lagos with great leadership shown by the Wikimedia User-group Director Olushola Olaniyan in schools and radio campaigns. In the northern region, there is an unstructured community championing Igbo Language Wikipedia Awareness.

Olushola: We have established Wikimedia Clubs among Nigerian institutions such as University of Ibadan, and Nigerian Institute of Journalism (NIJ), to create social groups of Nigerian students excited about free knowledge. We have established partnerships with two media houses to broadcast Wikipedia-related programs on air, eg Black Face Radio and WFM91.7

Photo by Kaizenify, public domain.

Photo by Kaizenify, public domain.

Photo by Kaizenify, public domain.

Photo by Kaizenify, public domain.

Tell us about the video ads you have directed. What do they show about Nigeria? What did you want to communicate about Wikipedia for Nigerians?

Olushola: The video ads shows a blend of characters between the old and young generations in Nigeria. It reflects our cultures, it shows that Nigerians are eager to learn new things, it shows that knowledge is not restricted by age.

Sam: We wanted the video ad to shape the Wikipedia brand in Nigeria, and to actively push the brand to the Nigerian audience.

Blossom: The essence of the video ads is to reinforce Wikipedia awareness in the minds of Nigerians that are already aware of the platform and also subtly create new awareness in the minds of the unaware via pop culture shapers like Mark Angel Comedy’s star, Emmanuella and the ever-green Pete Edochie.

Kayode: I think the videos show a cultural and fun part of Nigeria: they show that we are educated, knowledgeable and fun loving. I also wanted to rewrite what Wikipedia is in the minds of Nigeria. Wikipedia is a free encyclopedia, written collaboratively by the people who use it.


What will Nigerian audiences understand about these videos that international audiences might miss?

Sam: The comedy! It’s no news that comedy is usually culture specific, so I think the international audience might miss the comedic content in the videos: the mannerisms, the facial expressions, the slang, etc. Exceptions will be those familiar with Nigerian entertainment, such as other West Africans and Africans at large.

Actor Peter Edochie asking about Wikipedia in the new Nigerian video ad. Photo by Wikimedia Foundation, CC BY-SA 4.0.

Olushola: The actors! Peter Edochie (popularly called Okonkwo) is a Nigerian award-winning actor, considered one of the most talented actors in Africa. He came to the limelight after featuring in Things Fall Apart, an adaptation from Chinua Achebe‘s book. He is very popular among the older generation of Nigerians.

Emmanuella of Mark Angel Comedy, is a young Nigerian comedian who was introduced to movie industry by her uncle. Her comedy is very popular among the younger class.

Kayode: Pete Edochie is a well respected actor who is seen as a strong supporter for our culture and traditions; his pictures are often used for memes that communicate chastity, integrity and common sense. Nigerians will easily relate anything that has to do with Pete Edochie as original and traditional. This will mean that the product advertised is local and useful for us in Nigeria

For the second video, every Nigerian can see themselves in Emmanuella and Mark Angel. These are young self-made comedians who make fun out of our everyday activities. Their message usually is to show the fun side of life. They have grown to become the most subscribed channel on YouTube in Africa and they proved their worth in the ad.

Videos by Wikimedia Foundation, in collaboration with Wikimedia Nigeria User Group and Anakle, CC BY-SA 4.0. You can view them on Commons (1, 2) or Vimeo (1, 2).


What’s next for Wikimedians in Nigeria?

Eyitayo: I look forward to commencing my ‘women who wiki’ project soon. As a major player in the female tech space in Nigeria, this project will enable me bring more women into the user group community—especially in Lagos. It will also help keep up with organizing the user groups, especially getting meetup venues as that poses a major challenge to our monthly meetups.

Kayode: I am looking forward to taking Wikipedia to high schools. Young people have a lot of questions about our culture and tradition, they learn daily and usually, they do not know they can put together these knowledge they have so other people can make use of it. I believe that reaching out to the young people will give them the opportunity to create a lot of local content

Sam: I’ll be leading the Wiki Loves Africa contest in Nigeria, which will commence in October. That’s currently what I’m looking forward to. This year’s contest will be focusing on professions in Africa.

Zack McCune, Global Audience Communications Manager, Communications
Wikimedia Foundation

Interested in the Wikimedia Foundation’s New Readers initiative? Take a look at its Meta-Wiki landing page, and read about it on this blog.

by Zachary McCune at September 21, 2017 04:46 PM

Amir E. Aharoni

The Curious Problem of Belarusian and Igbo in Twitter and Bing Translation

Twitter sometimes offers machine translation for tweets that are not written in the language that I chose in my preferences. Usually I have Hebrew chosen, but for writing this post I temporarily switched to English.

Here’s an example where it works pretty well. I see a tweet written in French, and a little “Translate from French” link:

Emmanuel Macron on Twitter.png

The translation is not perfect English, but it’s good enough; I never expect machine translation to have perfect grammar, vocabulary, and word order.

Now, out of curiosity I happen to follow a lot of people and organizations who tweet in the Belarusian language. It’s the official language of the country of Belarus, and it’s very closely related to Russian and Ukrainian. All three languages have similar grammar and share a lot of basic vocabulary, and all are written in the Cyrillic alphabet. However, the actual spelling rules are very different in each of them, and they use slightly different variants of Cyrillic: only Russian uses the letter ⟨ъ⟩; only Belarusian uses ⟨ў⟩; only Ukrainian uses ⟨є⟩.

Despite this, Bing gets totally confused when it sees tweets in the Belarusian language. Here’s an example form the Euroradio account:

Еўрарадыё   euroradio    Twitter double.pngBoth tweets are written in Belarusian. Both of them have the letter ⟨ў⟩, which is used only in Belarusian, and never in Ukrainian and Russian. The letter ⟨ў⟩ is also used in Uzbek, but Uzbek never uses the letter ⟨і⟩. If a text uses both ⟨ў⟩ and ⟨і⟩, you can be certain that it’s written in Belarusian.

And yet, Twitter’s machine translation suggests to translate the top tweet from Ukrainian, and the bottom one from Russian!

An even stranger thing happens when you actually try to translate it:

Еўрарадыё   euroradio    Twitter single Russian.pngNotice two weird things here:

  1. After clicking, “Ukrainian” turned into “Russian”!
  2. Since the text is actually written in Belarusian, trying to translate it as if it was Russian is futile. The actual output is mostly a transliteration of the Belarusian text, and it’s completely useless. You can notice how the letter ⟨ў⟩ cannot be transliterated.

Something similar happens with the Igbo language, spoken by more than 20 million people in Nigeria and other places in Western Africa:

 4  Tweets with replies by Ntụ Agbasa   blossomozurumba    Twitter.png

This is written in Igbo by Blossom Ozurumba, a Nigerian Wikipedia editor, whom I have the pleasure of knowing in real life. Twitter identifies this as Vietnamese—a language of South-East Asia.

The reason for this might be that both Vietnamese and Igbo happen to be written in the Latin alphabet with addition of diacritical marks, one of the most common of which is the dot below, such as in the words ibụọla in this Igbo tweet, and the word chọn lọc in Vietnamese. However, other than this incidental and superficial similarity, the languages are completely unrelated. Identifying that a text is written in a certain language only by this feature is really not great.

If I paste the text of the tweet, “Nwoke ọma, ibụọla chi?”, into translate.bing.com, it is auto-identified as Italian, probably because it includes the word chi, and word that is written identically happens to be very common in Italian. Of course, Bing fails to translate everything else in the Tweet, but this does show a curious thing: Even though the same translation engine is used on both sites, the language of the same text is identified differently.

How could this be resolved?

Neither Belarusian nor Igbo languages are supported by Bing. If Bing is the only machine translation engine that Twitter can use, it would be better to just skip it completely and not to offer any translation, than to offer this strange and meaningless thing. Of course, Bing could start supporting Belarusian; it has a smaller online presence than Russian and Ukrainian, but their grammar is so similar, that it shouldn’t be that hard. But what to do until that happens?

In Wikipedia’s Content Translation, we don’t give exclusivity to any machine translation backend, and we provide whatever we can, legally and technically. At the moment we have Apertium, Yandex, and YouDao, in languages that support them, and we may connect to more machine translation services in the future. In theory, Twitter could do the same and use another machine translation service that does support the Belarusian language, such as Yandex, Google, or Apertium, which started supporting Belarusian recently. This may be more a matter of legal and business decisions than a matter of engineering.

Another thing for Twitter to try is to let users specify in which languages do they write. Currently, Twitter’s preferences only allow selecting one language, and that is the language in which Twitter’s own user interface will appear. It could also let the user say explicitly in which languages do they write. This would make language identification easier for machine translation engines. It would also make some business sense, because it would be useful for researchers and marketers. Of course, it must not be mandatory, because people may want to avoid providing too much identifying information.

If Twitter or Bing Translation were free software projects with a public bug tracking system, I’d post this as a bug report. Given that they aren’t, I can only hope that somebody from Twitter or Microsoft will read it and fix these issues some day. Machine translation can be useful, and in fact Bing often surprises me with the quality of its translation, but it has silly bugs, too.

Filed under: Belarusian, Free Software, Igbo, Microsoft, Nigeria, Russian, search, translation, Twitter, Ukraine, Wikipedia

by aharoni at September 21, 2017 09:39 AM

September 20, 2017

Wikimedia Tech Blog

Bashkir becomes the first language collated inside MediaWiki

Photo by Visem, CC BY-SA 4.0.

Have you ever heard of Bashkortostan?

It’s a region of Russia, about a 1000 miles away from Moscow. Few people outside of Russia have heard of it, but inside Russia it’s quite well-known for its traditional honey and kumis industries, and tourists who visit its many rivers, forests, and mountains.

The region’s name comes from the Bashkirs—a distinct ethnic group that lives there and speaks its own language, which belongs to the Turkic family. Twelve years ago the first article in the Wikipedia in that language was written. Today the community of editors around it is among the most active Wikipedia communities in languages of Russia.

That community recently asked the MediaWiki software developers to solve a technical problem for them: Category collation in the Bashkir alphabet. Put simply, “collation” is the process of sorting words according to the alphabet. It’s not as simple as it may sound, and it works slightly differently in every language.

Photo by Amir Aharoni, public domain.

Bashkir is written in the Cyrillic alphabet, like Russian, but with several additional letters for special Bashkir sounds. These letters have their places all along the alphabet, but MediaWiki showed all of them incorrectly. For example, in the “Capitals of republics of Russia” category, the entry for Ufa (Өфө), Bashkortostan’s capital, was appearing in the end of the list, even though it was supposed to be in the middle.

MediaWiki software relies on an external library called ICU—International Components for Unicode— to apply collation to different languages. It has collation information about many languages, but not all, and Bashkir is not one of them.

I submitted a request to get this language into ICU, but the process of getting a new language into it can take many months, if not years. We could have just waited for that to happen, but then our colleague Brian Wolff wrote some brilliant code that resolves this issue inside MediaWiki’s code, making it unnecessary to wait for the ICU to update.

When the fix was ready, I got it deployed and tested on the Bashkir Wikipedia. And when this started working, the Bashkir Wikipedians were so happy about it that the biggest Bashkir newspaper, simply called Bashkortostan, got interested, and published a story about it.

And yes, it mentions Brian Wolff. Search for “Брайан Вулфф”. (Bashkir is not supported by Google Translate, but it is supported by Yandex.Translate. Machine translation is never perfect, but if you’re curious, you can try using it to get an idea of what the article says.)

Bashkir is the first language for which complete collation is implemented inside of MediaWiki. I am already starting to hear requests to do something like this for other languages, and thanks to Brian’s work it will now be much easier. The fact that Bashkir was the first one shows how an active editing community which cares about its language can get things to happen.

We are doing amazing things that affect the world in ways we don’t even imagine!

Amir Aharoni, Wikimedian

Editor’s note: While Amir is an employee of the Wikimedia Foundation, this post is written in a volunteer capacity.

by Amir E. Aharoni at September 20, 2017 04:54 PM

September 19, 2017

Wikimedia Tech Blog

Admittedly loopy but not entirely absurd—Understanding our Search Relevance Survey

Photo by Albin Olsson, CC BY-SA 3.0.

The Friday before last, Sue Gardner—the former head of this organization, of all people—discovered a survey-based experiment we were running on Wikipedia. She took a screenshot and shared it on Twitter, where she wrote that “the whole thing seems loopy [and] absurd.” The survey asked if the Wikipedia article she was reading, Despatch box, would be relevant “when searching for ‘what does the chancellor of the Exchequer keep in his red box’?”

While admittedly loopy, the question is not entirely absurd.

The Wikimedia Foundation’s Search Platform team hopes this survey question and others like it will allow us to gather useful data on the relevance of search results, which in turn will help improve the quality of search on Wikipedia—and importantly, not just on English Wikipedia.

The particular question Sue tweeted about features one of my favorite queries from the survey, because I didn’t know anything about the subject of despatch boxes before reading the query. There are probably only a couple of pages on English Wikipedia that are really relevant to the query, but we’re asking variations of this question about the Chancellor of the Exchequer on a few dozen pages.

Why? Machine learning!

Search, in two not-so-easy steps

While it’s oversimplifying a bit, you can think of searching for articles as having two major steps—first, find articles that contain words from the query, and second, rank those articles, ideally so that the most relevant result is listed at the top. The first step has some subtlety to it, but it is comparatively straightforward.

The second step is more complex because there is not a clear recipe to follow, but rather a lot of little pieces of evidence to consider. Some bits of evidence include:

  • How common each individual word is overall. (As the most common word in the English language, the is probably less important to a query than somewhat rarer friggatriskaidekaphobia.)
  • How frequently each word appears in a given article. (Five matches is probably better than four matches, right?)
  • Whether there are any matching words in the title or a redirect. (Well, if you search for just the, an article on English’s definite article does look pretty good.)
  • How close the words are to each other in the article or title. (Why is there a band called “The The”?)
  • Whether words match exactly or as related forms. (The query term resume also matches both resuming and résumé.)
  • How many words are in the query.
  • How many words are in the article. (Okay, maybe five matches in a twenty thousand word article might be worse than four matches in a five hundred word article.)
  • How often an article gets read. (Popular articles are probably better.)
  • How many other articles link to an article. (Did I mention that popular articles are probably better?)

…and lots more!

Not only do you have to figure out how to weight these different pieces of evidence, you also have to figure out how best to measure them. Should popularity be measured over the last hour, week, or decade? A word that appears in three articles is rarer than a word that appears in thirty articles, but is a word that appears in 5,178,346 articles really that much rarer than a word that appears in 5,405,616 articles? (That would be the numbers for “of” and “the” on English Wikipedia at the time of this writing, but the numbers will likely go up before you get to the end of this sentence. Wikipedians are very industrious!)

At some point, manually tweaking the scoring formulas becomes too complex to be effective, and it becomes a never-ending game of whack-a-mole, with a fix over here causing a problem over there. There’s also the problem of applying any scoring formula to different projects or to projects in different languages, where the built-in estimates of the relative importance of any of the features in the scoring formula may not quite hold.

Machine learning to the rescue!

The solution—in just one oversimplified step—is to automate the process of determining the scoring function. That is, to use machine learning. Erik Bernhardson—a senior software engineer at the Foundation and technical lead for the search back-end team—got good results from his initial experiments and he, along with our colleague David Causse, and others outside the Foundation have been busy building a machine learning pipeline for search.

The problem is that machine learning needs data, lots of data—preferably all the data you can get your hands on, and then a bunch more data. And maybe a little extra data on top. Seriously though, more data gives the machine learning training process the evidence it needs to build a more nuanced, and thus more accurate model.

Importantly, a machine learning model needs to be trained and evaluated on both good examples and bad examples, otherwise it can learn some screwy things. Without bad examples, the machine learning process can’t know that tweaking something over here to get 5 more right answers also makes 942 wrong answers pop up over there. It needs to be able to see all the metaphorical rodents—uh, I mean, talpids—at once to figure out the best way to whack as many as possible.

Let’s make some noise data!

We tried generating human-curated training data using a cool tool called the Discernatron, built by Erik and subsequently prettified for use by humans by Jan Drewniak. The problem is that looking at other people’s queries is both very hard and very boring.

The hardest part is trying to figure out what people intended when they searched. A query like naval flags is so broad that it is hard to say what the most relevant article is.[1] A query like when was it found just doesn’t seem to refer to anything in particular—though it might be the name of a book or a play or an episode of an obscure TV show or a song by a band you never heard of (yeah, we’re search hipsters). Another favorite, tayps of wlding difats, is too hard to decipher for many people, but might be possible for a phonetic search to find—it seems to be a phonetic rendering of “types of welding defects”. But even if you do understand a given query, you may not know enough about the subject matter to judge the relevance of any articles.

There’s also the problem of noise in the Discernatron data. Even assuming good faith, you would expect people to disagree about what is relevant and what is not. Some queries are harder to decipher than others, and some people are also better at doing the deciphering. Unfortunately, that means that progress with the Discernatron was even slower than we’d hoped because we needed to get multiple reviews of the same data and come up with a way to determine whether different reviewers were in close enough agreement that their collective ratings constitute a consensus. We never got enough data for training, though we did build a nice data set for validating other approaches against.

Erik, never one to be deterred, started looking for ways to automate the data collection, too. Dynamic Bayesian networks (DBN)—in addition to being in the running for having the article with one of the w-i-d-e-s-t illustrations on the English Wikipedia—are the foundation for an approach that aggregates click-through data from similar searches to infer users’ collective relative preferences among search results. A good example is the TV series The Night Manager. Because the book that the series is based on is an exact title match when querying the night manager, it gets ranked above the TV show. But the DBN model made it abundantly clear that recently, at least, people were much more interested in the TV show.

Dropping the from the query, the night manager, eliminates the exact title match and changes the rankings just a bit. Screenshots from the English Wikipedia, CC BY-SA 3.0 and/or GNU Free Documentation License.

However, there’s still a problem with the DBN method. It relies on multiple users looking for more or less the same thing to draw its inferences. Not only do we want to know what the most relevant result is for a given query, but also the relative relevance for as many of the other results as possible. For the night manager, the results were overwhelming—everyone wanted the TV series and nothing else—but the ranking of most results is more nuanced: there may be a clear favorite, but also relevant second and third place results, and maybe even relevant fourth and fifth place results. You can’t really get that info from a single user.

Another issue is a concern that maybe there’s a result down in twenty-seventh place that may not be the best of the best, but it’s still what some people are looking for. No one looks at the twenty-seventh result, except those of you who are going to read that and then go look at some random twenty-seventh result just to be contrarian. (Welcome; you are my people.)

Improve the best, but don’t forget the rest

So, we’re never going to get enough clicks to create DBN-based training data covering the long tail of less common queries, and we’re never going to get any clicks to help us find good results that are hiding too far down in the results list.

Nonetheless, the DBN-based models did as well as our current result-ranking scoring methods in recent tests: you can check out the report by data analyst Mikhail Popov. Searchers clicked more overall and clicked more on the first result, so we’re doing something right!

The machine learning models also offer more room for improvement. Future improvements will come from a feedback loop as the machine learning models bring some results up from lower down the list (based at first on the features they exhibit, not the clicks they received), and from new features we add to help identify and/or distinguish more promising results.

Still, we worry about the long tail. If ten thousand people don’t get any good results for one particular query, we’ll hear about it—or one of them will write a new article or create a redirect and solve the problem that way. (Industrious!) But if ten thousand people each fail to get any good results for ten thousand separate queries, we would likely never know, at least not about all of them.

Maybe it’ll turn out that queries and results in the long tail are just like the more common queries we see in the DBN-based training data, merely less popular. In that case, the generalizations the machine learning model makes based on the DBN-based data—exact title matches are always good; it’s okay for most of the query words to be “common” when the query is over 37 words long, but not when it is less than 4 words long; more inexact matches in the title plus lots of exact matches in the first paragraph are better than fewer exact matches in the title and no matches in the first paragraph; “Omphalos hypothesis” should always be the seventh result for every query on Thursdays[2]—will also apply to the long tail. But maybe the long tail is qualitatively different, too.

How the sausage is made

Erik, who is always full of good ideas, decided we could solve our lack of long-tail data by turning the Discernatron inside out. Instead of taking a query and asking users to review a bunch of articles that might be relevant, what if we take an article and ask a user to review a query that might be relevant? Hence the survey seen by Sue!

The survey has been simplified from the Discernatron in many ways. The only rating options are “relevant”, “not relevant”, and “get me out of here”. All of the instructions have been boiled down to the one question (though we are also running A/B/C/D/E/F/G/H/I/J/K/etc. sub-tests to find the best wording for the question). And we hope that by asking someone about an article they’ve been reading for a little while, some of the “real world knowledge” problems will be mitigated.

We’ve already completed a very limited round of surveys to see if people agreed with me (personally!) on the relevance of a few articles for a few queries that I made up—just to make sure the survey-based crowd-sourcing approach works. Mikhail’s analysis shows that (a) a small group of random strangers on the internet were able to roughly infer the intent of my queries and the relevance of possible results, and (b) they like to tell you that you are wrong more than they like to tell you that you are right—irrelevant articles get more engagement than relevant articles. (Though maybe people are being careful; sometimes it is easier to see that things are wildly wrong than possibly right.)

The current round of surveys is testing much more broadly, using the not-quite-so-small corpus of queries and consensus-ranked articles we have from the Discernatron data. Re-using the Discernatron data—much of which is long-tail queries—lets us validate our crowd-sourced survey-based approach by comparing the survey relevance rankings against the Discernatron relevance rankings.

End game

If the survey data and the Discernatron data look similar enough, we can confidently expand the survey to new queries and new articles. Ideally, we can generate enough long-tail data this way to fold it into the DBN-based data used for training new machine learning models, probably with some weighting tricks since it will never match the scope of the DBN-based click-through data. But even if we can’t, we can still use the survey-based data to validate the performance of the DBN-based models on long-tail data.

Once we’re reasonably confident that our survey method works for English, it will much easier to expand to other languages and other projects. While it will take some careful work to translate the survey question(s) and even more to vet potential queries (they have to be carefully screened by multiple people to make sure no personal information is inadvertently revealed), it’s much less work than has been put into building the Discernatron corpus in English, and for much more reward.

And, of course, our survey needs to include both good examples and bad examples—even the ones that seem more than a little loopy—in part because we need both loopy and non-loopy data for training and validation, and in part because if we could tell loopy and non-loopy apart, we wouldn’t need help from the Wikipedia community to find out what the Chancellor of the Exchequer has in his red box!

Parting thoughts

  • In the spirit of Wikipedia and Wiktionary (my favorite!): Be bold! Try new ideas, new approaches, new features—test them!—and don’t fret if they don’t all work out on the first go, because…
  • Machine learning generally and feature engineering specifically is a black art—and doubly so for search because there is no one right answer to work towards. Whatever was mentioned in that cool paper you read or came as the default settings in the open-source software you downloaded is not likely to be exactly what’s best for you and the data you love. Optimize, optimize, hyperoptimize!
  • Whether you are using manual tuning or machine learning to improve search, good intuition and specific examples are very useful, but proper testing—and data that reflects what your users want[3] and need—is vital. Otherwise you are playing whack-a-mole in the dark.
  • Communication is hard. The Discernatron offers two pages of documentation, but —like all documentation—nobody actually reads it. Our new survey has at most a couple dozen words in which to ask a question and motivate the reader to participate. Getting your point across without being boring or coming off as absurd is a difficult balance.

On Phabricator you can follow the not entirely absurd progress of the machine learning pipeline in general (see task T161632), and the search relevance surveys in particular (see task T171740). You can join in the discussion with questions and suggestions, too!

Trey Jones, Senior Software Engineer, Search Platform
Wikimedia Foundation


  1. More than one person has wisely suggested that the maritime flag article is the obvious answer here. Statistically, though, this was one of the Discernatron queries that reviewers disagreed on most.
  2. These are all made up and nothing as specific as a rule about a particular article is going to be learned by a machine learning model. We also don’t include the day of the week as a feature in the model—though given the differences in user behavior on weekdays and weekends, maybe one day we will (last Thursday, I predict)!
  3. Because—to paraphrase J. B. S. Haldane—what people search for is not only stranger than we imagine, it is stranger than we can imagine.

by Trey Jones at September 19, 2017 05:12 PM

Wiki Education Foundation

Collaborating with the library to increase women on Wikipedia

Tamar Carroll teaches in the Department of History at Rochester Institute of Technology. She’s incorporated Wikipedia editing in several courses.

Several years ago, I was inspired to assign a research assignment in which students either write a new or substantially edit an existing Wikipedia entry on a notable American woman, after reading about other instructors’ experiences teaching with Wikipedia in the volume Writing History in the Digital Age. Women are under-represented both as editors on Wikipedia and as subjects of entries, and because I work at a technical institute with a heavily male student body, this seemed like a perfect fit for my U.S. women’s and gender history survey course. After all, we can not only investigate why women’s history is missing, but also work to correct the gap.

Indeed, students have reported a great deal of satisfaction in sharing information about the lives of women who have inspired them with the world, via Wikipedia. One student, who took the course back in Fall 2013 and researched and wrote about Mary Stafford Anthony, sister of the suffrage leader Susan B. Anthony, emailed me recently to tell me that the assignment had been the most meaningful of all the work she had completed as a student at RIT. My students frequently exceed the assignment requirements of adding three substantial paragraphs and five new sources, getting genuinely excited about the contributions they are making to Wikipedia. Along the way, they learn to think critically about how knowledge is constructed and gets circulated, and how to find sources that are both reliable and verifiable.

The success of this assignment rests largely upon my collaboration with Lara Nicosia, the RIT librarian for the College of Liberal Arts, who has enthusiastically and expertly provided professional and technical support for the class. We spend three full class sections with Lara. In the first session, Lara introduces the students to the concept of encyclopedias in general and Wikipedia in particular, outlining the rules of evidence employed by Wikipedia and its editorial process. Importantly, Lara incorporates hands-on work from the start, getting them used to editing Wikipedia in their sandboxes.

In the second library session, Lara demonstrates the library databases and other resources most useful to the students in conducing biographical research on American women, and the students must apply what they have learned by creating or adding sources to a “Further reading” section of an existing entry. At this point, students are free to choose their own subject to write about, or to consult with Lara and I for suggestions. They spend the next two months conducting research and writing up their findings, turning in bibliographies and a draft to me and doing peer editing in class. Every year, several students also consult one-on- one with Lara for assistance in the research process. We return to the library’s computer lab about two-thirds of the way through the semester, when the students make their entries “live” on Wikipedia, and greatly benefit from Lara’s editing experience in resolving any snafus in the process.

An unanticipated but beneficial result of our collaboration has been Lara’s successful advocacy for adding subscription-based resources like American National Biography and Women and Social Movements in the United States, 1600-2000 to RIT’s library. Being able to show that my classes are using these resources has helped Lara make the case for funding these valuable additions to our library, which benefit scholarship and teaching at RIT more broadly while raising the profile of the humanities on campus. Lara has also integrated Wikipedia programming into the library at RIT, hosting regular events with the goal of building a community of Wikipedia contributors on RIT’s campus.

“From a librarian’s standpoint, everybody complains about Wikipedia and all the problems – then fix it!” Lara says. “That’s the power of the tool. You have that power. People who have access to high quality information and research skills have a social obligation…. to give back by improving Wikipedia. It falls on us to make it the best it can be.”

Incorporating Wikipedia into our 100-level coursework and library programming here at Rochester Institute of Technology has helped to move our students from being passive consumers of news and information to active participants in the evaluation and dissemination of knowledge online, while working to reduce Wikipedia’s gender bias.

ImageWomen on Wikipedia Edit-a-thon 2017 (RIT), by Colalibrarian, CC BY-SA 4.0, via Wikimedia Commons.

by Guest Contributor at September 19, 2017 05:03 PM

Wikimedia Foundation

Odisha becomes first state government in India to release its social media under a free license

Photo by the Government of Odisha, CC BY 4.0.

The government of Odisha has become the first state entity in India to release all of its social media posts under a free Creative Commons license, allowing people from around the world to freely re-use the government’s content in projects like Wikipedia.

The pilot project, which covers eight of the state’s accounts on Facebook, Twitter, YouTube, Instagram,[1] releases a veritable treasure trove of public interest photos and media. This has already had an impact; on Wikipedia, for instance, volunteers have added government images to articles about Rathajatra, Konark Sun Temple, and others.

“This content release should serve as a model for governments around the world,” said Asaf Bartov, Senior Program Officer at the Wikimedia Foundation. “Releasing the Odisha government’s content under a free license will allow anyone to use, share, and build upon their work.”

In general, you don’t have permission to use a work unless you are given permission, such as under a Creative Commons license. Until 14 September, that lack of permission included anything shared by the Odisha government on social media. This restriction means that it can be difficult to share content on Wikipedia—one of the most popular websites in the world, committed to free and open copyright licenses from its earliest days on the internet—and elsewhere, even that created and shared by government bodies for public use and consumption.

This limitation has significant real-world impact, but in Odisha, a state in eastern India, this is quickly changing. Earlier this year, the Odia language Wikipedia community collaborated with the Odisha state’s Youth and Sports Services department to relicense the 2017 Asian Athletics Championships’ website. By doing so, nearly 350 images were added to articles in 35 different language Wikipedias.

Similarly, the Odisha government’s most recent decision to re-license its social media content under the Creative Commons Attribution 4.0 license, abbreviated as CC BY 4.0, will help illustrate a number of topics.

The Odia Wikipedia community would like to thank Manoj Kumar Mishra, Officer on Special Duty to the Chief Minister’s Office, for understanding the value of open content and implementing the change so quickly. The government’s decision, on 14 September 2017, came less than 24 hours after volunteer editors of the Odia language Wikipedia met with Mr. Mishra.

“The present government, under the leadership of Sri Naveen Patnaik, has been focussing of Transparency, Technology and Teamwork—a 3T bulwark to deliver better governance to citizens more efficiently,” the Chief Minister’s office says. “When the Wikipedia volunteers met with us, we had no hesitation to partner and share our content with one of the largest information hubs in the web space. We firmly believe that information is power, and that power must be vested with people. Government programs must reach to people, and knowledge through information can solve many last mile issues. We look forward to stronger information partnerships in the future.”

If you would like to license your work under a Creative Commons license, please visit their website for more information. (To be compatible with Wikipedia and Wikimedia Commons, a free media file repository that hosts many of the images used on Wikipedia, you must allow adaptations and commercial uses.)

Sailesh Patnaik and Mrutyunjaya Kar, Odia Wikipedia community volunteers


  1. The accounts include:

    The Odisha state government is in planning to re-license their official websites as well.

by Sailesh Patnaik and Mrutyunjaya Kar at September 19, 2017 12:00 AM

September 18, 2017

Wikimedia Foundation

Wikimedia Foundation signs amicus brief challenging U.S. travel and immigration restrictions at Supreme Court

Photo by MattWade, CC BY-SA 3.0.

Today, the Wikimedia Foundation joined a coalition of companies and organizations in filing an amicus curiae brief with the United States Supreme Court opposing an executive order that places restrictions on travel and immigration to the U.S. based upon national origin. Legal challenges to the order have been ongoing since it was issued earlier this year. Following injunctions against the order from the Courts of Appeal for the Fourth and Ninth Circuits, the US government petitioned the Supreme Court to review Trump v. International Refugee Assistance Project and Trump v. State of Hawaii. In opposing the order with this brief, the Foundation is joined by over 150 others, including Mozilla, Mapbox, and GitHub. The brief details how the order will disrupt the international operations of the Foundation and other signatories. Additionally, it explains how the order violates fundamental principles of US law.

At the Wikimedia Foundation, we understand that knowledge knows no borders. Our mission and purpose is to support the members of the Wikimedia communities who collect and disseminate information worldwide through the Wikimedia projects. By definition, this mission is global, and so is our organization. If we are unable to collaborate with each other across borders, our achievement of that mission is threatened. That is why we joined previous amicus briefs challenging this order, and will continue to oppose efforts to stifle international travel and collaboration.

Wikipedia and the other Wikimedia projects are the culmination of the efforts of thousands of volunteers whose work and perspectives are informed by their unique languages, histories and traditions. The Wikimedia movement consists of contributors, Foundation staff and contractors, board members, chapters, affiliates, and user groups in every corner of the globe. Constraints on international travel therefore present a serious threat to our collective work.

For more information about the importance of these issues, please see our February 6, 2017 and March 15, 2017 blog posts about previous amicus briefs opposing these restrictions, as well as the January 30, 2017 statement by our Executive Director Katherine Maher, which discusses our philosophy of making free knowledge globally available.

Restrictions on international travel and immigration not only go against the operational interests of the Wikimedia Foundation, but are also in opposition to the open collaboration that is crucial to the success of the projects and the sharing of free knowledge worldwide. We encourage the Court to affirm the judgments of the courts below.

Stephen LaPorte, Legal Director
Wikimedia Foundation

Special thanks to the law firm Mayer Brown for drafting the brief, to the other signatories of the brief for their collaboration and support in this matter, and to the Wikimedia Foundation Communications, Legal, Talent and Culture, and Travel teams for their work since the initial order was first issued.

by Stephen LaPorte at September 18, 2017 07:59 PM

Wiki Education Foundation

Roundup: National Hispanic Heritage Month

September 15 marks the beginning of National Hispanic Heritage Month, an annual event that celebrates not only the contributions of Hispanic and Latinx Americans, but also the histories and cultures of them and their ancestors from Spain, Mexico, the Caribbean, and Central and South America. This observance lasts until the 15th of the following month and is mirrored by similar events in other countries. While the choice of September 15th may seem random to some, this date was chosen very carefully, as it marks the anniversary of when Guatemalan authorities declared in 1821 that Central America was independent of Spain, sparking the Mexican War of Independence and is close to the days when Mexico, Chile, and Belize celebrate their independence days. As National Hispanic Heritage Month approaches, it seems appropriate to showcase articles that Princeton University students in Rosina Lozano’s Latino History class created as part of their coursework.

Students edited the article on the term Latinx, which is used as a gender neutral way to refer to people that have Latin America ties or are of Latin American descent. Meant to be more gender inclusive, the usage and awareness of the term has slowly grown over time. But they also improved biographies of notable Latinx people.

One of the notable Latinx Americans that students wrote about included Arcadia Bandini de Stearns Baker, a wealthy landowner in Los Angeles. She was also a Californio, a term given to a person who had Spanish or Castillian ancestry and was born in what is now known as California while the area was controlled by either Spain or Mexico. As Arcadia Bandini was part of the Californio elite and her family’s wealth was considerable, she made an extremely attractive prospect for anyone looking for a way to not only join, but form alliances within the elite Californio society. Make no mistake, Arcadia Bandini was a force to be reckoned with. She not only ruled Los Angeles society for a period of time, but she was also an active businesswoman who used portions of her wealth to benefit those around her. If you’ve ever enjoyed Palisades Park in California, you have Arcadia Bandini to thank for this, as it was she that donated the land to Santa Monica, among other charitable acts. When she died in 1912, approximately 2,000 people came to honor her at her funeral, a sign of how important she was and how many lives she impacted.

Students also created an article for Stanford professor Aurelio Macedonio Espinosa Sr., a scholar in the field of Spanish and Spanish American folklore and philology as well as staunch promoter for the study of the Spanish language and literature. They also created an article about the female members of the Young Lords, a Puerto Rican nationalist group that operated in the United States. The Young Lords were originally a turf gang, however their president reorganized them into a national civil and human rights movement. Women participated regularly and heavily in this movement, however they were often overlooked for key leadership position. Even after the dissolution of the Young Lords, the female members are still frequently overlooked or do not receive the same recognition as male members in work documenting the movement.

Want to help share knowledge with the world? Contact Wiki Education at contact@wikiedu.org to find out how you can gain access to tools, online trainings, and printed materials to help your class edit Wikipedia.

Image: District celebrates Hispanic Heritage Month (10601934644), by U.S. Army Corps of Engineers, CC BY 2.0, via Wikimedia Commons.

by Shalor Toncray at September 18, 2017 04:06 PM

Wikimedia Foundation

Wiki Loves Archives: Citizen participation helps rescue damaged archival resources

Digitization of the Louis-Roger Lafleur Fonds. Photo by Lea-Kim, CC BY-SA 4.0.

Many archive centers suffer from budget cuts and must make the difficult choice of selecting certain collections to preserve to the detriment of others. Over the last few years, Wikimedia Canada has been involved in a project to create Wikipedias in Aboriginal languages. It was with this in mind that Bibliothèque et Archives nationales du Québec (BAnQ) called on the participation of Montrealers to preserve photographic archives by organizing their first scan-a-thon.

Unknown Aboriginal child. Unsubscribed archival image. Photo by Louis-Roger Lafleur, public domain.

Frédéric Giuliano, coordinator-archivist, transferred four archive fonds, also known as collections in some countries, to BAnQ Vieux-Montréal from three BAnQ centers:

These archives illustrate the lives of the Cree and Algonquins of the Abitibi, the Innu from the Côte-Nord region, and the Atikamekw from the Mauricie region. As these archives are quite old, the negatives have become oxidized; several images were already permanently damaged. As such, they needed to be digitized as soon as possible.

For this activity and the future scan-a-thons, Wikimedia Canada and the WikiClub of Montreal acquired a specialized scanner, the Epson Perfection V850 Pro (recommended by BAnQ archivists), which allowed the scanning of negatives on film in very high resolution. Of course, it was necessary to have an equivalent digitizer if not better than that used by the digitization department of Bibliothèque et Archives nationales du Québec.

After a short training of an archival specialist, the volunteers took turns to place the negatives in the film holders, while others digitized them.

Consultation room of the Gilles-Hocquart building. On sides, scanning and uploading; middle, quality control. Photo by Benoit Rochon, CC BY-SA 3.0.

Quality assurance

At the same time, other volunteers ensured that there was no dust, that the images are complete, and other quality control. Then teams of volunteers, on each side of the consultation room (photo), received the photos on USB sticks, uploaded them to Wikimedia Commons and in some cases described the images, as not all have original descriptions. Other participants translated the descriptions into several languages, and if a Wikipedia article can be illustrated by one of these images, participants will improve the article.

Thus at the end of the day, more than 500 images and documents, dating from as far back as the 1700s, were digitized and uploaded to Wikimedia Commons. The archives will directly benefit from this as well—through the free license on Commons, BAnQ can take and use those previously undigitized photos and descriptions in their own catalog.

Once the funds were uploaded in Commons, volunteers from around the world digitally restored some of the photos by removing wrinkles, dust, and tears. Digital photo restoration is something of a speciality for some Wikimedians, with perhaps the most notable case coming with an image of the Wounded Knee massacre. The image, uploaded by the Library of Congress, was restored by a Wikimedia editor who found that seemingly random background detritus was actually four different bodies from the massacred Sioux tribe.

Frédéric Giuliano, archivist-coordinator, showing one of the oldest manuscripts of the judicial archives preserved at BAnQ. Photo by Benoit Rochon, CC BY-SA 3.0.

And then?

This experiment was a conclusive success, with both participants and BAnQ delighted with the results. Wikimedia Canada and BAnQ are already planning more scan-a-thons!

All of the scanned and uploaded images can be found on Wikimedia Commons.

Thank you to everyone who participated in this project. Preserving our memory is a priority.

Benoit Rochon, President
Wikimedia Canada

by Benoit Rochon at September 18, 2017 05:09 AM

Tech News

Tech News issue #38, 2017 (September 18, 2017)

TriangleArrow-Left.svgprevious 2017, week 38 (Monday 18 September 2017) nextTriangleArrow-Right.svg
Other languages:
العربية • ‎čeština • ‎Deutsch • ‎English • ‎español • ‎suomi • ‎français • ‎עברית • ‎हिन्दी • ‎Bahasa Indonesia • ‎italiano • ‎日本語 • ‎polski • ‎русский • ‎svenska • ‎українська • ‎中文

September 18, 2017 12:00 AM

September 17, 2017

Gerard Meijssen

#Wikimedia and its #BLP approach

There is a huge controversy about the policies about the "Biographies of Living People". Central in all this is that there is no such policy at Wikidata. Many seasoned Wikipedians are of the opinion that using data in Wikipedia is a violation of its BLP policy as a consequence. At the same time there are seasoned Wikidatans who oppose a BLP policy similar to the one at Wikipedia. The problem is that Wikidata does need a BLP policy but it needs to be different for various reasons.

  • An item in Wikidata can be really rudimentary; Marian Latour, a Dutch author, was created because she won an award. This is allowed in Wikidata but the limited information is probably a violation of the English BLP policy. This information came from the Dutch Wikipedia
  • The initial data of Wikidata were the interwiki links. This was a huge improvement for the Wikipedias and there are still many items that have no statements. This is used as an argument not to accept information from Wikidata.
  • Wikidata data is retrieved from a Wikipedia, information like "who won an award". Given the BLP policy of that Wikipedia is should be faultless but it often is not due to disambiguation issues. 
The first issue refers to a red link on the Dutch Wikipedia. When the red link is associated with the Wikidata item, there will not be a new disambiguation issue when a different Marian Latour is introduced. Currently there is only one Marian Latour known to Wikidata.
The second issue is one where Wikidata statistics indicate that slowly but surely is adding statements. They also prove that there is still so much to do...
The third issue is the main one. When an article is linked to Wikidata, articles in other languages should link to the same item or to a red link. Solving these issues requires coexistence and preferably collaboration. 

What we need in a Wikipedia is the ability to link a blue or red link to a Wikidata item. Obviously changing links is either blatantly obvious like for Manuel Echeverria or it requires a source. Technically the necessary change in the MediaWiki software may be "opt in" so that only people who care about this approach to quality make use of it. 

As far as I am concerned, when some Wikipedians find fault elsewhere and do not reflect on this proposal and the improvements it brings them, that is fine. What is relevant is that this approach allows for the best Wikidata practices and at the same time improves the BLP quality in all Wikimedia projects.

by Gerard Meijssen (noreply@blogger.com) at September 17, 2017 07:50 AM

Wikimedia Foundation

Wikimedia Research Newsletter, June 2017


“Wikum: bridging discussion forums and wikis using recursive summarization”

Summary by Baha Mansurov

The paper[1] proposes a solution to the problem of information galore in online discussions by creating and testing a tool that allows editors to summarize parts of a discussion and combine these summaries into a higher level summaries until a single summary of the discussion is created. (see also the related presentation at the September 2016 Wikimedia Research Showcase)

Annual “State of Wikimedia Research” summary presentation at Wikimania

The Wikimania 2017 conference in Montreal, Canada featured the “State of Wikimedia Research 2016–2017” presentation, a quick tour of scholarship and academic research on Wikipedia and other Wikimedia projects from the last year (now an annual Wikimania tradition, dating back to 2009). The slides are available online. The highlighted research publications (many previously covered in this newsletter) were grouped into the following topic areas: “Gender gap in participation”, “Gender gap in content”, “Fake news!”, “Using Wikipedia for prediction”, “Syndication”, “Wikipedia and the world”, and “Datasets: research that enables other research”.

Conferences and events

See the research events page on Meta for upcoming conferences and events, including submission deadlines.

Other recent publications

Other recent publications that could not be covered in time for this issue include the items listed below. contributions are always welcome for reviewing or summarizing newly published research.

Compiled by Tilman Bayer
  • “Beyond neutrality: how zero rating can (sometimes) advance user choice, innovation, and democratic participation”[2] From the abstract: “Over four billion people across the globe cannot afford Internet access. […] Enter zero rating. Mobile Internet providers in the developing world now waive the data charges for services like Facebook, Wikipedia, or local job-search sites. Despite zero rating’s apparent benefits, many advocates seek to ban the practice as a violation of net neutrality.
    This Article argues that zero rating is defensible by net neutrality’s own normative lights. Network neutrality is not about neutrality for its own sake, but about advancing consumer choice and welfare, innovation in the development of new services, and democratic participation in the public sphere. Analysis of zero rating should accordingly focus on the question of how it impacts these goals: we ought to embrace zero-rating programs that advance net neutrality’s substantive goals and reserve our skepticism for those services that would sacrifice the network’s generative potential to pursue mere short-term gains. ” (About Wikipedia Zero)
  • “Fun facts: automatic trivia fact extraction from Wikipedia”[3] From the abstract: “we formalize a notion of trivia-worthiness and propose an algorithm that automatically mines trivia facts from Wikipedia. We take advantage of Wikipedia’s category structure, and rank an entity’s categories by their trivia-quality. Our algorithm is capable of finding interesting facts, such as Obama’s Grammy or Elvis’ stint as a tank gunner. In user studies, our algorithm captures the intuitive notion of ‘good trivia’ 45% higher than prior work. Search-page tests show a 22% decrease in bounce rates and a 12% increase in dwell time, proving our facts hold users’ attention.”
  • “The citizen IS the journalist: automatically extracting news from the swarm”[4] From the abstract: “… we describe SwarmPulse, a system that extracts news by combing through Wikipedia and Twitter to extract newsworthy items. We measured the accuracy of SwarmPulse comparing it against the Reuters and CNN RSS feeds and the Google News feed. We found precision of 83 % and recall of 15 % against these sources.”
  • “Production of scientific information on the internet: the example of Wikipedia” (“Produktion von naturwissenschaftlichen Informationen im Internet am Beispiel von Wikipedia”, in German)[5] From the English abstract: “On the internet, lay people cannot only passively receive scientific information, they can also actively produce it. How do lay people process uncertain and contradictory information? […] little is yet known about the factors that influence the production of natural science information by lay people on the Internet. In our article, we discuss a variety of influencing factors and derive predictions about how these factors affect the production behaviors and the resulting text products. Finally, we illustrate our considerations using the online encyclopaedia Wikipedia.”
  • “Building an encyclopedia with a wiki? Looking back at Wikipedia’s editorial policy” (“Construire une encyclopédie avec un wiki ? Regards rétrospectifs sur la politique éditoriale de Wikipédia”, in French)[6] From the English abstract: “[The author] studied the discussions on applying rules to source citation and identified two streams that illustrate the editorial policy known as ‘wiki pole’ and ‘encyclopedia pole’. Although these two epistomological regimes may appear mutually contradictory, in fact this policy aims at finding balance between the wiki’s potential and the requirements of trustworthiness inherent in producing an encyclopedia.”
  • “Persistent Bias on Wikipedia. Methods and Responses”[7] From the abstract: “Techniques for biasing an entry include deleting positive material, adding negative material, using a one-sided selection of sources, and exaggerating the significance of particular topics. To maintain bias in an entry in the face of resistance, key techniques are reverting edits, selectively invoking Wikipedia rules, and overruling resistant editors. Options for dealing with sustained biased editing include making complaints, mobilizing counterediting, and exposing the bias. To illustrate these techniques and responses, the rewriting of my own Wikipedia entry serves as a case study.” (about the article Brian Martin)
  • “Multi-cultural Wikipedia mining of geopolitics interactions leveraging reduced Google matrix analysis”[8] From the abstract: “Wikipedia stores valuable fine-grained dependencies among countries by linking webpages together for diverse types of interactions (not only related to economical, political or historical facts). We mine herein the Wikipedia networks of several language editions using the recently proposed method of reduced Google matrix analysis. […] Our study concentrates on 40 major countries chosen worldwide. Our aim is to offer a multicultural perspective on their interactions by comparing networks extracted from five different Wikipedia language editions, emphasizing English, Russian and Arabic ones. We demonstrate that this approach allows to recover meaningful direct and hidden links among the 40 countries of interest.” (See also earlier coverage of related papers by some of the same authors: ‘Wikipedia communities’ as eigenvectors of its Google matrix” , “How Wikipedia’s Google matrix differs for politicians and artists“)
  • “Enriching Wikidata with frame semantics”[9] From the paper: “To increase the usability of WD [Wikidata] for NLP tasks, we aim at enriching WD with linguistic information by aligning it to the famous lexicon FrameNet … Specifically, we aim to find a mapping between WD facts, e.g. educated at(Person, University) and similar structures in expert lexical resources. […] in addition to the direct result of enriching WD with linguistic information, the alignments can be used to refine the property structure of WD by inducing new general/specific properties. For instance, the property killed by refers to someone (victim) killed by somebody else (killer). However, the property does not distinguish between different kinds of killing, such as execution. In FN such information is already captured through the frames Execution and Killing, where the former frame inherits from the latter. By aligning killed by to both frames, the property killed by can refined by introducing a new sub-property: executed by.”
  • “Explicit neutrality in voter networks – an analysis of the requests for adminship (RfAs) in Wikipedia” (“Explizite Neutralität in Wählernetzwerken – Eine Analyse der Requests for Adminship (RfAs) in Wikipedia”, in German)[10] Translated from the abstract: “This paper examines requests for adminship (RfAs) in Wikipedia. In particular, we are answering the research question about what increases the probability that someone provides a neutral vote about a potential administrator. … The results indicate a strong tendency toward neutral reciprocity (i.e. a higher probability that user A votes neutral on user B who himself had voted neutral on user A) and neutral balance (i.e. a higher probability that user A votes neutral on another user B, who has received an opposing vote from user C, who in turn had received an opposing vote from user A).”
  • “Keeping Ottawa honest—one tweet at a time? Politicians, journalists, Wikipedians and their Twitter bots”[11] From the abstract: “WikiEdits bots are a class of Twitter bot that announce edits made by Wikipedia users editing under government IP addresses, with the goal of making government editing activities more transparent. This article examines the characteristics and impact of transparency bots, bots that make visible the edits of institutionally affiliated individuals by reporting them on Twitter. We map WikiEdits bots and their relationships with other actors, analyzing the ways in which bot creators and journalists frame governments’ participation in Wikipedia. We find that, rather than providing a neutral representation of government activity on Wikipedia, WikiEdits bots and the attendant discourses of the journalists that reflect the work of such bots construct a partial vision of government contributions to Wikipedia as negative by default.”


  1. Zhang, Amy X.; Verou, Lea; Karger, David (2017). Wikum: bridging discussion forums and wikis using recursive summarization. CSCW ’17. New York, NY, USA: ACM. pp. 2082–2096. ISBN 9781450343350. doi:10.1145/2998181.2998235.  Closed access
  2. Ard, BJ (May 1, 2016). “Beyond neutrality: how zero rating can (sometimes) advance user choice, innovation, and democratic participation”. Maryland Law Review 75 (4): 984. ISSN 0025-4282. 
  3. Tsurel, David; Pelleg, Dan; Guy, Ido; Shahaf, Dafna (December 12, 2016). “Fun facts: automatic trivia fact extraction from Wikipedia”. arXiv:1612.03896 [cs].  (preprint), published version: https://dl.acm.org/citation.cfm?id=3018709 Closed access, author’s copy: http://www.pelleg.org/shared/hp/download/fun-facts-wsdm.pdf
  4. Oliveira, João Marcos de; Gloor, Peter A. (2016). “The citizen IS the journalist: automatically extracting news from the swarm”. In Matthäus P. Zylka, Hauke Fuehres, Andrea Fronzetti Colladon, Peter A. Gloor (eds.). Designing Networks for Innovation and Improvisation. Springer Proceedings in Complexity. Springer International Publishing. pp. 141–150. ISBN 9783319426969.  Closed access
  5. Nestler, Steffen; Leckelt, Marius; Back, Mitja D.; Beck, Ina von der; Cress, Ulrike; Oeberst, Aileen (July 1, 2017). “Produktion von naturwissenschaftlichen Informationen im Internet am Beispiel von Wikipedia”. Psychologische Rundschau 68 (3): 172–176. ISSN 0033-3042. doi:10.1026/0033-3042/a000360. Retrieved 2017-07-29.  Closed access
  6. Sahut, Gilles (January 6, 2017). “Construire une encyclopédie avec un wiki ? Regards rétrospectifs sur la politique éditoriale de Wikipédia”. I2D – Information, données & documents. me 53 (4): 68–77. ISSN 0012-4508.  Closed access
  7. Martin, Brian (2017). “Persistent Bias on Wikipedia. Methods and Responses”. Social Science Computer Review.  Closed access Author’s copy
  8. Frahm, Klaus M.; Zant, Samer El; Jaffrès-Runser, Katia; Shepelyansky, Dima L. (December 23, 2016). “Multi-cultural Wikipedia mining of geopolitics interactions leveraging reduced Google matrix analysis”. arXiv:1612.07920 [nlin, physics:physics].  (preprint), published version: http://www.sciencedirect.com/science/article/pii/S0375960116321879 Closed access
  9. Mousselly-Sergieh, Hatem; Gurevych, Iryna (2016). “Enriching Wikidata with frame semantics”. Semantic Scholar. 
  10. Putzke, Johannes; Takeda, Hideaki (January 23, 2017). “Explizite Neutralität in Wählernetzwerken – Eine Analyse der Requests for Adminship (RfAs) in Wikipedia”. Wirtschaftsinformatik 2017 Proceedings. Closed access
  11. Ford, Heather; Dubois, Elizabeth; Puschmann, Cornelius (October 12, 2016). “Keeping Ottawa honest—one tweet at a time? Politicians, journalists, Wikipedians and their Twitter bots”. International Journal of Communication 10 (0): 24. ISSN 1932-8036. 


Wikimedia Research Newsletter
Vol: 7 • Issue: 6 • June 2017
This newsletter is brought to you by the Wikimedia Research Committee and The Signpost
Subscribe: Syndicate the Wikimedia Research Newsletter feed Email WikiResearch on Twitter WikiResearch on Facebook[archives] [signpost edition] [contribute] [research index]

by Tilman Bayer at September 17, 2017 03:30 AM

September 16, 2017

Wikimedia Cloud Services

Toolforge provides proxied mirrors of cdnjs and now fontcdn, for your usage and user-privacy

Tool owners want to create accessible and pleasing tools. The choice of fonts has previously been difficult, because directly accessing Google's large collection of open source and freely licensed fonts required sharing personally identifiable information (PII) such as IPs, referrer headers, etc with a third-party (Google). Embedding external resources (fonts, css, javascript, images, etc) from any third-party into webpages hosted on Toolforge or other Cloud VPS projects causes a potential conflict with the Wikimedia Privacy Policy. Web browsers will attempt to load the resources automatically and this will in turn expose the user's IP address, User-Agent, and other information that is by default included in an HTTP request to the third-party. This sharing of data with a third-party is a violation of the default Privacy Policy. With explict consent Toolforge and Cloud VPS projects can collect and share some information, but it is difficult to secure that consent with respect to embedded resources.

One way to avoid embedding third-party resources is for each Tool or Cloud VPS project to store a local copy of the resource and serve it directly to the visiting user. This works well from a technical point of view, but can be a maintenance burden for the application developer. It also defeats some of the benefits of using a content distribution network (CDN) like Google fonts where commonly used resources from many applications can share a single locally cached resource in the local web browser.

Since April 2015, Toolforge has provided a mirror of the popular cdnjs library collection to help Toolforge and Cloud VPS developers avoid embedding javascript resources. We did not have a similar solution for the popular Google Fonts CDN however. To resolve this, we first checked if the font files are available via bulk download anywhere, sort of like cdnjs, but they were not. Instead, @zhuyifei1999 and @bd808 have created a reverse proxy and forked a font-searching interface to simplify finding the altered font CSS URLs. You can use these features to find and use over 800 font families.

You can use these assets in your tools now!



Onwiki docs

Please give us feedback on how these features could be further improved, submit patches, or just show us how you are using them!

by Quiddity (Nick Wilson) at September 16, 2017 11:27 AM

September 15, 2017

Wiki Education Foundation

Closing the knowledge gap about political science

Earlier this month, Wiki Education joined political scientists at the American Political Science Association’s (APSA) annual meeting here in San Francisco. Outreach Manager Samantha Weald and I spent the week discussing the massive gap between what experts know and what the public understands about political science. So many experts research important, interesting topics about governments and political behavior, yet the information can be difficult to understand if you haven’t spent a career devoted to it. That’s where Wikipedia comes in.

Wikipedia often provides a fast, simple way for curious people to enhance their understanding of the world. Since hundreds of millions of people access the website every month, Wiki Education believes it’s critical that the content reflects the current available research about political theory. We have already proven that students can bridge the gap and add the information they’re studying to Wikipedia, amplifying the impact of their education. Now we invite more political science instructors to make this a priority and join our Classroom Program.

Thanks to our partnership with the Midwest Political Science Association, we already created an editing guide for students on how to edit political science articles on Wikipedia. Students using that guide and the suite of tools we offer to instructors teaching with Wikipedia can join thousands of students who have improved Wikipedia’s articles like Arab Spring, warlordsconstitutional patriotism, and vaccination policies.

If you’re interested in joining our initiative to bring better political science to the public and engage students in a meaningful assignment, email contact@wikiedu.org.

by Jami Mathewson at September 15, 2017 03:58 PM

Lorna M Campbell

A new lease of life for your holiday snaps

I’ve been spending most of my evenings this week looking through photographs on old laptops, not because I’ve been overtaken by a fit of nostalgia, the reason I’m trawling through old holiday snaps is that I’m looking out pictures to submit to this year’s Wiki Loves Monuments competition.  And as a former archaeologist, monuments feature very heavily among my holiday pics :}

Wiki Loves Monuments is the worlds biggest photography competition which runs annually during the whole month of September.  The rules are simple, all you have to do is upload a high quality picture of a scheduled monument or listed building to Wiikimedia Commons through one of the competition upload interfaces.  You can browse monuments to photograph using this interactive map, or you can search for monuments using this interface, this is the one I’ve been using but it’s all a matter of preference. The competition is open to amateurs and professionals alike and you don’t even need a camera to enter, mobile phone pictures are fine as long as they’re of decent quality. You can enter as many times as you like, and you can submit entires taken anywhere in the world as long as you own the copyright and are willing to share them under a CC BY SA licence.

I’ve been meaning to enter Wiki Loves Monuments for years and it’s in no small part due to the persuasive powers of my colleague Ewan McAndrew, Wikimedian in Residence at the University of Edinburgh, that I’ve finally got my act together to enter.  A little healthy competition with our Celtic cousins also hasn’t done any harm….At the time of writing Wales had 510 entries, Scotland 289, Ireland 197.   You know what you need to do :}

Some of my more energetic colleagues at the University of Edinburgh have been out and about of an evening snapping pictures all over the city and beyond, but I’ve decided to raid my back catalogue instead.  So far I’ve unearthed and uploaded pics of Culzean Castle and Camellia House, Mount Stuart, Waverley Station, Teviot Row, St Giles Cathedral, the General Register Office, Sloans Ballroom, University of Glasgow Cloisters, Kibble Palace, and Garnet Hill Highschool for Girls.  My pictures might not win any prizes but it’s a great way to contribute to the Commons and create new open educational resources!  If you’ve got  old snaps lurking on a laptop or hard drive, why not give them a new lease of life on the Commons too? 🙂

Camellia House, Culzean Castle, CC BY Lorna M. Campbell

Wiki Loves Monuments
Scotland loves Monuments 2017 by Ewan McAndrew
Wanderings with a Wikimedian by Anne-Marie Scott

by admin at September 15, 2017 12:36 PM

Priyanka Nag

Moving on...

Life doesn't necessarily need to go as per our plans. And when those big plans crash bad, things can get pretty tough to deal with.

What I am writing today is nothing unique or being told for the first time. Its just my version of one of the world's most common problems of today's world...a heart-break!

A recent breakup had left me pretty shattered! When we get overly attached to someone, it gets difficult to imagine a life without their presence in it...well, that's pretty normal and that's pretty human! But whats important and what differs from people to people is how they deal with such situations.

For me, after I was stuck in a helpless situation, and after crossing all stages of breakup and finally reaching the "acceptance" stage, I knew I had to do something about it soon to not get into a severe state of depression. So, I tried a few things:

  • Switch to work.
    I always get workaholic when I need to deal with any of life's crisis situations! For me, this works better than alcohol. Work so hard that there is almost no time to think of anything else. Work till you are so tired and you crash the moment you hit your bed. 
  • Work on yourself.
    When we are in love, we often tend to become the person our lovers want us to be...forgetting what we really are or what we would like to be! Its often after a bad breakup that we get some time to give this a thought. I got to this state too. I started wondering around the different things I wanted to try on myself and what changes I wanted to see in me. I tried a lot of things which I was otherwise scared to experiment with. I got myself inked...I got my hair colored...I joined the gym and so on. Every one of these gave me an immense sense of achievement! I loved the changes I saw in the mirror.
  • Take a trip.
    I wanted to go on a solo trip for a while, but never did plan it. A heartbreak can be a real good motivator to make you do things you have been procrastinating for a while. I took a solo trip to Ladakh. Biking on the Himalayan off-routes for 10 days made me discover a whole new me. The totally carefree me that I found during this trip was definitely a happier and better version of the otherwise me who would give too much importance to people or society. 

Well, no breakups can ever be the right thing to happen in anyone's life...but since its a part of life, dealing with it the right way is definitely important.

Have I moved on?

Today, I can really say YES. Not that the thoughts and memories of the past doesn't hurt anymore...ofcourse they do...but I don't cry over them anymore.

And ya...not to forget, I don't intend to hold myself back from falling in love again or stop myself from experiencing those beautiful butterflies in stomach feelings! One thing I have learnt during this journey...there is no hard line of right and wrong. If something makes me happy, its got to be right...its okay if the world disagrees!

by Priyanka Nag (noreply@blogger.com) at September 15, 2017 11:47 AM

September 14, 2017

Weekly OSM

weeklyOSM 373


Alternative Text

Disaster OpenRouteService now active in the Caribbean, North America (incl. Mexico) and Bangladesh [1] | Map data © OpenStreetMap contributors, powered by MapSurfer.NET, © Leaflet

About us

  • We are always looking for people to help us improve our newsletter so it can get out faster, have more depth and coverage, and generally improve it for our readers, like you. Please join our team by contacting us now, it’s fun! 😉


  • Joost asks in the tagging mailing list about an alternativebarrier=cattle_grid for its electrical equivalent.
  • John Eldredge noticed on a user was adding the height of peaks in the name=* tag, and asks on the tagging mailing list for community feedback.
  • Penegal announces his long-prepared proposal to refine the tagging of sinkholes.
  • dieterdreist criticized the preset to pedestrian crossings in the iD Editor, since it automatically adds the tag crossing=zebra, as this does not apply to all the crossings.


  • User vtcraghead asks for contact to local mappers in the Dominican Republic as the talk-do mailing list is “pretty quiet”.
  • OpenStreetMapMx says “Thanks a lot to the digital volunteers mapping #Juchitan”. The southern part of the city is already 90% finished, but the north still needs your help: HOT Tasking Manager
  • After a feedback workshop for capacity building, both technical and organizational, held at the end of August, members from OpenStreetMap Benin later participated (last week) in the National conference of free software in Cotonou, where they promoted OpenStreetMap and free geodata.
  • User @cumberdumb noted on Twitter that the speed bumps he mapped in his area where mainly in low income areas. SK53 notes that in Nottingham there is only a weak correlation between low income and speed bumps.


OpenStreetMap Foundation

  • The license working group (LWG) added point 3.3.6 in the draft of a trademark policy. The LWG tried to dispel the feedback on the mailing list talk (we reported). Further feedback is welcome on the mailing list talk.
  • The next public OSMF board meeting will take place on September 21th. They will talk about changes to the Travel Policy.


  • The Mali OpenStreetMap community – in collaboration with the University of Ségou – announces its 2nd Capacity Development Camp in Segou, from 11 to 15 September. Participation is for around 25 students from Geography and IT. This year, officers from Segou townhall will also be trained from 18 to 20 September
  • Violaine Doutreleau announced on the Talk-fr list that CartOng is organizing a Mapathon in collaboration with several OSM contributors from francophone African countries during OSMGeoWeek from November 12 to 18. The objective is to organize or support mapathons in France and in several countries of French-speaking Africa (Niger, Mali, Madagascar, Senegal, Burkina Faso).
  • The first weekend in September the Elbe-Labe-Meeting 2017 took place in
    Germany. Read the report and watch the film.

Humanitarian OSM

  • Hundreds of mappers are currently engaging in HOT efforts to map the regions struck by Hurricane Irma by tracing buildings and roads in Florida and the Caribbean. For assisting the recovery there are planned buildings imports for Tampa and Clearwater, Florida.
  • 2017 Mexico Earthquake Response is an activation of the OSM community across Latin America and the Humanitarian OSM Team to provide map data to assist the response to this earthquake that devastated Mexico. This is an ongoing disaster and in need of mappers and validators for the mapping tasks.
  • [1] Luisa Janine Griesbaum reported in GIScience News Blog of the University of Heidelberg, that GIScience Heidelberg/HeiGIT updated their OpenStreetMap based Disaster OpenRouteService (a special version of OpenRouteService.org) to the following routing regions North America (incl. Mexico), Caribbean, Bangladesh in order to support the mapping and rescue activities. The data is currently updated every 24h (around 21:00 CET – thanks to Geofabrik).
  • HOT Activates for Three Disasters
  • HOT and the United Nations High Commissioner for Refugees (UNHCR) are working together to map refugee settlements in Uganda. A mapathon was organized and 18 international aid agencies participated.
  • Brynne Morris from Mapbox writes a blog which consists of the recovery and relief maps that were made for Hurricane Harvey and also calls the community to contribute to Harvey relief efforts and improve the map in communities that are immediately in the path of Harvey.
  • MAPS.ME and Humanitarian OpenStreetMap Team Partner to Crowdsource Data for Humanitarian Response


  • OpenStreetBrowser now supports permalinks (as reported last week) which codes your current map position, opened categories and opened map objects. It is now easier to share data.
  • pnorman would like that ‘place=* tags on admin-relations should be rendered so that errors in tagging are visible. In the discussion on Github, it is proposed that instead the corresponding border relations in the United States should be corrected.


  • Nicolas Dumoulin informs the Talk-fr discussion group about discussions with the “Conseil départemental du Puy-de-Dôme”. A Mapping party is planned and various POI should be added to the map.
  • Entur is a company owned by the Norwegian government tasked with building a multi-modal public transport planner for all of Norway. The travel planner uses OpenTripPlanner, which in turn uses OSM for foot routing.

Open Data


The JOSM stable release 12712 was published on the 17th of August. It adds Java 9 compatibility, and improves the dialogue for Overpass API download and search based on presets are special highlights. Bogdans Afonins worked on those 2 features as part of GSoC 2017. We want to thank the whole JOSM developer team for their continued efforts to provide us this great editing tool.

Did you know …

  • … the new overview to clean up errors on OSM provided by Mapbox?
  • UpdateWindy.com, an OSM based live map which shows wind activities (clouds, temperature, cloud tops, cloud base, waves and much more besides) worldwide? Windy is also available as an application for mobile phones. (Apple, Android) – Windy.com uses OpenStreetMap data without proper attribution as required by the OpenStreetMap License. See it here on Lacking proper attribution. See the discussion on talk as well.

OSM in the media

Other “geo” things

  • A map showing Hurricane Irma translated to Europe may have exaggerated it’s size. SK53 corrected the projection onto Mercator corrected.
  • Glenn Bech shares a 3D print of his OpenStreetMap export from New York, you can also have one for yourself using ‘Thingireverse‘.
  • Map Stylizer, version 1 of a web app to create highly stylized (and not-so-useful) maps.
  • Paul Plowman explains how he discovered two adjacent houses on the same road with the same housenumber.

Upcoming Events

Where What When Country
Zaragoza Mapping Party #Zaccesibilidad Arrabal, Mapeado Colaborativo 2017-09-16 spain
Moscow Big Schemotechnika 11 2017-09-16 russia
Nishinomiya 【西国街道#09・最終回】西宮郷・酒蔵マッピングパーティ 2017-09-16 japan
Nara 防災トレジャーハンター~自分を守り、地域を守る!~(UDC2017) 2017-09-16 japan
Buenos Aires 2do Mapatón OSM Baires @ Instituto Geográfico Nacional 2017-09-16 argentina
Rennes Cartographie collaborative du Musée de Bretagne, pour les Journées européennes du patrimoine 2017-09-16-2017-09-17 france
Nantes Participation aux Journées européennes du patrimoine à l’École de Longchamp 2017-09-16-2017-09-17 france
Montreal Cartopartie dans Hochelaga-Maisonneuve 2017-09-17 canada
Bonn Bonner Stammtisch 2017-09-19 germany
Lüneburg Mappertreffen Lüneburg 2017-09-19 germany
Nottingham Nottingham Pub Meetup 2017-09-19 united kingdom
Scotland Pub meeting, Edinburgh 2017-09-19 united kingdom
Cologne Stammtisch 2017-09-20 germany
Osaka もくもくマッピング! #09 2017-09-20 japan
Paris Mapathon Missing Maps @LLL, Liberté Living Lab 2017-09-21 france
Arizona Josm Workshop at 2017 AGIC Conference, Prescott 2017-09-21 united states
Karlsruhe Stammtisch 2017-09-21 germany
Liguria Wikigita a Santo Stefano Magra, Santo Stefano di Magra, La Spezia 2017-09-23 italy
Tokyo 東京!街歩かない!マッピングパーティ3 2017-09-23 japan
Patan State of the Map Asia 2017 2017-09-23-2017-09-24 nepal
Taipei OpenStreetMap Taipei Meetup, MozSpace 2017-09-25 taiwan
Bremen Bremer Mappertreffen 2017-09-25 germany
Graz Stammtisch Graz 2017-09-25 austria
Berlin 111.1 Berlin-Brandenburg Sonderstammtisch Intergeo 2017-09-25 germany
Salt Lake City OSM Utah GeoBeers 2017-09-26 united states
Berlin Intergeo 2017 2017-09-26-2017-09-28 germany
Lyon Mapathon missing maps à Lyon à l’atelier des médias, L’atelier des médias 2017-09-28 france
Lübeck Lübecker Stammtisch 2017-09-28 germany
Colorado Boulder]] State of the Map U.S. 2017, [[Boulder 2017-10-19-2017-10-22
Buenos Aires FOSS4G+State of the Map Argentina 2017 2017-10-23-2017-10-28 argentina
Brussels FOSS4G Belgium 2017 2017-10-26 belgium
Lima State of the Map LatAm 2017 2017-11-29-2017-12-02 perú
Bonn FOSSGIS 2018 2018-03-21-2018-03-24 germany

Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please check your event in our public calendar preview and correct it, where appropriate.

This weeklyOSM was produced by Nakaner, PierZen, Polyglot, SK53, SeleneYang, Spanholz, derFred, doktorpixel14, ec, jinalfoflia, keithonearth, sev_osm.

by weeklyteam at September 14, 2017 10:11 PM

Wikimedia Foundation

Wait, what? Split brain, when two personalities live in one body

Photo by Dean Hochman, CC BY 2.0.

Since the 1940s, neurosurgeons have been performing corpus callosotomy—a surgery that serves as a last resort for treating epilepsy. It included cutting through the corpus callosum, which functions as the main connection between the two hemispheres of the brain. The procedure was considered dangerous by some and less preferred by many, yet it relieved most patients from unbearable epileptic seizures.

Corpus callosotomy “keeps the electrical signals that cause a seizure from crossing over and wreaking havoc,” says Emily Temple-Wood, Wikipedia editor and medical student who was named the Wikipedian of the Year in 2016. “It’s amazing how well these patients adapt and recover, and this is all due to how plastic the brain is.”

Studying those patients helped neuroscientists make sense of how the two halves of the brain work together, what are the functions of each of them and what would happen if they worked separately. In the last mentioned case, the brain behaved as if there were two separate minds, or what they called later split-brain. Wikipedia tells us about it that:

After the right and left brain are separated, each hemisphere will have its own separate perception, concepts, and impulses to act. Having two “brains” in one body can create some interesting dilemmas. When one split-brain patient dressed himself, he sometimes pulled his pants up with one hand (that side of his brain wanted to get dressed) and down with the other (this side didn’t). Also, once he grabbed his wife with his left hand and shook her violently, so his right hand came to her aid and grabbed the aggressive left hand. However, such conflicts are actually rare. If a conflict arises, one hemisphere usually overrides the other.

At the Beyond Belief conference in 2006, neuroscientist Vilayanur S. Ramachandran shocked the audience with a special case where the patient was half atheist, half religious. But how had Ramachandran been able to interrogate the two halves of his patient?

Since the right hemisphere manages the left side of the body and vice versa, Ramachandran found his way to separately communicate with the two sides by whispering into the patient’s right ear to ask a question to the left hemisphere. He did the same with the left ear to communicate with the right hemisphere.

The major concern with that plan was how to get answers from the right hemisphere. Having the communications center (that controls speaking) on the left side means that it is only possible for the left hemisphere to verbally communicate. So, to get answers from the right hemisphere, the patient was shown a piece of paper with yes and no options to choose from using their left hand.

Studying split-brain patients helped tremendously with distinguishing the differences between the two hemispheres’ functions. While the left side is usually responsible for the language computation, the right side is the face-recognizing expert.

Painting by Vertumnus, public domain.

When a normal person is shown a painting by the Italian artist Giuseppe Arcimboldo, known for drawing portraits composed of objects (like the one above), they will usually recognize it as a face made of vegetables, fish or flowers. This is not the case for split-brain patients, however—their left side will recognize a face in the painting, while the right side will only see the objects.

“What I find particularly interesting is that consciousness is maintained as a unified state even when the two hemispheres of the brain are disconnected,” Temple-Wood explains. “We don’t understand consciousness very well at all, and it used to be thought that disconnecting the hemispheres would lead to being ‘of two minds’, quite literally. But there’s no connection between something like dissociative identity disorder and callosotomy!”

Michael Gazzaniga, one of the leading researchers in cognitive neuroscience, has dedicated a large amount of his life to studying the split-brained patients. He was concerned about how they act emotionally and physically in comparison to those who do not have a split brain—and if every one of us has two different minds, “why [do] people, including split-brain patients, have a unified sense of self and mental life”?

Nature magazine featured one of Gazzaniga’s favorite examples when he recalled “flashing the word ‘smile’ to a patient’s right hemisphere and the word ‘face’ to the left hemisphere, and asked the patient to draw what he’d seen. “His right hand drew a smiling face,” Gazzaniga recalled.

– “’Why did you do that?’ I asked.
– The patient said, ‘What do you want, a sad face? Who wants a sad face around?’.”

The patient’s left hemisphere had made up a story to verbally justify his drawing, and it had no idea why he made the face smiling because he hadn’t seen the word ‘smile’. “The left-brain interpreter,” Gazzaniga says, “is what everyone uses to seek explanations for events, triage the barrage of incoming information and construct narratives that help to make sense of the world.”

You can read more about split-brain in Wikipedia’s article about it.

Samir Elsharbaty, Digital Content Intern
Wikimedia Foundation

by Samir Elsharbaty at September 14, 2017 07:10 PM

Wiki Education Foundation

Chapter in the history of racism in America added by a Visiting Scholar

In 1843, an African American pioneer, James D. Saules, gained ownership of a farm in Oregon. The previous owner had employed a Wasco man named Cockstock, to whom he promised a horse as payment. When Saules refused to honor the previous owner’s agreement, Cockstock took the horse and threatened the farmers. Saules complained to local officials, who began pursuit of Cockstock. Eventually, he returned to town with a small group of other Wascos and an interpreter to confront the settlers that had been looking for him. When one townsperson went to arrest Cockstock, a melee broke out, leading to the deaths of Cockstock and two settlers.

Local discourse following these events, which came to be known as the Cockstock Incident, veered in a direction one probably wouldn’t expect—a fear that African Americans would incite a violent uprising by Native Americans against white settlers.

In 1844, the territory’s provisional government passed a law purporting to ban slavery in Oregon, which was already prohibited. In addition to mandating, superfluously, that all slaves be freed, it also required that all freed slaves leave its borders within 2-3 years (2 for men, 3 for women) under penalty of lashing. It was the first of three exclusion laws aimed at African Americans in Oregon.

Eryk Salvaggio, Visiting Scholar at the John Nicholas Brown Center for Public Humanities and Cultural Heritage at Brown University
Image: Eryk Salvaggio.jpg, by Eryk Salvaggio, CC BY-SA 4.0, via Wikimedia Commons.

The laws were repealed by 1926, and although it appears the 1844 law was not enforced, its passage marks an important, chilling, chapter in the histories of America, Oregon, and African Americans. Until last month, coverage of these subjects on Wikipedia was scant. That’s when Eryk Salvaggio, Wikipedia Visiting Scholar at Brown University’s John Nicholas Brown Center for Public Humanities and Cultural Heritage, created articles on the Cockstock Incident and Oregon black exclusion laws. Both articles have been featured in the “Did You Know” section of Wikipedia’s Main Page:

“[Did you know] … that an argument over a horse led to a law banning all black settlers from Oregon in 1844?” (August 15)
“[Did you know] … that an 1844 Oregon law required all slaves to be freed—and all freed slaves to leave Oregon?” (September 11)

Tracking down sources about historical subjects can be challenging, since so much is locked behind expensive paywalls or only available through particular institutions. That’s why we like the Wikipedia Visiting Scholars program so much. Wikipedia editors like Eryk gain access to the rich resources available through institutions like Brown, which wants to see those resources put to good use on the world’s most popular source of information.

For more information about the Visiting Scholars program, see the Visiting Scholars page on our website or email visitingscholars@wikiedu.org

Image: Oregon City and Willamette Falls, 1867.jpg, by Carleton Watkins, public domain, via Wikimedia Commons.


by Ryan McGrady at September 14, 2017 04:09 PM

Wikimedia Cloud Services

Introducing the Cloud Services Team: What we do, and how we can help you

24% of Wikipedia edits over a three month period in 2016 were completed by software hosted in Cloud Services projects. In the same time period, 3.8 billion Action API requests were made from Cloud Services. We are the newly formed Cloud Services team at the Foundation, which maintains a stable and efficient public cloud hosting platform for technical projects relevant to the Wikimedia movement. -- https://blog.wikimedia.org/2017/09/11/introducing-wikimedia-cloud-services/

With a lot of help from @MelodyKramer and the Wikimedia-Blog team, we have published a blog post on the main Wikimedia blog. The post talks a bit about why we formed the Wikimedia Cloud Services team and what the purpose of the product rebranding we have been working on is. It also gives a shout out to a very small number of the Toolforge tools and Cloud VPS projects that the Wikimedia technical community make. I wish I could have named them all, but there are just too many!

by bd808 (Bryan Davis) at September 14, 2017 12:39 AM

September 13, 2017

Wikimedia Foundation

Exploring Wikimedia’s gender gap with six contributors from Scandinavia

Image by Alreadymildneon, CC BY-SA 4.0.

This summer, Sabine Rønsen, a library and information sciences student at the Oslo and Akershus University College of Applied Sciences, conducted six interviews with Wikimedia contributors from Norway and Sweden. Her goal was to gain a deeper understanding of why they contribute to the projects and to see how their experiences as women or gender minority participants affect their experiences editing. The views in this essay are Sabine’s alone; we are publishing her thought-provoking essay, where she shares her higher-level findings, on the Wikimedia Blog. If you are interested in this topic, please see these resources on Meta-Wiki.

By any objective measure, Wikipedia needs more female contributors. Although much effort has been spent in correcting this gender gap, it’s not clear how much impact this is having on the number of women editors. To recruit women more efficiently, one must look at the underlying causes of the gender gap and lack of contributor diversity. It is equally important to understand why those who already contribute actually remain.

I’m a student at Oslo and Akershus University College of Applied Sciences, where I’m studying library and information science. Over this summer, I worked with Wikimedia Norway (Norge). In an effort to make some inroads into these questions, I talked with five female contributors from Norway and Sweden, one of which is now based in another European country and edits almost exclusively on English Wikipedia. I also talked to a contributor who identifies as non-binary transgender. My goal was to get a deeper insight into their ideas around their own role on Wikipedia, how being a woman or gender minority affects their experience of editing, and to uncover their ideas on how to recruit more diversely.

Wikipedia is written by volunteers, and these people spend hours and hours to add on to the sum of human knowledge. It has become a hobby for many of them, and that is no different for the people I interviewed. Most hadn’t experienced much difficulty being a female editor, attributing this to having carefully maintained gender-neutral user accounts, although all of them had witnessed it happen to others. Still, the base problem, in their view, doesn’t stem from being outed or mistreated as a user—they are structural issues that appear more in content, how topics are discussed, and in general behaviour.

One of my interviewees could tell stories about various wiki-meetups where, whenever she or another female were helping beginners, guys would cut in and take over the conversation. Although well intended, this can actually scare away beginners, especially women. Many women and gender minorities experience misogyny, discrimination, or simply being underestimated at work, home, and elsewhere in their everyday lives. A question they must ask themselves is if they really want this in their hobby?

Another two of the women I spoke to are long-time contributors, having edited Wikipedia since the project’s founding. They spoke of a closed environment where vested contributors had developed a strong feeling of ownership over the project.  In and of itself, this isn’t necessarily an issue. A problem arises only when this mentality makes newcomers feel excluded or not qualified to contribute.

None of the interviewees thought technical aspects of editing were the most difficult—it was figuring out what the best practices for editing are. Technical hurdles can be overcome with some teaching and practice, but discovering best practices was largely an exercise that could only be discovered through trial and error, and a virulent reaction to a newcomer’s first error can mean that they aren’t willing to try again.

This is where the importance of polite and constructive feedback is clear, versus, as one interviewee put it, the “cross language, articles deleted without explanation, and expectation that one should know all of the rules from their first edit.” In addition, the Swedish editors brought up the impact of various in-person meetups.  These events represented a safe place to ask “stupid” questions, meet people, and discuss the Wikipedia community and its culture.

One of the contributors I spoke to expressed concern that a lot of the rather unfortunate practices at Wikipedia are being cemented in place—that in attempting to be as encyclopedic as possible, we’re at risk of falling victim to same conservatism and exclusive thinking that has characterized traditional encyclopedias. And this is somewhat of a paradox because historically encyclopedias has been known for being progressive. Just think of the revolutionary French Encyclopédie. Did you know that to avoid censorship, the writers often hid statements that were controversial at the time in obscure articles and cunning cross-references?

As for the question of making Wikipedia your hobby despite disheartening experiences, all of the people I interviewed answered this with a resounding yes. To them, the importance of having history written by all segments of society outweighs all other considerations. The way Wikipedia is set up can encourage traditionally underrepresented populations to contribute their own stories. As one pointed out, treating each other decently will not only make recruiting women and those with other gender identities easier, it will make Wikipedia a more attractive place for experts to share their own knowledge—people who are perhaps less accustomed to ‘internet culture.’

And as to why my interviewees all still edit, they named two things. First, they view their work as important. They are helping gift knowledge to the world.

And second? Editing is, quite simply, addictive.

Sabine Rønsen, Wikimedia Norway

by Sabine Rønsen at September 13, 2017 06:05 PM

Wiki Education Foundation

From the librarian perspective: A relationship with Wikipedia as an education tool

Kelee Pacion is the Instruction Coordinator for the Albert R. Mann Library at Cornell University. In this post, she talks about the impact courses she’s taught with Wikipedia have had on her students.

My interest in Wikipedia really stems from a love/hate relationship I used to have for the platform. I always found the notion of everyone agreeing to the truth to be fraught with issue, and Wikipedia was the embodiment of that idea… at least in my mind. Fast forward to an incident where I found myself struggling to identify if there is any ecological value to the wasp to justify changing an entry on Wikipedia. I was struck with the notion of how much can be learned while verifying information, and wishing I could ask one of the entomology professors on the Cornell campus.

Serendipitously, I found a Cornell professor who is very interested in science communication, online platforms, and teaching with new technologies. The magic began! It felt like magic as serendipity was again with us and we discovered the amazing classroom platform built by Wiki Education and quickly tied our objectives together with the plans developed by the experts at Wiki Education. Our work at Cornell has really expanded with the use of Wikipedia in the classroom. We have taught three seminar classes using Wikipedia, and we have more in the works.

But to get at the heart of Wikipedia, the students. Many of our students that have taken the class were amazed at how much went into creating and developing articles. Our students were awed by the idea that they are creating information that will be shared with the world, they were careful about the resources they used, and they learned to be respectful of the Wikipedia editing community. Further, it was a great way to teach library skills: find, access, and engage with information to synthesize and develop ideas that are suitable to the tone of an online encyclopedia.

Some of our favorite comments from students:

“Wikipedia gets a bad reputation for being a “non-credible” source for information, since anyone can edit it. That being said, after having taken this course, I am starting to realize that if people (such as the students in this seminar) and even others take the time to edit and carefully cite articles on Wikipedia, we can make it into a way more credible source.”

“Before I did not think about how Wikipedia was maintained by a wide range of individuals doing work for free. And with this, I didn’t realize of the existence of pages such as Talk pages and the depth that lays behind Wikipedia. And with this depth, comes community. After creating my user name, I felt a greater sense of attachment. And after overcoming the five pillars, I felt a new range of inspiration and was further drawn into. The class then gave me the opportunity to add to an article and build some Wikipedia credibility (wiki-cred).”

We found that allowing students to choose the article they wish to edit, within certain parameters, they were more engaged in the process and developed a sense of belonging to the community. As the class was a one-credit seminar, we chose stub and start class articles and tasked students to develop minor extension to the articles to give the students practice with the interface, learn the discourse community, and develop the ability to evaluate appropriate information sources to help expand their article. We also found a diverse type of student enrolled in the course, ranging in gender, nationality, and age. Wikipedia was a great platform to share ideas, allow students to engage in discussion within a safe space, and then share those ideas and discussions with the Wikipedia Community.

Some of our favorite pages edited by students are adaptive unconsciousadequate stimulus, and fusiform gyrus. Many of the students went above and beyond what we asked them to add to an article, and put in a lot of work for what was a one credit, pass/fail seminar. Truly amazing work! The success of working with the students, engaging with Wiki Education, and professorial interest has encouraged my co-instructors and I to develop a three-credit, applied science communication course at Cornell, wherein we will highlight the usage and development of Wikipedia articles as a valid means of scientific communication for everyone!

Image: Cornell Mann Library Interior 4, by Bill Price III, CC-BY-SA 3.0, via Wikimedia Commons.

by Guest Contributor at September 13, 2017 04:29 PM

Wiki Playtime

Bridging real and fictional worlds

Wikidata has many, many statements about the real world, but also describes worlds of fiction or myth. Fictional entities can have almost all the same properties as real ones, but have at least one property that marks them as fictional. They should have instance of fictional character, or a subset such as fictional human, or even fictional pig. Wikidata presently has more than 40,000 fictional entities and this query gives an overview of their types.

There are also properties for present in work…

Prospero → present in work → The Tempest

…and from fictional universe

Hermione Granger → from fictional universe → Harry Potter universe

If you are only interested in real entities, fictional characters are superfluous results to be filtered out. One of my first queries was for people who had studied at Oxford University. Several fictional entities came up, including Dorothy L. Sayers’ character Lord Peter Wimsey, whose alma mater is listed as Balliol College. (Here’s the query code.)

It’s interesting to explore connections between the real and fictional worlds, for example with the named after property. We can ask Wikidata for things after which substellar objects (planets, moons, and asteroids) are named. (Here’s the query code). Jupiter’s moons are named after lovers or descendants of the Roman/Greek god Jupiter/Zeus. Saturn’s moons have Norse, Greek and Inuit inspirations, while Uranus’ moons are named after characters from Shakespeare.

Titania, Moon of Uranus, photographed by NASA, and Shakespeare’s Titania, as imagined by British artist Henry Meynell Rheam. Public domain images via Wikimedia Commons

Looking outside the solar system, to stars, nebulae and galaxies, I was surprised how few, according to Wikidata, are named after fictional or mythical entities — only 45 results for this query. (Here’s the code.)

Rather than fictional entities that connect to an aspect of the real world, we can ask for real things linked to a given fictional world. A query for things named after Shakespeare’s characters, returns 22 astronomical objects and seven other entities. (Here’s the code.)

I learned recently that the James Bond character Pussy Galore is widely regarded as inspired by Blanche Blackwell, the mistress of author Ian Fleming (thanks Melissa Highton). This set me thinking about other links between real and fictional people.

Wikidata has properties for based on and inspired by. These are easily confused (at least given their English labels) and, looking at the data, some users have added the wrong property for the fact they are trying to express. Based on is a property of works: for example a film that is based on a novel. Inspired by is a property of works of fiction or of fictional entities. They can be inspired by a specific real entity; for instance Charles Foster Kane in “Citizen Kane” was inspired by William Randolph Hearst. Alternatively, a character or fictional world can be inspired by a set of works. The Matrix is inspired by Alice in Wonderland but is not based on it.

A caveat about fictional entities in Wikidata: not every character in a book, film, or play will have a Wikidata representation. Characters need to be notable independently of a work they appear in, and this usually means that they appear in multiple notable works. Bilbo Baggins appears in multiple books and films, not to mention the Leonard Nimoy song. Sally Bowles, portrayed by Liza Minnelli in Cabaret, also appears in some other plays, films, and novels.

Let’s ask Wikidata for fictional characters that are based on real people, with descriptions of each. (Here’s the query code)

Fictional Alice and her inspiration, Alice Liddell. Public domain images via Wikimedia Commons

There are multiple ways in which a fictional character can be inspired by a real person. A character in a novel might combine characteristics and life events of multiple real people. When that character is portrayed on stage, or animated, other people might inspire the actors or artists. Disney’s animators used many different models and actresses as reference for the appearances and movements of Pocahontas and of Belle. This explains the initially bizarre Wikidata claims that Pocahontas was inspired by, among others, Kate Moss and Naomi Campbell.

I’ve added about twenty connections to the 180 or so that I found. This is an interesting list, and an educational object in its own right, but it is crying out for more relations for a more literary, less Western, and specifically less Disney-centric overview of fictional characters.

Appendix: a couple of things I learned from reading about the inspirations for fictional characters.

I thought that Captain Jack Sparrow was based on a real, historical pirate. There are certainly lots of web pages saying so, but a rumour spread by a lot of people is still a rumour. No one connected to the films backs it up. Obviously there are similarities between Sparrow and some actual pirates, but nothing to suggest that one person inspired the character, although Johnny Depp’s portrayal was inspired by Keith Richards.

Another thing “everybody knows” is that Dracula was based on Vlad the Impaler, also known as Vlad Dracula. While there are scholarly sources that say this historical Dracula inspired Bram Stoker’s creation, I heard of the book Dracula: Sense and Nonsense by Elizabeth Miller which, based on Stoker’s own notes, explains that Stoker was likely not even aware of Vlad the Impaler, and chose the word Dracula because it meant “Devil”. This is a case of a consensus of experts that has been overturned by more recent research, so should be treated as controversial at best, if not disproven.

Bridging real and fictional worlds was originally published in Wiki Playtime on Medium, where people are continuing the conversation by highlighting and responding to this story.

by Martin L Poulter at September 13, 2017 12:57 PM

This month in GLAM

This Month in GLAM: August 2017

by Admin at September 13, 2017 08:21 AM

September 11, 2017

Wiki Education Foundation

Roundup: African Archaeology

For all of his swagger and bravado, Indiana Jones makes a terrible archaeologist. With all due apologies to Harrison Ford and Steven Spielberg, Indiana was always slightly more interested in the treasure and his fetching female companions than he was with the “who, what, when, where, how, and why” of the historical sites he visited — even when he wasn’t trying to beat his enemies to the finish line. Real archaeologists are more interested in the value of the knowledge they can glean from their finds than their monetary value.

Africa is one country that would be a veritable treasure trove for anyone interested in archaeology, which makes it unsurprising that University of Wisconsin – La Crosse professor Kate Grillo chose African Archaeology as the focus for her students to edit Wikipedia. Expanding content on UNESCO World Heritage Site Gorée Island was a priority due the excavation of pre- and post- European settlement sites provided the archaeologists with invaluable information about the island’s past — something made more difficult as Gorée was now primarily a tourist destination. Much can also be learned from the breathtaking Kalambo Falls of Lake Tanganyika on the border of Zambia and Tanzania, as it is considered to be one of the most important archaeological sites in Africa. This single drop waterfall has been witness to over two hundred and fifty thousand years of human activity and may have even had people living there continuously since the Late Early Stone Age.

New additions to Wikipedia include the article Ifri Oudadane, a site located on the coast of the Mediterranean Sea in the northeast Rif region of Morocco that holds valuable evidence that shows how North Africans moved from hunter-gatherers to food producers. The 2006 research project led by archaeologists from all over the world marked Ifri Oudadane as one of the first North African sites that investigated the the transition of humans from hunter-gatherer groups to food production. Students also added a large amount of content to and greatly overhauled the the Nok culture article.

Perhaps most impressive is the students’ work on the article for the ruins of Gedi, as they took the article from a mere stub to a lengthy article filled with so much information and images that it helped the article later pass Good Article criteria. Located in Kenya, these are the ruins of a medieval Swahili-Arab coastal settlement that in its heyday may have traded directly or indirectly with China, South Asia, and the Islamic world. Some of the buildings that still remain standing include mosques, a palace, and numerous houses.

Want to help share knowledge with the world? Contact Wiki Education at contact@wikiedu.org. Wiki Education to find out how you can gain access to tools, online trainings, and printed materials to help your class edit Wikipedia.

Image: Great Mosque of Gede, by Mgiganteus, CC-BY-SA 3.0, via Wikimedia Commons.

by Shalor Toncray at September 11, 2017 03:54 PM

Wikimedia Tech Blog

Introducing the Cloud Services Team: What we do, and how we can help you

Photo by Martin Kraft, CC BY-SA 3.0.

Earlier this year, members of the Wikimedia Technical Operations Labs team and members of the Community Tech Tool Labs team merged into the Wikimedia Cloud Services (WMCS) team.

In this post, we outline what the new team is responsible for, what tools and projects fall under their umbrella, why and how the rebranding has taken place, and how you can learn more about products and services offered by the Cloud Services team.

What do you do, Wikimedia Cloud Services team?

The WMCS team focuses on four distinct areas:

  1. Providing a stable and efficient public cloud hosting platform for technical projects relevant to the Wikimedia movement.
  1. Developing, creating and maintaining services that empower the creation and operation of technical solutions to problems of the Wikimedia movement.
  1. Providing public, simple access to content and data produced by the Wikimedia projects to empower new technological solutions.
  1. Delivering technical and community support for users of the products.

What does that mean?

This new team is now in charge of Wikimedia Cloud VPS (formerly known as Wikimedia Labs), Toolforge (previously known as Tool Labs), and Data Services (which includes Wiki Replicas, ToolsDB, Wikimedia Dumps, Shared Storage, Quarry and PAWS.)  The team works in partnership with the larger Wikimedia volunteer community to manage the physical and virtual resources that power the environment and provide technical support to volunteer developers and other Wikimedia Cloud Services users.

The new team is the latest in a long series of investments that the Wikimedia Foundation has made in supporting the technical communities who build tools to help the movement. The Wikimedia Labs project was started in 2011 to create an OpenStack powered environment where volunteers could become involved in helping the Technical Operations team. Over the past six years, the Foundation has committed more people and resources to these products and platforms. The scope has expanded beyond the initial vision to also include Tools developers, MediaWiki and application testing, analytics, and academic researchers. Our focus is increasingly on supporting our volunteer contributors and finding ways to attract more volunteers interested in making technical contributions to the Wikimedia movement.

What can I do with the Wikimedia Cloud VPS and Toolforge, and how do these platforms work together?

Image by Bryan Davis/Wikimedia Foundation, CC BY-SA 4.0.

Toolforge is a shared hosting and platform as a service (PaaS) environment for volunteers who want to run bots, web services, cron jobs, or one-time jobs. Members create Tool accounts, which allow shared access by multiple maintainers to develop, deploy, and operate their tools. The Tool’s user account can use our Grid Engine job scheduling system or our Kubernetes container deployment cluster to run their code. The platform also provides easy access to the Wiki Replica databases and other Data Service products.

The Quarry and PAWS projects are trying to make using Cloud Services even easier than the Toolforge PaaS. Quarry lets anyone who has a Wikimedia user account run SQL queries against the Wiki Replica databases from the comfort of their web browser. PAWS is also available to all Wikimedia users and provides a platform for creating and running Jupyter notebooks or using Pywikibot from a command shell accessed from a web browser.

Wikimedia Cloud VPS is an infrastructure as a service (IaaS) product which uses OpenStack to provide virtual machines (VMs) for over 200 volunteer, affiliate, and Wikimedia Foundation staff managed projects. These projects include Toolforge, Beta Cluster, VMs for Wikimedia’s continuous integration system, and many others.

One of the unique features of Cloud Services is the Wiki Replica databases. These real-time replicas of the public metadata from the Wikimedia production wiki databases allow our users to perform a wide variety of data analysis.

The community has built hundreds of tools using these services. Some include:

Why are you changing the names to Cloud VPS and Toolforge?

As we outlined during the community consultation process, the effort to rebrand the Cloud Services Team was designed to reduce confusion and ambiguity around the many projects with “Labs” in their names and to clear up confusion around what the word “Lab” actually meant. The OpenStack cloud and tools hosting environments maintained by WMCS have become more and more important to the Foundation and the on-wiki communities. We wanted the rebranding effort to both raise awareness of the existence of the OpenStack cluster and the shared hosting/platform as a service project, and to make clear that these projects are not experimental and are in wide use across the movement.

What is being rebranded and where can I see these new names?

A number of communication channels such as email listservs, IRC channels, and Phabricator boards are being renamed to ensure consistency. In addition, updates will be made on wiki, as well as at the domain and infrastructure levels. You can see all of the planned changes, and/or consult the list here:

  • * “Tool Labs” has been renamed to “Toolforge”
  • The name for our OpenStack cluster was changing from “Wikimedia Labs” to “Cloud VPS”
  • The prefered term for projects such as Toolforge and Beta Cluster running on Cloud VPS is “VPS projects”
  • “Data Services” is a new collective name for the databases, dumps, and other curated data sets managed by the Cloud Services team
  • “Wiki replicas” is the new name for the private-information-redacted copies of Wikimedia’s production wiki databases
  • No domain name changes are scheduled at this time, but we control wikimediacloud.org, wmcloud.org, and toolforge.org
  • Toolforge and Cloud VPS have distinct logos to represent them on wikitech and in other web contexts

How can I learn more?

Watch the hour-long Introduction to Wikimedia Cloud Services (or YouTube), read about Cloud Services on wikitech, and look at the Annual Plan workboard on Phabricator.

Bryan Davis, Engineering Manager, Wikimedia Cloud Services
Wikimedia Foundation

Thank you to Melody Kramer from the Communications Team for helping us with this post.

by Bryan Davis at September 11, 2017 02:54 PM

Tech News

Tech News issue #37, 2017 (September 11, 2017)

TriangleArrow-Left.svgprevious 2017, week 37 (Monday 11 September 2017) nextTriangleArrow-Right.svg
Other languages:
العربية • ‎کوردی • ‎čeština • ‎English • ‎British English • ‎español • ‎فارسی • ‎suomi • ‎français • ‎עברית • ‎italiano • ‎日本語 • ‎ಕನ್ನಡ • ‎polski • ‎português do Brasil • ‎русский • ‎svenska • ‎українська • ‎粵語 • ‎中文

September 11, 2017 12:00 AM

September 09, 2017

Gerard Meijssen

The Manuel Echeverría "revenge"

When there are mistakes in a Wikipedia, it follows that once information is copied from that Wikipedia these mistakes find their way into Wikidata. So Manuel Echeverria did not receive the Xavier Villaurrutia AwardManuel Echeverría did.

So the edit that made Mr Echeverria a recipient of the award was reverted. I fixed things by using the Spanish Wikipedia as a resource instead. The dates were added when people received the award and a few missing people in Wikidata are now known as well.

I cannot be bothered to fix the English Wikipedia. There is no structural solution at this time and as far as I am concerned, there is no interest in one that has been proposed.

There is one additional reason why a solution would be advantageous; reverting edits is a hostile act when edits are made with the best intentions. By actively linking red links and black links to Wikidata, such reversions will become unnecessary.

The problem is that Wikipedians need to understand a problem that as far as they are concerned is elsewhere, and is only caused by the lack of quality of their project. It is with grim satisfaction that I know it serves them well.

by Gerard Meijssen (noreply@blogger.com) at September 09, 2017 07:22 AM

September 08, 2017

Wikimedia Foundation

Community digest: Wikimedia Israel celebrates tenth anniversary; first veloexpedition in Macedonia; news in brief

Award ceremony marks Wikimedia Israel’s tenth anniversary

Photo by Wikimedia Israel/ Udi Goren, CC BY-SA 3.0.

On 6 September, Wikimedia Israel (WMIL), the independent Wikimedia chapter that supports the Wikimedia movement in Israel, held an event to celebrate its tenth anniversary. Wikipedians, volunteers, partners, and donors gathered to celebrate the success they had all participated in making.

During the event, the “Wikimedia Awards for the promotion of open knowledge in Israel” were handed out by Christophe Henner, the Wikimedia Foundation’s Board Chair. Four winners received the award for their significant contributions to promoting Wikimedia’s vision. These awards are the first given out by a Wikimedia affiliate for such a reason.

The winners of the awards are:

  • Israel Internet Association (ISOC-IL): the association consistently supported our projects—starting from the first Wikipedia Academy in 2009, to hosting the 7th Wikimania in 2011, the Pikiwiki project (a database of free images of Israel), and many other initiatives.
  • Haifa University: the university is a leading academic institution with regards to Wikipedia writing assignments. The program they supported have paved the way for many other academic institutions to join. The collaboration began in 2011, and since then hundreds of Wikipedia articles have been written in dozens of courses. The initiators of the project are Dr. Ory Amitay and Hana Yariv, but so far 22 lecturers have participated in it.
  • Former Minister of Education Rabbi Shai Piron: during his term at the Ministry of Education, Rabbi Piron opened the door to WMIL and our extensive activity in the education system. The announcement of an official collaboration between a Ministry of education and Wikimedia was the first of its kind, anywhere in the world. Due to Piron’s recognition of the importance of integrating Wikipedia in the education system, 1,200 K-12 teachers have received training on Wikipedia editing basics. Furthermore, supported by Piron, the high school program where students write Wikipedia articles has expanded. The program is still active and forms a central part of our activity.
  • Oren Helman, former director of the Government Press Office: as director of the GPO, Helman assisted in promoting an amendment to copyright law, by which state photographs were released to free, non-commercial use, thereby exempting the public from paying for usage of GPO photos. Helman’s work to release state materials is unique in Israel. Together with former ministers Michael Eitan and Meir Sheetrit, Mr. Helman heralded a change in the state of Israel’s attitude toward open content.

In ten years, with the support of those and many others who believe in Wikimedia’s vision, we were able to accomplish a lot together.

Through the Wikipedia Education Program in the past few years, students in high-schools and academic institutions have written more than 1,500 Wikipedia articles. Several thousand students learned the basics of editing Wikipedia and nearly 1,200 teachers have received training on using Wikipedia as a teaching tool. The success of Wikipedia Education Program in Israel expanded to Arab-speaking schools in Israel by the end of 2016.

Following a three-year campaign, in which Wikimedia Israel was calling for an amendment to copyright laws, the government of Israel made a precedent-setting decision to release all the photographs on government websites under Creative Commons licenses. The law amendment was nicknamed the ‘Wikipedia law’.

In 2011, we hosted the seventh Wikimania in Haifa, where more than 700 of the movement enthusiasts from 56 countries attended. Last year, the International Wikimedia Hackathon was held in Jerusalem, with 130 developers from 17 countries participating. Hosting two international events like Wikimania and the Wikimedia hackathon was a big success for the movement in Israel and we are glad the we helped this success to come true.

In Wikimedia Israel, we look forward to further success in promoting open knowledge in Israel.

Photo by Wikimedia Israel/ Udi Goren, CC BY-SA 3.0.

Photo by Wikimedia Israel/ Udi Goren, CC BY-SA 3.0.

Michal Lester and Keren Shatzman, Wikimedia Israel


Wikipedian cyclists in Macedonia document natural heritage with ‘veloexpeditions’

Photo by Ehrlich91, CC BY-SA 4.0.

During the second weekend in August, several Macedonian Wikimedians headed to Ǵavat-Kol, a small remote area with villages around Bitola, the second most-populous town in Macedonia. The area has several villages but no photographs on Wikimedia Commons. And that was why we organized the first ‘veloexpedition’—we took photos of these small villages, old churches, and the area’s natural beauty.

In the morning on 12 August, we headed by car to Bitola. Following a three-hour break and breakfast in the hotel, we started our veloexpedition. After one hour we arrived to the first village, Srpci. Our main goal was to take panoramic photos of the villages, important social and religious buildings, architecture of the villages, and all other natural rarities.

The highest point of our veloexpedition was in Gopeš, a village where we found St. Transfiguration Church, a protected cultural monument. We didn’t have the chance to take photographs of the interior of this church, as it was closed, but were were able to explore two other churches near the village.

Then, we headed to the displaced villages of Svinjište and Streževo. In Svinjište, we discovered the preserved St. George Church, a remnant of the former village. This church, along with the two mentioned before, were not mentioned in the official list of religious buildings of Commission for religions in Macedonia.

Though we had other plans for the route, we couldn’t ride back to Gopeš after we reached the artificial Lake Streževo, and so we had to ride along its coast. By the end, we had to change the entire route of our first day of the veloexpedition.

In the second day, we wanted to visit Great Pelister Lake, but due to cold foggy weather—6 Celcius, 42 Fahrenheit—we were forced back into our hotel. Still, we were able to take some images on the ride back from some Molika trees.

The veloexpedition resulted in 82 distinct photographs of several different villages along with several cultural and natural heritage sites.

Toni Ristovski, Board Member and Treasurer of Shared Knowledge Wikimedians group

In brief

Competition in Srpska to feature their culture on Wikipedia: The Wikimedia community in the republic of Srpska is holding a photo competition to document the cultural, historical and natural heritage in Srpska. The competition is running through the month of September and the Wikipedia communities in Ukraine, Macedonia and Morocco are participating in the organization. More about the competition can be found on the Serbian Wikipedia.

The first International Wikipedia Scientific Conference will be held in Brazil on 8-10 November 2017. The conference aims at connecting the Brazilian academic Wikipedia community with the wider and global community and promoting Wikipedia as a reliable source of knowledge within the academic communities. Submissions deadline for the conference is Monday 11 September. More information about the conference can be found on their website.

New features on Wikimedia projects from the community wish list: The foundation’s community tech team has announced the implementation of new features on Wikimedia projects from this year’s community wish list. LoginNotify is a security feature that will give the user a notification on Wikipedia when there is an unsuccessful log in to their account from a new device or using an unfamiliar IP address. In the case of using a familiar device or IP address to try logging-in unsuccessfully for five times, a notification will be sent to the account as well. LoginNotify was Number 7 on this year’s community wish list. Syntax Highlighting was number 6 on the list. The feature is now released as a beta feature to help editors parse the wikitext in the edit window by using color, bolding, italics and size to make it easier to see which parts are article text, and which are links, templates, tags and headings.

WikidataCon2017 program has been announced: the Wikidata community conference that will be held in Berlin, Germany on 28 and 29 October has announced the conference program. More information about the conference and how to register can be found on meta.

De-recognition of Wikimedia Macau: The Wikimedia Affiliations Committee (AffCom) has announced in August that Wikimedia Macau, the former independent Wikimedia chapter in the Chinese region of Macau, is no longer a recognized Wikimedia affiliation. More details about the decision on Wikimedia-l.

Compiled and edited by Samir Elsharbaty, Digital Content Intern
Wikimedia Foundation

by Michal Lester, Keren Shatzman, Toni Ristovski and Samir Elsharbaty at September 08, 2017 10:10 PM

Wiki Education Foundation

Communicating American cultures with dynamic student projects

Since we started working with UC Berkeley students and instructors in 2010, they’ve added more than 1 million words to Wikipedia. From updating the article about carbon capture and storage in Mexico to creating a new article about lesbian bars, Berkeley students have been behind the helm. As a graduate of UC Berkeley myself, I know that they are a public institution committed to growing and communicating academic research to the world. That’s why on Wednesday, I was so excited to meet with new instructors and librarians on campus to continue to expand that effort and highlight how students in their upcoming courses could update and improve Wikipedia’s coverage of academic content.

We were invited to campus by the American Cultures Center, who are committed to fostering academic excellence and civic engagement around issues critical to America’s dynamic ethnic, racial, and sociocultural landscape — a mission Wiki Education wholeheartedly supports. Unfortunately though, the depth and breadth of information on Wikipedia in those areas is lacking. Wikipedia’s list of featured articles, for example, includes hundreds that cover military history, music, media, sports, and video games content, but significantly less about politically relevant topics.

Lucky for us, the American Cultures Center has a set of engaged scholarship courses that take the study of important issues even farther than the classroom, aiming to provide opportunities for students to participate in collaborative social justice projects alongside community organizations like Wiki Education. At our workshop, I met with instructors teaching in global studies, cultural anthropology, environmental design, and bioengineering among others, with courses primed and ready to participate in this initiative.

One instructor asked about the review process for contributing content on Wikipedia — “who approves that work?” — while others asked how to start conversations with their students in regard to evaluating the sourcing on Wikipedia. And that’s the beauty of Wiki Education’s suite of educational tools and resources: We provide the scaffolding for instructors and students to accomplish this amazing work. Technically, no one has to approve any contributions on Wikipedia. It truly is the encyclopedia that anyone can edit. However, Wiki Education provides assignment templates, online trainings, and support resources so that students are able to evaluate Wikipedia articles, select appropriate peer reviewed sources, and draft high quality contributions, all with the goal of moving that work live into the world’s largest free information resource.

We are excited to continue growing our support of American Cultures courses and hope you’ll join us in our efforts. To learn more about our work, please contact us at contact@wikiedu.org.

by Samantha Weald at September 08, 2017 04:39 PM

Weekly OSM

weeklyOSM 372


screen shot OpenStreetMap Inspector, objects without important tags

The OpenStreetMap Inspector now shows nodes and ways which do not have “important” tags. 1 | data OpenStreetMap contributors, ODbL, image CC-BY-SA OpenStreetMap and Geofabrik

About us

  • We are always looking for people to help us improve our newsletter so it can get out faster, have more depth and coverage, and generally improve it for our readers, like you. Please join our team by contacting us now, it’s fun! 😉


  • [1] In the tagging view of the OpenStreetMap Inspector there is a new layer with tagging errors. The routing view now contains, once more, another error layer which shows ‘sources’ and ‘sinks’ of oneway roads which are either unreachable or have no return route.
  • The Flemish community in Belgium wishes to explicitly tag maxspeed on all highways. André Picard says this goes against existing usage in the country where a special relation of type default is used to provide information on a range of default values. He objects to the original wiki proposal page being marked as “abandoned”. Clifford Snow remarks that the documentation on the wiki is inconsistent and difficult to understand.
  • A discussion began on the Tagging mailing list around the usage of the tag shop=fashion as opposed to the tag shop=boutique considering that the latter has more uses.
  • The MapGive initiative of the U.S. Department of State provides the OSM community satellite image material of the region Ashgabat (Turkmenistan), which was recorded in the August 2017 (30 cm resolution).
  • Frederik Ramm started (de) (automatic translation) a discussion on the German OSM Forum if we really need all boundary relations or if some of them should be deleted. As an example, he points to a segment of the River Rhine which belongs to 29 relations.
  • Sam Guilford from YouthMappers has published a small article about POSM and Mapeo.
  • Voting on the proposal Language Information will be open until 13th September. The proposal wants to introduce the tag language:<key> = <language code> which describes which language a tag <key> = * has.
  • iD has a new feature: “Review requested”. Pascal Neis writes in his blog post how to find these changesets and help the newcomer in his first steps in OSM, such as with links to wiki pages, tags, and map features, good practices, the OSM Forum or the page OSM help. Pascal forgot to mention the mailing lists. 😉


  • Dr. Steffano de Sabbato (Leicester University) asks for guidance to help interpret the mapping of London for his research. He also provides links to some previous preliminary studies.
  • Geospa_gal believes that there’s a need for more women to respond to the survey about gender bias in the OSM, and invites people to participate in order to have more accurate data about the gender situation in the community. Some people on Talk-US mailing list criticize the survey and the (desired) aims.
  • Gardster shares pictures (and videos) from the very well-attended 13th OSM Birthday party held by the Belarus community in Minsk on the 30th of August.
  • Joost Schouppe writes a diary entry highlighting the need of improving interaction between OSM and academic communities.


OpenStreetMap Foundation

  • The minutes of the last OSMF board meeting on August 24 have been completed. The next public meeting will take place on September 21st 21:00 London time.
  • The minutes of the Local Chapters Congress at SotM have been published.
  • There will be elections to the board of the OpenStreetMap Foundation at end of 2017. Frederik Ramm explains what potential candidates should expect and tries to abandon some illusions about being a board member.
  • A meeting of the Engineering Working Group took place on August 29 and the minute is online. The topic was the end of this year’s Google Summer of Code.


  • Edoardo Neerhut writes in the Mapillary Blog about the OSM ecosystem, OSMCha from Mapbox and about the AI assisted road tracing of Facebook in Thailand.
  • The call for papers for the SOTM Latam 2017 is still open! The conference will take place in Lima, Peru, from November 29th to December 2nd.

Humanitarian OSM


  • Stephan Bösch Plepelits, author of OpenStreetBrowser reported in his blog that the pop-ups on the map have been significantly improved. A permalink will be available for the map in the near future.
  • Mapbox announces a modified pricing in its blog. They rolled out pay-as-you-go pricing for their core products like maps, search, and navigation.


  • Frédéric Rodrigo published his findings on using a Redis database to query OSM objects and discusses performance relative to SQL queries on PostgreSQL. The experiments aim to explore ways of improving the heavy computations of the Osmose QA-tool.
  • Maripo Goda works on a JOSM plugin which warns mappers if they move a node over a large distance.


The new release of Mapbox Navigation SDK for iOS v0.7.0 comes with automatic day/night styles, other UI improvements, better compatibility with Amazon Polly, support for multiple legs, more aggressive location snapping, new localizations and much more.

Did you know …

Other “geo” things

  • Accurate maps of crops in both Germany and England have been produced using Remote sensing data from the ESA’s Sentinel-2 satellite. The English dataset is published under an open licence.
  • Thames Valley Records Centre (TVERC) are using non-open data on building age to create a model of potential bat roosting sites.
  • A nice article by Kartin Humai shows how to go on a “time travel” with Mapillary. 😉
  • Stefano Cudini has a demo version of his open source ‘geo-social’ platform KeplerJs online. KeplerJs combines the features of social networks with geographical facts and uses OpenStreetMap among others.

Upcoming Events

Where What When Country
Tokyo 東京!街歩き!マッピングパーティ:第11回 清澄庭園 2017-09-09 japan
Moscow Staircase Sunday 2017-09-10 russia
Passau Mappertreffen 2017-09-11 germany
Rennes Réunion mensuelle 2017-09-11 france
Lyon Rencontre mensuelle ouverte 2017-09-12 france
Berlin 111. Berlin-Brandenburg Stammtisch 2017-09-14 germany
Munich Stammtisch 2017-09-14 germany
Zaragoza Mapping Party #Zaccesibilidad Arrabal, Mapeado Colaborativo 2017-09-16 spain
Moscow Big Schemotechnika 11 2017-09-16 russia
Nishinomiya 【西国街道#09・最終回】西宮郷・酒蔵マッピングパーティ 2017-09-16 japan
Nara 防災トレジャーハンター~自分を守り、地域を守る!~(UDC2017) 2017-09-16 japan
Buenos Aires 2do Mapatón OSM Baires @ Instituto Geográfico Nacional 2017-09-16 argentina
Rennes Cartographie collaborative du Musée de Bretagne, pour les Journées européennes du patrimoine 2017-09-16-2017-09-17 france
Nantes Participation aux Journées européennes du patrimoine à l’École de Longchamp 2017-09-16-2017-09-17 france
Bonn Bonner Stammtisch 2017-09-19 germany
Lüneburg Mappertreffen Lüneburg 2017-09-19 germany
Nottingham Nottingham Pub Meetup 2017-09-19 united kingdom
Scotland Pub meeting, Edinburgh 2017-09-19 united kingdom
Cologne Stammtisch 2017-09-20 germany
Osaka もくもくマッピング! #09 2017-09-20 japan
Paris Mapathon Missing Maps @LLL, Liberté Living Lab 2017-09-21 france
Prescott Josm Workshop at 2017 AGIC Conference 2017-09-21 Arizona
Karlsruhe Stammtisch 2017-09-21 germany
Liguria Wikigita a Santo Stefano Magra, Santo Stefano di Magra, La Spezia 2017-09-23 italy
Tokyo 東京!街歩かない!マッピングパーティ3 2017-09-23 japan
Patan State of the Map Asia 2017 2017-09-23-2017-09-24 nepal
Taipei OpenStreetMap Taipei Meetup, MozSpace 2017-09-25 taiwan
Bremen Bremer Mappertreffen 2017-09-25 germany
Graz Stammtisch Graz 2017-09-25 austria
Berlin 111.1 Berlin-Brandenburg Sonderstammtisch Intergeo 2017-09-25 germany
Colorado, Boulder State of the Map U.S. 2017 2017-10-19-2017-10-22 united states
Buenos Aires FOSS4G+State of the Map Argentina 2017 2017-10-23-2017-10-28 argentina
Brussels FOSS4G Belgium 2017 2017-10-26 belgium
Lima State of the Map LatAm 2017 2017-11-29-2017-12-02 perú
Bonn FOSSGIS 2018 2018-03-21-2018-03-24 germany

Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please check your event in our public calendar preview and correct it, where appropriate.

This weeklyOSM was produced by Nakaner, Polyglot, Rogehm, SK53, SeleneYang, Spec80, YoViajo, derFred, jinalfoflia.

by weeklyteam at September 08, 2017 08:38 AM

Wikipedia Weekly

Wikipedia Weekly #125 – Wiki Loves Monuments

This Wikipedia Weekly podcast episode covers Wiki Loves Monuments, the largest photo contest in the world that has been held in September of each year since 2010. We the origins of the project, how it is executed today, which countries are contributing the most, and the future of the project.

Participants: Andrew Lih (User:Fuzheado), Rob Fernandez (User:Gamaliel), Lodewijk Gelauff (User:Effeietsanders / @effeietsanders), User:Nikikana, User:LilyOfTheWest, Kevin Payravi (User:SuperHamster / @KevinPayravi)


Opening music: “At The Count” by Broke For Free, is licensed under CC-BY-3.0; Closing music: “Things Will Settle Down Eventually” by 86 Sandals, is licensed under CC-BY-SA-3.0

All original content of this podcast is licensed under CC-BY-SA-4.0.

by admin at September 08, 2017 07:18 AM

September 07, 2017

Wiki Education Foundation

Engaging students in interdisciplinary science communication

George Waldbusser is Associate Professor of Ocean Ecology and Biogeochemistry at Oregon State University. He’s integrated Wikipedia editing into his Biogeochemical Earth class several times.

When the email first appeared in my inbox with the title ‘Teaching with Wikipedia!’, I vacillated between, “that sounds really interesting” to “what kind of email spam is this?”

Fortunately, I trusted my instinct and went with the former.

I teach a graduate level required introductory course called Biogeochemical Earth for students in the Ocean, Earth, and Atmospheric Science and Marine Resource Management programs at Oregon State University. These students come to OSU’s College of Earth, Ocean, and Atmospheric Science to obtain an interdisciplinary education in order to prepare them to tackle some of the most interesting and pressing earth systems questions facing society today. Student engagement is always a challenge, and providing assignments that encourage they learn from one another has always been a priority of mine in the course. It is crucial in this ever shrinking world, of ever increasing interdisciplinary research questions to train students how to work well across those disciplinary boundaries that include cultural and language/semantics barriers. For the past two years now, I have successfully used Wikipedia projects to bridge that gap and find it to be far more engaging for the students than my previous approach of group pre-proposal projects.

The research trajectories of the students in my course include social science and marine policy, physics of the earth’s mantle, life in extreme environments, paleoclimatology, and physics, chemistry, and biology of the oceans, just to name a few. My co-instructor and I have 10 weeks to cover the fundamentals of how the biology and chemistry (with a splash of physics) of primarily the oceans work, with some specific examples of case studies ranging from the early pre-oxygen earth to changes in anchovy and sardine fisheries in the Pacific Ocean on decadal timescales. There are many topics I cannot cover, and many more we must skim over, but the Wikipedia project provides an opportunity for the students to dig deeper into a topic of their choice over the term.

Having their project as a public contribution provides two really important outcomes: First, it seems to promote a greater engagement and care in their work, and second it contributes to the largest crowdsourced information clearinghouse in the world. In an age where disinformation and information overload are increasingly important issues, it is more critical than ever to help contribute to Wikipedia, to provide greater access to the scientific works, and to ensure it is correct.

It may come as a surprise to some college professors that Wikipedia could be an integral part of college and graduate level courses, given a predisposition to dismiss information from Wikipedia as being unverifiable, especially compared to peer reviewed journals. That has certainly changed. I always approached Wikipedia with a strong skepticism, and continue to require my students to cite peer-reviewed literature for their assignments, but as participation increases, it seems Wikipedia has gotten better at self-correcting more quickly. Even more so, Wikipedia can provide an important entry point for both students and the public at large, who may not have access to scientific journals. I personally at times start at Wikipedia on topics I am unfamiliar with, if I don’t have another access point, but then again I used to read the Encyclopedia Britannicaas a kid out of sheer curiosity. I was fortunate enough to have the hard bound books in my room. Today, anyone with an internet connection can feed their curiosity with the world’s largest digital encyclopedia.

So how does one utilize Wikipedia in the classroom or as part of a course? There are many approaches to this. For me, the most important aspects is to have my students engage with one another around a topic (presumably something that they have some interest in), to explore that topic in greater detail than I can cover in class, and to teach them how to write clearly and convey information impartially to a target audience different than them. Wiki Education makes it incredibly easy to set up the project for your course, with a quick survey, they generate webpage with weekly tasks, deadlines, and discussion points to cover in class. This also includes training for student editors, and a tech support person who can help with how to questions.

While the students in the current term are still finalizing their projects, examples of their work from the previous year can be found in the blue carbon and Boring BillionWikipedia articles.

Relative to the quality of work submitted as part of the previous assignment that the Wikipedia project replaced, I was very impressed.

One of the students who developed the Boring Billion page was Brian Ahlers, a Marine Resource Management graduate student whose research is actually focused on marine fisheries and traceability.

“I learned so much about biogeochemical processes during Earth’s ancient geological history, and how to work well in groups,” Brian says. “After deploying our new Wikipedia page to production on Wikipedia for the public, we had the privilege of connecting with one of the world’s leading experts on the Boring Billion based in Tasmania. He was very excited about our new article, and had very positive feedback.”

Another graduate student in the Marine Resource Management program, Larissa Clarke, who is working on seagrass habitat use by fish and crabs, noted of her experience working on the Blue Carbon page in terms of science communication: “From the Wikipedia project, I think I most gained an appreciation/better understanding for what it means to write for a specific audience. Writing for a middle school level reader is much different than the assignments we turn in for class, obviously, but I think that was a great exercise because we are so often writing for our peers/professor and may not have the general public in mind.”

I look forward to continuing to utilize Wikipedia in the classroom as a tool to get students engaged more deeply in topics, to interact with each other, and to contribute to an important crowdsourced information depot. If you are a teacher, professor, or instructor, I hope you take a few moments to consider how you may use Wikipedia to enhance your own course and provide some unique learning opportunities to your students.

Image: OSU by air, by saml123, CC-BY-SA 2.0, via Wikimedia Commons.

by Guest Contributor at September 07, 2017 04:22 PM

Wikimedia Foundation

Pranayraj Vangari has written a new Wikipedia article every day for the last year

Pranayraj Vangari. Photo by Pranayraj Vangari, CC BY-SA 4.0.

Telangana is a state in southern India with a history and culture that extends back five millennia. The region is known for hosting people with different cultures, speaking a few different languages and holding different beliefs. All in addition to being an open art space with its authentic architecture, plastic art, and music.

The state’s culture is the subject of an article that Pranayraj Vangari has created today on the Telugu Wikipedia, concluding a year-long challenge where Vangari created a new Wikipedia article every day for 365 consecutive days. This morning, members of the Telugu Wikipedia community gathered to celebrate Vangari’s work.

Vangari, a native of Telangana’s capital Hyderabad, heard about the #100wikidays challenge on Wikipedia from fellow Telugu-language Wikipedians who have taken and successfully completed it. This encouraged him to follow their lead and take the 100-day challenge. The 100-day commitment is not easy at all, but Vangari later changed his plan to make it even harder.

“At first, I thought that 100wikidays [were] enough,” Vangari recalls, “but … on the 95th day of this challenge, I decided to take up the Wikiyear challenge.”

But over the course of a full a year, there would be several occasions when he is more busy than usual that he can’t dedicate half an hour of his time to editing. He may need to travel for work, for example, or even get married. But all that was no barrier for Vangari.

“I am a theater artist,” Vangari told us, “so, I need to travel often for theater performances. When I go away, I carry my laptop with me and [a list of] previously-selected articles to work on. In September 2016, I was an assistant director for a Telugu film. I needed to be in the location at 7 am, and be back home at 11 pm. I woke up at five every morning, selected the article content, created the article at 5:30 with some content, then completed it when I came back home in the evening.”

In February, Vangari got married, but this was not a good enough excuse for him to wriggle out of his commitment for even a single day.

“I wrote an article on my wedding day,” he says, “and my wife took this opportunity to learn about the project and sign up. Our fellow Wikipedian Pavan Santhosh coined a term for that: Wiki Kalyanam (Wiki Marriage).”

Telugu ranks as the fifth-largest Indian language on Wikipedia with nearly 67,000 articles. It has been growing over the years since 2003, when the Telugu Wikipedia started.

Vangari has been editing the site since March 2013. He made over 85,000 edits on the Telugu Wikipedia and created over 700 new articles. His dedication to Wikipedia encouraged the community to nominate him as an administrator. When not online, he can be found at local editing workshops, encouraging new users and helping them with the editing basics. Aside from Wikipedia, Vangari is as an assistant theater director and is working on an M.Phil degree in theater arts.

“Wikipedia is the first door I knock on for knowledge,” Vangari says. “The idea of giving your part and contributing to Wikipedia sounds like a fantastic idea to me. Making my knowledge available to the rest of the world for free is what inspires me to keep editing … Wikipedia is important because it helps share the knowledge irrespective of who you are.”

Members of the Telugu Wikipedia community, this morning, celebrating Vangari’s work. Photo by Pranayraj Vangari, CC BY-SA 4.0.

Samir Elsharbaty, Digital Content Intern
Wikimedia Foundation

by Samir Elsharbaty at September 07, 2017 01:52 PM

Jeroen De Dauw

The Fallacy of DRY

DRY, standing for Don’t Repeat Yourself, is a well-known design principle in the software development world.

It is not uncommon for removal of duplication to take center stage via mantras such as “Repetition is the root of all evil”. Yet while duplication is often bad, the well intended pursuit of DRY often leads people astray. To see why, let’s take a step back and look at what we want to achieve by removing duplication.

The Goal of Software

First and foremost, software exists to fulfill a purpose. Your client, which can be your employer, is paying money because they want the software to provide value. As a developer it is your job to provide this value as effectively as possible. This includes tasks beyond writing code to do whatever your client specifies, and might best be done by not writing any code. The creation of code is expensive. Maintenance of code and extension of legacy code is even more so.

Since creation and maintenance of software is expensive, the quality of a developers work (when just looking at the code) can be measured in how quickly functionality is delivered in a satisfactory manner, and how easy to maintain and extend the system is afterwards. Many design discussions arise about trade-offs between those two measures. The DRY principle mainly situates itself in the latter category: reducing maintenance costs. Unfortunately applying DRY blindly often leads to increased maintenance costs.

The Good Side of DRY

So how does DRY help us reduce maintenance costs? If code is duplicated, and it needs to be changed, you will need to find all places where it is duplicated and apply the change. This is (obviously) more difficult than modifying one place, and more error prone. You can forget about one place where the change needs to be applied, you can accidentally apply it differently in one location, or you can modify code that happens to the same at present but should nevertheless not be changed due to conceptual differences (more on this later). This is also known as Shotgun Surgery. Duplicated code tends to also obscure the structure and intent of your code, making it harder to understand and modify. And finally, it conveys a sense of carelessness and lack of responsibility, which begets more carelessness.

Everyone that has been in the industry for a little while has come across horrid procedural code, or perhaps pretend-OO code, where copy-paste was apparently the favorite hammer of its creators. Such programmers indeed should heed DRY, cause what they are producing suffers from the issues we just went over. So where is The Fallacy of DRY?

The Fallacy of DRY

Since removal of duplication is a means towards more maintainable code, we should only remove duplication if that removal makes the code more maintainable.

If you are reading this, presumably you are not a copy-and-paste programmer. Almost no one I ever worked with is. Once you know how to create well designed OO applications (ie by knowing the SOLID principles), are writing tests, etc, the code you create will be very different from the work of a copy-paste-programmer. Even when adhering to the SOLID principles (to the extend that it makes sense) there might still be duplication that should be removed.The catch here is that this duplication will be mixed together with duplication that should stay, since removing it makes the code less maintainable. Hence trying to remove all duplication is likely to be counter productive.

Costs of Unification

How can removing duplication make code less maintainable? If the costs of unification outweigh the costs of duplication, then we should stick with duplication. We’ve already gone over some of the costs of duplication, such as the need for Shotgun Surgery. So let’s now have a look at the costs of unification.

The first cost is added complexity. If you have two classes with a little bit of common code, you can extract this common code into a service, or if you are a masochist extract it into a base class. In both cases you got rid of the duplication by introducing a new class. While doing this you might reduce the total complexity by not having the duplication, and such extracting might make sense in the first place for instance to avoid a Single Responsibility Principle violation. Still, if the only reason for the extraction is reducing duplication, ask yourself if you are reducing the overall complexity or adding to it.

Another cost is coupling. If you have two classes with some common code, they can be fully independent. If you extract the common code into a service, both classes will now depend upon this service. This means that if you make a change to the service, you will need to pay attention to both classes using the service, and make sure they do not break. This is especially a problem if the service ends up being extended to do more things, though that is more of a SOLID issue. I’ll skip going of the results of code reuse via inheritance to avoid suicidal (or homicidal) thoughts in myself and my readers.

DRY = Coupling

— A slide at DDDEU 2017

The coupling increases the need for communication. This is especially true in the large, when talking about unifying code between components or application, and when different teams end up depending on the same shared code. In such a situation it becomes very important that it is clear to everyone what exactly is expected from a piece of code, and making changes is often slow and costly due to the communication needed to make sure they work for everyone.

Another result of unification is that code can no longer evolve separately. If we have our two classes with some common code, and in the first a small behavior change is needed in this code, this change is easy to make. If you are dealing with a common service, you might do something such as adding a flag. That might even be the best thing to do, though it is likely to be harmful design wise. Either way, you start down the path of corrupting your service, which now turned into a frog in a pot of water that is being heated. If you unified your code, this is another point at which to ask yourself if that is still the best trade-off, or if some duplication might be easier to maintain.

You might be able to represent two different concepts with the same bit of code. This is problematic not only because different concepts need to be able to evolve individually, it’s also misleading to have only a single representation in the code, which effectively hides that you are dealing with two different concepts. This is another point that gains importance the bigger the scope of reuse. Domain Driven Design has a strategic pattern called Bounded Contexts, which is about the separation of code that represents different (sub)domains. Generally speaking it is good to avoid sharing code between Bounded Contexts. You can find a concrete example of using the same code for two different concepts in my blog post on Implementing the Clean Architecture, in the section “Lesson learned: bounded contexts”.

DRY is for one Bounded Context

— Eric Evans


Duplication itself does not matter. We care about code being easy (cheap) to modify without introducing regressions. Therefore we want simple code that is easy to understand. Pursuing removal of duplication as an end-goal rather than looking at the costs and benefits tends to result in a more complex codebase, with higher coupling, higher communication needs, inferior design and misleading code.

by Jeroen at September 07, 2017 12:40 AM

September 06, 2017

Wikimedia Tech Blog

What ten students made for Wikimedia while participating in Google Summer of Code and Outreachy round 14

Mentor Sage Ross with Outreachy intern Sejal Khatri at Wikimania 2017. Photo by Sage Ross, CC BY-SA 4.0.

Bug fixes, extension improvements, and a tool to help import images from GLAM institutions to Commons were among the many projects that student developers worked on while participating in Google Summer of Code 2017 (GSoC) and Outreachy round 14 through the Wikimedia Foundation.

Both programs, which are sponsored by Google and the Software Freedom Conservancy (respectively), are designed to introduce university students to free and open-source software projects from around the world. Wikimedia, which has participated in GSoC programs for 12 years and the Outreachy program for 5, mentors student developers from around the world throughout the summer. In return, students contribute thousands of lines of code to Wikimedia projects, obtain a deeper understanding of our movement, write about their experiences on a weekly basis, and connect with MediaWiki developers all over the world.

Sejal Khatri, who participated in Google Summer of Code, detailed her experiences in weekly Medium posts. She also attended Wikimania, where she met one of her project mentors and participated in the hackathon.

“Over the course of my internship I got many experiences,” she says. “[This included] getting in touch with [Wiki Education] Dashboard users from all across the world (Israel, Canada, USA, Egypt, Serbia, Czech Republic), interacting with them, organizing user testing sessions on my own and helping them with their frustrations and understanding their needs.”

What did the students make and accomplish?

Below are descriptions of what the students and their mentors made during their time in the programs:

Implement Thanks support in Pywikibot, by Alexander Jones (GSoC intern, Texas, United States), mentored by John Mark Vandenberg

This project provided a boost to Pywikibot by enabling it to thank normal wiki revisions and Flow posts, and also generate statistics (eg who sent thanks using log entries).

Provide enhanced usability for the Wikimedia Programs & Events Dashboard, managed by Wiki Education Foundation, by Sejal Khatri (GSoC intern, Pune, India), mentored by Sage Ross, Jonathan Morgan

In this project, Sejal worked on improving the overall usability of the dashboard, especially on mobile devices, by solving high priority issues, conducting user testing sessions, documenting the feedback, and taking actions on them.

Improvements to ProofreadPage Extension and Wikisource, by Amrit Sreekumar (GSoC intern, Kerela, India), mentored by Tpt and Yann Forget

As part of this project, some of the improvements added were an auto-validation privilege for specific user-groups to override the two step validation of proofread pages, migration of the Index: Pages editing form, and Proofread zoom feature to OOJS UI , etc.

Add a “hierarchy” type to the Cargo extension, by Feroz Ahmad (GSoC intern, New Delhi, India), mentored by Yaron Koren, Nischay Nahata, Tobias Oetterer

This project added support for “Hierarchy” fields in the Cargo MediaWiki extension, including both efficient storage and querying of such information. This project also added similar support within the Page Forms extension.

Adding custom features while upgrading and updating Quiz extension, by Harjot Singh Bhatia (GSoC intern, India), mentored by Marielle Volz, Sam Reed

The Quiz extension has received quite a lot of updates such as bug fixes, removal of the legacy code and upgrades to MediaWiki standards, addition of tests, and new features.

Automatic editing suggestions and feedbacks for articles in Wiki Ed Dashboard, by Keerthana S (GSoC intern, India), mentored by Sage Ross, Jonathan Morgan

This project lays the foundations for the Dashboards to provide specific useful editing suggestions to newcomers about how they can improve existing Wikipedia articles or article drafts they are working on — including both automated editing suggestions (based on the Objective Revision Evaluation Service) and user-submitted suggestions.

Glam2Commons, by Siddhartha Sarkar (GSoC intern, India), mentored by Basvb, Zhuyiefei, Tom

Glam2Commons is a tool that allows any Wikimedia Commons user to import images to Commons from the online repositories of a number of GLAMs (galleries, libraries, archives, and museums) easily.

Remind me of this article in X days” MediaWiki notification, by Ela Opper (Outreachy, Tel Aviv, Israel), mentored by Moriel Schottlender and Matt Flaschen

Ela developed a new feature requested by the Wikimedia user base—adding reminders for articles.  It allows you to set a future time, at which point the notification system will remind you of a particular article.  You can optionally provide a custom message to include.

Allow Programs & Events Dashboard to make automatic edits on connected wikis, by Medha Bansal (Outreachy, New Delhi, India), mentored by Jonathan Morgan, Sage Ross

There is now a documented process in place for enabling the Programs & Events Dashboard to make automated edits on new wikis. It paves the way for Wikipedia Education Programs across the world to start using the tools that were originally limited to English Wikipedia and only US/Canada universities.

Document process for creating new Zotero translator, by Sonali Gupta (Outreachy, Rajasthan, India) mentored by Marielle Volz and Czar

Zotero translators are what we rely heavily on for Citoid, a service that allows easy adding of references on Wikipedia. The outcome of this project is a well-documented resource on how to develop Zotero web translators on the server side and Scaffold and get them live in production.

A few new things we tried this year

  • Hosted an online information session for prospective candidates and addressed queries that we collected beforehand
  • Promoted weekly reports of accepted students in a monthly highlights newsletter format.
  • Provided Zulip as a mentoring tool, thanks to the organization members for letting us use the beta version. Zulip’s threaded conversation feature allowed Wikimedia organization administrators to communicate smoothly on different topics with students, and reach out to them quickly in matters of urgent sync-up.
  • Collected changes for improvements from mentors throughout the program, and as a result produced a revised documentation of Outreach Programs

Stay tuned for more

  • If you are interested in learning in-depth about these projects, attend the showcase on September 21st, at 10:00 AM PST / 17:00 UTC via YouTube live (link to view the broadcast)
  • The application period for the next round of Outreachy opens on September 7th. Check ideas for projects and apply!
  • We will be circulating a program feedback survey with the students and mentors, the lessons of which we will publish soon.
  • A few of us will be attending the GSoC Mentors Summit at Google Headquarters in October. We are looking forward to meeting with mentors from other organizations and learning from their style of mentoring and practices for community building.

Thanks to mentors for their valuable time and guidance, to Google and Outreachy program coordinators for their generous support, and to Sumit Asthana and Anna Liao for helping coordinate these rounds.

Srishti Sethi, Developer Advocate, Technical Collaboration team
Wikimedia Foundation

by Srishti Sethi at September 06, 2017 06:33 PM

Wikimedia Scoring Platform Team

Laughing ORES to death with regular expressions and fake threads

At 1100 UTC on June 23rd, ORES started to struggle. Within a half hour, it had fully choked and could no longer respond to any requests. It took us 10 hours to diagnose the problem, solve it, and consider it solved. We learned some valuable lessons when studying and addressing this issue.

You can't prevent bad things from happening. Something will always go wrong. So you do the best that you can to handle bad things gracefully. In a distributed processing environment like ORES' cluster, the worst thing that could happen is to have a process block for forever. So, preparing for bad things means you use timeouts for just about everything. So far, this has been a great strategy and it makes it so that, at worst, only a few requests out of many will fail when something goes wrong. Regretfully, for this downtime event we had one of the worst bad things happen, and at the same time we discovered that our timeouts were not capable of stopping deep processes that go rogue in a specific way. In this blog post, I'll explain what happened.

Recursive backtracking in a regular expression

Many of the models deployed in ORES use regular expressions to extract signal about the quality of an edit. For example, we use them to match "badwords" (curse words, racial slurs, and other words that are commonly used to cause offense) and "informals" (linguistic colloquialisms like "haha" or "lol" or "wtf"). One such regular expression that we used to match informal laughing in Spanish language looked like this: /j+[eaiou]+(j+[aeiou]*)*/ It is intended to match strings like "jajajaja" or "jijiji".

In this edit of Spanish Wikipedia, an IP editor added a very long string of repeated "JAJAJJAJAJAJAJAJAJ" to the article for "Terrain". This is exactly what the regular expression was designed to match. But there was a problem. This regular expression was poorly designed in that it caused a catastrophic backtracking pattern. Every time it would match the entire sequence of "JAJAJJAJAJAJAJAJAJ" and then fail when encountered "...JAJAJlentos...", it would re-attempt the entire match dropping just one "JA" from the middle. This problem doesn't really matter for any short sequences. But for one very long one (and this one was 4155 chars long == 230 repetitions of "JAJAJJAJAJAJAJAJAJ"), it would have taken days to finish. The plot below demonstrates how badly things break down at only 14 repetitions.

Where were the timeouts?

Things like this happen. When operating in a distributed processing environment, you should always have timeouts on everything so that if something goes haywire, it doesn't take everything down. Regretfully, matching a regular expression is not just a special opportunity for pathological backtracking, but also an opportunity to learn hard lessons about safe timeouts.

We have timeouts in ORES in a few strategic places. E.g. if a single scoring job takes longer than 15 seconds (extracting informal "JAJAJA" is part of a scoring job), then it is supposed to time out. But for some reason, we weren't timing out during regular expression matching. I started digging into the library we use to implement execution timeouts, and what I learned was horrifying.

Most timeouts in python are implemented with "threads". I put "threads" in quotes because threads in python are a convenient abstraction and not true concurrency. Python's Global Interprer Lock(GIL) is an internal mutex that prevents truly concurrent threading. In order to get around this, python uses separate processes to implement concurrency. I'm not going to get into the details of the GIL or process based concurrency, but suffice it to say, if you use an external C library to execute a regular expression match on a string, any thread that is trying to implement a timeout is going to get locked up and totally fail to do what it is supposed to do!

Because our threading-based timeouts were completely disabled by this long regular expression match, our "precaching" system (makes sure we score every edit and put the score in the cache ASAP) was slowly taking us down. Every time the problematic diff was requested, it would render yet another worker unreachable. Because ORES would just fail to respond, our precaching system registered a Connection Timeout and would simply retry the request. Eventually capacity would decay as our ~200 workers were locking at 100% CPU one by one.

Luckily, there's an easy solution to this problem in unix signals. By having the operating system help us manage our timeouts, we could stop relying on python threads to behave sanely in order for us to recover from future rogue processes.

So, you fixed it right?

First, I should thank @ssastry for his quick work identifying the pathological backtracking problem and submitting a fix. We also completed an emergency deployment of ORES that implemented the use of Unix signals and we've been humming along, scoring all of the things, ever since.

by Halfak (Aaron Halfaker, EpochFail, halfak) at September 06, 2017 11:50 AM

September 05, 2017

Ian Gilfillan (greenman)

September 2017 African language Wikipedia update

African language map

It’s time to look at the state of the African language Wikipedias again, as always based on the imperfect metric of number of articles.

African Language Wikipedias

Language 11/2/2011 9/5/2013 26/6/2015 24/11/2016 5/9/2017
Malagasy 3,806 45,361 79,329 82,799 84,634
Afrikaans 17,002 26,752 35,856 42,732 46,824
Swahili 21,244 25,265 29,127 34,613 37,443
Yoruba 12,174 30,585 31,068 31,483 31,577
Egyptian Arabic   10,379 14,192 15,959 17,138
Amharic 6,738 12,360 12,950 13,279 13,789
Northern Sotho 557 685 1,000 7,605 7,823
Somali 1,639 2,757 3,446 4,322 4,727
Lingala 1,394 2,025 2,062 2,777 2,915
Kabyle   1,503 2,296 2,847 2,887
Shona   1,421 2,321 2,638 2,851
Kinyarwanda   1,817 1,780 1,799 1,810
Hausa 1,345 1,400 1,525
Igbo 1,019 1,284 1,384
Kikuyu 1,349
Kongo 1,173 1,176
Wolof 1,116 1,161 1,023 1,058 1,157
Luganda 1,082 1,153

This is the 2nd update in a row that gets to welcome a new language to the thousand article mark – congratulations Kikuyu which has now joined the list, and is already hot on the tail of Igbo.

I know some of the Afrikaans Wikipedia editors have been a bit disappointed by the slowing pace of growth as they move towards 50,000 articles. But, to put it in perspective, the 2013 Global Brittanica had about 40,000 articles, so there are less and less obvious gaps in content. Afrikaans is also one of the highest quality Wikipedias for its size – there’s a focus by many editors on the quality of articles rather than just the numbers. And they shouldn’t be too disappointed by the pace – Afrikaans is still the fastest growing African-language Wikipedia, catching up to Malagasy, which has the most articles.

It’s interesting that Afrikaans is getting more media attention, but still has to deal with concerns such as but anyone can edit it, how can we trust it?, the kind of thing the English Wikipedia has long moved on from. A definite focus area for us as the Wikimedia South Africa chapter.

Swahili continues to grow steadily, and Egyptian Arabic as well, and the other languages continue to grow slowly.

South African Language Wikipedias

Language 19/11/2011 9/5/2013 26/6/2015 24/11/2016 5/9/2017
Afrikaans 20,042 26,754 35,856 42,732 46,824
Northern Sotho 557 685 1,000 7,605 7,823
Zulu 256 579 683 777 942
Xhosa 125 148 356 576 708
Tswana 240 495 503 615 639
Tsonga 192 240 266 390 526
Sotho 132 188 223 341 523
Swati 359 364 410 419 432
Venda 193 204 151 238 256
Ndebele (incubator) 12 12

Looking at the South African languages in particular, besides Afrikaans, Northern Sotho has returned to a more natural growth compared to the spurt of the previous period. User:Aliwal2012 continues to be the standout contributor there, having now created 3,228 pages.

Growth in the Zulu Wikipedia has picked up slightly, with a few relatively new editors contributing the majority of recent additions.

Two other languages have also seen an uptick. Tsonga has leapfrogged Swati, mainly thanks to User:Thuvack, who’s on track to make 2017 his record year for Tsonga contributions.

Sotho has also passed Swati, with User:Aliwal2012 active there as well.

So what are you waiting for? If you haven’t edited before, don’t be afraid that you’ll find the syntax difficult – be bold, and there’ll always be someone to ask for help. All it takes is clicking that “Edit” link and getting started. With just a few edits a week and you could be making a noticeable difference to one of the African language Wikipedias!

Picture from Wikimedia Commons.

Related articles

by Ian Gilfillan at September 05, 2017 08:52 PM

Wikimedia Foundation

Where to find all the slides, links, videos, and tutorials from Wikimania 2017 (and what we learned from our remote-first strategy)

Group photo of Wikimania 2017 attendees. Photo by Victor Grigas/Wikimedia Foundation, CC BY-SA 4.0.

Perhaps the best phase to describe Wikimania 2017 is “brain-expanding and a little overwhelming.” Over the course of the three days that compose the main program of the Wikimedia community’s annual conference, there were five keynote sessions, over a hundred community-submitted talks, meetups and lightning talks. The conference was preceded by two days of a hackathon, the Wikimedia Conference of North America, and other workshops.

Over 900 people attended Wikimania 2017 in person, but we also wanted to ensure that the conversations and ideas that surfaced at Wikimania would be accessible well beyond the conference. Below, we detail how we approached remote participation so that people could follow along with the conference either in real-time or find archival material at some point in the future. We also detail what we learned and might change or improve for future events. Please feel free to borrow any of this material for conferences you are attending or leading in the future.


A number of Wiki-related conferences have had robust note-taking programs, and we looked at both the 2017 Wikimedia Developer Summit and Wikimania 2016 in Esino Lario for guidance on how to construct a note-taking program that would be easy for both presenters and attendees to use. We liked how the Developer Summit asked notetakers to take notes using an Etherpad template, and to follow notetaking instructions outlined on the Session Guidelines page; we also appreciated that the Developer Summit collected all session notes on one page to make it easy for remote viewers to scroll and find notes and slides.

We borrowed this approach for Wikimania 2017. Participants and presenters were asked to copy an Etherpad template for their sessions, and  to contribute their notes, slides, and any documentation they had on a single wiki page to make it easy for remote participants and people attending simultaneous sessions to both follow along in real-time and access material in the future. We also added categories and themes to this page so that readers could easily sort the sessions by topic. Instructions on how to take good notes were added to the speaker’s page.

Prior to the conference, we emailed the Wikimedia-l and Wikimania-l mailing lists  with how to follow along with sessions in real time. Each day at the conference, participants were reminded to take notes in their sessions for archival purposes and to help remote attendees. We also emailed session leaders after the conference, asking them to share their material on the All Session Notes page.

Possible improvements and notes

  • A number of attendees suggested that the note-taking templates for each session should be created ahead of time, making it easier for attendees to jump in and get started taking notes.
  • Not every session designated a notetaker.
  • We received a number of requests to have certain sessions transcribed in real time, both to help remote participants and for accessibility and translation purposes. Though expensive, real-time transcription does help remote participants more fully experience a conference.

Live streaming

We initially scoped out livestreaming a number of simultaneous sessions throughout the conference, but realized quickly that we didn’t have the equipment or staffing to do so. Instead, we livestreamed four keynotes on both YouTube and Facebook Live and recorded and uploaded all talks taking place in Ballroom West to both YouTube and Wikimedia Commons. It was not possible to move the audio and video equipment between rooms.

A Wikimedia Cuteness Association meetup at Wikimania. Photo by Avery Jensen, CC BY-SA 4.0.


This was our first time livestreaming to the Wikipedia Facebook account. On average, these Facebook Live streams doubled the in-person attendance figures at Wikimania, broadening the reach of the conference and ensuring that remote participants could access the keynote sessions. About 3 percent of Facebook users who saw the streams provided negative feedback by hiding the post or unliking the page.

One important note: two of the keynotes were presented, in parts, in French. The first keynote—“Community and information in a partisan world”—started in French, and was served to the entire Facebook Wikipedia audience. Traffic to the video dropped almost immediately—most non-French-speaking viewers left the livestream after 15 seconds—and viewers complained about the lack of translation available. The second keynote “Libraries, Archives and sharing knowledge” took place entirely in French, and was served only to Facebook users who identified themselves as French speakers. The dropoff of viewers was much less severe.  Top locations viewing this video were in Tunisia, California, and France. (Quebec came in fourth place.) This indicated that talks in multiple languages or in a language other than the dominant language on a Facebook page should be served to a targeted audience, or translated in real-time.

Over 48,000 Facebook users watched the Wikimedia 2030 video announced during Katherine and Christophe’s keynote. (This is the only video we ran during Wikimania that wasn’t a keynote.)

Possible improvements and notes

  • The initial livestreamed opening talk was in both French and English. We didn’t specify language on the video streams, and a number of non-French-speaking viewers dropped off. It is possible to specify “language” and “country” on video streams, and that might have improved audience satisfaction. Viewers get frustrated and leave when they can’t understand what speakers are saying. Descriptions for these videos should match the language in which they’re spoken.
  • We livestreamed and/or recorded all talks taking place in a single room without determining whether these were the most important talks to record. For future years, if video and audio equipment is limited, we recommend determining which talks should be recorded and then placing them in a room with recording equipment. (In other words, put all recordable sessions in the same room or rooms.) We recommend that the Program Committee make these decisions.
  • We didn’t have enough staff to moderate YouTube comments and ended up turning them off. They were off-topic, and didn’t add to the conversation. We’d like to experiment with ways that remote participants can ask questions and have a dialogue.
  • A good, constructive conversation took place on Wikimedia-l about the centralnotice banners that were used to let people know about the video streams. This was a relatively last minute decision, and we agree that the dialogue about when and how to use centralnotice should begin earlier. We recommend that these conversations take place publicly and earlier for future conferences.
  • Having the keynote speakers’ slides and speeches in advance would have allowed us to simultaneously provide context or additional information to viewers with limited bandwidth who couldn’t stream video or to people with hearing difficulties.
  • Facebook Live videos should start as close to the speaking time as possible. We recommend scheduling a live stream and sending a reminder notification to page followers rather than starting a video shortly before an event starts.
  • For streaming videos, until the livestream starts, overlaid images should have the speaker name, time of starting speech, title so viewers know exactly what’s about to take place.
  • Should we stream on Wikipedia’s Facebook page or would this be more appropriate for a Wikimania Facebook page?

Social Media

We followed the hashtags #wikimania, #wikimania17, and #wikimania2017 and retweeted pertinent tweets into the @wikimania, @wikimedia, @wikipedia, and @mediawiki Twitter accounts. We also alerted various Facebook groups and the main Wikipedia Facebook page about sessions that would be livestreamed in advance. Lastly, we put a January 2017 profile of Felix Nartey on social media simultaneously with Jimmy Wales’ announcement that he was named the Wikimedian of the Year. During the conference, @Wikimania earned 73,800K impressions total, with 1.3% of those users (about 959) engaging with tweets through clicks, likes, retweets, and replies.

Possible improvements and notes

  • A number of people asked what the canonical hashtag was to follow the event. We recommend that event organizers pick one hashtag and make it clear to attendees—so that audiences don’t have to follow multiple hashtags.
  • Social media coordination should begin at least 2-3 months before the event takes place, especially when translation is required.
  • Anything that can be planned ahead of time should be. If we have quotes from previous interviews or speeches from keynotes in advance, social material can be designed around that.

Melody Kramer, Senior Audience Development Manager, Communications
Wikimedia Foundation

Thank you to the Foundation’s Aubrie Johnson for compiling many of the metrics listed above for livestreaming and social media.

by Melody Kramer at September 05, 2017 04:01 PM

Wiki Education Foundation

Roundup: Budding Linguaphiles

Latin may be the language of love, but Ethnologue says that there are more than 7,000 living languages in the world today. This left students with Andrew Nevins’s Introduction to Linguistics class at Harvard Summer School much to choose from for their Wikipedia coursework. Together his class edited 38 articles, with some students becoming so taken with Wikipedia that they decided to branch out into different topics such as beatboxing.

Two of his students expanded the Italian language, paying particular attention to the language’s growth in popularity over the years, noting that the growth of technology such as printing presses made it easier for the language to spread over large areas. One hard working student added information about the grammar structure in the Hindi language article, which received almost a quarter of a million views since they began editing! Two more of his students expanded the articles for the GermanFrench, and Sino-Tibetan languages, the last of which now includes information on the change in the tone and word structure of Sino-Tibetan languages.

Also intriguing is the work from one student, who expanded the article on the Turkish language to give more detail on the language’s syntax, as well as information on a whistling version of the Turkish language. This language, which mirrors the lexical and syntactical structure of Turkish language, is in danger of dying off — however there is hope that attempts to teach the language in regional schools will save the language. What makes the addition of this material so much more meaningful is that the student in question was editing from Turkey, which blocked access to Wikipedia in their country. Adding this content not only gave readers the ability to learn more about the Turkish language but also gave the student the freedom to share their wealth of knowledge and resources despite their country’s censorship of Wikipedia.

You too can have your class work with Wikipedia as one of your class assignments. If you’re interested, please contact Wiki Education at contact@wikiedu.org to find out how you can gain access to tools, online trainings, and printed materials.

Image: Harvard University Widener Library, by Joseph Williams, CC-BY-SA 2.0, via Wikimedia Commons.

by Shalor Toncray at September 05, 2017 03:43 PM

Wiki Loves Monuments

Wiki Loves Monuments banner contest

Wikivoyage, a free travel guide everyone can edit, is one of Wikimedia sister projects. High-quality illustrations are central to a travel guide. Of particular importance are page banners placed on top of every page in most language versions of Wikivoyage, including English Wikivoyage that contains over 25,000 travel guides to all countries of the world.

Banners with fixed 7:1 aspect ratio and high resolution are meant to give a quick impression and represent the most essential features of a destination. For example, look at this banner for Yosemite National Park,

Wikivoyage banner of Yosemite National Park, by Diliff, CC BY-SA 3.0

and you immediately see that the park is about gorgeous high mountains and forests. Before reading the article, you already deem this place a beautiful natural landmark and a potential holiday destination.

How to create a page banner? Just take a photo and crop it. What is needed? A good photo, of course. That is why Wikivoyagers joined forces with Wiki Loves Monuments and launched a separate award for the best page banner uploaded during Wiki Loves Monuments 2017. All standard Wiki Loves Monuments rules apply. Your banner should be freely licensed and contain an identifiable heritage monument. Additionally, your banner should be usable for Wikivoyage, that is, it should look appropriate in a travel guide.

Cathedrals and Old Town in Lviv, Ukraine, by Johnny, CC BY 2.5

Unlike the regular Wiki Loves Monuments, where prizes are given for individual photos, the banner award goes for the best overall contribution. Participants will score points based on aesthetic quality of individual banners and on their usage in Wikivoyage. Best score secures the prize. Thus, a good strategy is to check existing Wikivoyage destinations for missing banners and for banners lacking in quality or character – such articles can become your best targets. The Wikivoyage community will carefully consider all banners uploaded during the contest, and assign the banners to travel guides. The scores will be distributed accordingly.

Want to know more? Check the contest page. You may also be interested in page banners uploaded during Wiki Loves Earth 2017, many of them now used in Wikivoyage.

After reading that far, you may have an impression that banners are necessarily panoramic photos. Not quite. Just think of something that is characteristic of a destination, be it a wide panorama or a tiny but recognizable detail. Here is one example,

External walls of the Saint George Cathedral in Yuryev-Polsky, one of the oldest stone carvings in Russia, by M000142, CC BY-SA 4.0


and more can be found on the contest page. Be creative and try non-standard solutions. This will be much appreciated!

Everybody is cordially welcome to participate in the banner contest. We look forward to your contributions. Let’s improve Wikivoyage together!


(This blogpost was contributed by Yaroslav Blanter who is an administrator of a number of Wikimedia projects including the Russian Wikivoyage and Alexander Tsirlin, also an administrator of the Russian Wikivoyage and an organizer of Wiki Loves Monuments Russia)

by lilyofthewest at September 05, 2017 04:39 AM

September 04, 2017

Tech News

Tech News issue #36, 2017 (September 4, 2017)

TriangleArrow-Left.svgprevious 2017, week 36 (Monday 04 September 2017) nextTriangleArrow-Right.svg
Other languages:
العربية • ‎čeština • ‎English • ‎español • ‎فارسی • ‎suomi • ‎français • ‎עברית • ‎italiano • ‎日本語 • ‎ಕನ್ನಡ • ‎polski • ‎português do Brasil • ‎русский • ‎svenska • ‎українська • ‎粵語 • ‎中文

September 04, 2017 12:00 AM

September 02, 2017

Wikimedia Scoring Platform Team

Wikilabels incident: Reversed diffs!

Today, we discovered a major regression in Wikilabels. We've patched the issue and made an emergency deployment. We also deleted some labels that were saved while the system was compromised. In this post, we'll describe what happened.

In order to generate visual representations of edit diffs for labeling, we use the API behind Wikipedia. The old way of asking the API to generate a diff was to use ?action=query&prop=revisions&revids=...&rvdiffto=.... The revision for the rvdiffto parameter would appear on the left and the revision for the revids parameter would appear on the right. Recently, this method of gathering diffs from the API has been deprecated in favor of using ?action=compare&torev=...&fromrev=... The revision for the fromrev parameter would appear on the left and the revision for the torev parameter would appear on the right. But we got that backwards! So the UI would show that an edit removing vandalism was performing it! Or worse, an edit vandalizing Wikipedia would look like it was cleaning up vandalism. So needless to say, we can't trust the labels saved while Wikilabels was in a compromised state.

We've deleted the 36 labels that were submitted to editquality campaigns during this period. All labels should be clean from this point forward. Labelers will notice that worksets they though were completed are now incomplete. We're very sorry for this inconvenience.

Special thanks to Paupass for making us aware of this issue on the ORES flow board. Sorry to Papuass, Ivi104, Lsanabria, and 4shadoww for deleting their hard work. I hope it won't slow them down much.

These are the deleted labels for reference (CCO):

 task_id | user_id  |         timestamp          |                           data                           
  445742 | 38109688 | 2017-08-30 15:31:52.354537 | {"goodfaith": false, "damaging": true, "unsure": false}
  445765 | 38109688 | 2017-08-30 15:32:30.420068 | {"goodfaith": false, "damaging": true, "unsure": true}
  445812 | 38109688 | 2017-08-30 15:32:35.83588  | {"goodfaith": true, "damaging": false, "unsure": false}
  445877 | 38109688 | 2017-08-30 15:32:38.619456 | {"goodfaith": true, "damaging": false, "unsure": false}
  446388 | 38109688 | 2017-08-30 15:33:48.12102  | {"goodfaith": true, "damaging": false, "unsure": false}
  446433 | 38109688 | 2017-08-30 15:34:12.903998 | {"goodfaith": true, "damaging": false, "unsure": false}
  446562 | 38109688 | 2017-08-30 15:34:16.460096 | {"goodfaith": true, "damaging": false, "unsure": false}
  446752 | 38109688 | 2017-08-30 15:34:21.357112 | {"goodfaith": true, "damaging": false, "unsure": false}
  447128 | 38109688 | 2017-08-30 15:34:24.683543 | {"goodfaith": true, "damaging": false, "unsure": false}
  447219 | 38109688 | 2017-08-30 15:34:52.929963 | {"goodfaith": true, "damaging": false, "unsure": false}
  447228 | 38109688 | 2017-08-30 15:35:41.194955 | {"goodfaith": true, "damaging": false, "unsure": false}
  370394 | 35332911 | 2017-08-30 22:25:45.732859 | {"goodfaith": true, "damaging": false, "unsure": false}
  370724 | 35332911 | 2017-08-30 22:25:52.676656 | {"goodfaith": true, "damaging": true, "unsure": false}
  370788 | 35332911 | 2017-08-30 22:26:36.25715  | {"goodfaith": true, "damaging": false, "unsure": false}
  371101 | 35332911 | 2017-08-30 22:27:38.908425 | {"goodfaith": true, "damaging": false, "unsure": false}
  370368 | 35332911 | 2017-08-31 05:41:25.739392 | {"goodfaith": true, "damaging": false, "unsure": false}
  433975 |     4075 | 2017-08-31 09:14:10.80414  | {"goodfaith": true, "damaging": false, "unsure": false}
  434380 |     4075 | 2017-08-31 09:15:04.198489 | {"goodfaith": true, "damaging": false, "unsure": false}
  434389 |     4075 | 2017-08-31 09:15:16.009233 | {"goodfaith": false, "damaging": true, "unsure": false}
  451268 | 14073293 | 2017-08-31 11:46:47.204268 | {"goodfaith": true, "damaging": false, "unsure": false}
  451324 | 14073293 | 2017-08-31 11:47:11.079806 | {"goodfaith": true, "damaging": true, "unsure": false}
  451688 | 14073293 | 2017-08-31 11:48:25.680257 | {"goodfaith": false, "damaging": true, "unsure": true}
  452159 | 14073293 | 2017-08-31 11:48:50.324313 | {"goodfaith": true, "damaging": true, "unsure": true}
  452203 | 14073293 | 2017-08-31 11:49:09.644101 | {"goodfaith": false, "damaging": true, "unsure": false}
  452210 | 14073293 | 2017-08-31 11:49:34.381716 | {"goodfaith": false, "damaging": false, "unsure": false}
  452315 | 14073293 | 2017-08-31 11:49:45.622214 | {"goodfaith": true, "damaging": false, "unsure": false}
  452376 | 14073293 | 2017-08-31 11:51:33.128251 | {"goodfaith": true, "damaging": false, "unsure": false}
  452431 | 14073293 | 2017-08-31 11:52:11.38513  | {"goodfaith": true, "damaging": true, "unsure": true}
  452581 | 14073293 | 2017-08-31 11:52:42.420328 | {"goodfaith": false, "damaging": true, "unsure": false}
  452734 | 14073293 | 2017-08-31 11:53:02.748251 | {"goodfaith": true, "damaging": false, "unsure": false}
  452967 | 14073293 | 2017-08-31 11:54:23.412685 | {"goodfaith": true, "damaging": false, "unsure": false}
  453027 | 14073293 | 2017-08-31 11:55:02.218287 | {"goodfaith": true, "damaging": true, "unsure": false}
  453162 | 14073293 | 2017-08-31 11:56:35.027953 | {"goodfaith": true, "damaging": false, "unsure": false}
  453192 | 14073293 | 2017-08-31 11:56:46.138465 | {"goodfaith": false, "damaging": true, "unsure": false}
  453237 | 14073293 | 2017-08-31 11:58:35.359926 | {"goodfaith": false, "damaging": true, "unsure": false}
  453463 | 14073293 | 2017-08-31 11:58:46.129857 | {"goodfaith": true, "damaging": false, "unsure": false}

by Halfak (Aaron Halfaker, EpochFail, halfak) at September 02, 2017 01:44 PM

Gerard Meijssen

#Wikimedia - Where I make a stand / where I stand for

I was told that my priorities are not the shared priorities of our movement; this by a pivotal person in the WMF. I consider this a personal affront and I will spell out what I stand for and where I make a stand. When you want to personally verify the veracity of my commitment; read my blog and check out my involvement. I have blogged for over 10 years and the basics/citations are all there to find. I consider my position very much in line with what our movement is there for.

==Share in the sum of all knowledge==
This is the overarching aim of our movement. At this time we are congratulating ourselves with what we have achieved so far. There is a lot to celebrate particularly for the English reading world.

===Everything but English===
Given that only 40% of the world population can read English, our successes need to be measured for what we do for all the people in the world. I do not care for good intentions, I care for what can be observed. Financially there is no break down available on the amount spend on English versus the amount spend on all the rest. This is imho a diversity issue as potent as the gender gap. All the arguments why "English first" are structurally no different from any other "my group first" arguments. Just compare the amounts given to US American chapters versus the Indian chapter. In addition you may or may not consider the cost of the software that is developed with English Wikipedia in mind.

===Internationalisation and localisation===
I have searched briefly for "internationalisation" in the 2030 strategy papers. Could not find it. It is however the bedrock of Wikipedia. It is vital for any and all of the individual features of MediaWiki.

When you consider Wikimedia partners like the Internet Archive and their Open Library, we do not even consider how much we will to achieve when together we reach out to the other 60% as well. Our internationalisation platform is open to our open source partners and translatewiki.net is in my opinion a strategic resource.

The successes of our GLAM partnerships prove collaboration serves mutual interests. There are plans to improve Commons, a key part is the Wikidatification that will open up Commons, not only in English but also in any and all other languages. Where we could make more of a difference is help where our partners indicate what is relevant to them. We can show them the effect of the cooperation in any language. At this time what we show is limited to images. This is something we should expand on.

====Internet Archive====
The Internet Archive provides a vital service to our Wikipedias. Its Wayback Machine allows us to proof that references that used to be on the Internet existed. Effectively it is an import tool when the aim is to prevent misinformation. Its Open Library has two parts. The part I am interested in is making free e-books available to readers. We would do better when we collaborate just a bit more and help them with their internationalisation and localisation.

The libraries of this world collaborate in the OCLC and share their links in one system; the Virtual International Authority File. In its WorldCat sytem, the idea is that people can find books in the library near to them. Thanks to the references to local libraries, it is always possible to know if a book, an author is known in whatever country. Important is for us to improve cooperation and the visibility of this collaboration for our readers and editors.

===Bringing things together===
I have helped bring data from Wikidata, OCLC and Open Library together. I am seeking the disambiguation of Open Library content using existing links to the Library of Congress to the VIAF and consequently to Wikidata. I am adding award winners because they provide arguments what articles to write or improve. Currently I am adding Dutch literature awards to show the Dutch National Library that this information exists and can be used. Recently I added botanical awards to show a group of botanists how small tasks like this add relevance.

===Outspoken stuff===
  • I am not a Wikipedian and consequently arguments specific to any Wikipedia are problematic, mostly irresponsible.
  • I care about diversity; issues around the gender gap do get extra attention from me but it is a secondary consideration.
  • I care about usability and use Reasonator and tools like Petscan and Awarder. The necessity to use Reasonator for so many years is proof perfect that usability does not have much of a priority. Having seen previous attempts at usability, I will consider it once it is available.
  • I expect that there will be more use for our data. Quality is key and collaboration on a meta scale is what will make this possible.
  • Wikidata is particularly useful in English. Theoretically other languages may profit from its multilingual nature. Institutional (WMF) interest is needed to improve this use of Wikidata. 
  • While I respect many efforts of the WMF, I find that its concentration on English Wikipedia has a very negative effect on a micro scale. It is not all bad but it is this division of labour and money that prevents us from having the most bang for our buck.

PS I resent that I felt the need to write this blogpost.

by Gerard Meijssen (noreply@blogger.com) at September 02, 2017 12:31 PM

September 01, 2017

Wikimedia Foundation

Natacha Rault: Taking a feminist approach to Wikipedia’s gender gap

Video by the Wikimedia Foundation, CC BY-SA 4.0. You can also view it on Vimeo or Youtube.

An engaged feminist, Natacha Rault has used her understanding of psychology to attract more female participants to Wikipedia.

A French-British Wikipedian raised in Geneva, Switzerland, Rault browsed Wikipedia frequently during her maternity leave, but never thought about contributing to it. “I was getting bored at home,” Rault recalls. “I found the encyclopedia on the internet and discovered a wealth of different subjects to explore. … I never clicked the ‘edit’ button until much later.”

That only happened when she learned about the website’s need for contributors like her.

Various surveys have found that between eight or nine out of every ten editors are men. This gender gap feeds into the quality of Wikipedia, and many people, including Rault, have wanted to take practical steps to change this.

“It is important to have more women participating in Wikipedia because the male perspective is often skewed a certain way to only cover certain subjects,” says Rault. “When you have a majority of men contributing to Wikipedia, you have more football articles and more articles on Pokemon, but you won’t have a lot about design, for example, a subject that would be considered ‘feminine,’ And then you have nearly nothing concerning feminism.”

In response to what Rault read about the gender gap, she created her account on Wikipedia in 2012. Rault has since used the account to edit Wikipedia nearly 10,000 times, most of which have been made to women’s biographies.

Like Rault, in September 2015, Fondation Emilie Gourd, an active feminist group in Switzerland, wanted to respond to Wikipedia’s gender gap. They asked Rault to help coordinate a conference to raise awareness about women’s participation in Wikipedia, but she didn’t think one-way communication was the best idea.

“We’re going to have 200 people coming, learning about the subject, applauding, and then going home,” Rault explains. “We haven’t advanced … towards a solution.”

In addition to the conference, Rault suggested using workshops to teach women how to contribute to Wikipedia. She focused the workshops on creating Wikipedia articles about notable women, and the majority of attendees were women themselves. Rault was able to offer customized support for the workshop attendees based on her understanding of their specific needs. “We can look at the way women and men react differently to the ‘edit’ button. Men, for example, tend to be less afraid of making mistakes. So I encouraged women to write and not to be afraid of making mistakes.”

Rault’s efforts in the last five years have helped many people understand the importance of the female perspective, and thus, content quality, on Wikipedia and has made positive moves toward addressing it.

“It’s really nice to see women, especially those who hesitated at first, smile when they have published their first article.”

Photo by Ruby Mizrahi / Wikimedia Foundation, CC BY-SA 3.0.

Interview by Ruby Mizrahi, Interviewer
Profile by Samir Elsharbaty, Digital Content Intern
Wikimedia Foundation

by Ruby Mizrahi and Samir Elsharbaty at September 01, 2017 05:46 PM

Wiki Education Foundation

So long and thanks for all the poutine!

This year’s Wikimania conference in Montréal was — for me — the most intense yet.

I discovered many years ago — after showing up to a few New York City Wikipedia meetups — that Wikimedians are my tribe. We’re nerds; we’re weird; some of us just can’t resist arguing about copyright for hours. Our community culture is far from perfect, but we’re part of it because we believe that knowledge is important, and that everyone should have access to it. The people who are really dedicated to that idea tend to stick around a long while. So each time I come to Wikimania, the density of people I know and love and have missed since my last Wikimania is even greater. I mostly remember it this year as an endless series of hugs.

Camelia and Sejal
Camelia and Sejal present their Dashboard work at the Wikimania Hackathon Showcase.

It started with the Wikimania Hackathon, where I finally got a chance to meet face-to-face with Sejal Khatri, whom I’d been working with remotely for nearly a year through the course of two internships. This was wonderful! Sejal worked on improving mobile support for the Dashboard, and together we helped Italian Wikipedian Camelia Boban make her first contribution to the Dashboard codebase. (Sejal blogged more about it here.)

For the main conference, my initial plan was to stick to the hallway track, but my curiosity got the better of me for more excellent sessions than I expected. From research to anti-harassment to fundraising to library outreach to activist campaigns like Art+Feminism, there are more exciting projects going on right now than ever — especially in the realm of nurturing first-time Wikipedia contributors. That’s been central to my work at Wiki Education (and at Wikimedia Foundation before that), and it was gratifying to find so much enthusiasm for the Dashboard system we’ve been building. We realized early on that many of the problems Wiki Education has been tackling are not unique to our programs, but adapting the technology for more general and global uses has been a long process. Programs & Events Dashboard is a project that kicked off two years ago at the Wikimania 2015 hackathon, but only in 2017 has it come far enough along that it’s really hitting its stride. I’m excited — and with all the Wikimaniacs I talked to who plan to use it, more than a little nervous — to see where it goes by Wikimania 2018.

Image: Sejal Khatri and Sage Ross, by VGrigas (WMF) (Own work) CC BY-SA 3.0, via Wikimedia Commons

by Sage Ross at September 01, 2017 03:53 PM

Weekly OSM

weeklyOSM 371



OpenStreetMap 13th Aniversary mapathon in Latin America here: Mexiko 1 | © Mariam Gonzalez – Openstreetmap México – Geochicas 😉

About us

  • We are always looking for people to help us improve our newsletter so it can get out faster, have more depth and coverage, and generally improve it for our readers, like you. Please join our team by contacting us now, it’s fun! 😉


  • Pierre Béland noted on Twitter that winter roads on frozen rivers, lakes and seas (ice_road=yes) are shown as normal roads in all renderers. At least 7,600 of these roads are mapped on OpenStreetMap, for example in Tuktoyaktuk.
  • Andreas Bürki shared his grievances on the Talk-ch mailing list about a mapper, who deleted separately mapped sidewalks in Bern. Thus, he encounters a discussion about the mapping of sidewalks.
  • Brian May calls on Talk-US mailing list “all TIGER fix up junkies” to focus on the coastal areas of Texas. If you want to join please read the tips on Reddit and the OSM wiki.
  • The satellite image layer “Esri World Imagery” can now also be used as a data source for mapping and is available in the OSM editors. See also entry in the ArcGIS blog. In Germany it contains imagery of Rhineland-Palatinate, Thuringia and North Rhine-Westphalia, for which permission for use had not previously been granted (translation).
  • [1] User mapeadora writes a detailed report of the mapathon for the 13th anniversary of OSM Latin America. Further information can be found on the event’s wiki page (in Spanish).
  • Satoshi (user nyampire) published an idea for review workflow using OSMCha. He asks for feedback. Note that current version of the iD editor does indeed support “review requested”.
  • Jochen Topf looks back in his blog at his efforts to remap and fix “old style” multipolygons.


  • Some users of the German OSM Forum expressed surprise (de) (automatic translation) about some of the recipients of the OSM Awards. There was also a suggestion that people who do work in OSM without getting paid (volunteers) should get awards, rather people for whom OSM is a “day job”. Everyone’s favourite source of OSM snark Anonymaps says something similar in a tweet.
  • Steve Bennett wrote an article about how easy it is (or isn’t) to combine the various vector tile options when creating a new vector map for OpenStreetMap.

OpenStreetMap Foundation

  • The user “chdr” added and modified about 75,000 street names in various countries. Artefacts (in some cases actual source tags) show that some information came from Google Maps and the contributor refused to confirm the source of the data. Frederik Ramm from the Data Working Group announced the redaction of these changes on the Talk and Talk-US mailing lists. A large discussion ensues about where it would be possible to replace these names with data from acceptable sources, how that should take place and whether that is even possible with a “redaction”.
  • On 24. August a public meeting of the OSMF board took place. They talked about the application of the FOSSGIS e.V. to be a local chapter in Germany and other topics. The minutes have not been published in total yet.


  • Geofabrik extends an invitation to an OpenStreetMap Hack Weekend on 21th and 22th October in Karlsruhe. You can sign up on the OSM wiki.
  • On the Swiss mailing list (de) Stefan Keller suggests keeping a date in your diary for the Open Tourism Data Hackdays on 27/28th October 2017 in Arosa. It is organised by the OpenData club and the SOSM, the Swiss Association of OpenStreetMap. There will also be a mapping party.
  • OSM and FOSSGIS e.V. will be present at the Intergeo 2017 from September 26th to 28th at the “Berlin Exhibition Grounds” (Hall 5, Booth B5.044).
  • Shunnosuke Shimizu created a short video of SotM 2017 in Aizu-Wakamatsu (Japan), that ends with the invitation for all to join SotM 2018 in Milan (Italy).

Humanitarian OSM

  • If you are a HOT contributor or in other way related to it, Laura Salzmann asks that you complete a small survey to help in her dissertation “Investigating motivations of humanitarian OpenStreetMap contributors and NGO use of contributed data”. The ultimate goal is to compare the expectations from within the project to those of NGOs that go on and use the maps that we produce.
  • The Dutch Red Cross calls with the headline “Save people from your chair” to help with OSM.


  • Christoph Hormann, a OSM-Carto maintainer, looks back on the last year of its development and ventures into the future.


  • OSM user “apm-wa” (who in his “day job” is the US Ambassador to Turkmenistan) is happy that two taxi companies in Ashgabat are now using OpenStreetMap. The official app of the Asian indoor and martial arts games, which takes place in September in Ashgabat, will also use OSM maps.
  • opendigitalradio.org uses a Leaflet map to show the location of their local DAB (Digital Audio Broadcas) + stations equipment that are already in use.

Open Data

  • In Sweden public authorities are changing to the license model CC-0. They start with TK50 satellite fotos (including height data), street register and others. (automatic translation)


  • The board of the OSM Foundation has accepted the Geocoding Guidelines. The guideline explains how the OSMF interprets the Open Database License 1.0 in relation to geocoding.


  • Vincent Privat invites Osmose developers on Dev-fr to view JOSM tickets and discuss: 1) Export of the Validation Messages to a xml/json file and 2) Develop a Standalone JOSM Validator (jar) in command line without GUI.
  • road.cc introduces “Bike Citizens” as their Cycling app of the week. The free OSM-based bicycle router may be upgraded by city maps or route suggestions. User comments speak of some odd routing behaviour though.


  • User -karlos- does not want to add new features to his 3D renderer OSM go any more.
  • Wille Marcel, author of OSMCha, suggests on GitHub to add an API for user blocks to the OSM API.
  • User fannymonori writes the final report for her Google Summer of Code project, that consisted in implementing an OpenGL-based renderer for the libosmscout library.
  • User SomeoneElse wrote an OSM diary entry explaining how to use a Microsoft Azure cloud server to render OpenStreetMap map tiles.


Since version 2.4.0 the web-based editor can also access the ESRI World Imagery layer. Also some speed-ups and new presets were part of the update.

Other “geo” things

  • Astun Technology announced via Twitter that all those who missed Steven Feldman’s very humorous presentation on #FakeMaps at FOSS4G can now catch up on it on the Internet.
  • Andy Mabett draws attention to a proposal of a new HTML element for easier embedding of maps into web sites.
  • aharvey shows its bicycle setup for 360 degree Mapillary photos.
  • Still demand for it? Apparently, there is – Garmin issues the eighth version of the “Garmin Topo Germany PRO” at a price of 129.00 €. Hint: Map white spots to get them coloured 😉

Upcoming Events

Where What When Country
Freital Elbe-Labe-Meeting 2017 2017-09-01-2017-09-03 germany
Dortmund Mappertreffen Dortmund 2017-09-03 germany
Essen Mappertreffen Essen 2017-09-05 germany
Rostock Rostocker Treffen 2017-09-05 germany
Stuttgart Stuttgarter Stammtisch 2017-09-06 germany
Praha/Brno/Ostrava Kvartální pivo 2017-09-06 czech republic
Tokyo 東京!街歩き!マッピングパーティ:第11回 清澄庭園 2017-09-09 japan
Passau Mappertreffen 2017-09-11 germany
Rennes Réunion mensuelle 2017-09-11 france
Lyon Rencontre mensuelle ouverte 2017-09-12 france
Berlin 111. Berlin-Brandenburg Stammtisch 2017-09-14 germany
Zaragoza Mapping Party #Zaccesibilidad Arrabal, Mapeado Colaborativo 2017-09-16 spain
Moscow Big Schemotechnika 11 2017-09-16 russia
Nishinomiya 【西国街道#09・最終回】西宮郷・酒蔵マッピングパーティ 2017-09-16 japan
Nara 防災トレジャーハンター~自分を守り、地域を守る!~(UDC2017) 2017-09-16 japan
Rennes Cartographie collaborative du Musée de Bretagne, pour les Journées européennes du patrimoine 2017-09-16-2017-09-17 france
Nantes Participation aux Journées européennes du patrimoine à l’École de Longchamp 2017-09-16-2017-09-17 france
Bonn Bonner Stammtisch 2017-09-19 germany
Lüneburg Mappertreffen Lüneburg 2017-09-19 germany
Nottingham Nottingham Pub Meetup 2017-09-19 united kingdom
Scotland Pub meeting, Edinburgh 2017-09-19 united kingdom
Patan State of the Map Asia 2017 2017-09-23-2017-09-24 nepal
Colorado Boulder]] State of the Map U.S. 2017, [[Boulder 2017-10-19-2017-10-22
Buenos Aires FOSS4G+State of the Map Argentina 2017 2017-10-23-2017-10-28 argentina
Brussels FOSS4G Belgium 2017 2017-10-26 belgium
Lima State of the Map LatAm 2017 2017-11-29-2017-12-02 perú
Bonn FOSSGIS 2018 2018-03-21-2018-03-24 germany

Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please check your event in our public calendar preview and correct it, where appropriate.

This weeklyOSM was produced by Anne Ghisla, Nakaner, PierZen, Polyglot, Spec80, TheFive, YoViajo, derFred, jcoupey, jinalfoflia, k_zoar, kreuzschnabel, muramototomoya.

by weeklyteam at September 01, 2017 06:10 AM

Wiki Loves Monuments

Starting Wiki Loves Monuments 2017

On September 1, Wikimedia volunteers around the world will start the 2017 edition of Wiki Loves Monuments. This is the eighth year for the world’s largest photo competition, with over 45 countries this year.

You can join the competition by submitting a photograph of a nationally registered monument on Wikimedia Commons before September 30, following the instructions for your country. Your photograph will help illustrate the more than 1.4 million monuments on Wikipedia, and help more people around the world to learn about history and national heritage from your country. The winning entries in Wiki Loves Monuments can receive national and international prizes, as well as significant exposure.

Wiki Loves Monuments is an annual photo competition celebrating built cultural heritage. It is organized by volunteers around the world, and top ten photographs from each country are selected for an international finale. The winners from 2016 included a district court in Berlin, a concert hall in London, as well as other stunning monuments from the United Kingdom, Italy, Thailand, Pakistan, Brazil, and elsewhere. These may be regular sights for some people, but thanks to Wiki Loves Monuments photographers, they will be more documented on Wikipedia and more accessible to everyone around the world, free of cost, forever.

“The photo was taken in an area where one side of my family is from,” explains Albrecht Landgraf, one of the winning photographers from Wiki Loves Monuments 2016, who took a beautiful photo of a so-called “Devil’s Bridge” in Gablenz, Germany. “While all relatives in this area passed away years ago, the remaining family spread out all over Germany decided to come together and take a little road trip and explore the area. That’s when we found this little gem (among others) in a small village almost next to the road.”

In September 2017, Wiki Loves Monuments is going even further with a few significant opportunities:

  • Flickr is joining us to organize a worldwide photo walk throughout the month of September, with a focus on built heritage for Wiki Loves Monuments.
  • Thanks to monument details on Wikidata, it’s easier than ever to find a new place to photograph. You can search a map of nearby monuments and easily join the competition at maps.wikilovesmonuments.org. The map is a beta-release this year — a few countries are fully supported, we plan to make more data available next year, with the help of national organizers.
  • This year, the grand awards for the international competition will include a prize selection of over €9,000 in-kind, including a Canon 5D Mark IV, graciously donated by an anonymous donor.

Wiki Loves Monuments is built on three simple criteria. First, all the photos are freely licensed, like other contributions to Wikipedia and Wikimedia Commons. By giving permission to the public to share these photos, it ensures that the results can remain widely available forever. Second, all of the photos must contain an identified monument, like building or art of historic significance – we want to know what heritage is on the photo, so that we can actually use it. Each country maintains a list of registered historic sites that are eligible for the competition. Third, the photo must be uploaded in the month of September (except Iran and Israel, which end at a later date). You are always welcome to contribute your photography to Wikimedia Commons, but photos uploaded before or after the month of September may not be considered for the competition. If you would like more details on Wiki Loves Monuments in your country, you can visit wikilovesmonuments.org/participate.

Whether you are going on a trip to visit someplace new, share this wonderful photo on a holiday trip many years ago or take a quick picture of a landmark where you live, we’re excited to see your photographs.

Good luck!


by Stephen LaPorte at September 01, 2017 01:29 AM

August 31, 2017

Wiki Education Foundation

Wikipedia and the quest for legitimacy

This week, Wiki Education is at the American Political Science Association’s (APSA) annual meeting in San Francisco. We’re in the exhibit hall speaking with attendees and APSA members about how they can enhance Wikipedia’s coverage of political science by replacing a traditional term paper with a Wikipedia assignment. Thus, students channel research and writing into a project that informs the public about important information related to public policy and political theory.

This year’s conference theme feels particularly suitable for Wikipedia: The Quest for Legitimacy. Just this morning, a political science professor’s teenage daughter approached us to ask about our work. “Teachers always tell us not to use Wikipedia, but it’s the best website I’ve ever seen, so I basically ignore them,” she joked. “I plan to use Wikipedia until the day I die.” I hear this seemingly contradictory ethos on nearly a monthly basis: Wikipedia is not legitimate; I use Wikipedia every day.

This begs the question: What is legitimacy?

For a website aiming to inform the masses about knowledge that cites reputable, reliable sources, allowing readers to verify said information, Wikipedia seems to be doing pretty well for itself. The site gets more than 500 million unique readers per month. These people come to Wikipedia in search of an overview on topics they’re trying to learn more about, and they often find it. No, they do not find original research from scholars proposing solutions to the world’s problems, but Wikipedia has never claimed to be an academic journal. Nor is Wikipedia a newspaper reporting world events. Rather, it’s an encyclopedia serving as a tertiary summary of the robust literature in those academic journals and newspapers. Wikipedia is only as good as its sources.

And that’s where the real problem with legitimacy comes in, but it’s rarely the one scholars approach me to dispute: Wikipedia is not always an accurate, updated reflection of the academy. The available content is usually good, but what of the content that Wikipedia’s volunteer contributors never write (or haven’t yet)? Why don’t we include the most recent peer-reviewed literature so the world’s context can progress at the same rate as the academy’s?

I think the answer is largely a technical one. Wikipedians cannot synthesize the most cutting edge research if it lives behind paywalls. Most academic disciplines have incorporated intersectional lenses and frameworks in the last few decades because the profession has become somewhat more diverse in regard to race, gender, sexuality, class, etc. But that information has not yet made its way into the canon, so we cannot be surprised when marginalized voices are missing from the encyclopedia while outdated historiographies persist.

Until the entire industry joins the open access community and publishes research free of restrictions, I see two options for Wikipedia to improve its representation of scientists’ current understanding of the world. Option A: Give existing Wikipedians access to closed-access journals; Option B: Teach people who already have access to journals the skills and Wikipedia know-how to summarize it and share it with the world. Wiki Education has taken both approaches, and we want to expand our services to even more people. We work with universities who sponsor library access in our Visiting Scholars program, and we support instructors who assign students to write Wikipedia articles in our Classroom Program, teaching them how to be successful along the way. At the APSA conference, we hope to meet people who can help us further both objectives.

By joining Wiki Education’s programs, academics can help bridge the knowledge gap between the people who read academic journals and the people who read Wikipedia. If the academy’s purpose is to inform the world and help humans make scientifically sound decisions, yet that knowledge is limited to a privileged few within a closed community, then I ask you: is the academy still legitimate?

by Jami Mathewson at August 31, 2017 11:59 PM

Wikimedia Scoring Platform Team

More/better model information and "threshold optimizations"

Today, I'm writing to announce a breaking change in ORES that will come out about a month from now. It will only change how information about prediction models is stored and reported. This information is used by some tools to set thresholds at specified levels of confidence (e.g. "give me the threshold that gives 90% recall"). In this blog post, I'll explain how this is currently done and how it will be done once we deploy the change.

While you read through these examples, you can experiment with https://ores.wikimedia.org (current behavior) and https://ores-misc.wmflabs.org (new behavior). These systems will stay in this state until we deploy the newer version to production (probably around Sept. 20th).

Why you need model_info

So, let's say you are going to use ORES to supply your counter-vandalism tool with "damaging" edit predictions. A prediction looks like this:

"damaging": {
  "score": {
    "prediction": true,
    "probability": {
      "false": 0.04445904933523648,
      "true": 0.9555409506647635

That "probability" looks interesting. You'd be tempted to assume that it corresponds to some operational metric of model fitness. E.g. "There's a 95.5% chance that this edit is damaging!" but regretfully, you'd be wrong. This "probability" is a useful measure of the model's confidence but not a useful measure of how the model will work against a stream of new edits from the recent changes feed. It turns out that operational metrics for classifiers like this one are all drawn around thresholds. In truth, you get ~95% precision when you set a threshold at 93% "probability".

This gets even more complicated when you want to set thresholds based on other statistics. E.g. "recall" which is the measure of how much of a target class you match. In vandal patrolling work, we want to make sure that we catch most (if not all) of the vandalism. There's steep tradeoffs in classifiers if we ask for perfection, so let's just set a high bar at 90% recall -- catching 90% of the most egregious vandalism. Where should you set your "probability" threshold in order to do that? It turns out that you should set your threshold at 0.09. Using this, you'll have to review less than 1/5th of the incoming edits and you'll be guaranteed to catch 90% of the damaging edits.

The act of finding the confidence threshold at a specified fitness level is something that we call a threshold optimization and it's something that all of our users want to be able to do. We've been providing this information in a limited and inflexible way for a long time. But this change will make gathering information about a model in a machine-readable way much much easier.

Current model_info behavior

Currently, model_info is static. You can request it by adding ?model_info to your URLs. E.g. https://ores.wikimedia.org/v2/scores/enwiki/damaging?model_info This model information is generated at the time that the model is trained and includes a static set of statistics and threshold optimizations. Here's an example of a threshold optimization for the English Wikipedia damaging model:

"filter_rate_at_recall(min_recall=0.9)": {
  "false": {
    "filter_rate": 0.121,
    "recall": 0.9,
    "threshold": 0.547
  "true": {
    "filter_rate": 0.743,
    "recall": 0.908,
    "threshold": 0.148

This block of data says that you can select all edits that score above 0.148 "probability" and expect to catch 91% of the damaging edits.

In order to provide useful thresholds for ORES users, we'd specify them at the time of model train/test. First, we had three thresholds specified: filter_rate_at_recall(min_recall=0.9), filter_rate_at_recall(min_recall=0.75), and recall_at_precision(min_precision=0.9). These threshold optimizations corresponded roughly to "needs review", "likely damaging", and "almost certainly damaging" respectively.

After working with the Collaboration Team on the new RC Filters system for patrolling Special:RecentChanges, the list of threshold optimizations ballooned to include: recall_at_fpr(max_fpr=0.1), recall_at_precision(min_precision=0.15), recall_at_precision(min_precision=0.45), recall_at_precision(min_precision=0.6), recall_at_precision(min_precision=0.75), recall_at_precision(min_precision=0.98), recall_at_precision(min_precision=0.99), and recall_at_precision(min_precision=0.995). This was getting out of control.

So I started work on a new task T162217: Implement "thresholds", deprecate "pile of tests_stats". See the description for a discussion I had with @Catrope to make sure I understood what he and his team needed.

New model_info behavior

So, I hadn't planned on this work, but I thought dealing with it was a really good idea. After all, it would make our users' life easier and my life easier because I wouldn't need to re-train the models every time that a new threshold optimization was needed. I could also take this opportunity to implement some important revscoring stuff I'd been putting off. E.g. T160223: Store the detailed system information inside of model files. , T172566: Include label-specific schemas with model_info, and T163711: Use our own scoring models in `tune` utility. A couple weekends, a holiday, and a hackathon later, I had something that worked. Fun story: I actually fully implemented the system several times and decided to refactor and re-engineer the model_info system entirely. This allowed me to iteratively reduce complexity and coupled-ness.

The new system can currently be tested at https://ores-misc.wmflabs.org. When we ask for ?model_info, we see something that's a little different. I'll make some time in other blog posts to talk about 'environment' and 'score_schema'. For now, I just want to talk about 'statistics' that replaces 'test_stats'.

Digging into "statistics"

The first thing that is different is that we now generate aggregate statistics across output labels.

old (query):

"f1": {
  "OK": 0.99,
  "attack": 0.136,
  "spam": 0.586,
  "vandalism": 0.341

new (query):

"f1": {
  "labels": {
    "OK": 0.974,
    "attack": 0.136,
    "spam": 0.586,
    "vandalism": 0.341
  "macro": 0.509,
  "micro": 0.962

A macro-average of the label statistics is just a simple average across the reported statistic for each label. (0.974 + 0.136 + 0.586 + 0.341) / 4 = 0.509. The micro-average is a weighted by the number of observations. Since the "OK" class if far more common than any other and gets a relatively high f1 score, the micro-average is much higher than the macro-average.

All types of statistics now have these aggregates by default.

Digging into "thresholds"

OK so what about the thresholds thing that is the whole premise of this blog post? Well, I think you're going to like this. I've built a light-weight querying system into the abstract concept of "thresholds" that will allow you to get whatever threshold you like -- so long as your strategy for getting it involves optimizing one statistic ("maximum filter_rate") and holding another constant ("@ recall >= 0.9").

?model_info=statistics.thresholds.true."maximum filter_rate @ recall >= 0.9":

"thresholds": {
  "true": [
      "!f1": 0.883,
      "!precision": 0.996,
      "!recall": 0.794,
      "accuracy": 0.797,
      "f1": 0.233,
      "filter_rate": 0.77,
      "fpr": 0.206,
      "match_rate": 0.23,
      "precision": 0.134,
      "recall": 0.901,
      "threshold": 0.09295862121864444

Here, you can see that we get the same information back, but we're allowed to choose arbitrary optimizations and have the system report back to us where we should place our thresholds.

I asked @Catrope to put together a task for me to demo how I'd just this system to get the optimizations he needs. See T173019. This will require me to request multiple optimizations at the same time. Here's the full URL:

?model_info=statistics.thresholds.true."maximum filter_rate @ recall >= 0.9"|statistics.thresholds.true."maximum recall @ precision >= 0.15"

Which gives us:

"thresholds": {
  "true": [
      "!f1": 0.883,
      "!precision": 0.996,
      "!recall": 0.794,
      "accuracy": 0.797,
      "f1": 0.233,
      "filter_rate": 0.77,
      "fpr": 0.206,
      "match_rate": 0.23,
      "precision": 0.134,
      "recall": 0.901,
      "threshold": 0.09295862121864444
      "!f1": 0.906,
      "!precision": 0.993,
      "!recall": 0.833,
      "accuracy": 0.834,
      "f1": 0.256,
      "filter_rate": 0.81,
      "fpr": 0.167,
      "match_rate": 0.19,
      "precision": 0.151,
      "recall": 0.838,
      "threshold": 0.14750910213671917

So there you have it! There's lots more you can do with this model_info system, but we'll need to save that for another blog post. For now, let us know if you have concerns with the new threshold optimization scheme.

The deployment plan

This announcement blog post is the first step of our deployment plan. We'll be reaching out to @Catrope, @Petrb, @Ragesoss, and other developers who use ORES to make sure that they know this change is coming over the next week. A week from now (Sept. 5th), we'll deploy the new model_info system to https://ores.wmflabs.org and https://ores-beta.wmflabs.org. Then we'll wait at least two weeks and confirm that adaptations have been made to the tools that we know about before finally deploying to https://ores.wikimedia.org (~Sept. 20th)

by Halfak (Aaron Halfaker, EpochFail, halfak) at August 31, 2017 09:22 PM

Semantic MediaWiki

Semantic MediaWiki 2.5.4 released/en

Semantic MediaWiki 2.5.4 released/en

August 7, 2017

Semantic MediaWiki 2.5.4 (SMW 2.5.4) has been released today as a new version of Semantic MediaWiki.

This new version brings a security fix for special page "SemanticMediaWiki". It also provides an improvement for software testing, other bugfixes and further increases platform stability. Since this release provides a security fix it is strongly advised to upgrade immediately! Please refer to the help page on installing Semantic MediaWiki to get detailed instructions on how to install or upgrade.

by TranslateBot at August 31, 2017 09:15 PM

Semantic MediaWiki 2.5.4 released

Semantic MediaWiki 2.5.4 released

August 7, 2017

Semantic MediaWiki 2.5.4 (SMW 2.5.4) has been released today as a new version of Semantic MediaWiki.

This new version brings a security fix for special page "SemanticMediaWiki". It also provides an improvement for software testing, other bugfixes and further increases platform stability. Since this release provides a security fix it is strongly advised to upgrade immediately! Please refer to the help page on installing Semantic MediaWiki to get detailed instructions on how to install or upgrade.

by Kghbln at August 31, 2017 09:12 PM

Wikimedia Scoring Platform Team

Status update (April 14th, 2017)

In this update, I'm going to change some things up to try and make this update easier for you to consume. The biggest change you'll notice is that I've broken up the [#] references in each section. I hope that saves you some scrolling and confusion. You'll also notice that I have changed the subject line from "Revision scoring" to "Scoring Platform" because it's now clear that, come July, I'll be leading a new team with that name at the Wikimedia Foundation. There'll be an announcement about that coming once our budget is finalized. I'll try to keep this subject consistent for the foreseeable future so that your email clients will continue to group the updates into one big thread.

Deployments & maintenance:

In this cycle, we've gotten better at tracking our deployments and noting what changes do out with each deployment. You can click on the phab task for a deployment and observe the sub-tasks to find out what was deployed. We had 3 deployments for ORES since mid-march[1,2,3]. We've had two deployments to Wikilabels[4,5] and we've added a maintenance notices for a short period of downtime that's coming up on April 21st[6,7].

  1. https://phabricator.wikimedia.org/T160279 -- Deploy ores in prod (Mid-March)
  2. https://phabricator.wikimedia.org/T160638 -- Deploy ORES late march
  3. https://phabricator.wikimedia.org/T161748 -- Deploy ORES early April
  4. https://phabricator.wikimedia.org/T161002 -- Late march wikilabels deployment
  5. https://phabricator.wikimedia.org/T163016 -- Deploy Wikilabels mid-April
  6. https://phabricator.wikimedia.org/T162888 -- Add header to Wikilabels that warns of upcoming maintenance.
  7. https://phabricator.wikimedia.org/T162265 -- Manage wikilabels for labsdb1004 maintenance

Making ORES better:

We've been working to make ORES easier to extend and more useful. ORES now reports it's relevant versions at https://ores.wikimedia.org/versions[8]. We've also reduced the complexity of our "precaching" system that scores edits before you ask for them[9,10]. We're taking advantage of logstash to store and query our logs[11]. We've also implemented some nice abstractions for requests and responses in ORES[12] that allowed us to improve our metrics tracking substantially[13].

  1. https://phabricator.wikimedia.org/T155814 -- Expose version of the service and its dependencies
  2. https://phabricator.wikimedia.org/T148714 -- Create generalized "precache" endpoint for ORES
  3. https://phabricator.wikimedia.org/T162627 -- Switch /precache to be a POST end point
  4. https://phabricator.wikimedia.org/T149010 -- Send ORES logs to logstash
  5. https://phabricator.wikimedia.org/T159502 -- Exclude precaching requests from cache_miss/cache_hit metrics
  6. https://phabricator.wikimedia.org/T161526 -- Implement ScoreRequest/ScoreResponse pattern in ORES

New functionality:

In the last month and a half, we've added basic support to Korean Wikipedia[14,15]. Props to Revi for helping us work through a bunch of issues with our Korean language support[16,17,18].

We've also gotten the ORES Review tool deployed to Hebrew Wikipedia[19,20,21,22] and Estonian Wikipedia[23,24,25]. We're also working with the Collaboration team to implement the threshold test statistics that they need to tune their new Edit Review interface[26] and we're working towards making this kind of work self-serve so that that product team and other tool developers won't have to wait on us to implement these threshold stats in the future[27].

  1. https://phabricator.wikimedia.org/T161617 -- Deploy reverted model for kowiki
  2. https://phabricator.wikimedia.org/T161616 -- Train/test reverted model for kowiki
  3. https://phabricator.wikimedia.org/T160752 -- Korean generated word lists are in chinese
  4. https://phabricator.wikimedia.org/T160757 -- Add language support for Korean
  5. https://phabricator.wikimedia.org/T160755 -- Fix tokenization for Korean
  6. https://phabricator.wikimedia.org/T161621 -- Deploy ORES Review Tool for hewiki
  7. https://phabricator.wikimedia.org/T130284 -- Deploy edit quality models for hewiki
  8. https://phabricator.wikimedia.org/T160930 -- Train damaging and goodfaith models for hewiki
  9. https://phabricator.wikimedia.org/T130263 -- Complete hewiki edit quality campaign
  10. https://phabricator.wikimedia.org/T159609 -- Deploy ORES review tool to etwiki
  11. https://phabricator.wikimedia.org/T130280 -- Deploy edit quality models for etwiki
  12. https://phabricator.wikimedia.org/T129702 -- Complete etwiki edit quality campaign
  13. https://phabricator.wikimedia.org/T162377 -- Implement additional test_stats in editquality
  14. https://phabricator.wikimedia.org/T162217 -- Implement "thresholds", deprecate "pile of tests_stats"

ORES training / labeling campaigns:

Thanks to a lot of networking at Wikimedia Conference and some help from Ijon (Asaf Batrov), we've found a bunch of new collaborators to help us deploy ORES to new wikis. As is critcial in this process, we need to deploy labeling campaigns so that Wikipedians can help us train ORES.

We've got new editquality labeling campaigns deployed to Albanian[28], Finnish[29], Latvian[30], Korean[31], and Turkish[21] Wikipedias.

We've also been working on a new type of model: "Item quality" in Wikidata. We've deployed, labeled, and analyzed a pilot[33], fixed some critical bugs that came up[34,35], and we've finally launched a 5k item campaign which is already 17% done[36]! See https://www.wikidata.org/wiki/Wikidata:Item_quality_campaign if you'd like to help us out.

  1. https://phabricator.wikimedia.org/T161981 -- Edit quality campaign for Albanian Wikipedia
  2. https://phabricator.wikimedia.org/T161905 -- Edit quality campaign for Finnish Wikipedia
  3. https://phabricator.wikimedia.org/T162032 -- Edit quality campaign for Latvian Wikipedia
  4. https://phabricator.wikimedia.org/T161622 -- Deploy editquality campaign in Korean Wikipedia
  5. https://phabricator.wikimedia.org/T161977 -- Start v2 editquality campaign for trwiki
  6. https://phabricator.wikimedia.org/T159570 -- Deploy the pilot of Wikidata item quality campaign
  7. https://phabricator.wikimedia.org/T160256 -- Wikidata items render badly in Wikilabels
  8. https://phabricator.wikimedia.org/T162530 -- Implement "unwanted pages" filtering strategy for Wikidata
  9. https://phabricator.wikimedia.org/T157493 -- Deploy Wikidata item quality campaign

Bug fixing:

As usual, we have a few weird bug that got in our way. We needed to move to a bigger virtual machine in "Beta Labs" because our models take up a bunch of hard drive space[37]. We found that Wikilabels wasn't removing expired tasks correctly and that this was making it difficult to finish labeling campaigns[38]. We also had a lot of right-to-left issues when we did an upgrade of OOjs UI[39]. There was an old bug we had with https://translatewiki.net in one of our message keys[40].

  1. https://phabricator.wikimedia.org/T160762 -- deployment-ores-redis /srv/ redis is too small (500MBytes)
  2. https://phabricator.wikimedia.org/T161521 -- Wikilabels is not cleaning up expired tasks for Wikidata item quality campaign
  3. https://phabricator.wikimedia.org/T161533 -- Fix RTL issues in Wikilabels after OOjs UI upgrade
  4. https://phabricator.wikimedia.org/T132197 -- qqq for a wiki-ai message cannot be loaded

Principal Research Scientist
Head of the Scoring Platform Team

(This post was copied from https://lists.wikimedia.org/pipermail/ai/2017-April/000154.html)

by Halfak (Aaron Halfaker, EpochFail, halfak) at August 31, 2017 08:25 PM

Wiki Education Foundation

Learning collaboration with style and research skills

Carie S. Tucker King is a Clinical Professor of Communication at The University of Texas at Dallas. In this post, she shares how she used Wikipedia in her course, “Advanced Writing and Research.”

Carie King
Carie King

“Advanced Writing and Research” is a senior-level prescribed elective that is open to students pursuing any major. So this year, when I checked my enrollment and found a diverse list of majors, I knew I needed to find a project that all students would want to complete. I also knew that my students were using Wikipedia; however, I doubted that they understood the rigor of the research and writing that Wikipedia articles required.

I had already noted that Wikipedia was missing an article on a topic I was researching. I had obtained the archives of the Danforth Foundation, a nonprofit organization started by the founder of the Ralston Purina pet food company, and I was particularly interested in the investment that Mr. William Danforth had made in all-faith chapels on university college campuses.

I proposed the project to the students in my class, explaining that we would supplement our study of advanced writing with the tutorials available through Wiki Education. The students were eager to experiment with Wikipedia. The students—a mix of historical studies, literary studies, public policy and economics, and interdisciplinary and international studies—were juniors, seniors, and graduate students. The students were also from a variety of ethnic backgrounds and nations: from Japan, Jordan, India, and the U.S.

We determined to address all 24 all-faith chapels in the Danforth Chapel Program, so students chose which chapels were most interesting to them, based on the buildings’ histories, universities, locations, and student populations. For example, a student whose family came from Japan was interested to learn that a Japanese university houses an all-faith chapel funded by the Danforth Foundation, and he wanted to investigate that university.

The students also learned to write collaboratively, as we sat in the classroom and researched the Danforth Foundation and shared what we learned with each other. We proofread each other’s work and discussed the citation process and how it differed from the other citation styles that the students were using for their individual projects.

One of the students wrote me a thank-you card, stating that she never anticipated that she would write a full research paper and write a Wikipedia article. Another student expressed to me that she did not realize how exact Wikipedians must be and that she was delighted to have written an article on Wikipedia that she could list on her resume.

The students did not need to struggle to find a page to create or adapt, but they did need to learn more than one citation style—in addition to Wikipedia’s citation style, my students use APA, MLA, and Chicago in their writing—and they also needed to learn how to discuss their own writing and others’ writing as professionals. The students also struggled to find reliable resources and so learned to question if a source is primary or secondary, to question a source’s credibility, and to identify high-quality sources. In this way, they deepened their research experience past a study of methods to consider background and preliminary research and to ask what they could use and what they could trust.

For me, perhaps the best part of the assignment was hearing students talk about sharing their project with friends in other classes. The students were proud of their work published on Wikipedia. I also appreciated the students’ excitement once they began the project and overcame their initial fears. I enjoy seeing students struggling and then breaking free to run toward their goal unabashedly eager to finish and finish well.

ImageCSTKing.jpg, by Carie King, CC BY-SA 4.0, via Wikimedia Commons.

by Guest Contributor at August 31, 2017 04:45 PM

Wikimedia Foundation

A ‘couple’ of Telugu-language Wikimedians: T Sujatha and Sri Ramamurthy

Photo by T Sujatha, CC BY-SA 4.0.

T Sujatha found her way to Wikipedia through a simple internet search—one that brought up a Wikipedia article written in her native Telugu language, the third-largest language in India, spoken by about 74 million people. After working to understand the site and what it was trying to achieve, Sujatha started editing in August 2006.

Some years later, Sujatha’s husband Sri Ramamurthy joined her in attending a Telugu Wikipedia conference. On the suggestion of one of the attendees, he took part in a “Wiki Academy” there—a crash course in how to edit the encyclopedia. He picked it up quickly, and soon started contributing to the Telugu Wikisource, a digital library for freely licensed and out-of-copyright works, as well.

Both are still highly active editors even today, something helped by the friendly environment in which they can work. The Telugu Wikipedia has only about 160 active editors, so “friendly companionship with fellow Wikipedians is essential,” both Sujatha and Ramamurthy say, for reasons not unlike the dynamics of living in a small town.

Sujatha has helped organize several meetups for Telugu Wikipedians, including anniversary celebrations for the site’s tenth and eleventh birthdays, and has participated in others—like the Lilavati’s Daughters edit-a-thon, based on a book published by the Indian Academy of Sciences that focused on 50 women in science. When not writing about notable women from history, she focuses her writing on tourist destinations, Mughal emperors, astrology, and religion. Sujatha’s best two articles, in her opinion, are those about Vancouver and Guatemala.

Ramamurthy works extensively with Rajasekhar1961, a bureaucrat on the Telugu Wikipedia and an administrator on two other Telugu-language Wikimedia sites, to select books for digitization on the Telugu Wikisource—a chronology of Nepalese history, for one. On Wikipedia, he’s concentrated his efforts on the smaller town and villages for which there no entries on the English Wikipedia.

Interviews by Muzammiluddin Syed, Wikimedia community member
Profile by Ed Erhart, Editorial Associate, Wikimedia Foundation

Thanks to Pavan Santhosh for facilitating these interviews.

by Syed Muzammiluddin and Ed Erhart at August 31, 2017 03:52 PM