en.planet.wikimedia

January 18, 2017

Greg Sabino Mullane

MediaWiki extension.json change in 1.25

I recently released a new version of the MediaWiki "Request Tracker" extension, which provides a nice interface to your RequestTracker instance, allowing you to view the tickets right inside of your wiki. There are two major changes I want to point out. First, the name has changed from "RT" to "RequestTracker". Second, it is using the brand-new way of writing MediaWiki extensions, featuring the extension.json file.

The name change rationale is easy to understand: I wanted it to be more intuitive and easier to find. A search for "RT" on mediawiki.org ends up finding references to the WikiMedia RequestTracker system, while a search for "RequestTracker" finds the new extension right away. Also, the name was too short and failed to indicate to people what it was. The "rt" tag used by the extension stays the same. However, to produce a table showing all open tickets for user 'alois', you still write:

<rt u='alois'></rt>

The other major change was to modernize it. As of version 1.25 of MediaWiki, extensions are encouraged to use a new system to register themselves with MediaWiki. Previously, an extension would have a PHP file named after the extension that was responsible for doing the registration and setup—usually by mucking with global variables! There was no way for MediaWiki to figure out what the extension was going to do without parsing the entire file, and thereby activating the extension. The new method relies on a standard JSON file called extension.json. Thus, in the RequestTracker extension, the file RequestTracker.php has been replaced with the much smaller and simpler extension.json file.

Before going further, it should be pointed out that this is a big change for extensions, and was not without controversy. However, as of MediaWiki 1.25 it is the new standard for extensions, and I think the project is better for it. The old way will continue to be supported, but extension authors should be using extension.json for new extensions, and converting existing ones over. As an aside, this is another indication that JSON has won the data format war. Sorry, XML, you were too big and bloated. Nice try YAML, but you were a little *too* free-form. JSON isn't perfect, but it is the best solution of its kind. For further evidence, see Postgres, which now has outstanding support for JSON and JSONB. I added support for YAML output to EXPLAIN in Postgres some years back, but nobody (including me!) was excited enough about YAML to do more than that with it. :)

The extension.json file asks you to fill in some standard metadata fields about the extension, which are then used by MediaWiki to register and set up the extension. Another advantage of doing it this way is that you no longer need to add a bunch of ugly include_once() function calls to your LocalSettings.php file. Now, you simply call the name of the extension as an argument to the wfLoadExtension() function. You can even load multiple extensions at once with wfLoadExtensions():

## Old way:
require_once("$IP/extensions/RequestTracker/RequestTracker.php");
$wgRequestTrackerURL = 'https://rt.endpoint.com/Ticket/Display.html?id';

## New way:
wfLoadExtension( 'RequestTracker' );
$wgRequestTrackerURL = 'https://rt.endpoint.com/Ticket/Display.html?id';

## Or even load three extensions at once:
wfLoadExtensions( array( 'RequestTracker', 'Balloons', 'WikiEditor' ) );
$wgRequestTrackerURL = 'https://rt.endpoint.com/Ticket/Display.html?id';

Note that configuration changes specific to the extension still must be defined in the LocalSettings.php file.

So what should go into the extension.json file? The extension development documentation has some suggested fields, and you can also view the canonical extension.json schema. Let's take a quick look at the RequestTracker/extension.json file. Don't worry, it's not too long.

{
    "manifest_version": 1,
    "name": "RequestTracker",
    "type": "parserhook",
    "author": [
        "Greg Sabino Mullane"
    ],
    "version": "2.0",
    "url": "https://www.mediawiki.org/wiki/Extension:RequestTracker",
    "descriptionmsg": "rt-desc",
    "license-name": "PostgreSQL",
    "requires" : {
        "MediaWiki": ">= 1.25.0"
    },
    "AutoloadClasses": {
        "RequestTracker": "RequestTracker_body.php"
    },
    "Hooks": {
        "ParserFirstCallInit" : [
            "RequestTracker::wfRequestTrackerParserInit"
        ]
    },
    "MessagesDirs": {
        "RequestTracker": [
            "i18n"
            ]
    },
    "config": {
        "RequestTracker_URL": "http://rt.example.com/Ticket/Display.html?id",
        "RequestTracker_DBconn": "user=rt dbname=rt",
        "RequestTracker_Formats": [],
        "RequestTracker_Cachepage": 0,
        "RequestTracker_Useballoons": 1,
        "RequestTracker_Active": 1,
        "RequestTracker_Sortable": 1,
        "RequestTracker_TIMEFORMAT_LASTUPDATED": "FMHH:MI AM FMMonth DD, YYYY",
        "RequestTracker_TIMEFORMAT_LASTUPDATED2": "FMMonth DD, YYYY",
        "RequestTracker_TIMEFORMAT_CREATED": "FMHH:MI AM FMMonth DD, YYYY",
        "RequestTracker_TIMEFORMAT_CREATED2": "FMMonth DD, YYYY",
        "RequestTracker_TIMEFORMAT_RESOLVED": "FMHH:MI AM FMMonth DD, YYYY",
        "RequestTracker_TIMEFORMAT_RESOLVED2": "FMMonth DD, YYYY",
        "RequestTracker_TIMEFORMAT_NOW": "FMHH:MI AM FMMonth DD, YYYY"
    }
}

The first field in the file is manifest_version, and simply indicates the extension.json schema version. Right now it is marked as required, and I figure it does no harm to throw it in there. The name field should be self-explanatory, and should match your CamelCase extension name, which will also be the subdirectory where your extension will live under the extensions/ directory. The type field simply tells what kind of extension this is, and is mostly used to determine which section of the Special:Version page an extension will appear under. The author is also self-explanatory, but note that this is a JSON array, allowing for multiple items if needed. The version and url are highly recommended. For the license, I chose the dirt-simple PostgreSQL license, whose only fault is its name. The descriptionmsg is what will appear as the description of the extension on the Special:Version page. As it is a user-facing text, it is subject to internationalization, and thus rt-desc is converted to your current language by looking up the language file inside of the extension's i18n directory.

The requires field only supports a "MediaWiki" subkey at the moment. In this case, I have it set to require at least version 1.25 of MediaWiki - as anything lower will not even be able to read this file! The AutoloadClasses key is the new way of loading code needed by the extension. As before, this should be stored in a php file with the name of the extension, an underscore, and the word "body" (e.g. RequestTracker_body.php). This file contains all of the functions that perform the actual work of the extension.

The Hooks field is one of the big advantages of the new extension.json format. Rather than worrying about modifying global variables, you can simply let MediaWiki know what functions are associated with which hooks. In the case of RequestTracker, we need to do some magic whenever a <rt> tag is encountered. To that end, we need to instruct the parser that we will be handling any <rt> tags it encounters, and also tell it what to do when it finds them. Those details are inside the wfRequestTrackerParserInit function:

function wfRequestTrackerParserInit( Parser $parser ) {

    $parser->setHook( 'rt', 'RequestTracker::wfRequestTrackerRender' );

    return true;
}

The config field provides a list of all user-configurable variables used by the extension, along with their default values.

The MessagesDirs field tells MediaWiki where to find your localization files. This should always be in the standard place, the i18n directory. Inside that directory are localization files, one for each language, as well as a special file named qqq.json, which gives information about each message string as a guide to translators. The language files are of the format "xxx.json", where "xxx" is the language code. For example, RequestTracker/i18n/en.json contains English versions of all the messages used by the extension. The i18n files look like this:

$ cat en.json
{
  "rt-desc"       : "Fancy interface to RequestTracker using <code>&lt;rt&gt;</code> tag",
  "rt-inactive"   : "The RequestTracker extension is not active",
  "rt-badcontent" : "Invalid content args: must be a simple word. You tried: <b>$1</b>",
  "rt-badquery"   : "The RequestTracker extension encountered an error when talking to the RequestTracker database",
  "rt-badlimit"   : "Invalid LIMIT (l) arg: must be a number. You tried: <b>$1</b>",
  "rt-badorderby" : "Invalid ORDER BY (ob) arg: must be a standard field (see documentation). You tried: <b>$1</b>",
  "rt-badstatus"  : "Invalid status (s) arg: must be a standard field (see documentation). You tried: <b>$1</b>",
  "rt-badcfield"  : "Invalid custom field arg: must be a simple word. You tried: <b>$1</b>",
  "rt-badqueue"   : "Invalid queue (q) arg: must be a simple word. You tried: <b>$1</b>",
  "rt-badowner"   : "Invalid owner (o) arg: must be a valud username. You tried: <b>$1</b>",
  "rt-nomatches"  : "No matching RequestTracker tickets were found"
}

$ cat fr.json
{
  "@metadata": {
     "authors": [
         "Josh Tolley"
      ]
  },
  "rt-desc"       : "Interface sophistiquée de RequestTracker avec l'élement <code>&lt;rt&gt;</code>.",
  "rt-inactive"   : "Le module RequestTracker n'est pas actif.",
  "rt-badcontent" : "Paramètre de contenu « $1 » est invalide: cela doit être un mot simple.",
  "rt-badquery"   : "Le module RequestTracker ne peut pas contacter sa base de données.",
  "rt-badlimit"   : "Paramètre à LIMIT (l) « $1 » est invalide: cela doit être un nombre entier.",
  "rt-badorderby  : "Paramètre à ORDER BY (ob) « $1 » est invalide: cela doit être un champs standard. Voir le manuel utilisateur.",
  "rt-badstatus"  : "Paramètre de status (s) « $1 » est invalide: cela doit être un champs standard. Voir le manuel utilisateur.",
  "rt-badcfield"  : "Paramètre de champs personalisé « $1 » est invalide: cela doit être un mot simple.",
  "rt-badqueue"   : "Paramètre de queue (q) « $1 » est invalide: cela doit être un mot simple.",
  "rt-badowner"   : "Paramètre de propriétaire (o) « $1 » est invalide: cela doit être un mot simple.",
  "rt-nomatches"  : "Aucun ticket trouvé"
}

One other small change I made to the extension was to allow both ticket numbers and queue names to be used inside of the tag. To view a specific ticket, one was always able to do this:

<rt>6567</rt>

This would produce the text "RT #6567", with information on the ticket available on mouseover, and hyperlinked to the ticket inside of RT. However, I often found myself using this extension to view all the open tickets in a certain queue like this:

<rt q="dyson"></rt>

It seems easier to simply add the queue name inside the tags, so in this new version one can simply do this:

<rt>dyson</rt>

If you are running MediaWiki 1.25 or better, try out the new RequestTracker extension! If you are stuck on an older version, use the RT extension and upgrade as soon as you can. :)

by Greg Sabino Mullane (noreply@blogger.com) at January 18, 2017 03:41 AM

Broken wikis due to PHP and MediaWiki "namespace" conflicts

I was recently tasked with resurrecting an ancient wiki. In this case, a wiki last updated in 2005, running MediaWiki version 1.5.2, and that needed to get transformed to something more modern (in this case, version 1.25.3). The old settings and extensions were not important, but we did want to preserve any content that was made.

The items available to me were a tarball of the mediawiki directory (including the LocalSettings.php file), and a MySQL dump of the wiki database. To import the items to the new wiki (which already had been created and was gathering content), an XML dump needed to be generated. MediaWiki has two simple command-line scripts to export and import your wiki, named dumpBackup.php and importDump.php. So it was simply a matter of getting the wiki up and running enough to run dumpBackup.php.

My first thought was to simply bring the wiki up as it was - all the files were in place, after all, and specifically designed to read the old version of the schema. (Because the database scheme changes over time, newer MediaWikis cannot run against older database dumps.) So I unpacked the MediaWiki directory, and prepared to resurrect the database.

Rather than MySQL, the distro I was using defaulted to using the freer and arguably better MariaDB, which installed painlessly.

## Create a quick dummy database:
$ echo 'create database footest' | sudo mysql

## Install the 1.5.2 MediaWiki database into it:
$ cat mysql-acme-wiki.sql | sudo mysql footest

## Sanity test as the output of the above commands is very minimal:
echo 'select count(*) from revision' | sudo mysql footest
count(*)
727977

Success! The MariaDB instance was easily able to parse and load the old MySQL file. The next step was to unpack the old 1.5.2 mediawiki directory into Apache's docroot, adjust the LocalSettings.php file to point to the newly created database, and try and access the wiki. Once all that was done, however, both the browser and the command-line scripts spat out the same error:

Parse error: syntax error, unexpected 'Namespace' (T_NAMESPACE), 
  expecting identifier (T_STRING) in 
  /var/www/html/wiki/includes/Namespace.php on line 52

What is this about? Turns out that some years ago, someone added a class to MediaWiki with the terrible name of "Namespace". Years later, PHP finally caved to user demands and added some non-optimal support for namespaces, which means that (surprise), "namespace" is now a reserved word. In short, older versions of MediaWiki cannot run with modern (5.3.0 or greater) versions of PHP. Amusingly, a web search for this error on DuckDuckGo revealed not only many people asking about this error and/or offering solutions, but many results were actual wikis that are currently not working! Thus, their wiki was working fine one moment, and then PHP was (probably automatically) upgraded, and now the wiki is dead. But DuckDuckGo is happy to show you the wiki and its now-single page of output, the error above. :)

There are three groups to blame for this sad situation, as well as three obvious solutions to the problem. The first group to share the blame, and the most culpable, is the MediaWiki developers who chose the word "Namespace" as a class name. As PHP has always had very non-existent/poor support for packages, namespaces, and scoping, it is vital that all your PHP variables, class names, etc. are as unique as possible. To that end, the name of the class was changed at some point to "MWNamespace" - but the damage has been done. The second group to share the blame is the PHP developers, both for not having namespace support for so long, and for making it into a reserved word full knowing that one of the poster children for "mature" PHP apps, MediaWiki, was using "namespace". Still, we cannot blame them too much for picking what is a pretty obvious word choice. The third group to blame is the owners of all those wikis out there that are suffering that syntax error. They ought to be repairing their wikis. The fixes are pretty simple, which leads us to the three solutions to the problem.


MediaWiki's cool install image

The quickest (and arguably worst) solution is to downgrade PHP to something older than 5.3. At that point, the wiki will probably work again. Unless it's a museum (static) wiki, and you do not intend to upgrade anything on the server ever again, this solution will not work long term. The second solution is to upgrade your MediaWiki! The upgrade process is actually very robust and works well even for very old versions of MediaWiki (as we shall see below). The third solution is to make some quick edits to the code to replace all uses of "Namespace" with "MWNamespace". Not a good solution, but ideal when you just need to get the wiki up and running. Thus, it's the solution I tried for the original problem.

However, once I solved the Namespace problem by renaming to MWNamespace, some other problems popped up. I will not run through them here - although they were small and quickly solved, it began to feel like a neverending whack-a-mole game, and I decided to cut the Gordian knot with a completely different approach.

As mentioned, MediaWiki has an upgrade process, which means that you can install the software and it will, in theory, transform your database schema and data to the new version. However, version 1.5 of MediaWiki was released in October 2005, almost exactly 10 years ago from the current release (1.25.3 as of this writing). Ten years is a really, really long time on the Internet. Could MediaWiki really convert something that old? (spoilers: yes!). Only one way to find out. First, I prepared the old database for the upgrade. Note that all of this was done on a private local machine where security was not an issue.

## As before, install mariadb and import into the 'footest' database
$ echo 'create database footest' | sudo mysql test
$ cat mysql-acme-wiki.sql | sudo mysql footest
$ echo "set password for 'root'@'localhost' = password('foobar')" | sudo mysql test

Next, I grabbed the latest version of MediaWiki, verified it, put it in place, and started up the webserver:

$ wget http://releases.wikimedia.org/mediawiki/1.25/mediawiki-1.25.3.tar.gz
$ wget http://releases.wikimedia.org/mediawiki/1.25/mediawiki-1.25.3.tar.gz.sig

$ gpg --verify mediawiki-1.25.3.tar.gz.sig 
gpg: assuming signed data in `mediawiki-1.25.3.tar.gz'
gpg: Signature made Fri 16 Oct 2015 01:09:35 PM EDT using RSA key ID 23107F8A
gpg: Good signature from "Chad Horohoe "
gpg:                 aka "keybase.io/demon "
gpg:                 aka "Chad Horohoe (Personal e-mail) "
gpg:                 aka "Chad Horohoe (Alias for existing email) "
## Chad's cool. Ignore the below.
gpg: WARNING: This key is not certified with a trusted signature!
gpg:          There is no indication that the signature belongs to the owner.
Primary key fingerprint: 41B2 ABE8 17AD D3E5 2BDA  946F 72BC 1C5D 2310 7F8A

$ tar xvfz mediawiki-1.25.3.tar.gz
$ mv mediawiki-1.25.3 /var/www/html/
$ cd /var/www/html/mediawiki-1.25.3
## Because "composer" is a really terrible idea:
$ git clone https://gerrit.wikimedia.org/r/p/mediawiki/vendor.git 
$ sudo service httpd start

Now, we can call up the web page to install MediaWiki.

  • Visit http://localhost/mediawiki-1.25.3, see the familiar yellow flower
  • Click "set up the wiki"
  • Click next until you find "Database name", and set to "footest"
  • Set the "Database password:" to "foobar"
  • Aha! Looks what shows up: "Upgrade existing installation" and "There are MediaWiki tables in this database. To upgrade them to MediaWiki 1.25.3, click Continue"

It worked! Next messages are: "Upgrade complete. You can now start using your wiki. If you want to regenerate your LocalSettings.php file, click the button below. This is not recommended unless you are having problems with your wiki." That message is a little misleading. You almost certainly *do* want to generate a new LocalSettings.php file when doing an upgrade like this. So say yes, leave the database choices as they are, and name your wiki something easily greppable like "ABCD". Create an admin account, save the generated LocalSettings.php file, and move it to your mediawiki directory.

At this point, we can do what we came here for: generate a XML dump of the wiki content in the database, so we can import it somewhere else. We only wanted the actual content, and did not want to worry about the history of the pages, so the command was:

$ php maintenance/dumpBackup.php --current > acme.wiki.2005.xml

It ran without a hitch. However, close examination showed that it had an amazing amount of unwanted stuff from the "MediaWiki:" namespace. While there are probably some clever solutions that could be devised to cut them out of the XML file (either on export, import, or in between), sometimes quick beats clever, and I simply opened the file in an editor and removed all the "page" sections with a title beginning with "MediaWiki:". Finally, the file was shipped to the production wiki running 1.25.3, and the old content was added in a snap:

$ php maintenance/importDump.php acme.wiki.2005.xml

The script will recommend rebuilding the "Recent changes" page by running rebuildrecentchanges.php (can we get consistentCaps please MW devs?). However, this data is at least 10 years old, and Recent changes only goes back 90 days by default in version 1.25.3 (and even shorter in previous versions). So, one final step:

## 20 years should be sufficient
$ echo '$wgRCMAxAge = 20 * 365 * 24 * 3600;' >> LocalSettings.php
$ php maintenance/rebuildrecentchanges.php

Voila! All of the data from this ancient wiki is now in place on a modern wiki!

by Greg Sabino Mullane (noreply@blogger.com) at January 18, 2017 03:23 AM

January 17, 2017

Erik Zachte

Browse winning Wiki Loves Monuments images offline

wlm_2016_in_aks_the_reflection_taj_mahal

Click to show full size (1136×640), e.g. for iPhone 5

 

The pages on Wikimedia Commons which list the winners of the yearly contests [1] contain a feature ‘Watch as Slideshow!’. Works great.

However, wouldn’t it be nice if you could also show these images offline (outside a browser), annotated and resized for minimal footprint?

Most end-of-year vacations I do a hobby project for Wikipedia. This time I worked on a script [2] [3] to make the above happen. The script does the following:

  • Download all images from Wiki Loves Monuments winners pages [1]
  • Collect image, author and license info for each image on those winners pages
  • or if not available there, collect these meta data from the upload pages on Commons
  • Resize the images so they are exactly the required size
  • Annotate the image unobtrusively in a matching font size:
    contest year, country, title, author, license
wlm-annotations

Font size used for 2560×1600 image

 

  • Prefix the downloaded image for super easy filtering on year and/or countrywlm-winners-file-list-detail


I pre-rendered several sets with common image sizes, ready for download. You can request an extra set for other common screen sizes [4] [5]:

wlm_download_folder


For instance the 1920×1080 set is ideal for HDTV (e.g. for Appl
e TV screensaver) or large iPhones. On TV the texts are readable by itself, on phone some manual zooming is needed (but unobtrusiveness is key).

[1] 2010 2011 2012 2013 2014 2015 2016
[2] The script has been tested on Windows 10.
Prerequisites: curl and ImageMagicks convert (in same folder).
[3] I am actually already rewriting the script, separating it into two scripts, to make it more modular and more generally applicable. First script will extract information from WLM/WLE (WLA?) winners pages and image upload pages, and generate a csv file. Second script will read this csv, download images, resize and annotate them. I will announce the git url here when done.
[4] 4K is a bit too large for easy upload. I may do that later when the script can also run on WMF servers.
[5] Current sets are optimal for e.g. HDTV and new iPhones (again, others may follow):
1920×1080 HDTV and iPhone 6+/7+
1334×750 iPhone 6/6s/7
1136×640 iPhone 5/5s 

by Erik at January 17, 2017 12:46 PM

Gerard Meijssen

#Wikimedia - What is our mission

Many Wikipedians have a problem with Wikidata. It is very much cultural. One argument is that Wikidata does not comply with their policies and therefore cannot be used. A case in point is "notability", Wikidata knows about much more and how can that all be good?

To be honest, Wikidata is immature and it needs to be a lot better. When a Wikipedia community does not want to incorporate data from Wikidata at this point, fine. Let us find what it takes to do so in the future. Let us work on approaches that are possible now and add value to everyone.

Many of the arguments that are used show a lack of awareness of Wikipedia's own history. There are no reminders to the times when it was good to be "bold". It is forgotten that content should be allowed to improve over time and, this is still true for all of the Wikimedia content.

The problem is that every Wikidata provides a service to every Wikimedia project and as a consequence there are parts of a project where Wikidata will never comply with its policies. Arguably, all the policies of all the projects including Wikidata service what the Wikimedia Foundation is about it is to provide "every single person on the planet is given free access to the sum of all human knowledge".  When the argument is framed in this way, the question becomes a different one; it becomes how can we benefit from each other and how can we strengthen the quality of each others offerings.

Wikidata got a flying start when it replaced all the interwiki links. When all the wiki links and red links are associated with Wikidata links, it will allow for new ways to improve the consistency of Wikipedia. The problem with culture is that it is resistant to change. So when the entrenched practice is that they do not want Wikidata, let's give them the benefits of Wikidata. In a "phabricator" thingie I tried to describe it.

The proposal is for both red links and wiki links to be associated with Wikidata items. It will make it easier to use the data tools associated with Wikidata to verify, curate and improve the Wikipedia content. Obviously every link could have an associated statement. When more and more Wikipedia links are associated with statements Wikidata improves but as part of the process, these links are verified and errors will be removed.

The nice thing is that the proposal allows for it to be "opt in". The old school Wikipedians do not have to notice. It will only be for those who understand the premise of using Wikidata to improve content. In the end it will allow Wikidata and even Wikipedia to mature. It will bring another way to look at quality and it will ensure that all the content of the Wikimedia Foundation will get better integrated and be of a higher quality.
Thanks,
      GerardM

by Gerard Meijssen (noreply@blogger.com) at January 17, 2017 09:25 AM

Wikimedia Foundation

Wikipedia is built on the public domain

Image by the US Department of Agriculture, public domain/CC0.

Image by the US Department of Agriculture, public domain/CC0.

Wikipedia is built to be shared and remixed. This is possible, in part, thanks to the incredible amount of material that is available in the public domain. The public domain refers to a wide range of creations that are not restricted by copyright, and can be used freely by anyone. These works can be copied, translated, or remixed, so the public domain provides a crucial foundation for new creative works. On Wikipedia, some articles are based on text from older public domain encyclopedias or include images no longer protected by copyright. People regularly use public domain material to bring educational content to life and create interesting new ways to share it further.

There are three basic ways that material commonly enters the public domain.

First, when you think of the public domain, you may think of the very old creations that have expired copyright. In the United States and many other countries, copyright lasts for the life of the author plus seventy years. Works published before 1923 are in the public domain, but the rest are governed by complex copyright rules. Peter B. Hirtle of Cornell University created a helpful chart to determine when the copyright terms for various types of works will expire in the U.S.. Due to the copyright term extension in the 1976 Copyright Act and later amendments, published works from the United States will not start entering the public domain until 2019. In places outside of the U.S., copyright terms expire after shorter terms on January 1, celebrated annually as public domain day.

Second, a valuable contributor to the public domain is the U.S. federal government. Works created by the U.S. government are in the public domain as a matter of law. This means that government websites may provide a rich source of freely usable photographs and other material. A primary purpose of copyright is to promote creation by rewarding people with exclusive rights, but the government does not need this sort of incentive. Government works are already funded directly by taxpayers, and should belong to the public. Putting the government’s creations in the public domain allows everyone to benefit from the government’s work.

Third, some authors choose to dedicate their creations to the public domain. Tools like Creative Commons Zero (CC0) allow people to mark works that the public can freely used without restrictions or conditions. CC0 is used for some highly creative works, like the photographs on Unsplash. Other creators may wish release their works freely, but still maintain some copyright with minimal conditions attached. These users may adopt a license like Creative Commons Attribution Share-Alike (CC BY-SA) to require other users to provide credit and re-license their works. Most of the photographs on Wikimedia Commons and all the articles on Wikipedia are freely available under CC BY-SA. While these works still have copyright and are not completely in the public domain, they can still be shared and remixed freely alongside public domain material.

In the coming years, legislators in many countries will consider writing new copyright rules to adapt to changes in technology and the economy. One important consideration is how these proposals will protect the public domain to provide room for new creations. The European Parliament has already begun considering a proposed change to the Copyright Directive, including concerning new rights that would make the public domain less accessible to the public. As copyright terms have been extended over the past few decades, works from the 1960s remain expensive and inaccessible that would otherwise be free of copyright. As we consider changing copyright rules, we should remember that everyone, including countless creators, will benefit from a rich and vibrant public domain.

Stephen LaPorte, Senior Legal Counsel
Wikimedia Foundation

Interested in getting more involved? Learn more about the Wikimedia Foundation’s position on copyright, and join the public policy mailing list to discuss how Wikimedia can continue to protect the public domain.

by Stephen LaPorte at January 17, 2017 12:25 AM

January 16, 2017

Wiki Education Foundation

The Roundup: Serious Business

It can be tricky to find publicly accessible, objective information about business-related subjects. It’s more common for there to be monetary incentives to advocate, promote, omit, or underplay particular aspects, points of view, or examples. The concepts can also be complex, weaving together theory, history, law, and a variety of opinions. Effectively writing about business on Wikipedia thus requires neutrality, but also great care in selecting sources and the ability to summarize the best information about a topic. It’s for these reasons that students can make particularly valuable contributions to business topics on Wikipedia. They arrive at the subject without the burden of a conflict of interest that a professional may have, they have access to high-quality sources, and have an expert to guide them on their way.

Students in Amy Carleton’s Advanced Writing in the Business Administration Professions course at Northeastern University made several such contributions.

One student contributed to the article on corporate social responsibility, adding information from academic research on the effects of the business model on things like employee turnover and customer relations.

Another student created the article about the investigation of Apple’s transfer pricing arrangements with Ireland, a three-year investigation into the tax benefits Apple, Inc. received. The result was the “biggest tax claim ever”, though the decision is being appealed.

Overtime is something that affects millions of workers, and which has been a common topic of labor disputes. Wikipedia has an article about overtime in general, but it’s largely an overview of relevant laws. What had not been covered until a student created the article, are the effects of overtime. Similarly, while Wikipedia covers a wide range of immigration topics, it did not yet cover the international entrepreneur rule, a proposed immigration regulation that would to admit more foreign entrepreneurs into the United States. As with areas where there are common monetary conflicts of interest, controversial subjects like immigration policy are also simultaneously challenging to write and absolutely crucial to write about.

Some of the other topics covered in the class include philanthropreneurs, the globalization of the football transfer market, peer-to-peer transactions, and risk arbitrage.

Contributing well-written, neutral information about challenging but important topics is a valuable public service. If you’re an instructor who may want to participate, Wiki Ed is here to help. We’re a non-profit organization that can provide you with free tools and staff support for you and your students as you have them contribute to public knowledge on Wikipedia for a class assignment. To learn more, head to teach.wikipedia.org or email contact@wikiedu.org.

Photo: Dodge Hall Northeastern University.jpg, User:Piotrus (original) / User:Rhododendrites (derivative), CC BY-SA 3.0, via Wikimedia Commons.

by Ryan McGrady at January 16, 2017 05:07 PM

Semantic MediaWiki

Semantic MediaWiki 2.4.5 released/en

Semantic MediaWiki 2.4.5 released/en


January 16, 2017

Semantic MediaWiki 2.4.5 (SMW 2.4.5) has been released today as a new version of Semantic MediaWiki.

This new version is a minor release and provides bugfixes for the current 2.4 branch of Semantic MediaWiki. Please refer to the help page on installing Semantic MediaWiki to get detailed instructions on how to install or upgrade.

by TranslateBot at January 16, 2017 01:27 PM

Semantic MediaWiki 2.4.5 released

Semantic MediaWiki 2.4.5 released

January 16, 2017

Semantic MediaWiki 2.4.5 (SMW 2.4.5) has been released today as a new version of Semantic MediaWiki.

This new version is a minor release and provides bugfixes for the current 2.4 branch of Semantic MediaWiki. Please refer to the help page on installing Semantic MediaWiki to get detailed instructions on how to install or upgrade.

by Kghbln at January 16, 2017 01:24 PM

User:Legoktm

MediaWiki - powered by Debian

Barring any bugs, the last set of changes to the MediaWiki Debian package for the stretch release landed earlier this month. There are some documentation changes, and updates for changes to other, related packages. One of the other changes is the addition of a "powered by Debian" footer icon (drawn by the amazing Isarra), right next to the default "powered by MediaWiki" one.

Powered by Debian

This will only be added by default to new installs of the MediaWiki package. But existing users can just copy the following code snippet into their LocalSettings.php file (adjust paths as necessary):

# Add a "powered by Debian" footer icon
$wgFooterIcons['poweredby']['debian'] = [
    "src" => "/mediawiki/resources/assets/debian/poweredby_debian_1x.png",
    "url" => "https://www.debian.org/",
    "alt" => "Powered by Debian",
    "srcset" =>
        "/mediawiki/resources/assets/debian/poweredby_debian_1_5x.png 1.5x, " .
        "/mediawiki/resources/assets/debian/poweredby_debian_2x.png 2x",
];

The image files are included in the package itself, or you can grab them from the Git repository. The source SVG is available from Wikimedia Commons.

by legoktm at January 16, 2017 09:18 AM

January 15, 2017

Wikimedia Foundation

Librarians offer the gift of a footnote to celebrate Wikipedia’s birthday: Join #1lib1ref 2017

Photo by Diliff, CC BY-SA 4.0.

Photo by Diliff, CC BY-SA 4.0.

Wikipedia has just turned 16, at a time when the need for accurate, reliable information is greater than ever. In a world where social media channels are awash with fake news, and unreliable assertions come from every corner, the Wikimedia communities and Wikipedia in particular have offered a space for that free, accessible and reliable information to be aggregated and shared with the broader world.

Making sure that the public, our patrons, reach the best sources of information is at the heart of the Wikipedia community’s ideals. The concept of all the information on Wikipedia being “verifiable”, connected to an editorially controlled source, like a reputable newspaper or academic journal, has helped focus the massive collaborative effort that Wikipedia represents.

This connection of Wikipedia’s information to sourcing, however, is an ideal; Wikipedia grows through the contributions of thousands of people every month, and we cannot practically expect every new editor to understand how Wikipedia relies on footnotes, how to find the right kinds of research material, or how to add those references to Wikipedia. All of these steps require not only a broader understanding of research, but how those skills apply to our context.

Unlike an average Wikipedia reader, librarians understand these skills intimately: not only do librarians have training and practical experience finding and integrating reference materials into written works, but they teach patrons these vital 21st-century information literacy skills every day. In the face of a flood of bad information, the health of Wikipedia relies not only on contributors, but community educators who can help our readers understand how our content is created. Ultimately, the skills and goals of the library community are aligned with the Wikipedia community.

That is why we are asking librarians to “Imagine a world where every librarian added one more reference to Wikipedia” as part of our second annual “1 Librarian, 1 Reference” (#1lib1ref) campaign. There are plenty of opportunities to get involved: there are over 313,000 “citation needed” statements on Wikipedia and 213,000 articles without any citations at all.

Last year, #1lib1ref spread around the world, helping over 500 librarians contribute thousands of citations, and sparking a conversation among library communities about what role Wikipedia has in the information ecosystem. Still, Wikipedia has over 40 million articles in hundreds of languages; though the hundreds of librarians made many contributions to the English, Catalan and a few other language Wikipedias, we need more to significantly change the experience of Wikipedia’s hundreds of millions of readers.

This year, we are calling on librarians the world over to make #1lib1ref a bigger, better contribution to a real-information based future. We are:

  • Supporting more languages for the campaign
  • Providing a kit to help organize gatherings of librarians to contribute and talk about Wikipedia’s role in librarianship.
  • Extending the campaign for another couple weeks, from January 15 until February 3.

Share the campaign in your networks and go to your library to ask your librarian to join in the campaign in the coming weeks, to contribute a precious Wikipedia birthday gift to the world: one more citation on Wikipedia!

Alex Stinson, GLAM-Wiki Strategist
Wikimedia Foundation

You can learn more about 1lib1ref at its campaign page.

by Alex Stinson at January 15, 2017 08:31 PM

Gerard Meijssen

#Wikipedia - Who is Fiona Hile?

When you look for Fiona Hile on the English Wikipedia, you will find this. It is a puzzle and there are probably two people by that name that do not have an article (yet).

One of them is an Australian poet. When you google for her you find among other things a picture. When you seek her information on VIAF you find two identifiers and in the near future she will have a third: Wikidata.

From a Wikidata point of view it is relevant to have an item for her because she won two awards. It completes these lists and it connects the two awards to the same person.

When you asks yourself is Mrs Hile really "notable", you find that the answer depends on your point of view. Wikipedia already mentions her twice and surely a discussion on the relative merits of notability is not everyone's cup of tea.

Why is Mrs Hile notable enough to blog about? It is a great example that Wikipedia and Wikidata together can produce more and better information.
Thanks,
      GerardM

by Gerard Meijssen (noreply@blogger.com) at January 15, 2017 07:40 PM

The Peter Porter Poetry Prize

For me the Peter Porter Poetry Prize is an award like so many others. There is one article, it lists the names of some of the people who are known to have won the prize. Some are linked and some are not. For one winner I linked to a German article and for a few others I created an item.

This list is complete, it has a link to a source so the information can be verified and I am satisfied with the result up to a point.

What I could do is add more awards and people who have won awards. The article for Tracy Ryan, the 2009 winner, has a category for another award that she won.  This award does not have a webpage with all the past winners so the question is; is Wikipedia good enough as a source. I added the winners to the award, made a mistake corrected it and now Wikidata knows about a Nathan Hobby.

Jay Martin is the 2016 winner of the  T.A.G. Hungerford Award. It has a source but it is extremely likely that this will disappear in  2017. The problem I have is that I want to see this information shared but all the work done to improve on Wikidata data is not seen at Wikipedia. When we share our resources and when we are better in tune with each others needs as editors, we will be better able to "share in the sum of our available knowledge".
Thanks,
      GerardM

by Gerard Meijssen (noreply@blogger.com) at January 15, 2017 12:20 PM

Is #Wikipedia the new #Britannica?

At the time the Britannica was best of breed. It was the encyclopaedia to turn to. Then Wikipedia happened and obviously it was not good enough, people were not convinced. When you read the discussions why Wikipedia was not good enough, there was however no actual discussion. The points of view were clear, they had consequences and it was only when research was done that Wikipedia became respectable. Its quality was equally good and it was more informative and included more subjects. The arguments did not go away the point of view became irrelevant. People and particularly students use Wikipedia.

Today Wikipedia is said to be best of breed. It is where you find encyclopaedic information and as Google rates Wikipedia content highly it is seen and used a lot by many people.

The need for information is changing. We have recently experienced a lot of misinformation and the need to know what is factually correct has never been more important. What has become clear is that arguments and information alone is not what sways people. So the question is where does that leave Wikipedia?

The question we have to ask is, what does it take to convince people, to be open minded. What to do when people expect a neutral point of view but the facts are unambiguous in one direction? What if the language used is not understood? What are the issues of Wikipedia, what are its weaknesses and what are its strength?

So far quality is considered to be found in sources, in the reputation of its writers. When this is not what convinces, how do we show our quality or better, how do we get people to reconsider and see the other point of view?
Thanks,
      GerardM

by Gerard Meijssen (noreply@blogger.com) at January 15, 2017 08:04 AM

January 13, 2017

Weekly OSM

weeklyOSM 338

01/03/2017-01/09/2017

Logo

New routing possiblities for wheelchairs 1 |

Mapping

  • Regio OSM, a completeness checker for addresses now checks 1702 communities and many cities in Germany, one of the 11 countries where the tool can be used.
  • An interesting combination of OpenData and OSM to improve the OSM data of schools in the UK. One drawback is that a direct link exists only to iD. If iD is open, however, you can open JOSM from there. 😉
  • Pascal Neis describes in a blog post his tools for QA in OSM
  • Arun Ganesh shows the significance of the wikidata=* tag by an example of the North Indian city of Manali. In his contribution, he also points to possibilities for improving OSM with further information via Wikidata, Wikimedia Commons, WikiVoyage and also points out information about using Wikidata with Mapbox tools.
  • The OSM Operations team announced a new feature on the main map page: Public GPS-Tracks.
  • Tom Pfeifer asks, how the quite modern form of cooperation, sharing of workspaces and equipment in the form of the coworking space should be tagged.
  • Chris uses AutoHotKey (Windows) and JOSM to optimize his mapping experience. He demonstrates this in a video, while tracing building outlines.
  • User rorym shows why it is useful not to make mechanical edits but “look at the area and look for other mistakes!”

Community

OpenStreetMap Foundation

Events

  • Klaus Torsky reports (de) (automatic translation) on the last FOSS4G in Germany. He links to an interview (en) with Till Adams the brain behind the organisation of FOSS4G in Bonn.
  • Frederik Ramm invites people for the February hack weekend happening in Karlsruhe.
  • A mapping party took place in Tombuctu took place from 7th to 9th of January.

Humanitarian OSM

  • Kizito Makoye reports on the initiative of the Dar es Salaam City Administration, Tanzania, in the floodplains of poor regions such as Tandal drones. The Ramani-Huria project supports this by implementing the acquired data in OSM-based maps. This and other measures will improve the living conditions and the infrastructure in the slum areas.

Maps

switch2OSM

  • Uber uses OpenStreetMap. Grant Slater expects Uber to contribute to OSM data.

Software

  • The Wikimedia help explains how to use the Wikidata ID to display the outline of OSM Objects in Wikimedia maps.
  • User Daniel writes a diary on how the latest release of Open Source Routing Machine (version 5.5) has made it easier to set up our own routing machine and shares some documentation related to it.

Releases

OpenStreetMap Routing Machine released version 5.5 which comes with some huge enhancements in guidance, tags, API and infrastructure.

Software Version Release date Comment
OSRM 5.5.0 2016-12-16 Navigation, tag interpretation and the API infrastructure have been improved.
JOSM 11427 2016-12-31 No info.
Mapillary Android * 3.14 2017-01-04 Much faster GPX fix.
Mapbox GL JS v0.30.0 2017-01-05 No info.
Naviki Android;* 3.52.4 2017-01-05 Accuracy improved.
Mapillary iOS * 4.5.11 2017-01-06 Improved onboarding.
SQLite 1.16.2 2017-01-06 Four fixes.

Provided by the OSM Software Watchlist.

(*) unfree software. See: freesoftware.

Did you know …

  • … the daily updated extracts by Netzwolf?
  • … your next holiday destination? If yes, then the map with georeferenced images in Wikimedia Commons is ideal to inform oneself in advance.
  • … the GPS navigator uNav for Ubuntu smartphones? This OSM-based Navi-App is now available in version 0.64 for the Ubuntu Mobile Operating System (OTA-14).

OSM in the media

  • Tracy Staedter (Seeker) explained the maps of Geoff Boeing. He calls his visualization tool OSMnx (OSM + NetworkX). The tool can create the physical characteristics of the streets of each city in a black & white grid, showing impressive historical city developments. Boeing says, “The cards help change opinions by demonstrating to people that the density of a city is not necessarily bad.”

Other “geo” things

  • The Open Traffic Partnership (OTP) is an initiative in Manila, Philippines which aims to make use of anonymized GPS data to analyze traffic congestion. The partnership has led to an open source platform – OSM is represented by Mapzen – that enables developing countries to record and analyze traffic patterns. Alyssa Wright, President of the US OpenStreetMap Foundation, said: “The partnership seeks to improve the efficiency and effectiveness of global transport use and supply through open data and capacity expansion.”
  • This is how the Mercator Projection distorts the poles.
  • Treepedia, developed by MIT’s Senseable City Lab and World Economic Forum, provides a visualization of tree cover in 12 major cities including New York, Los Angeles and Paris.

Upcoming Events

Where What When Country
Lyon Mapathon Missing Maps pour Ouahigouya 01/16/2017 france
Brussels Brussels Meetup 01/16/2017 belgium
Essen Stammtisch 01/16/2017 germany
Grenoble Rencontre groupe local 01/16/2017 france
Manila 【MapAm❤re】OSM Workshop Series 7/8, San Juan 01/16/2017 philippines
Augsburg Augsburger Stammtisch 01/17/2017 germany
Cologne/Bonn Bonner Stammtisch 01/17/2017 germany
Scotland Edinburgh 01/17/2017 uk
Lüneburg Mappertreffen Lüneburg 01/17/2017 germany
Viersen OSM Stammtisch Viersen 01/17/2017 germany
Osnabrück Stammtisch / OSM Treffen 01/18/2017 germany
Karlsruhe Stammtisch 01/18/2017 germany
Osaka もくもくマッピング! #02 01/18/2017 japan
Leoben Stammtisch Obersteiermark 01/19/2017 austria
Urspring Stammtisch Ulmer Alb 01/19/2017 germany
Tokyo 東京!街歩き!マッピングパーティ:第4回 根津神社 01/21/2017 japan
Manila 【MapAm❤re】OSM Workshop Series 8/8, San Juan 01/23/2017 philippines
Bremen Bremer Mappertreffen 01/23/2017 germany
Graz Stammtisch Graz 01/23/2017 austria
Brussels FOSDEM 2017 02/04/2017-02/05/2017 belgium
Genoa OSMit2017 02/08/2017-02/11/2017 italy
Passau FOSSGIS 2017 03/22/2017-03/25/2017 germany
Avignon State of the Map France 2017 06/02/2017-06/04/2017 france
Aizu-wakamatsu Shi State of the Map 2017 08/18/2017-08/20/2017 japan
Buenos Aires FOSS4G+SOTM Argentina 2017 10/23/2017-10/28/2017 argentina

Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please check your event in our public calendar preview and correct it, where appropiate..

This weeklyOSM was produced by Peda, Polyglot, Rogehm, SeleneYang, SomeoneElse, SrrReal, TheFive, YoViajo, derFred, jinalfoflia, keithonearth, wambacher.

by weeklyteam at January 13, 2017 07:00 PM

Wikimedia Tech Blog

Importing JSON into Hadoop via Kafka

Photo by Eric Kilby, CC BY-SA 2.0.

Photo by Eric Kilby, CC BY-SA 2.0.

JSON is…not binary

JSON is awesome.  It is both machine and human readable.  It is concise (at least compared to XML), and is even more concise when represented as YAML. It is well supported in many programming languages.  JSON is text, and works with standard CLI tools.

JSON sucks.  It is verbose.  Every value has a key in every single record.  It is schema-less and fragile. If a JSON producer changes a field name, all downstream consumer code has to be ready.  It is slow.  Languages have to convert JSON strings to binary representations and back too often.

JSON is ubiquitous.  Because it is so easy for developers to work with, it is one of the most common data serialization formats used on the web [citation needed!].  Almost any web based organization out there likely has to work with JSON in some capacity.

Kafka was originally developed by LinkedIn, and is now an open source Apache project with strong support from Confluent.   Both of these organizations prefer to work with strongly typed and schema-ed data.  Their serialization format of choice is Avro.  Organizations like this have tight control over their data formats, as it rarely escapes outside of their internal networks.  There are very good reasons Confluent is pushing Avro instead of JSON, but for many, like Wikimedia, it is impractical to transport data in a binary format that is unparseable without extra information (schemas) or special tools.

The Wikimedia Foundation lives openly on the web and has a commitment to work with volunteer open source contributors.  Mediawiki is used by people of varying technical skill levels in different operating environments.  Forcing volunteers and Wikimedia engineering teams to work with serialization formats other than JSON is just mean!  Wikimedia wants our software and data to be easy.

For better or worse, we are stuck with JSON.  This makes many things easy, but big data processing in Hadoop is not one of them.  Hadoop runs in the JVM, and it works more smoothly if its data is schema-ed and strongly typed.  Hive tables are schema-ed and strongly typed.  They can be mapped onto JSON HDFS files using a JSON SerDe, but if the underlying data changes because someone renames a field, certain queries on that Hive table will break.  Wikimedia imports the latest JSON data from Kafka into HDFS every 10 minutes, and then does a batch transform and load process on each fully imported hour.

Camus, Gobblin, Connect

LinkedIn created Camus to import Avro data from Kafka into HDFS.   JSON support was added by Wikimedia.  Camus’ shining feature is the ability to write data into HDFS directory hierarchies based on configurable time bucketing.  You specify the granularity of the bucket and which field in your data should be used as the event timestamp.

However, both LinkedIn and Confluent have dropped support for Camus.  It is an end-of-life piece of software.  Posited as replacements, LinkedIn has developed Gobblin, and Kafka ships with Kafka Connect.

Gobblin is a generic HDFS import tool.  It should be used if you want to import data from a variety of sources into HDFS.  It does not support timestamp bucketed JSON data out of the box.  You’ll have to provide your own implementation to do this.

Kafka Connect is generic Kafka import and export tool, and has a HDFS Connector that helps get data into HDFS.  It has limited JSON support, and requires that your JSON data conform to a Kafka Connect specific envelope.  If you don’t want to reformat your JSON data to fit this envelope, you’ll have difficulty using Kafka Connect.

That leaves us with Camus.  For years, Wikimedia has successfully been using Camus to import JSON data from Kafka into HDFS.  Unlike the newer solutions, Camus does not do streaming imports, so it must be scheduled in batches. We’d like to catch up with more current solutions and use something like Kafka Connect, but until JSON is better supported we will continue to use Camus.

So, how is it done?  This question appears often enough on Kafka related mailing lists, that we decided to write this blog post.

Camus with JSON

Camus needs to be told how to read messages from Kafka, and in what format they should be written to HDFS.  JSON should be serialized and produced to Kafka as UTF-8 byte strings, one JSON object per Kafka message.  We want this data to be written as is with no transformation directly to HDFS.  We’d also like to compress this data in HDFS, and still have it be useable by MapReduce.  Hadoop’s SequenceFile format will do nicely.  (If we didn’t care about compression, we could use the StringRecordWriterProvider to write the JSON records \n delimited directly to HDFS text files.)

We’ll now create a camus.properties file that does what we need.

First, we need to tell Camus where to write our data, and where to keep execution metadata about this Camus job.  Camus uses HDFS to store Kafka offsets so that it can keep track of topic partition offsets from which to start during each run:

# Final top-level HDFS data output directory. A sub-directory
# will be dynamically created for each consumed topic.
etl.destination.path=hdfs:///path/to/output/directory

# HDFS location where you want to keep execution files,
# i.e. offsets, error logs, and count files.
etl.execution.base.path=hdfs:///path/to/camus/metadata

# Where completed Camus job output directories are kept,
# usually a sub-dir in the etl.execution.base.path
etl.execution.history.path=hdfs:///path/to/camus/metadata/history

Next, we’ll specify how Camus should read in messages from Kafka, and how it should look for event timestamps in each message.  We’ll use the JsonStringMessageDecoder, which expects each message to be  UTF-8 byte JSON string.  It will deserialize each message using the Gson JSON parser, and look for a configured timestamp field.

# Use the JsonStringMessageDecoder to deserialize JSON messages from Kafka.
camus.message.decoder.class=com.linkedin.camus.etl.kafka.coders.JsonStringMessageDecoder


camus.message.timestamp.field specifies which field in the JSON object should be used as the event timestamp, and camus.message.timestamp.format specifies the timestamp format of that field.  Timestamp interpolation is handled by Java’s SimpleDateFormat, so you should set camus.message.timestamp.format to something that SimpleDateFormat understands, unless your timestamp is already an integer UNIX epoch timestamp.  If it is, you should use ‘unix_seconds’ or ‘unix_milliseconds’, depending on the granularity of your UNIX epoch timestamp.

Wikimedia maintains a slight fork of JSONStringMessageDecoder that makes the camus.message.timestamp.field slightly more flexible.  In our fork, you can specify sub-objects using dotted notation, e.g. camus.message.timestamp.field=sub.object.timestamp. If you don’t need this feature, then don’t bother with our fork.

Here are a couple of examples:

Timestamp field is ‘dt’, format is an ISO-8601 string:

# Specify which field in the JSON object will contain our event timestamp.
camus.message.timestamp.field=dt

# Timestamp values look like 2017-01-01T15:40:17
camus.message.timestamp.format=yyyy-MM-dd'T'HH:mm:ss


Timestamp field is ‘meta.sub.object.ts’, format is a UNIX epoch timestamp integer in milliseconds:

# Specify which field in the JSON object will contain our event timestamp.
# E.g. { “meta”: { “sub”: { “object”: { “ts”: 1482871710123 } } } }
# Note that this will only work with Wikimedia’s fork of Camus.
camus.message.timestamp.field=meta.sub.object.ts

# Timestamp values are in milliseconds since UNIX epoch.
camus.message.timestamp.format=unix_milliseconds

If the timestamp cannot be read out of the JSON object, JsonStringMessageDecoder will log a warning and fall back to using System.currentTimeMillis().

Now that we’ve told Camus how to read from Kafka, we need to tell it how to write to HDFS. etl.output.file.time.partition.mins is important. It tells Camus the time bucketing granularity to use.  Setting this to 60 minutes will cause Camus to write files into hourly bucket directories, e.g. 2017/01/01/15. Setting it to 1440 minutes will write daily buckets, etc.

# Store output into hourly buckets.
etl.output.file.time.partition.mins=60

Use UTC as the default timezone.
etl.default.timezone=UTC

# Delimit records by newline.  This is important for MapReduce to be able to split JSON records.
etl.output.record.delimiter=\n


Use SequenceFileRecordWriterProvider if you want to compress data.  To do so, set mapreduce.output.fileoutputformat.compress.codec=Snappy (or another splittable compression codec) either in your mapred-site.xml, or in this camus.properties file.

# SequenceFileRecordWriterProvider writes the records as Hadoop Sequence files
# so that they can be split even if they are compressed.
etl.record.writer.provider.class=com.linkedin.camus.etl.kafka.common.SequenceFileRecordWriterProvider

# Use Snappy to compress output records.
mapreduce.output.fileoutputformat.compress.codec=SnappyCodec


Finally, some basic Camus configs are needed:

# Replace this with your list of Kafka brokers from which to bootstrap.
kafka.brokers=kafka1001:9092,kafka1002:9092,kafka1003:9092

# These are the kafka topics camus brings to HDFS.
# Replace this with the topics you want to pull,
# or alternatively use kafka.blacklist.topics.
kafka.whitelist.topics=topicA,topicB,topicC

# If whitelist has values, only whitelisted topic are pulled.
kafka.blacklist.topics=

There are various other camus properties you can tweak as well.  You can see some of the ones Wikimedia uses here.

Once this camus.properties file is configured, we can launch a Camus Hadoop job to import from Kafka.

hadoop jar camus.jar com.linkedin.camus.etl.kafka.CamusJob  -P /path/to/camus.properties -Dcamus.job.name="my-camus-job"


The first time this job runs, it will import as much data from Kafka as it can, and write its finishing topic-partition offsets to HDFS.  The next time you launch a Camus job with this with the same camus.properties file, it will read offsets from the configured etl.execution.base.path HDFS directory and start consuming from Kafka at those offsets.  Wikimedia schedules regular Camus Jobs using boring ol’ cron, but you could use whatever new fangled job scheduler you like.

After several Camus runs, you should see time bucketed directories containing Snappy compressed SequenceFiles of JSON data in HDFS stored in etl.destination.path, e.g. hdfs:///path/to/output/directory/topicA/2017/01/01/15/.  You could access this data with custom MapReduce or Spark jobs, or use Hive’s org.apache.hive.hcatalog.data.JsonSerDe and Hadoop’s org.apache.hadoop.mapred.SequenceFileInputFormat.  Wikimedia creates an external Hive table doing just that, and then batch processes this data into a more refined and useful schema stored as Parquet for faster querying.

Here’s the camus.properties file in full:

#
# Camus properties file for consuming Kafka topics into HDFS.
#

# Final top-level HDFS data output directory. A sub-directory
# will be dynamically created for each consumed topic.
etl.destination.path=hdfs:///path/to/output/directory

# HDFS location where you want to keep execution files,
# i.e. offsets, error logs, and count files.
etl.execution.base.path=hdfs:///path/to/camus/metadata

# Where completed Camus job output directories are kept,
# usually a sub-dir in the etl.execution.base.path
etl.execution.history.path=hdfs:///path/to/camus/metadata/history

# Use the JsonStringMessageDecoder to deserialize JSON messages from Kafka.
camus.message.decoder.class=com.linkedin.camus.etl.kafka.coders.JsonStringMessageDecoder

# Specify which field in the JSON object will contain our event timestamp.
camus.message.timestamp.field=dt

# Timestamp values look like 2017-01-01T15:40:17
camus.message.timestamp.format=yyyy-MM-dd'T'HH:mm:ss

# Store output into hourly buckets.
etl.output.file.time.partition.mins=60

# Use UTC as the default timezone.
etl.default.timezone=UTC

# Delimit records by newline.  This is important for MapReduce to be able to split JSON records.
etl.output.record.delimiter=\n

# Concrete implementation of the Decoder class to use
camus.message.decoder.class=com.linkedin.camus.etl.kafka.coders.JsonStringMessageDecoder

# SequenceFileRecordWriterProvider writes the records as Hadoop Sequence files
# so that they can be split even if they are compressed.
etl.record.writer.provider.class=com.linkedin.camus.etl.kafka.common.SequenceFileRecordWriterProvider

# Use Snappy to compress output records.
mapreduce.output.fileoutputformat.compress.codec=SnappyCodec

# Max hadoop tasks to use, each task can pull multiple topic partitions.
mapred.map.tasks=24

# Connection parameters.
# Replace this with your list of Kafka brokers from which to bootstrap.
kafka.brokers=kafka1001:9092,kafka1002:9092,kafka1003:9092

# These are the kafka topics camus brings to HDFS.
# Replace this with the topics you want to pull,
# or alternatively use kafka.blacklist.topics.
kafka.whitelist.topics=topicA,topicB,topicC

# If whitelist has values, only whitelisted topic are pulled.
kafka.blacklist.topics=

# max historical time that will be pulled from each partition based on event timestamp
#  Note:  max.pull.hrs doesn't quite seem to be respected here.
#  This will take some more sleuthing to figure out why, but in our case
#  here it’s ok, as we hope to never be this far behind in Kafka messages to
#  consume.
kafka.max.pull.hrs=168

# events with a timestamp older than this will be discarded.
kafka.max.historical.days=7

# Max minutes for each mapper to pull messages (-1 means no limit)
# Let each mapper run for no more than 9 minutes.
# Camus creates hourly directories, and we don't want a single
# long running mapper keep other Camus jobs from being launched.
# We run Camus every 10 minutes, so limiting it to 9 should keep
# runs fresh.
kafka.max.pull.minutes.per.task=9

# Name of the client as seen by kafka
kafka.client.name=camus-00

# Fetch Request Parameters
#kafka.fetch.buffer.size=
#kafka.fetch.request.correlationid=
#kafka.fetch.request.max.wait=
#kafka.fetch.request.min.bytes=

kafka.client.buffer.size=20971520
kafka.client.so.timeout=60000

# Controls the submitting of counts to Kafka
# Default value set to true
post.tracking.counts.to.kafka=false

# Stops the mapper from getting inundated with Decoder exceptions for the same topic
# Default value is set to 10
max.decoder.exceptions.to.print=5

log4j.configuration=false

##########################
# Everything below this point can be ignored for the time being,
# will provide more documentation down the road. (LinkedIn/Camus never did! :/ )
##########################

etl.run.tracking.post=false
#kafka.monitor.tier=
kafka.monitor.time.granularity=10

etl.hourly=hourly
etl.daily=daily
etl.ignore.schema.errors=false

etl.keep.count.files=false
#etl.counts.path=
etl.execution.history.max.of.quota=.8

Nuria Ruiz, Lead Software Engineer (Manager)
Andrew Otto, Senior Operations Engineer
Wikimedia Foundation

by Nuria Ruiz and Andrew Otto at January 13, 2017 06:05 PM

January 12, 2017

Wikimedia Foundation

Inspire Campaign’s final report shows achievements in gender diversity and representation within the Wikimedia movement

Photo by Flixtey, CC BY-SA 4.0.

Photo by Flixtey, CC BY-SA 4.0.

In March 2015, the Wikimedia Foundation launched its first Inspire Campaign with the goal of improving gender diversity within our movement. The campaign invited idea on how we as a movement could improve the representation of women within our projects, both in its content and as contributors.

The response and effort from volunteers has been remarkable.  Across the ideas that were funded, there was a diversity of approaches such as:

  • An edit-a-thon series in Ghana to develop content on notable Ghanaian women
  • A tool to track how the gender gap is changing on Wikipedia projects
  • A pilot on mentorship-driven editing between high school and college students

These and other initiatives have resulted in concrete and surprising outcomes, such as:

  • Creating or improving over 12,000 articles, including 126 new biographies on women,
  • Engaging women as project leaders, volunteers, experienced editors and new editors,
  • Correcting gender-related biases within Wikipedia articles.

As this campaign draws to a close we’d like to celebrate the work of our grant-funded projects: the leaders, volunteers, and participants who contributed (many whom were women), and the achievements that have moved us forward in addressing this topic.

Protecting user privacy through prevention

The internet can be a hostile place, and in this age of information, we have become more cautious about what we reveal about ourselves to others online. You can imagine then that in a campaign designed to attract women, privacy became a central concern for both program leaders and participants.

Program leaders were sensitive to this challenge, and cultivated spaces where women could contribute without compromising their need for privacy. For instance, we asked program leaders to report the number of women who attended their events. Many program leaders pushed back, citing the need to protect privacy. They raised two good points: that some editors choose not to disclose their gender online as a safety measure, and that by even associating their name or username with a public event designed for women, they could inadvertently compromise their privacy. Consequently, the total number of women who participated in these programs was underreported.

In spite of this conflict, it was clear that women were majority participants across funded projects.  In projects hosting multiple events for training or improving project content, such as those hosted by AfroCROWD in New York, the Linguistics Editathon series in Canada and the U.S., and WikiNeedsGirls in Ghana, well over 50% of participants were women across their own events.  Furthermore, in the mentorship groups formed through the Women of Wikipedia (WOW!) Editing Group, all 34 participants were women.  These women showed strong commitment as a result of the program, and in a follow-up survey, many of them wanted to continue contributing together with their mentorship group beyond the program.

Who is missing on Wikipedia?

There is an impressive amount of information on Wikipedia today: over 43 million articles across 284 languages. In English Wikipedia alone, there are over 5 million articles today. A fair number of these articles are dedicated to people: biographies about notable individuals amount to over 6.5 million articles, and this number continues to increase every year.

It can be difficult to see what is missing within this sea of information, and biographies are one well-defined area where the question of “Who is missing?” is particularly pertinent. Today,  biographies about women amount to just over 1 million articles across all languages. One million biographies out of 6 million biographies, or 16% of biographies in total. One million articles out of 43 million articles, or 2% of Wikipedia content in total (whereas 12% of Wikipedia content is biographies of men). This is one way to understand how women are underrepresented on Wikipedia today, and we know even less about the extent of underrepresentation for other non-male gender identities.

One Inspire grant sought to address the visibility of this issue through the development of a tool: Wikipedia Human Gender Indicators (WHGI). WHGI uses Wikidata to track the gender of newly-created biographies by language Wikipedia (and other parameters, such as country or date of birth of the individual), and provides reports in the form of weekly snapshots.

The project has seen solid evidence of usage since its completion. In February 2016 alone the site had approximately 4,000 pageviews and over 1,000 downloads of available datasets. The team also received important feedback from users on the tool: participants in WikiProject Women in Red—a volunteer project that has created more than 44,000 biographies about women—characterized the project as valuable to their work, as it helps them identify notable women to write about.

The first step to addressing a problem is to identify it. WHGI helps us to do that in a concrete, data-driven way.

Why does “Queen-Empress” redirect to “King-Emperor”?

Addressing the gender gap goes beyond addressing gaps in content. It includes igniting conversations and addressing bias in content, bias might be more subtle or even unseen to casual readers.

Just for the record is an ongoing Gender Gap Inspire grant that focuses on these more subtle forms of content bias on English Wikipedia. One of their events analyzed the process of Wikipedia editing to investigate the possibilities and challenges of gender-neutral writing.

They specifically looked at how pages are automatically re-directed to others (e.g. “Heroine” automatically re-directs to “Hero”) and the direction of those redirects: female to gender-neutral, male to gender-neutral, female to male, male to female. An analysis of almost 200 redirects on English Wikipedia showed that ~100 direct from male/female terms to gender-neutral terms, and ~100 from female to male terms.  For example, “Ballerina” redirects to “Ballet dancer” and “Queen-Empress” redirects to “King-Emperor”.

These redirections may seem like minor technical issues, but they result in an encyclopedia that is rife with systemic bias. Raising awareness of these types of bias, starting discussions on and off wiki, and directly editing language were some of the main approaches Inspire grantees took to address bias.

Learn more!

These and other outcomes can be read in more detail in our full report. We encourage you to read on, learn more about what our grantees achieved, and join us in celebrating these project leaders and their participants! You can also learn more about Community Resources’ first experiment with proactive grantmaking and what we learned from this iteration.

Sati Houston, Grants Impact Strategist
Chris Schilling, Community Organizer
Wikimedia Foundation

by Sati Houston and Chris Schilling at January 12, 2017 06:26 PM

Gerard Meijssen

#Wikidata - Clare Hollingworth and #sources

Mrs Hollingworth was a famous journalist. She recently passed away and as I often do, I added content to the Wikidata item of the person involved.

Information like awards are something I often add and it was easy enough to establish that Mrs Hollingworth received a Hannen Swaffer Award in 1962. I found a source for the award and I had my confirmation.

The Wikipedia article has it that "She won the James Cameron Award for Journalism (1994)." There is however no source and I can find a James Cameron lecture and award but Mrs Hollingworth is not noted as receiving this award; it is Ed Vulliamy.

People often say that Wikipedia is not a source. The problem is that for Wikidata it often is. Particularly in the early days of Wikidata massive amounts of data were lifted off the Wikipedias and it is why there is so much initial content to build upon.

When you work from sources, you find an issue with the Wikipedia content. My source does not know about Mr Paul Foot either. Mrs Lyse Doucet does have a Wikipedia article but she is not linked in the Wikipedia list.

To truly get to the bottom of issues like these takes research and, I am willing nor able to do this for each and every subject that I touch. It is impossible to work on all the issues that exist because of everything that I did. I have over 2,1 million edits on Wikidata. What I do is make a start and I am happy to be condemned for the work that I did, work that does have issues but they are all there to be solved someday.
Thanks,
      GerardM

by Gerard Meijssen (noreply@blogger.com) at January 12, 2017 08:54 AM

Resident Mario

Wikimedia Foundation

Writing Ghana into Wikipedia: Felix Nartey

File:Felix Nartey - clip.webm

Video by Victor Grigas, CC BY-SA 3.0.

The chaos of the Second World War touched all corners of the world, and the Gold Coast (now Ghana) was no exception. The colony’s resources were marshaled for the war effort, and the headquarters of Britain’s West Africa Command were located in Accra. Nearly 200,000 soldiers from the area would eventually sign up to serve in various branches of the armed forces.

In support of these efforts, No. 37 General Hospital was established in Accra to provide medical care for injured Allied troops. The hospital’s name was changed to 37 Military Hospital of the Gold Coast shortly before the colony gained independence. It is now open to the public and serves as a teaching hospital for graduate and medical students.

Despite its impact on healthcare in Accra, there was no Wikipedia article about the hospital until February 2014, when Wikipedian Felix Nartey created it.

Photo by Ruby Mizrahi/Wikimedia Foundation, CC BY-SA 3.0.

Photo by Ruby Mizrahi/Wikimedia Foundation, CC BY-SA 3.0.

“I was walking with a friend from an event when we just thought there was no picture of this place on Wikimedia Commons, the free media repository,” says Nartey, who also served as the former community manager for the Wikimedia Ghana user group. “My friend took a picture with my tablet, and so did I, then we headed home.”

Nartey wanted to use his photo in the hospital’s article on Wikipedia, but was shocked to find no article for it: the search results page told him that “no results” matched his search term, but that he could create a page about it.

He created a short “stub” article that night. Only a few weeks later, two other editors expanded it into an informative entry that pleasantly surprised him the next time he visited the article.

Nartey believes that knowledge sharing activities like editing Wikipedia have an effect on people’s lives, but at the present time there are significant content gaps on the site. There are fewer articles about topics on the African continent than there are about Europe or North America, and those that do exist tend to be shorter and less detailed.

Increasing the diversity of contributions on Wikipedia helps achieve higher-quality content and combat systemic bias, and that’s why many people—including Nartey—are trying to figure out the reasons behind these gaps and are putting forth great effort to bridge them.

In Ghana, a current paucity of contributors could “be partly blamed on the current growing unemployment situation, which is certainly an impediment to people’s willingness to do things for free,” says Nartey. But even in better times, while “the internet connection is reasonable in urban areas, it’s expensive, so people tend to go without it or place it last on their list of priorities, which, of course, affects contributions to Wikipedia.”

But when Nartey and other volunteers start editing Wikipedia, the positive energy created works as an incentive for them to maintain their contributions. He explains:

The only way you can have an impact in this world is to always leave something behind from where you came from and give back to society, whatever that means for you… That is the feeling I get whenever I edit Wikipedia. And I feel like it’s the joy of every Wikipedian to really see your impact.

 
In addition to editing, Nartey leads several initiatives in Ghana where he promotes the importance of editing Wikipedia. Some examples include GLAM activities, the Wikipedia Education Program, the Wikipedia Library, etc. In these activities, Nartey speaks with students, cultural organization officials, and Wikipedians to find the best ways to encourage people from his country to contribute.

“I mostly teach people about the essence of wanting to contribute to Wikipedia,” Nartey explains. “Information itself is useless until it’s shared with the whole world. And the only way you can do that is through a medium like Wikipedia. You need to [highlight] that essence in the minds of people and inspire them to contribute to Wikipedia. It’s easy for you to tap in and just tell someone you need to do this because Wikipedia is already creating that impact.”

Interview by Ruby Mizrahi, Interviewer
Outline by Tomasz Kozlowski, Blog Writer, Wikimedia Foundation
Profile by Samir Elsharbaty, Digital Content Intern, Wikimedia Foundation

by Ruby Mizrahi, Tomasz Kozlowski and Samir Elsharbaty at January 12, 2017 12:16 AM

January 10, 2017

Wikimedia Foundation

Wikimedia Foundation joins EFF and others encouraging the California Court of Appeal to protect online free speech

Photo by Coolcaesar, CC BY-SA 3.0.

Photo by Coolcaesar, CC BY-SA 3.0.

On Tuesday, January 10, 2016, the Wikimedia Foundation joined an amicus brief filed by the Electronic Frontier Foundation, Eric Goldman, Rebecca Tushnet, the Organization for Transformative Works, Engine, GitHub, Medium, Snap, and Yelp encouraging the California Court of Appeal to review the ruling of the trial court in Cross v. Facebook. The case involves important principles of freedom of speech and intermediary liability immunity (which shields platforms like Wikimedia, Twitter, and Facebook from liability for content posted by users), both essential to the continued health of the Wikimedia projects.

The case began when users on Facebook created a Facebook page which criticized the plaintiff, a musician, based on his business practices. The plaintiff (along with the label and marketing companies that represented him) brought suit against Facebook with a number of claims including misuse of publicity rights. The trial court denied Facebook’s anti-SLAPP motion  and found that the plaintiff could assert a right of publicity claim against Facebook. Worryingly, under the trial court’s reasoning, such a claim arises for any speech on social media that is: (i) about a real person; and (ii) published on a website that includes advertisements. In other words, a platform that carries advertising can be held liable for the speech of its users merely because this speech relates to a real person. The court’s reasoning is not consistent with well-established rules for limits to online speech.

Facebook filed an appeal against this ruling before the California Court of Appeal where the case is currently pending. In our amicus brief, we encourage the Court of Appeal to review the lower court’s decision by pointing to the legal and policy consequences of the lower court’s ruling.

We and our co-signers argue that the court reached this absurd result through two major errors in its reasoning. First, the court did not follow the well-established First Amendment limits to the right of publicity. Second, the court did not correctly apply the immunity granted in CDA Section 230. Congress enacted Section 230 to encourage the development of the internet and other interactive media by shielding intermediaries not only from liability for actionable content created or posted by users, but also from the cost and uncertainty associated with litigation itself. This framework is essential to the success of the Wikimedia projects and many other major websites across the internet that host user-generated content. If allowed to stand, a social media site such as Facebook, Twitter, or Tumblr can be sued for any post about a real person made by a user, ultimately undermining congressional intent.

We hope that the California Court of Appeal will protect the First Amendment right to comment on and criticize public figures. We also urge the court will uphold the immunity granted under US law to intermediaries that enables robust free speech and has become a fundamental pillar in the architecture of the internet.

Tarun Krishnakumar, Legal Fellow
Stephen LaPorte, Senior Legal Counsel
Wikimedia Foundation

Special thanks to the Electronic Frontier Foundation for drafting this amicus brief, and for Aeryn Palmer for leading the Wikimedia Foundation’s contribution

by Tarun Krishnakumar and Stephen LaPorte at January 10, 2017 11:46 PM

This month in GLAM

This Month in GLAM: December 2016

by Admin at January 10, 2017 11:06 AM

Jeroen De Dauw

PHP 7.1 is awesome

PHP 7.1 has been released, bringing some features I was eagerly anticipating and some surprises that had gone under my radar.

New iterable pseudo-type

This is the feature I’m most exited about, perhaps because I had no clue it was in the works. In short, iterable allows for type hinting in functions that just loop though their parameters value without restricting the type to either array or Traversable, or not having any type hint at all. This partially solves one of the points I raised in my Missing in PHP 7 series post Collections.

Nullable types

This feature I also already addressed in Missing in PHP 7 Nullable return types. What somehow escaped my attention is that PHP 7.1 comes not just with nullable return types, but also new syntax for nullable parameters.

Intent revealing

Other new features that I’m excited about are the Void Return Type and Class Constant Visibility Modifiers. Both of these help with revealing the authors intent, reduce the need for comments and make it easier to catch bugs.

A big thank you to the PHP contributors that made these things possible and keep pushing the language forwards.

For a full list of new features, see the PHP 7.1 release announcement.

by Jeroen at January 10, 2017 10:23 AM

January 09, 2017

Wikimedia Tech Blog

Wikimedia Foundation receives $3 million grant from Alfred P. Sloan Foundation to make freely licensed images accessible and reusable across the web

Photo by Ajepbah, CC BY-SA 3.0 DE.

Photo by Ajepbah, CC BY-SA 3.0 DE.

The Wikimedia Foundation, with a US$3,015,000 grant from the Alfred P. Sloan Foundation, is leading an effort to enable structured data on Wikimedia Commons, the world’s largest repository of freely licensed educational media. The project will support contributors’ efforts to integrate Commons’ media more readily into the rest of the web, making it easier for people and institutions to share, access, and reuse high-quality and free educational content.

Wikimedia Commons includes more than 35 million freely licensed media files—photos, audio, and video—ranging from stunning photos of geographic landscapes to donations from institutions with substantial media collections, like the Smithsonian, NASA, and the British Library. Like Wikipedia, Wikimedia Commons has become a “go-to” source on the internet—used by everyone from casual browsers to major media outlets to educational institutions, and easily discoverable through search engines. It continues to rapidly grow every year: Volunteer contributors added roughly six million new files in 2016.

Today, the rich images and media in Wikimedia Commons are described only by casual notation, making it difficult to fully explore and use this remarkable resource. The generous contribution from the Sloan Foundation will enable the Wikimedia Foundation to connect Wikimedia Commons with Wikidata, the central storage for structured data within the  Wikimedia projects. Wikidata will empower Wikimedia volunteers to transform Wikimedia Commons into a rich, easily-searchable, and machine-readable resource for the world.

Over three years, the Wikimedia Foundation will develop infrastructure, tools, and community support to enable the work of contributors, who have long requested a way to add more precise, multilingual and reusable data to media files. This will support new uses of Commons’ media, from richer and more dynamic illustration of articles on Wikipedia, to helping new users, like museums, remix the media in their own applications. Structured data will also be compatible with and support Wikimedia Commons’ partnership communities, including “GLAM” institutions (galleries, libraries, archives, museums) that have donated thousands of images in recent years. With the introduction of structured data on Commons, GLAM institutions will be able to more easily upload media and integrate existing metadata into Wikimedia Commons and share their collections with the rest of the web.

“At Wikimedia, we believe the world should have access to the sum of all knowledge, from encyclopedia articles to archival images,” said Katherine Maher, Executive Director of the Wikimedia Foundation. She continued:

Wikimedia Commons is a vast library of freely licensed photography, video, and audio that illustrates knowledge and the internet itself. With this project, and in partnership with the Wikimedia community of volunteer contributors, we hope to expand the free and open internet by supporting new applications of the millions of media files on Wikimedia Commons. We are grateful for the generous support of the Sloan Foundation, our longtime funders, in this important work.

 
“We are delighted to continue our near-decade-long support of Wikimedia with this potentially game-changing grant to unlock millions of media files—the most common form of modern communication and popular education, growing exponentially each year—into a universal format that can be read and reused not just by Wikipedia’s hundreds of millions of readers in nearly 300 languages but by educational, cultural and scientific organizations and by anyone doing a Google search or using the web,” said Doron Weber, Vice President and Program Director at the Alfred P. Sloan Foundation.

At a time when the World Wide Web, like the rest of the world, is beset by increasing polarization, commercialization, and narrowing, Wikipedia continues to serve as a shining, global counter-example of open collaborative knowledge sharing and consensus building presented in a reliable context with a neutral point of view, free of fake news and false information, that emphasizes how we can come together to build the sum of all human knowledge. We all need Wikipedia, its sister projects, its technology, and its values, now more than ever.

 
The Wikimedia Foundation is partnering on this project with Wikimedia Germany (Deutschland), the independent nonprofit organization dedicated to supporting the Wikimedia projects in Germany. Wikimedia Germany incubated and oversaw Wikidata’s initial operations, and continues to manage Wikidata’s technical and engineering roadmap. The project will be overseen in consultation with the Wikimedia community of volunteer contributors on collaboration and community support. The USD$3,015,000 grant from the Sloan Foundation will be given over a three year period.

For more information, please visit the structured data page on Wikimedia Commons.

by Wikimedia Foundation at January 09, 2017 07:41 PM

January 08, 2017

Jeroen De Dauw

Rewriting the Wikimedia Deutschland fundraising

Last year we rewrote the Wikimedia Deutschland fundraising software. In this blog post I’ll give you an idea of what this software does, why we rewrote it and the outcome of this rewrite.

The application

Our fundraising software is a homegrown PHP application. Its primary functions are donations and membership applications. It supports multiple payment methods, needs to interact with payment providers, supports submission and listing of comments and exchanges data with another homegrown PHP application that does analysis, reporting and moderation.

fun-app

The codebase was originally written in a procedural style, with most code residing directly in files (i.e., not even in a global function). There was very little design and completely separate concerns such as presentation and data access were mixed together. As you can probably imagine, this code was highly complex and very hard to understand or change. There was unused code, broken code, features that might not be needed anymore, and mysterious parts that even our guru that maintained the codebase during the last few years did not know what they did. This mess, combined with the complete lack of a specification and units tests, made development of new features extremely slow and error prone.

derp-code

Why we rewrote

During the last year of the old application’s lifetime, we did refactor some parts and tried adding tests. In doing so, we figured that rewriting from scratch would be easier than trying to make incremental changes. We could start with a fresh design, add only the features we really need, and perhaps borrow some reusable code from the less horrible parts of the old application.

They did it by making the single worst strategic mistake that any software company can make: […] rewrite the code from scratch. —Joel Spolsky

We were aware of the risks involved with doing a rewrite of this nature and that often such rewrites fail. One big reason we did not decide against rewriting is that we had a time period of 9 months during which no new features needed to be developed. This meant we could freeze the old application and avoid parallel development, resulting in some kind of feature race. Additionally, we set some constraints: we would only rewrite this application and leave the analysis and moderation application alone, and we would do a pure rewrite, avoiding the addition of new features into the new application until the rewrite was done.

How we got started

Since we had no specification, we tried visualizing the conceptual components of the old application, and then identified the “commands” they received from the outside world.

old-fun-code-diagram

Creating the new software

After some consideration, we decided to try out The Clean Architecture as a high level structure for the new application. For technical details on what we did and the lessons we learned, see Implementing the Clean Architecture.

The result

With a team of 3 people, we took about 8 months to finish the rewrite successfully. Our codebase is now clean and much, much easier to understand and work with. It took us over two man years to do this clean up, and presumably an even greater amount of time was wasted in dealing with the old application in the first place. This goes to show that the cost of not working towards technical excellence is very high.

We’re very happy with the result. For us, the team that wrote it, it’s easy to understand, and the same seems to be true for other people based on feedback we got from our colleagues in other teams. We have tests for pretty much all functionality, so can refactor and add new functionality with confidence. So far we’ve encountered very few bugs, with most issues arising from us forgetting to add minor but important features to the new application, or misunderstanding what the behavior should be and then correctly implementing the wrong thing. This of course has more to do with the old codebase than with the new one. We now have a solid platform upon which we can quickly build new functionality or improve what we already have.

The new application is the first Wikimedia (Deutschland) deployed on, and wrote in, PHP7. Even though not an explicit goal of the rewrite, the new application has ended up with better performance than the old one, in part due to the PHP7 usage.

Near the end of the rewrite we got an external review performed by thePHPcc, during which Sebastian Bergmann, who you might know from PHPUnit fame, looked for code quality issues in the new codebase. The general result of that was a thumbs up, which we took the creative license to translate into this totally non-Sebastian approved image:

You can see our new application in action in production. I recommend you try it out by donating 🙂

Technical statistics

These are some statistics for fun. They have been compiled after we did our rewrite, and where not used during development at all. As with most software metrics, they should be taken with a grain of salt.

In this visualization, each dot represents a single file. The size represents the Cyclomatic complexity while the color represents the Maintainability Index. The complexity is scored relative to the highest complexity in the project, which in the old application was 266 and in the new one is 30. This means that the red on the right (the new application) is a lot less problematic than the red on the left. (This visualization was created with PhpMetrics.)

fun-complexity

Global access in various Wikimedia codebases (lower is better). The rightmost is the old version of the fundraising application, and the one next to it is the new one. The new one has no global access whatsoever. LLOC stands for Logical Lines of Code. You can see the numbers in this public spreadsheet.

global-access-stats

Static method calls, often a big source of global state access, where omitted, since the tools used count many false positives (i.e. alternative constructors).

The differences between the projects can be made more apparent by visualizing them in another way. Here you have the number of lines per global access, represented on a logarithmic scale.

lloc-per-global

The following stats have been obtained using phploc, which counts namespace declarations and imports as LLOC. This means that for the new application some of the numbers are very slightly inflated.

  • Average class LLOC: 31 => 21
  • Average method LLOC: 4 => 3
  • Cyclomatic Complexity / LLOC : 0.39 => 0.10
  • Cyclomatic Complexity / Number of Methods: 2.67 => 1.32
  • Global functions: 58 => 0
  • Total LLOC: 5517 => 10187
  • Test LLOC: 979 => 5516
  • Production LLOC: 4538 => 4671
  • Classes 105 => 366
  • Namespaces: 14 => 105

This is another visualization created with PhpMetrics that shows the dependencies between classes. Dependencies are static calls (including to the constructor), implementation and extension and type hinting. The applications top-level factory can be seen at the top right of the visualization.

fun-dependencies

by Jeroen at January 08, 2017 09:02 AM

January 07, 2017

User:Bluerasberry

Year of Science 2016 – a new model for Wikipedia outreach

The Year of Science was a 2016 Wikipedia outreach campaign managed by the Wiki Education Foundation with funding support from the Simons Foundation. The campaign had several goals, including developing science articles on Wikipedia, recruiting scientists as volunteer Wikipedia editors, promoting discussions about the culture and impact of Wikipedia in the scientific community, and integrating more science themes into existing Wikipedia community programs.

It is easy to say that the Year of Science was one of the biggest and highest impact campaigns which the Wikipedia community has produced to date. Previous campaigns rarely lasted more than a month, and campaigns rarely include multiple events in multiple cities or recruited so many participants. It is unprecedented for any Wikipedia campaign to bring so many discussions to professional spaces, but Year of Science included talks and workshops at academic conferences throughout the year. The very brand and idea of a “year of science” was provocative to see in circulation around Wikipedia, and pushed the community’s imagination of what is possible.

The campaign will have its own outcome reports and counts of progress. 2016 just ended so these are not available yet. When they come out, they will describe the counts of how many people attended workshops and registered Wikipedia articles to add citations to academic journals. With Wikipedia being digitally native, so many metrics are available. That part of the impact can be measured quantitatively. Beyond that I am confident that the social outreach changes the cultural posturing in science to Wikipedia, which I think is overdue to change. Right now, Wikipedia is riding a 10+-year wave of being the world’s most consulted source of science information. Assuming that Wikipedia survives into the future, I think people might look back and wonder when Wikipedia’s influence as a popular publication was recognized, and this Year of Science campaign might be cited as one of the first examples of professional Wikipedia outreach into a population of people who still had serious reservations about acknowledging Wikipedia at all. It was a risk to do Year of Science in 2016; 2014 or before would have been premature considering Wikipedia’s reputation then. Although things are better now, things are changing quickly and every year outreach like this is becoming easier to conduct and more likely to have a high impact with less effort.

I am pleased with the campaign outcomes. From a Wikimedia community perspective of wanting to keep what worked and spend less time repeating the parts which were less effective, the campaign could be criticized, but I do not think the criticism should detract from celebrating everything that everyone accomplished. Most parts of the program were successful, and I expect that other stakeholders will publish to describe those parts. For the sake of anyone who might want to do similar projects, I will review the challenges.

Metrics are incomplete
The Wikipedia community values transparency. However, many people in the Wikipedia community stay in digital spaces and underestimate the difficulty of doing outreach away from the keyboard. The Year of Science tracked as much of Wikipedia engagement as is routine to track in outreach programs, but from anecdotes, I know that much and perhaps most data was not captured. There are various reasons for this. One reason is that Wikipedia’s software is nonprofit and rooted in the late 1990s, whereas commercial websites have all the advantages of being state of the art and intuitive to use. Wikipedia’s clunky interface and infrastructure is a barrier to getting users to agree to the lamest parts of Wikipedia, like volunteering for metrics tracking. In platforms like Facebook, every aspect of people’s lives are tracked routinely with single clicks, but in Wikipedia, there are social options to preserve privacy and then technical limitations even for people who are sharing what they do. The idea for metrics tracking in a program like this is that if someone volunteers to report to a campaign organizer which Wikipedia article they edited, then we ought to be able to track that. For a campaign like this, we actually need to be able to track hundreds or thousands of participants. What happens in practice is that for various reasons, this tracking connection is difficult to make in Wikipedia for reasons which are not present in other organizing platforms. This is simultaneously a problem, and an intentional choice with its own rights-preserving benefits, and a social situation on which to reflect. Something that came out of this is development of the Programs & Events dashboard. I think that the P&E dashboard could prove to be of the most significant innovations to Wikipedia in its entire history, because the dashboard is the first effort to provide a system for collecting media metrics reporting the impact of Wikipedia. When stories about Wikipedia communication metrics are told, then I think the Year of Science should be remembered as one of the second-wave driving forces in the development of the concept.

Some experiments failed to develop
In typical wiki-fashion, the beginning of the campaign was treated as a call for all sorts of sub-projects. Should the campaign include a contest, a newsletter, collaborations with 10+ ongoing Wikipedia initiatives, and formal partnerships with respected science organizations? As it happens, Wikipedia is an improvised project which changes quickly depending on participant interest. When a few people want something, they start to create it, and some kinds of communication which worked well for offline activism – like newsletters – can seem slow in the age of Internet. Wikipedia does have some newsletters, but just in the same way that The New York Times publishes online first and only puts yesterday’s news in the latest edition of their paper publications, things like newsletters for digital communities can have low relevance for people who are living the experience. The Year of Science campaign ambitiously listed a range of projects, but many never materialized, and things that did not seem important in the inception of the idea became important months later. Insiders of a campaign often hesitate to definitively strike an idea which is not progressing, but for this campaign, I think some of the ideas which were raised in the beginning looked quite dead to both Wikipedians and science professionals who might have checked the campaign page. Wikipedia has trouble managing timed campaigns, because it is difficult to crowdsource the management of projects which must happen on a schedule. Wiki-style editors will not be bold enough to go into a campaign space started by another and tell them that they need to abandon certain halted projects, and the leadership of a campaign might not be able to recognize when enough time has passed to declare an initiative dead. By the end of the year, the campaign page accumulated some distracting cruft. Anyone replicating the campaign should plan in advance how to introduce new ideas to stay current and how to kill off paused concepts to prevent being overburdened. I would recommend by making modest promises in the beginning, introducing supplementary projects without prior announcement as a bonus rather than in fulfillment of a commitment, and not advertising any non-essential feature or service as ongoing and dependable until and unless that feature has already been provided in several iterations over a period of time.

No centralized forum
The idea of a centralized outreach campaign in Wikipedia is a little crazy. Wikipedia was imagined from its founding as a crowdsourced project in which any individual can contribute information, and other people can spontaneously organize to review and manage it according to rules which are developed by consensus. At no point in Wikipedia’s history has there been much concept of centralized leadership or even support. With Year of Science, there were outreach events in every way possible targeting individuals who would do anything, including editing articles, providing review and suggestions, developing the Wikidata database, or joining conversations. Beyond individuals all sorts of organizations external to Wikipedia participated, including conference teams, universities, social groups, and professional societies.

Although there was a campaign landing page to orient anyone to the Year of Science concept, the Wikipedia community is not accustomed to anticipating the existence of this kind of central campaign or using such forums provided by a campaign as a way to connect to sponsored support services. In some ways, Wiki Ed as an organization provided staff support for the outreach by setting up some basic infrastructure to make the campaign possible. Things that any traditional off-wiki outreach campaign would imagine to be essential – like logos, basic text instructions, sign-up sheets, reporting queues, designated talk pages, and some points of contact – are not aspects of Wikipedia community culture which the wiki community expects to exist in the wiki campaigns which have been successful to date. There is a cultural mismatch in what a science professional would expect to exist in a social campaign and what the wiki community imagines should exist. The organizers of the Year of Science campaign imagined the campaign landing page to be a bridge for this, and it was, but the concept of a traditional community entry point has not developed in the wiki community to a point which permits two-way communication between the Wikipedia community and people communicating in other ways. This is not a problem unique to Wikipedia, as people not familiar with communication in YouTube, Facebook, Twitter, or any other digital community platform have trouble moving messaging into and and getting comments out of those platforms as well. With Wikipedia, the paths to communication are less developed, and the Year of Science pushed to test what was possible.

For future campaigns, as outreach becomes broader, there could be more notice of what central services are and are not available. The Wikipedia community will tend to anticipate that there is no central service; off-wiki communities will tend to expect that there will be. Both communities will have challenges grasping the reality which is in the middle of these expectations. The centralized support which is available should be ready to promote services to those not expecting them, and preemptively match the support requested by off-wiki communities to what is available.

Take aways
Let’s do it again! The very precedent of the Year of Science is good for me in my medical outreach, because the credibility it generated gives me more of a foundation to to go further. This kind of campaign could be repeated globally in all languages for a year, or anyone could modify the concept to be local in one language and for a shorter time if that suited them. I would like to see more science themed campaigns. I can imagine other people exploring campaigns with themes in the humanities, for trades and labor, by geographical interest, for content types like datasets, or for engagement types like translation. I expect that now that this has happened, the next campaign organizers will be more informed going into the project now that the risk has been taken.

This entire experience also marks one of the first times that content sponsorship has been provided, albeit in the wiki way. It is not at all orthodox right now for anyone to fund wiki development, but not only did Simons Foundation do this, but they even let it happen in the wiki way: with invitations for any person or organization to contribute and to share the information which was important to them, as a volunteer, and without any promotional agenda.

by bluerasberry at January 07, 2017 07:58 PM

Gerard Meijssen

#Maps - Where did they live?

This map is in many ways perfect. It tells us a story. It helps visualise what happened in the past. The map is simple, they are the contours of present day Europe, more or less and in it you see roughly where what happened.

Obviously the map could be improved but typically it makes little difference for understanding what it is that is shown when it is seen in isolation.

When this map is part of a continuum of maps, it will show the movements over time. It will show where they are at a given time. They will show where the Vandals settled down and show where they fought their battles. Better understanding will emerge but it may get complicated. The Vandals were not the only ones around. It was a time of turmoil and only when the shape of former countries and battles are shown a better understanding emerges.

For many "former countries" maps are not available and when they are they are of a similar quality as the map of the Vandals. What I would love is maps as an overlay and just add maps and facts as they are available. Many maps will only over time get some credibility but it is an improvement over nothing to see.
Thanks,
       GerardM

by Gerard Meijssen (noreply@blogger.com) at January 07, 2017 07:44 AM

January 06, 2017

Wikimedia Foundation

The end of ownership? Rethinking digital property to favor consumers at a Yale ISP talk

Photo by Nick Allen, CC BY-SA 3.0.

Photo by Nick Allen, CC BY-SA 3.0.

Suppose a consumer named Alice buys a record of a David Bowie music album. Although Alice is not an expert in property law, she probably knows what privileges she enjoys by buying the LP record. For example, Alice can freely lend or rent it to her friend. Alice also possesses similar rights if she were to buy a book. But what happens when Alice buys an e-book or a song on iTunes? Can she enjoy the same rights with the e-book as she could with the book? Can she lend it or rent it to whoever she wants without any restrictions? Probably not. In the online world, users’ rights on digital copies and content are subject to licensing and technological restrictions imposed by copyright holders.

How we approach content licensing is critical for the Wikimedia projects and our mission of spreading free knowledge around the world. To help us better understand current research on this issue, I recently attended a talk on this subject hosted by our friends and collaborators at the Yale Information Society Project. The talk, entitled The End of Ownership: Personal Property in the Digital Economy, was given on November 3, 2016 by Professor Aaron Perzanowski of Case Western Reserve University.

Intellectual property, including copyright, is governed by the principle of exhaustion, also called the first-sale doctrine. This principle, established in Bobbs-Merrill Co. v. Straus in 1908, holds that copyright holders lose their ability to control further sales over their copyrighted works once they transfer the works to new owners. Bobbs-Merrill, the plaintiff and a publisher, drafted a notice in its books forbidding sales under one dollar, warning that violations of this condition would be considered copyright infringement. The defendants resold the books for less than a dollar each. In the end, the United States Supreme Court agreed with the defendants’ position and sent a clear message that copyright holders are not able to control prices or resales after the first sale of the copyrighted work. Even today, the first-sale doctrine is an important defense for consumers. In Alice’s case, copyright holders can control any use of the physical copy of the Bowie LP record until the first sale to Alice. Once Alice owns the record, she can re-sell it, donate it, etc.

But according to Perzanowski, the notion of property has changed: in the past, copies used to be valuable because they were scarce and difficult to produce. In the internet era, the paradigm has shifted. Because everything disseminates quickly and at almost no cost, copies have lost value. This is why buying in the digital world is a different experience. If Alice wants to buy an e-book on Amazon or an album on iTunes, she is not actually buying the “copy” of such e-book or album, but instead is licensing them. According to Perzanowski, these licensing schemes are undermining consumers’ rights that once were protected by ownership. Generally, the license terms of digital products will include the prohibition against transfer and sublicensing, among other restrictions. Thus, the notion of a copy of a work is disappearing, because in these licensing schemes, rights that are obvious in a physical object like resale, rental, or donation rights are neglected. Furthermore, these restrictions are authorized by law, specifically the Digital Millennium Copyright Act (DMCA), which includes provisions on Digital Rights Management (DRM) technologies that limits the consumer’s ability to use the product. If Alice buys an e-book, DRM technologies and legal provisions may limit her ability to print and copy-paste, and may impose time-limited access to her book.

Unfortunately, consumers do not seem to understand these limitations of the digital market. Perzanowski explained that 48% of online consumers think once they purchased an e-book by clicking on the “buy now” button, they are able to lend it to someone else and 86% think they actually own the book and can use it in the device of their choice. Perzanowski believes that consumers should be better informed so that they can have a better sense of autonomy on how to use the digital products they buy. For this reason, he advocates for better education on digital licensing so that consumers can recalibrate their expectations.

The use of Creative Commons (“CC”) licensing for content on the Wikimedia projects helps address Perzanowski’s concerns regarding limited consumer rights with respect to digital works and consumer education on digital licensing. First, CC licensing allows the Wikimedia communities to enjoy broader rights for digital works, such as the ability to share content with a friend or to produce derivatives, that are not available under more typical digital licensing schemes. Second, Creative Commons’ and the Foundation’s approaches to licensing provide certainty to consumers and promote transparency in how they can license content. For example, Creative Commons provides summaries of CC licenses in plain English. Similarly, the Wikimedia Foundation clearly explains to its users and contributors their rights and responsibilities in the use of CC licensed Wikimedia content. The Foundation also publicly consults with the Wikimedia communities on these licensing terms; it recently closed a consultation with the communities on a proposed change from CC BY-SA 3.0 to CC BY-SA 4.0.

In today’s world, physical copies and analog services are more the exception than the rule. This, however, shouldn’t mean that digital copies and online services have to be offered with fewer rights for users compared to rights available under traditional licensing schemes. We support licensing schemes that allow users to retain the same rights that they would otherwise have in the offline world so that users have the power to edit, share, and remix content: the more we empower, the more knowledge expands and creativity grows.

We believe strongly in a world where knowledge can be freely shared. Our visits to Yale ISP allow us to remain engaged in discussions about internet-related laws and affirm the importance of licenses like Creative Commons for the future of digital rights.

Ana Maria Acosta, Legal Fellow
Wikimedia Foundation

by Ana Maria Acosta at January 06, 2017 08:06 PM

#100womenwiki: A global Wikipedia editathon

Photo by BBC/Henry Iddon, CC BY-SA 3.0.

Photo by BBC/Henry Iddon, CC BY-SA 3.0.

On 8 December 2016, Wikimedia communities around the world held a multi-lingual, multi-location editathon in partnership with the BBC to raise awareness of the gender gap on Wikipedia, improve coverage of women, and encourage women to edit.

In the UK, events took place at BBC sites in Cardiff, Glasgow, and Reading, in addition to the flagship event at Broadcasting House in London; while events took place around the world in cities like Cairo, Islamabad, Jerusalem, Kathmandu, Miami, Rio de Janeiro, Rome, Sao Paulo and Washington DC. Virtual editathons were organised by Wikimedia Bangladesh, and by WikimujeresWikimedia Argentina and Wikimedia México for the Spanish-language Wikipedia. Women in Red were a strategic partner for the whole project, facilitating international partnerships between the BBC and local Wikimedia communities, helping to identify content gaps and sources, and working incredibly hard behind the scenes to improve new articles that were created as part of the project.

The global editathon was the finale of the BBC’s 100 Women series in 2016 and attracted substantial radio, television, online, and print media coverage worldwide.

The events were attended by hundreds of participants, many of them women and first-time editors, with nearly a thousand articles about women created or improved during the day itself. Impressively, Women in Red volunteers contributed over 500 new biographies to Wikipedia, with nearly 3000 articles improved as part of the campaign. Participants edited in languages including Arabic, Dari, English, Hausa, Hindi, Pashto, Persian, Russian, Spanish, Thai, Turkish, Urdu and Vietnamese, and were encouraged to live tweet the event using the shared hashtag #100womenwiki.

The online impact of #100womenwiki was significant, however of equal importance was the media coverage generated by the partnership. The BBC has a global reach of more than 350 million people a week, so this was a unique opportunity to highlight the gender gap, to raise the profile of the global Wikimedia community, and to reach potential new editors and supporters. In the UK, I was interviewed by Radio 5Live and Radio 4’s prestigious Today programme, while my colleague Stuart Prior and I appeared on the BBC World Service’s Science in Action programme. Dr Alice White, Wikimedian-in-Residence at the Wellcome Library, was also interviewed by 5Live and Jimmy Wales came to Broadcasting House to be interviewed by BBC World News, BBC Outside Source and Facebook Live. The story was featured heavily on the BBC’s online news coverage on 8th December—with an article by Rosie Stephenson-Goodknight that you can read here–and the project was covered by the Guardian, the Independent, and Metro in the UK, and other print and online media across the world.

Jimmy Wales, founder of Wikipedia, being interviewed at BBC 100 Women. Photo by the BBC/Henry Iddon via Wikimedia UK, CC BY-SA 3.0.

Jimmy Wales, founder of Wikipedia, being interviewed at BBC 100 Women. Photo by the BBC/Henry Iddon via Wikimedia UK, CC BY-SA 3.0.

The partnership with the BBC would not have been possible without the vision and energy of Fiona Crack, Editor and Founder of BBC 100 Women. After the events I spoke to her about what had been achieved and she reflected on how the combined reach and audience of the BBC and Wikimedia inspired and engaged people interested in women’s representation online. She commented “It was a buzzing event here in London, but the satellite events from Kathmandu to Nairobi, Istanbul to Jakarta were the magic that made 100 Women and Wikimedia’s partnership so special”

Clearly a project like #100womenwiki, focused on a single day of events, could never be a panacea for the gender gap on Wikimedia. After all, this is a complex issue reflecting systemic bias and gender inequality both online and in the wider world. With more lead-in time and resources, the partnership could have been even more successful, involving more Wikimedians and engaging and supporting more new editors.

However, events and partnerships like these demonstrate that the gender gap is not an entirely intractable issue. Within the global Wikimedia community, there are a significant number of people who are motivated to create change and willing to give up their free time contributing to Wikipedia and the sister projects, organising events, training editors and activating other volunteers and contributors in order to achieve it.

As the Chief Executive of Wikimedia UK—committed to building an inclusive online community and ensuring that Wikipedia reflects our diverse society and is free from bias—this is inspiring, encouraging, and humbling.

Lucy Crompton-Reid, Chief Executive
Wikimedia UK

This post was originally published on Wikimedia UK’s blog; it was adapted and lightly edited for publication in the Wikimedia blog.

by Lucy Crompton-Reid at January 06, 2017 06:08 PM

Wikimedia UK

So You’ve Decided to Become a Wikipedia Editor…

 

Learning to edit Wikipedia - Image by Jwslubbock CC BY-SA 4.0
Learning to edit Wikipedia – Image by Jwslubbock CC BY-SA 4.0

The learning curve when you start editing Wikipedia and its sister projects can be steep, so to help you get started, we decided to compile some advice that will help you navigate the complexity of the Wikimedia projects.

Check out the Getting Started page for general advice and information about how Wikipedia works before you start editing. There are a lot of written and visual tutorials as well as links to policies and guidelines used on the site. A quick look at the main editorial policies of Wikipedia, known as the Five Pillars, is also worthwhile.

1) Identify a subject area you know about.

Usually people have a particular area that they know about or are interested in. Wikipedia has project pages where people with similar interests go to discuss writing. They’re a great place to see what subjects you can contribute to – they often have advice on what work needs to be done in their area: Directory of Wikiprojects.

For example, if you’re interested in increasing the number of articles about women on WIkipedia, look at the Women in Red project page.

2) Fight the desire to create a new article straight away.

There are lots of ways to contribute to Wikipedia, and creating a new article is a big step when you’re starting out. Instead, you could try:

  1. making copyedits (correcting mistakes);
  2. improving stubs (enlarging small articles) Here’s a Twitter bot that lists stubs for you.
  3. contributing to red link lists (a red link is a page that does not exist on WP yet)

3) Start with a reference.

Wikipedia is the best available version of the evidence about any subject, so if you have factual books at home, find a good fact and insert a reference on a page about that topic. Be careful, however; some subjects have higher referencing criteria, especially the medical pages, so if you’re not a specialist in a complex area like medicine, start with a simpler subject area.

Finding reliable sources can be difficult, so here is a page with tips on how to identify these. You can also check out the Wikipedia list of Open Access journals and the Directory of Open Access Journals with reliable research that can you can reference.

4) Upload some photos to Commons.

As well as Wikipedia, one of the most important Wikimedia projects is Wikimedia Commons. If you’re more of a visual content creator than a writer, your photos might be useful to illustrate articles on Wikipedia.

Uploading to Commons means you agree that others can use your content for free without asking you as long as they give you credit as the author of the work. This agreement is called an Open License or a Creative Commons license, which go by odd names like CC BY-SA 4.0.

There are monthly photo competitions: current challenges are on drone photography, rail transport and home appliances.

You can also use the WikiShootMe tool to see what Wikipedia articles and Wikidata items are geolocated near your present location. Why not take images of some of the places listed and add the photos to their pages and data items?

5) Try to identify content gaps.

The English Wikipedia now has around 5.3 million articles, but the type of content skews towards the interests of the groups of people who are more likely to edit it. There’s lots of articles on Pokemon and WWE wrestling, but less about ethnic minorities, important women, non-European history and culture, and many other topics.

There is a tool that you can use to search for content gaps by comparing one Wikipedia to another to see which articles exist in, for example, Spanish, but not in English. You can try it out here.

6) Talk to other people in the community for advice.

Wikipedia has a help section with advice on how to get started, including a messageboard for asking questions and a help chatroom. There are also Facebook groups and IRC channels if you’re that cool.

If you’re one of those kinds of people who enjoys interacting with actual human beings in real life as well as online, there are social meetups for the Wikimedia community every month in London, Oxford and Cambridge, and periodically in Manchester and Edinburgh. There are also lots of events you can come to about specific subjects, many of which are hosted by our Wikimedians-in-Residence

_ _ _ _ _ _ _ _ _ _

A lot of people use Wikipedia but never edit it, and consequently never think about how much effort goes into creating it. Participating in the creation of knowledge yourself is a really instructive way to discover how knowledge is created and structured, and the issues we face in producing accurate and impartial knowledge.

If you speak another language, you can practice by translating articles from English into a target language, and at the same time help people to educate themselves for free in another part of the world.

The world can feel disempowering sometimes, but if you help to create a good article or upload a good photograph, it could be seen by hundreds of thousands of people, and you could make a difference to someone’s education, or government policy, or the visibility of minority cultures.

So if you’ve decided to become more involved in Wikipedia or its sister projects this year, thank you! Wikimedia UK is here to support you, so don’t hesitate to get in touch and ask for advice. Wikipedia has always been, and will continue to be a work in progress, and we think that provides exciting opportunities to help create a world in which every single human being can freely share in the sum of all knowledge.

by John Lubbock at January 06, 2017 04:39 PM

Erik Zachte

Wiki Loves Monuments 2016

In 2016 Wiki Loves Monuments (WLM) has been a top ranking project community initiative in terms of attention raised.

Here are further stats on that contest. The charts follow the layout used in this blog in earlier years, but the data have now been collected from another WLM stats tools wlm-stats. For added depth see also this Wikimedia blog post.

Participating countries in 2016 WLM contest

Map of countries participating in Wiki Loves Monuments 2016


Some charts are about image uploads.
One is about image uploaders, also known as contributors.

Countries

With 44 participating countries, 9 more than in 2015, the 2016 contest ranks second after 2013, when 53 countries participated. (See first table). 8 countries participated for the first time: Bangladesh, Georgia, Greece, Malta, Morocco, Nigeria, Peru and South Korea.

In those 7 years since WLM started 7 countries participated 6 times: Belgium, France, Germany, Norway, Russia, Spain, Sweden.

The contest ran in different countries during different periods (mostly because different calendars are in use, and the aim is to run the contest for a full calendar month).

List of countries that participated, per year

Participants per year to Wiki Loves Monuments contest (click to zoom)


Uploads

 

The 2016 in total 277,406 images were uploaded, which is 20% more than in 2015.

WLM_uploads_per_year_fixed


In 2016 Germany contributed most images: 38,809

wlm_uploads_by_country_2016


wlm_uploads_by_country_cumulative

wlm_uploads_by_country_cumulative_2010_2016


Contributors

In 2016 India and United States excelled in number of uploaders: 1784 vs 1783. As the measured numbers fluctuate a bit over time (there is always ongoing vetting), I suggest we call this an ex aequo first place.

wlm_contributors_by_country_2016
wlm_uploaders_by_country_year_by_year_2010_2016_top_10


Edit activity on Commons

One Wikistats diagram: every year the Wiki Loves Monuments contest brings peak activity on Commons. The second peak earlier in the year, mostly since 2014, is result of the Wiki Loves Earth contest.

ploteditorscommons

Charts also available on Wikimedia Commons

by Erik at January 06, 2017 01:56 PM

January 05, 2017

Content Translation Update

January 6 CX Update: Fixes for page loading and template editor

Hello, and welcome to the first CX Update post of 2017!

We just deployed two significant fixes:

  • Many users complained recently that they cannot load a translation in progress that was auto-saved. This was happening when translating from languages that aren’t written in the Latin alphabet, such as Russian or Chinese. The data was not lost—it was correctly saved internally, but a software error prevented its proper loading. This is now supposed to be fixed, although some more work may be needed to make it more stable. If you experienced this issue, please try loading your article now. If it still doesn’t work, please report it. We apologize about this inconvenience. (bug report)
  • The template editor was remaining open when moving to the next section. This was confusing because some people didn’t realize that it’s supposed to be closed to actually save the data entered in the fields. It will now automatically close when moving to edit the next section. (bug report)

by aharoni at January 05, 2017 10:21 PM

Wikimedia Foundation

Coming soon: A global community survey to learn how to best support Wikimedians

Photo by Christopher Allison/Department of Defense, public domain/CC0.

Photo by Christopher Allison/Department of Defense, public domain/CC0.

How is the Wikimedia community’s health? Where do volunteer developers prefer to collaborate? What do Wikimedians think about partnerships? What is the most important workflow for editors?

These are only a few of the questions that the Wikimedia Foundation is asking in a new global community survey for Wikimedians. The opinions gathered from this survey will directly affect how the Foundation supports Wikimedia communities. The new survey, called Community Engagement Insights, is part of the Foundation’s annual plan. It was developed with input from 13 different teams at the Wikimedia Foundation and tested with Wikimedia volunteers.

Take the survey to influence decisions at the Foundation!

Surveys are important because they help us to design solutions, keeping community members in the heart of every project. In 2014, for example, we had many questions about harassment: How do our users experience harassment? What kinds of harassment are most common on which projects? How well Foundation staff were able to address harassment issues? The harassment survey was not a solution; it was the beginning of a conversation that has led to many actions.

Since the survey was completed, the movement has continued to take direct actions towards addressing the issue, from working on better detection (called Detox), to facilitating workshops and creating training modules. With the information from the survey, we were able to better understand the issues our communities and to begin taking action. The harassment survey will continue to inform future approaches to the problem.

Other surveys we have done recently include Reimagining Grants in mid 2015, which we used to help improve the support the Foundation offers through grants; an executive director survey in early 2016, which the Foundation used to help inform our search for the Foundation’s new executive director; the Tool Labs survey, which we used to identify needs in in the labs space and will help in planning for 2017-2018 year. These are only some examples of how asking the right thing at the right time can have an impact in our movement.

Wikimedia is a global movement of people, that goes beyond the online projects. Communities of volunteers are formed by citizens of the world that help create free access to knowledge. As a movement, we discuss and work on advocacy, mentorship, leadership, technology, collaboration and internationalization, among many other themes. The Foundation needs to make decisions and fund programs that will support Wikimedia communities in those specific areas. The Foundation also needs to learn and improve these programs. Are they being effective? Are they having the intended impact? Are these the right programs right now? The best way to answer these questions is to use tools that help us listen to communities. These not only include wiki pages, mailing lists, and conferences, but also consultations, interviews and surveys.

Community Engagement Insights is a new tool to help the Foundation listen to Wikimedia communities. This project seeks to improve surveys so the Foundation can try to hear from many voices equally and to make sure we are supporting and growing with the communities we directly assist. In other words, everyone who takes this survey can have a voice in deciding how the Foundation will improve their support. The data we collect will inform and guide the work of several teams at the Foundation.

Community Engagement Insights is not just a new survey; it is a new process that will support systematic surveys for the Wikimedia Foundation. The process would be highly collaborative and includes designing questions and capturing data from various audiences that can help support decisionmaking for the Foundation. For this year, the survey will reach out to very active editors, active editors, affiliates, program leaders, and volunteer developers. In future years, the audiences might expand. We hope the data will also be useful for the broader movement. We are working to make the datasets accessible to anyone while protecting the privacy of users.

The collaborative aspect of this survey was specifically designed to reduce the number of times the Foundation polls communities on any given topic.

Photo by Sebastiaan ter Burg, CC BY 2.0.

Photo by Sebastiaan ter Burg, CC BY 2.0.

Working towards actionable data

When we asked teams to come up with questions, we made sure that we understood their goals for each question and we asked how they intend to use the information they get for each prompt. If this information was unclear, we worked with the team to make sure there is a clear line of sight between the question and further actions.

Thirteen teams at the Foundation want to hear from Wikimedia communities. Here are some examples of the data they need: The Community Engagement Department wants to know about community health, because we need to learn more about how well we are collaborating, how well we support each other, and how engaged our volunteer community feels. The Technical Collaboration team wants to know about volunteer developer preferences in programming languages and collaboration spaces because we have a community of developers for whom we need to improve our support. The Global Reach team is working on expanding and changing their scope and they need to know what concerns communities have about different types of partnerships, including Wikipedia Zero, before making decisions about the types of partnerships they will pursue. The Editing Department needs to know more about which aspects of editing are more important for volunteers, so that the Foundation’s software developers can better prioritize their work improving the editing experience.

We are trying to get better at listening to communities, and surveys are one of the best ways we can do it for such a huge and diverse community. We also hope that having this process can help to reduce survey fatigue in the movement. As the project evolves, we hope it will begin to incorporate other surveys towards this end, and also to improve the quality of the data collected.

When you see the survey in your talk page, email, mailing list, or social media, remember that your opinions will directly affect the Foundation’s work.

Find out more information about Community Engagement Insights in our Frequently Asked Questions. You can learn how we are doing the sampling process, what kinds of questions we are asking, and which teams are participating.

Edward Galvez, Survey specialist, Learning and Evaluation team
María Cruz, Communications and outreach coordinator, Learning and Evaluation team

 

by Edward Galvez and María Cruz at January 05, 2017 08:06 PM

Trump, Prince, and Queen Elizabeth: 2016’s most-read Wikipedia articles

Photo by Michael Candelori, CC BY 2.0.

Photo by Michael Candelori, CC BY 2.0.

In 2016, people around the world turned to Wikipedia for facts about all kinds of things, but especially celebrities who died, television shows, and Donald Trump, who commanded unprecedented attention in both the media and on the English-language Wikipedia.

In total, there were over 76 million views on the English-language article about the US president-elect, making it 2016’s most-viewed Wikipedia article. That total is nearly three times more than 2015’s first-place article, which had just under 28 million.

A number of these views were concentrated around major events in the US presidential campaign: in the days after Trump’s multiple victories on “Super Tuesday,” 11 million pageviews were recorded on the article, and another 11 million came in the four days after the election.

Wikipedia editors were certainly diligently keeping up with the fast-changing news about him—the article about Trump was the second-most revised article of the year on the English Wikipedia.

Moreover, the Trump phenomenon extended far beyond the English language. More than five million pageviews were recorded on his article from readers in the Spanish, German, Russian; four million in the French; and three million in the Italian and Japanese languages.

A large percentage of these views came on November 9 and 10, the two days after Trump’s victory in the election. On the Russian Wikipedia, for instance, 36.7% of the entire year’s views came in those days. The number of people coming to read the article contributed to noticeable pageview jumps on the entire Spanish and Russian Wikipedias for November 9.

Back on the English Wikipedia, US politics occupied three other spots in the top ten. The main article on last year’s US election was fourth, while Trump’s wife Melania and Trump’s opponent Hillary Clinton garnered almost 19 and 18 million views (respectively).

Those numbers make them the highest-ranked women to ever appear in a Wikipedia most-viewed list, surpassing mixed martial artist Ronda Rousey‘s 12 million in 2015.1

Photo by penner, CC BY-SA 3.0.

Photo by penner, CC BY-SA 3.0.

Politics, however, was far from the only story of 2016. The main article about deaths in 2016 was the second most-viewed article with almost 36 million views, a total that would have put it in first place in any of the last four years.2

Prince and David Bowie, both musicians of rare talent and influence who died last year, were the third- and seventh-most viewed articles of the year. Nearly 13 of Bowie’s 19 million came shortly after his death in January. Similarly, 17 of the 22 million views on Prince’s article came in the days after his unexpected death in April 2016. At at least one point, there were so many viewers and editors on Prince’s article—at its peak, an average of 810 views per second—that some were redirected to an error page.

Boxer Muhammad Ali and actor Carrie Fisher also appear in the top 25. The large majority of views to Fisher’s article came almost entirely in the last week of the year, during which she was hospitalized and passed away.

Cinema and film, the theme of 2015, still makes up the bulk of the list—but last year had a notable bias towards superheroes. The headliner was Suicide Squad, whose 19.4 million pageviews was good enough for fifth. Trailers for the superhero ensemble blockbuster drummed up much interest for the film, leading to several view spikes during the year, but it faced “overwhelmingly negative” reviews upon its release. Franchise installments Captain America: Civil War, Batman v. Superman: Dawn of Justice, and Deadpool all also appear in the top 12. And away from superheroes, the yearly list of Bollywood films—a perennial favorite—was viewed 19.2 million times.

Perhaps surprisingly, two 2015 films appear in the 2016’s top 25. The Revenant, at #19, was released in the United States only days before the end of 2015. It got many views in 2016 after winning several awards, including star Leonardo diCaprio‘s first Oscar. Star Wars: The Force Awakens was powered to #22 by the tail end of its popularity. It was #3 on 2015’s list, and the article was viewed over 14 million times both in 2016 and in December 2015 alone.

Photo by NASA/Bill Ingalls, public domain/CC0.

Photo by NASA/Bill Ingalls, public domain/CC0.

People also jumped on Wikipedia to learn more about the television shows they were watching. Game of Thrones, for example, appears twice on the list (#18 and #22). But real-life individuals portrayed on television shows had an even more serious impact on the most-viewed list.

Streaming media company Netflix, for instance, is a strong suspect for two entries. The Crown, their biopic television series about the coronation and early reign of Queen Elizabeth II, likely helped boost the monarch to #13 on the list. Coming up shortly behind her was Narcos‘ main character Pablo Escobar (#16), the Colombian drug lord who was once the most wealthy criminal in the world.

FX, the American television channel, might have joined with The People v. O. J. Simpson: American Crime, as our list concludes at #25 with former American football player O. J. Simpson. The show, which aired in February and March 2016, is based on Simpson’s murder trial and eventual acquittal on all charges.

All three of these articles had some of the highest percentage of mobile views in the top 25, topping out with Simpson at 75.36%. This suggests a sort of second screen effect, where readers grabbed their mobile phones, searched Wikipedia, and educated themselves about the real-life equivalents to characters in front of them.

“There is certainly a correlation in article popularity for a new popular television show or movie and its primary actors,” says Wikipedia editor Milowent, one of two writers who examine the weekly popularity of Wikipedia articles in the Top 25 Report. “Articles about key figures in television shows, like Elizabeth II or Pablo Escobar, have been consistently popular. The correlation only grows for movies: Rogue One is currently near the top of the weekly charts, for example, helping boost lead actress Felicity Jones into the Top 25 in previous weeks.”

The top 25 articles follow below. You can see the top 5000 over on Wikipedia;3 our grateful thanks go to researcher Andrew West for collating the data.

most-viewed-en-wp-articles-of-2016

  1. Donald Trump (75,965,727)
  2. Deaths in 2016 (35,911,398)2
  3. Prince (musician) (22,793,889)
  4. United States presidential election, 2016 (22,063,171)
  5. Suicide Squad (film) (19,435,260)
  6. List of Bollywood films of 2016 (19,285,100)
  7. David Bowie (19,039,110)
  8. Melania Trump (18,946,792)
  9. Captain America: Civil War (18,693,046)
  10. Batman v Superman: Dawn of Justice (18,548,575)
  11. Hillary Clinton  (17,801,991)
  12. Deadpool (film) (16,917,412)
  13. Elizabeth II (16,815,631)
  14. United States (16,502,083)
  15. Muhammad Ali (16,303,934)
  16. Pablo Escobar (16,210,514)
  17. Barack Obama (15,994,091)
  18. Game of Thrones (15,726,657)
  19. The Revenant (2015 film) (15,077,213)
  20. UEFA Euro 2016 (14,488,759)
  21. Star Wars: The Force Awakens (14,168,904)
  22. Game of Thrones (season 6) (14,111,811)
  23. 2016 Summer Olympics (14,026,668)
  24. Carrie Fisher (13,923,993)
  25. O. J. Simpson (13,795,907)

Ed Erhart, Editorial Associate
Wikimedia Foundation

You can see a list of 2016’s most-edited English Wikipedia articles and previous most-viewed lists from 2015 (1, 2), 2014, and 2013. Most-viewed English Wikipedia articles of each week are available through Wikipedia’s Top 25 Report.

Footnotes

  1. Pageview counts from before 2015 are not directly comparable, as mobile readers were not counted until October 2014.
  2. Wikipedians chronicle the deaths by month, so the page now redirects to a “list of lists of deaths.” This year’s list has already been started at deaths in 2017.
  3. The top 5000 include the percentage of mobile views for screening purposes—those with less than 5% or more than 95% can be safely discounted, as a significant amount of the pageviews will have stemmed spam, botnets, or other errors. We have also removed Earth, which would have come in at #20 but with only 9% mobile views, on the recommendation of the top 25 team.

by Ed Erhart at January 05, 2017 03:54 PM

Shyamal

Tracing some ornithological roots

The years 1883-1885 were tumultuous in the history of zoology in India. A group called the Simla Naturalists' Society was formed in the summer of 1885. The founding President of the Simla group was, oddly enough, Courtenay Ilbert - who some might remember for the Ilbert Bill which allowed Indian magistrates to make judgements on British subjects. Another member of this Simla group was Henry Collett who wrote a Flora of the Simla region (Flora Simlensis). This Society vanished without much of a trace. A slightly more stable organization was begun in 1883, the Bombay Natural History Society. The creation of these organizations was precipitated by the emergence of a gaping hole. A vacuum was created with the end of an India-wide correspondence network of naturalists that was fostered by a one-man-force - that of A. O. Hume. The ornithological chapter of Hume's life begins and ends in Shimla. Hume's serious ornithology began around 1870 and he gave it all up in 1883, after the loss of years of carefully prepared manuscripts for a magnum opus on Indian ornithology, damage to his specimen collections and a sudden immersion into Theosophy which also led him to abjure the killing of animals, taking to vegetarianism and subsequently to take up the cause of Indian nationalism. The founders of the BNHS included Eha (E. H. Aitken was also a Hume/Stray Feathers correspondent), J.C. Anderson (who was a Simla naturalist) and Phipson (who was from a wine merchant family with a strong presence in Simla). One of the two Indian founding members, Dr Atmaram Pandurang, was the father-in-law of Hume's correspondent Harold Littledale, a college principal at Baroda.

Shimla then was where Hume rose in his career (as Secretary of State, before falling) allowing him to work on his hobby project of Indian ornithology by bringing together a large specimen collection and conducting the publication of Stray Feathers. Through readings, I had a constructed a fairytale picture of the surroundings that he lived in. Richard Bowdler Sharpe, a curator at the British Museum who came to Shimla in 1885 wrote (his description  is well worth reading in full):
... Mr. Hume who lives in a most picturesque situation high up on Jakko, the house being about 7800 feet above the level of the sea. From my bedroom window I had a fine view of the snowy range. ... at last I stood in the celebrated museum and gazed at the dozens upon dozens of tin cases which filled the room ... quite three times as large as our meeting-room at the Zoological Society, and, of course, much more lofty. Throughout this large room went three rows of table-cases with glass tops, in which were arranged a series of the birds of India sufficient for the identification of each species, while underneath these table-cases were enormous cabinets made of tin, with trays inside, containing series of the birds represented in the table-cases above. All the specimens were carefully done up in brown-paper cases, each labelled outside with full particulars of the specimen within. Fancy the labour this represents with 60,000 specimens! The tin cabinets were all of materials of the best quality, specially ordered from England, and put together by the best Calcutta workmen. At each end of the room were racks reaching up to the ceiling, and containing immense tin cases full of birds. As one of these racks had to be taken down during the repairs of the north end of the museum, the entire space between the table-cases was taken up by the tin cases formerly housed in it, so that there was literally no space to walk between the rows. On the western side of the museum was the library, reached by a descent of three stops—a cheerful room, furnished with large tables, and containing, besides the egg-cabinets, a well-chosen set of working volumes. ... In a few minutes an immense series of specimens could be spread out on the tables, while all the books were at hand for immediate reference. ... we went below into the basement, which consisted of eight great rooms, six of them full, from floor to ceilings of cases of birds, while at the back of the house two large verandahs were piled high with cases full of large birds, such as Pelicans, Cranes, Vultures, &c.
I was certainly not hoping to find Hume's home as described but the situation turned out to be a lot worse. The first thing I did was to contact Professor Sriram Mehrotra, a senior historian who has published on the origins of the Indian National Congress. Prof. Mehrotra explained that Rothney Castle had long been altered with only the front facade retained along with the wood-framed conservatories. He said I could go and ask the caretaker for permission to see the grounds. He was sorry that he could not accompany me as it was physically demanding and he said that "the place moved him to tears." Professor Mehrotra also told me about how he had decided to live in Shimla simply because of his interest in Hume! I left him and walked to Christ Church and took the left branch going up to Jakhoo with some hopes. I met the caretaker of Rothney Castle in the garden where she was walking her dogs on a flat lawn, probably the same garden at the end of which there once had been a star-shaped flower bed, scene of the infamous brooch incident with Madame Blavatsky (see the theosophy section in Hume's biography on Wikipedia). It was a bit of a disappointment however as the caretaker informed me that I could not see the grounds unless the owner who lived in Delhi permitted it. Rothney Castle has changed hands so many times that it probably has nothing to match with what Bowdler-Sharpe saw and the grounds may very soon be entirely unrecognizable but for the name plaque at the entrance. Another patch of land in front of Rothney Castle was being prepared for what might become a multi-storeyed building. A botanist friend had shown me a 19th century painting of Shimla made by Constance Frederica Gordon-Cumming. In her painting, the only building visible on Jakko Hill behind Christ Church is Rothney Castle. The vegetation on Shimla has definitely become denser with trees blocking the views.
 
So there ended my hopes of adding good views (free-licensed images are still misunderstood in India) of Rothney Castle to the Wikipedia article on Hume. I did however get a couple of photographs from the roadside. In 2014, I managed to visit the South London Botanical Institute which was the last of Hume's enterprises. This visit enabled the addition a few pictures of his herbarium collections as well as an illustration of his bookplate which carries his personal motto.

Clearly Shimla empowered Hume, provided a stimulating environment which included several local collaborators. Who were his local collaborators in Shimla? I have only recently discovered (and notes with references are now added to the Wikipedia entry for R. C. Tytler) that Robert (of Tytler's warbler fame - although named by W E Brooks) and Harriet Tytler (of Mt. Harriet fame) had established a kind of natural history museum at Bonnie Moon in Shimla with  Lord Mayo's support. The museum closed down after Robert's death in 1872, and it is said that Harriet offered the bird specimens to the government. It would appear that at least some part of this collection went to Hume. It is said that the collection was packed away in boxes around 1873. The collection later came into possession of Mr B. Bevan-Petman who apparently passed it on to the Lahore Central Museum in 1917.

Hume's idea of mapping rainfall
to examine patterns of avian distribution
It was under Lord Mayo that Hume rose in the government hierarchy. Hume was not averse to utilizing his power as Secretary of State to further his interests in birds. He organized the Lakshadweep survey with the assistance of the navy ostensibly to examine sites for a lighthouse. He made use of government machinery in the fisheries department (Francis Day) to help his Sind survey. He used the newly formed meteorological division of his own agricultural department to generate rainfall maps for use in Stray Feathers. He was probably the first to note the connection between rainfall and bird distributions, something that only Sharpe saw any special merit in. Perhaps placing specimens on those large tables described by Sharpe allowed Hume to see geographic trends.

Hume was also able to appreciate geology (in his youth he had studied with Mantell ), earth history and avian evolution. Hume had several geologists contributing to ornithology including Stoliczka and Ball. One wonders if he took an interest in paleontology given his proximity to the Shiwalik ranges. Hume invited Richard Lydekker to publish a major note on avian osteology for the benefit of amateur ornithologists. Hume also had enough time to speculate on matters of avian biology. A couple of years ago I came across this bit that Hume wrote in the first of his Nests and Eggs volumes (published post-ornith-humously in 1889):

Nests and Eggs of Indian birds. Vol 1. p. 199
I wrote immediately to Tim Birkhead, the expert on evolutionary aspects of bird reproduction and someone with an excellent view of ornithological history (his Ten Thousand Birds is a must read for anyone interested in the subject) and he agreed that Hume had been an early and insightful observer to have suggested female sperm storage.

Shimla life was clearly a lot of hob-nobbing and people like Lord Mayo were spending huge amounts of time and money just hosting parties. Turns out that Lord Mayo even went to Paris to recruit a chef and brought in an Italian,  Federico Peliti. (His great-grandson has a nice website!) Unlike Hume, Peliti rose in fame after Lord Mayo's death by setting up a cafe which became the heart of Shimla's social life and gossip. Lady Lytton (Lord Lytton was the one who demoted Hume!) recorded that Simla folk "...foregathered four days a week for prayer meetings, and the rest of the time was spent in writing poisonous official notes about each other." Another observer recorded that "in Simla you could not hear your own voice for  the grinding of axes. But in 1884 the grinders were few. In the course of my service I saw much of Simla society,  and I think it would compare most favourably with any other town of English-speaking people of the same size. It was bright and gay. We all lived, so to speak, in glass houses. The little bungalows perched on the mountainside wherever there was a ledge, with their winding paths under the pine trees, leading to our only road, the Mall." (Lawrence, Sir Walter Roper (1928) The India We Served.)

A view from Peliti's (1922).
Peliti's other contribution was in photography and it seems like he worked with Felice Beato who also influenced Harriet Tytler and her photography. I asked a couple of Shimla folks about the historic location of Peliti's cafe and they said it had become the Grand Hotel (now a government guest house). I subsequently found that Peliti did indeed start Peliti's Grand Hotel, which was destroyed in a fire in 1922, but the centre of Shimla's social life, his cafe, was actually next to the Combermere Bridge (it ran over a water storage tank and is today the location of the lift that runs between the Mall and the Cart Road). A photograph taken from "Peliti's" clearly lends support for this location as do descriptions in Thacker's New Guide to Simla (1925). A poem celebrating Peliti's was published in Punch magazine in 1919. Rudyard Kipling was a fan of Peliti's but Hume was no fan of Kipling (Kipling seems to have held a spiteful view of liberals - "Pagett MP" has been identified by some as being based on W.S.Caine, a friend of Hume; Hume for his part had a lifelong disdain for journalists. Kipling's boss, E.K. Robinson started the British Naturalists' Association while E.K.R.'s brother Philip probably influenced Eha.

While Hume most likely stayed well away from Peliti's, we see that a kind of naturalists social network existed within the government. About Lord Mayo we read: 
Lord Mayo and the Natural History of India - His Excellency Lord Mayo, the Viceroy of India, has been making a very valuable collection of natural historical objects, illustrative of the fauna, ornithology, &c., of the Indian Empire. Some portion of these valuable acquisitions, principally birds and some insects, have been brought to England, and are now at 49 Wigmore Street, London, whence they will shortly be removed. - Pertshire Advertiser, 29 December 1870.
Another news report states:
The Early of Mayo's collection of Indian birds, &c.

Amids the cares of empire, the Earl of Mayo, the present ruler of India, has found time to form a valuable collection of objects illustrative of the natural history of the East, and especially of India. Some of these were brought over by the Countess when she visited England a short time since, and entrusted to the hands of Mr Edwin Ward, F.Z.S., for setting and arrangement, under the particular direction of the Countess herself. This portion, which consists chiefly of birds and insects, was to be seen yesterday at 49, Wigmore street, and, with the other objects accumulated in Mr Ward's establishment, presented a very striking picture. There are two library screens formed from the plumage of the grand argus pheasant- the head forward, the wing feathers extended in circular shape, those of the tail rising high above the rest. The peculiarities of the plumage hae been extremely well preserved. These, though surrounded by other birds of more brilliant covering, preserved in screen pattern also, are most noticeable, and have been much admired. There are likewise two drawing-room screens of smaller Indain birds (thrush size) and insects. They are contained in glass cases, with frames of imitation bamboo, gilt. These birds are of varied and bright colours, and some of them are very rare. The Countess, who returned to India last month, will no doubt,add to the collection when she next comes back to England, as both the Earl and herself appear to take a great interest in Illustrating the fauna and ornithology of India. The most noticeable object, however, in Mr. Ward's establishment is the representation of a fight between two tigers of great size. The gloss, grace, and spirit of the animals are very well preserved. The group is intended as a present to the Prince of Wales. It does not belong to the Mayo Collection. - The Northern Standard, January 7, 1871
And Hume's subsequent superior was Lord Northbrook about whom we read:
University and City Intelligence. - Lord Northbrook has presented to the University a valuable collection of skins of the game birds of India collected for him by Mr. A.O.Hume, C.B., a distinguished Indian ornithologist. Lord Northbrook, in a letter to Dr. Acland, assures him that the collection is very perfec, if not unique. A Decree was passed accepting the offer, and requesting the Vice-Chancellor to convey the thanks of the University to the donor. - Oxford Journal, 10 February 1877
Papilio mayo
Clearly Lord Mayo and his influence on naturalists in India is not sufficiently well understood. Perhaps that would explain the beautiful butterfly named after him shortly after his murder. It appears that Hume did not have this kind of hobby association with Lord Lytton, little wonder perhaps that he fared so badly!

Despite Hume's sharpness on many matters there were bits that come across as odd. In one article on the flight of birds he observes the soaring of crows and vultures behind his house as he sits in the morning looking towards Mahassu. He points out that these soaring birds would appear early on warm days and late on cold days but he misses the role of thermals and mixes physics with metaphysics, going for a kind of Grand Unification Theory:

And then claims that crows, like saints, sages and yogis are capable of "aethrobacy".
This naturally became a target of ridicule. We have already seen the comments of E.H. Hankin on this. Hankin wrote that if levitation was achieved by "living an absolutely pure life and intense religious concentration" the hill crow must be indulging in "irreligious sentiments when trying to descend to earth without  the help of gravity." Hankin despite his studies does not give enough credit for the forces of lift produced by thermals and his own observations were critiqued by Gilbert Walker, the brilliant mathematican who applied his mind to large scale weather patterns apart from conducting some amazing research on the dynamics of boomerangs. His boomerang research had begun even in his undergraduate years and had earned him the nickname of Boomerang Walker. On my visit to Shimla, I went for a long walk down the quiet road winding through dense woodland and beside streams to Annandale, the only large flat ground in Shimla where Sir Gilbert Walker conducted his weekend research on boomerangs. Walker's boomerang research mentions a collaboration with Oscar Eckenstein and there are some strange threads connecting Eckenstein, his collaborator Aleister Crowley and Hume's daughter Maria Jane Burnley who would later join the Hermetic Order of the Golden Dawn. But that is just speculation!
1872 Map showing Rothney Castle

The steep road just below Rothney Castle

Excavation for new constructions just below and across the road from Rothney Castle

The embankment collapsing below the guard hut

The lower entrance, concrete constructions replace the old building

The guard hut and home are probably the only heritage structures left


I got back from Annandale and then walked down to Phagli on the southern slope of Shimla to see the place where my paternal grandfather once lived. It is not a coincidence that Shimla and my name are derived from the local deity Shyamaladevi (a version of Kali).


The South London Botanical Institute

After returning to England, Hume took an interest in botany. He made herbarium collections and in 1910 he established the South London Botanical Institute and left money in his will for its upkeep. The SLBI is housed in a quiet residential area. Here are some pictures I took in 2014, most can be found on Wikipedia.


Dr Roy Vickery displaying some of Hume's herbarium specimens

Specially designed cases for storing the herbarium sheets.

The entrance to the South London Botanical Institute

A herbarium sheet from the Hume collection

 
Hume's bookplate with personal motto - Industria et Perseverentia

An ornate clock which apparently adorned Rothney Castle
A special cover released by Shimla postal circle in 2012

Further reading
 Postscript

 An antique book shop had a set of Hume's Nests and Eggs (Second edition) and it bore the signature of "R.W.D. Morgan" - it appears that there was a BNHS member of that name from Calcutta c. 1933. It is unclear if it is the same person as Rhodes Morgan, who was a Hume correspondent and forest officer in Wynaad/Malabar who helped William Ruxton Davison.
Update:  Henry Noltie of RBGE pointed out to me privately that this is cannot be the forester Rhodes Morgan who died in 1919! - September, 2016.

    by Shyamal L. (noreply@blogger.com) at January 05, 2017 12:36 PM

    Weekly OSM

    weeklyOSM 337

    12/27/2016-01/02/2017

    Logo

    A motorway island discovered with JOSM 1 | © Andrey Golovin

    About us

    • We wish all our readers a peaceful and a happy new year 2017.The German team produced a review of the year’s events with the most important links in our category maps. Unfortunately, the team has had no time to translate it into other languages. But the translations links should be fine in most of the articles. 😉
    • Due to shortage of manpower we have had to drop our Italian version of weeklyOSM. So we will publish in 8 languages in the future. A special thanks to the two remaining editors sabas88 and sbiribizio who tried together to keep the Italian edition alive. For an edition – even if it is just translations – the weeklyOSM requires minimum three editors. We thank them for their willingness to continue as weeklyOSM’s reporters for the Italian community.

    Mapping

    • Brian Prangle has written a guide to help with the collection and mapping of fire hydrants in UK. This mapping effort was prompted by the declaration of fire hydrant locations in the West Midlands of England as “secret” (despite each one being labelled with a highly visible, often bright yellow sign)!
    • Harald Hartmann is looking for the source of the tag maxspeed=DE:rural. (automatic translation)
    • TagaSanPedroAko writes in detail about his latest mapping activity, including power line mapping and adding points of interest, in Batangas City – a place he has been regularly mapping.
    • Zverik analyses the change count of POIs by editor type.
    • Toc-rox reports, that often tracks do not have a tracktype and how this is handled in the “Freizeitkarte” . A disussion about good default values has started. (automatic translation)

    Community

    • Tom analyses the building coverage of OpenStreetMap in Austria. (automatic translation)
    • HackerNews user recommends OSM in response to a discussion about Google Maps lite mode, particularly with respect to up-to-dateness regarding roads and speed of loading in comparison with Google Maps.
    • Stumbled upon OpenStreetMap while playing Pokemon Go? Here are some tips to get started with contributing to OSM.
    • On Talk-GB, Brian Prangle shares a list of tentative projects in consideration for this year’s first ‘Quarterly Project’.
    • Andygol shows how to test for a connected net of streets that are still connected at other zoom levels.
    • User escada interviews Philippe Verdy as the Mapper of the Month.

    Imports

    • User ryebread brings to the community’s attention his efforts to import data from the Southeast Michigan Council of Governments (SEMCOG) for Detroit.

    OpenStreetMap Foundation

    • OSMF provides a recap of 2016, highlighting some of the interesting work done during the past year.

    Events

    Humanitarian OSM

    • In Tunis young people are looking for a better image of a poor district. The project uses OSM and is financially supported by Switzerland.
    • Tyler Radford announced the successful completion of the fund critical community mapping projects.

    Maps

    • Greg reported the results of the last quarterly project. The aim was to use UK Food Hygiene Rating System data from the UK Food Standards Agency to improve the density of POIs, addresses and postcodes in town centres. Statistics and tools can still be used.
    • Christian Quest announced the renewed French map style and moved to a new server. See the new map and what was changed.
    • Sven Geggus improved the “German-Map-Style”. Very nice feature: The map shows names in two languages. Sven is looking for help.

    switch2OSM

    • The TAHUNA app beta version is available (automatic translation) from Google Play Store, adding Teasi navigation tools to your mobile device.

    Software

    • Maps.me has integrated an important feature since the new update: traffic information.
    • The Android app OSM for the dyslexic an OSM-based world atlas for dyslexic users. The development is part of the MyGEOSS project.
    • WordPress asks via tweet for beta testers on their upcoming OSM plugin version 3.8.

    Programming

    • Mike Fricker, the technical director for Unreal Engine 4 at Epic Games, has released a plugin for their popular game engine which can import OpenStreetMap data into their game editor.

    Releases

    Software Version Release date Comment
    Mapillary Android * 3.12 2016-12-20 App install fix, stopping background upload service when finished.
    Maps.me Android * var 2016-12-26 Travel data in 36 countries.
    Komoot iOS * 8.5.1 2016-12-27 See what month a highlight is most visited.
    Maps.me iOS * 7.0.4 2016-12-27 Travel data in 36 countries.
    Grass Gis 7.2.0 2016-12-28 More than 1,950 stability fixes and manual improvements, 50 new addons.
    JOSM 11425 2016-12-31 Many improvements, see release info.
    SQLite 3.16.0 2017-01-02 14 enhancements and two bugfixes.

    Provided by the OSM Software Watchlist.

    (*) unfree software. See: freesoftware.

    Did you know …

    • … the German alternative to Google Maps? Maps.metager.de (automatic translation) of the SUMA-EV and the Leibnitzuniversität Hannover is currently in the beta version and offers only Germany-wide search requests on OSM basis. However, it promises to be one of the safest search engines.

    Other “geo” things

    • In an article on L’Obs, the online edition of the Nouvel Observateur, it is once again about how Google adapts the national boundaries according to the states’ view.
    • The Verge reports about the launch of the service toilet-locator from Google and India’s Ministry of Urban Development.

    Upcoming Events

    Where What When Country
    Dortmund Stammtisch 01/08/2017 Germany
    Manila 【MapAm❤re】OSM Workshop Series 6/8, San Juan 01/09/2017 Philippines
    Rennes Réunion mensuelle 01/09/2017 France
    Passau Niederbayerntreffen 01/09/2017 Germany
    Lyon Rencontre mensuelle mappeurs 01/10/2017 France
    Nantes Rencontres mensuelles 01/10/2017 France
    Berlin 103. Berlin-Brandenburg Stammtisch 01/12/2017 Germany
    Ulloa 1er encuentro comunidad OSMCo 01/13/2017-01/15/2017 Colombia
    Kyoto 【西国街道シリーズ】長岡天満宮マッピングパーティ 01/14/2017 Japan
    Rennes Atelier de découverte 01/15/2017 France
    Lyon Mapathon Missing Maps Avancé pour Ouahigouya 01/16/2017 France
    Brussels Brussels Meetup 01/16/2017 Belgium
    Essen Stammtisch 01/16/2017 Germany
    Manila 【MapAm❤re】OSM Workshop Series 7/8, San Juan 01/16/2017 Philippines
    Cologne/Bonn Bonner Stammtisch 01/17/2017 Germany
    Scotland Edinburgh 01/17/2017 UK
    Osnabrück Stammtisch / OSM Treffen 01/18/2017 Germany
    Karlsruhe Stammtisch 01/18/2017 Germany
    Osaka もくもくマッピング! #02 01/18/2017 Japan
    Leoben Stammtisch Obersteiermark 01/19/2017 Austria
    Urspring Stammtisch Ulmer Alb 01/19/2017 Germany
    Tokyo 東京!街歩き!マッピングパーティ:第4回 根津神社 01/21/2017 Japan
    Manila 【MapAm❤re】OSM Workshop Series 8/8, San Juan 01/23/2017 Philippines
    Bremen Bremer Mappertreffen 01/23/2017 Germany
    Brussels FOSDEM 2017 02/04/2017-02/05/2017 Belgium

    Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please check your event in our public calendar preview and correct it, where appropiate..

    This weeklyOSM was produced by Peda, Polyglot, Rogehm, SomeoneElse, SrrReal, TheFive, YoViajo, derFred, jinalfoflia, keithonearth, kreuzschnabel, seumas.

    by weeklyteam at January 05, 2017 11:48 AM

    Wiki Education Foundation

    Wiki Ed encourages geophysicists to teach with Wikipedia

    Last month, Outreach Manager Samantha Weald, Classroom Program Manager Helaine Blumenthal, Director of Programs LiAnna Davis, and I attended the American Geophysical Union’s annual meeting here in San Francisco. At the conference, we spoke to dozens of scientists who believe Wikipedia is a valuable website for them, their students, and the world. We’re excited to bring more geophysics, geology, and earth science students to Wikipedia in the coming years, helping us amplify the impact of this year’s Wikipedia Year of Science.

    In January 2016, we started the Year of Science as a year-long campaign to improve Wikipedia’s science coverage. After all, Wikipedia is the main source of scientific information for the general public. So if we’re interested in a scientifically literate populace, we need to make science accessible and available to those who don’t pursue it as a career.

    The idea was simple: students have all the tools they need to contribute to the public scholarship of science. They have access to rigorous research and scientific journals through the university library and they regularly meet with an expert in the field who explains important topics and concepts. Plus, students are still studying and learning about scientific topics as they develop their own expertise, so they’re less removed from communicating these ideas to non-experts than decades-long researchers. This key attribute makes students ideal science communicators, and we want to help students actively build those science communication skills in the classroom.

    In the Classroom Program, we provide the toolkit students need to become Wikipedians, or contributors to the encyclopedia. Over the course of the semester, students identify missing components in a Wikipedia article related to class, research the topic, and learn how to add well-sourced information to Wikipedia. We’ve worked with several earth science courses over the years, which is why we created a guide for students editing environmental science content.

    During the Year of Science, more than 6,000 science students have used our training materials to learn how Wikipedia works. Together, they added 4.93 million words about science to Wikipedia. At the AGU conference, we were proud to share these results with potential program participants, as they considered the value Wikipedia assignments can bring not only to their classroom, but also to the public. By inviting these instructors to join our program, we will build on the accomplishments our students made during 2016 to make science accessible to those outside of academia.

    If you teach in the earth sciences and are interested in learning more about increasing your students’ contributions to public scholarship about the earth and climate, email us at contact@wikiedu.org.

    by Jami Mathewson at January 05, 2017 12:02 AM

    January 04, 2017

    Jeroen De Dauw

    Simple is not easy

    Simplicity is possibly the single most important thing on the technical side of software development. It is crucial to keep development costs down and external quality high. This blog post is about why simplicity is not the same thing as easiness, and common misconceptions around these terms.

    Simple is not easy

    Simple is the opposite of complex. Both are a measure of complexity, which arises from intertwining things such as concepts and responsibilities. Complexity is objective, and certain aspects of it, such as Cyclomatic Complexity, can be measured with many code quality tools.

    Easy is the opposite of hard. Both are a measure of effort, which unlike complexity, is subjective and highly dependent on the context. For instance, it can be quite hard to rename a method in a large codebase if you do not have a tool that allows doing so safely. Similarly, it can be quite hard to understand an OO project if you are not familiar with OO.

    Achieving simplicity is hard

    I’m sorry I wrote you such a long letter; I didn’t have time to write a short one.

    Blaise Pascal

    Finding simple solutions, or brief ways to express something clearly, is harder than finding something that works but is more complex. In other words, achieving simplicity is hard. This is unfortunate, since dealing with complexity is so hard.

    Since in recent decades the cost of software maintenance has become much greater than the cost of its creation, it makes sense to make maintenance as easy as we can. This means avoiding as much complexity as we can during the creation of the software, which is a hard task. The cost of the complexity does not suddenly appear once the software goes into an official maintenance phase, it is there on day 2, when you need to deal with code from day 1.

    Good design requires thought

    Some people in the field conflate simple and easy in a particularly unfortunate manner. They reason that if you need to think a lot about how to create a design, it will be hard to understand the design. Clearly, thinking a lot about a design does not guarantee that it is good and minimizes complexity. You can do a good job and create something simple or you can overengineer. The one safe conclusion you can make based on the effort spent is that for most non-trivial problems, if little effort was spent (by going for the easy approach), the solution is going to be more complex than it could have been.

    One high-profile case of such conflation can be found in the principles behind the Agile Manifesto. While I don’t fully agree with some of the other principles, this is the only one I strongly disagree with (unless you remove the middle part). Yay Software Craftsmanship manifesto.

    Simplicity–the art of maximizing the amount of work not done–is essential

    Principles behind the Agile Manifesto

    Similarly we should be careful to not confuse the ease of understanding a system with the ease of understanding how or why it was created the way it was. The latter, while still easier than the actual task of creating a simple solution, is still going to be harder than working with said simple solution, especially for those that lack the skills used in its creation.

    Again, I found a relatively high-profile example of such confusion:

    If the implementation is hard to explain, it’s a bad idea. If the implementation is easy to explain, it may be a good idea.

    The Zen of Python

    I think this is just wrong.

    You can throw all books in a library onto a big pile and then claim it’s easy to explain where a particular book is – in the pile – though actually finding the book is more of a challenge. It’s true that more skill is required to use the library effectively than going through a pile of books randomly. You need to know the alphabet and be familiar with genres of books. It is also true that sometimes it does not make sense to invest in the skill that allows working more effectively, and that sometimes you simply cannot find people with the desired skills. This is where the real bottleneck is: learning. Most of the time these investments are worth it, as they allow you to work both faster and better from that point on.

    See also

    In my reply to the Big Ball of Mud paper I also talk about how achieving simplicity requires effort.

    The main source of inspiration that led me to this blog post is Rich Hickeys 2012 Rails Conf keynote, where he starts by differentiating simple and easy. If you don’t know who Rich Hickey is (he created Clojure), go watch all his talks on YouTube now, well worth the time. (I don’t agree with everything he says but it tends to be interesting regardless.) You can start with this keynote, which goes into more detail than this blog post and adds a bunch of extra goodies on top. <3 Rich

    Following the reasoning in this blog post, you cannot trade software quality for lower cost. You can read more about this in the Tradable Quality Hypothesis and Design Stamina Hypothesis articles.

    There is another blog post titled Simple is not easy, which as far as I can tell, differentiates the terms without regard to software development.

    by Jeroen at January 04, 2017 10:33 PM

    Gerard Meijssen

    #Wikidata - Abdas of Susa

    Abdas of Susa was a Catholic bishop, he had the ear of the Sasanian king. His religion was even promoted by king Yazdegerd I until Abdas in a dispute burned a temple of Zoroaster. He was told by the king to pay for the destruction he caused. He refused and it resulted in an about face by the king. Churches were to be burned and Abdas became a victim of the following riots. Abdas is considered a martyr.

    Consider; if Abdas had not burned the temple or paid for the damage, the role of the Catholic church would have been much different. The religious intolerance of Abdas is forgotten he is now a martyr. The problem with celebrating such people is that they serve as a role model. There are plenty of people like Abdas from any and all religions. My question is how to find them in either Wikipedia or Wikidata.
    Thanks,
          GerardM

    by Gerard Meijssen (noreply@blogger.com) at January 04, 2017 07:43 AM

    Wikimedia Foundation

    Community digest: Spanish Wikipedia’s Women in Architecture helps address the gender gap; news in brief

    Photo by Jaluj,CC BY-SA 4.0.

    Photo by Jaluj, CC BY-SA 4.0.

    Female architects have played key roles in building designs throughout history, but their contributions have often been systemically minimized. For example, at least one historian has claimed that Elizabeth Wilbraham was the first professional female architect, but her Wikipedia article notes that her “work frequently may have been attributed to men.” More recently, some of today’s norms for public spaces, like having a play area for children, were a novelty in the first half of the twentieth century. The first architects to propose the idea were women.

    This phenomenon has lasted into the present day in the overwhelming majority of architecture history books, with some including just one percent women.

    Existing gender stereotyping in academic knowledge production has been shifted to Wikipedia. According to Wikidata, last year 5% of architectural biographies on Wikipedia were about women. This absence distorts the history of architecture. In order to fix this, we have created a working group with the goal of increasing the presence of female architects in Wikipedia.

    In 2015, as part of the global editing campaign Women in Architecture, we organized three simultaneous editathons: at the University of Cordoba, Argentina; the Polytechnic University of Valencia in Spain; and Montevideo, Uruguay. During that event, the first category of female architects was created on the Spanish Wikipedia. That helped us realize that only 60 out of 1,200,000 biographies on the Spanish Wikipedia were about female architects.

    Upon reviewing the articles, we found that traces of female architects tended to be obliterated, with their contributions being attributed to men, ranging from their partners, husbands, parents, or siblings. Not only were the biographies of a lot of prominent women architects missing, the articles were shorter than those about men architects. Moreover, they were not mentioned in the articles of their partners. Women biographies always linked to articles about men but rarely in reverse.

    Villa Mairea was described as “a summer house built in 1938 in Noormarkku, Finland by the Finnish architect Alvar Aalto.” The article featured him as the sole protagonist of the design. Some parts of the article read: “Aalto raised it as the main idea, Aalto tried to avoid artificial patterns, Aalto modified certain details of the second proposal,” and so on. Although all the pieces of work made at the office carried a mutual signature of him and his wife that read “Aino and Alvar Aalto,” Aino’s name was not mentioned in her husband’s article.

    Other examples include the House of the Bridge, where Amancio Williams was named as the only contributor—ignoring his wife, Delfina Gálvez Bunge. The Amphitheater of Cartagena was restored by Atxu Amann and Alcocer, Andrés Cánovas Alcaraz, and Nicolás Maruri, but only Cánovas was mentioned. The same happened with Pascuala Campos of Michelena and César Portela, and many others. It took a lot of work to review the articles and apply an equitable balance supported by reliable references.

    Premio Hexágono de Oro is a prestigious award in Latin America. Cynthia Watmough, Laurinda Spear and Sandra Barclay are winners of the award who had no articles on Wikipedia, while men architect winners appeared in the article Colegio de Arquitectos del Perú. During the editathon, we have created articles for Premio Hexágono de Oro, Cynthia Watmough and Sandra Barclay. Laurinda Spear’s profile is still missing.

    The problem was present on Wikimedia Commons, Wikimedia’s free media repository, as well. The images that we could use to illustrate the women articles did not identify them as authors of their own projects. We have created 212 categories of women architects on Wikimedia Commons.

    This year, the working group has expanded. In October 2016 we repeated the editathon experience in Buenos Aires, Argentina; Montevideo, Uruguay and Valencia, Spain. We have also launched the Wikiproyecto Mujeres en la arquitectura (Wikiproject Women in Architecture).

    Last year the Spanish Wikipedia had 5% of architect biographies about women. We helped increase this figure to 8% in 2016. We are very proud of the results and already excited for more work in 2017.

    Andrea Patricia Kleiman, MA, co-founder of Wikiproject Women in Architecture
    Ines Moisset, Ph.D AR, co-founders of Wikiproject Women in Architecture

    This post comes from the Wikimedia community; the views expressed are the author’s alone and not necessarily held by the Wikimedia Foundation.

    In brief

    Photo by Socius gGmbh for Wikimedia Deutschland e.V.,CC BY-SA 4.0.

    Photo by Socius gGmbh for Wikimedia Deutschland e.V., CC BY-SA 4.0.

    Partnerships and Resource Development group meets in Berlin: The Partnerships & Resource Development group is an initiative by Wikimedia France, Wikimedia Sweden and Wikimedia Germany to foster the international work on partnerships and external fundraising between Wikimedia affiliations. It was created to follow-up on the productive and inspiring conversations at the Wikimedia Conference (WMCON 2015) and the WMCON Follow-Up Day at Wikimania 2015.

    Partnership enthusiasts from different Wikimedia affiliations were invited by Wikimedia Germany to attend a two-day workshop in Berlin on November 28 and 29, 2016.The workshop aimed at maintaining the partnerships group work, setting the ground for the group activities at Wikimania 2017, and developing further partnership capacities in the movement.

    Participants from Austria, Germany, France, Sweden, Argentina, Poland, Estonia, the Netherlands, UK and the Wikimedia Foundation attended the workshop where they shared their success stories and learned lessons from the partnership experiences. (brief note courtesy of Giselle Bordoy, Wikimedia Argentina)

    New Signpost published: In the English Wikipedia’s community news journal, published last week, a majority of the German Wikipedia’s Arbitration Committee has resigned, leaving it without a quorum; an active user page filter is preventing vandalism and harassment; and an opinion piece questioning the viability of workshops.

    An editathon in South Korea: Yangjeong High School for girls in South Korea hosted an editathon on December 19 that was attended by 4 students and their teacher. More information and photos are available on the event page on the Korean Wikipedia.

    Photos from Kennedy’s visit to Germany are now on Commons: A digitized copy of a historic photo collection from John F. Kennedy’s only official visit to Germany has been recently released to the public. The JFK Library has announced that all the White House photos taken during the 1963 visit are now in the public domain on the internet. Some of the photos have been uploaded to Wikimedia Commons, the free media repository, and are available for use in Wikipedia articles around the globe.

    Foundation board announcement regarding expiring member terms: The Wikimedia Foundation Board of Trustees announced its decisions regarding the two Board terms expiring at the end of December 2016. Alice Wiegand sought and received reappointment for a new term that will end at Wikimania 2018, while Guy Kawasaki decided not to seek reappointment. He will complete his term at the end of December. More information about the board decisions and the history of the two members is available on Wikimedia-l.

    Wikipedians pass awayG-Michel-Hürth, a Wikimedian since 2006, died on 18 December. He primarily edited the German Wikipedia, where he was known for writing articles related to his hometown of Hürth, located near Cologne. DanVeg, an editor on the Hebrew Wikipedia who focused on gender and transgender issues, passed away on the 23rd. There is an ongoing drive to improve transgender articles in her memory. Our condolences go to both of their friends and family.

    Wiki on Rails: The Kurier has a report (in German) on the Swiss Federal Archives and six other institutions’ “Wiki on Rails” event, held last October through December after the opening of the Gotthard Base Tunnel. The organizers recruited people from outside of the established Wikipedia community in order to bring their knowledge into the site, and the Kurier notes that it was considered a “complete success.”

    Hebrew Wikipedia reaches 200,000 articles: Last Thursday, a new article about Pudu Puda (a type of deer), was the Hebrew Wikipedia’s 200,000th. Congratulations to the Hebrew Wikipedia community on reaching this significant milestone!

    Compiled and edited by Samir Elsharbaty, Digital Content Intern
    Wikimedia Foundation

    by Andrea Patricia Kleiman, Ines Moisset and Samir Elsharbaty at January 04, 2017 07:04 AM

    January 03, 2017

    Wiki Education Foundation

    Wiki Ed heads to linguistics and history conferences this week

    This week, Wiki Ed hits the road to recruit new instructors into our programs. We’re thrilled to take on 2017 after supporting a record 515 courses and nearly 11,000 student editors last year.

    American Historical Association

    This week I’ll return to the American Historical Association’s annual meeting in Denver, along with Classroom Program Manager Helaine Blumenthal, who received her PhD in History from the University of California, Berkeley. During the pre-conference Digital History workshop, Helaine and I will share with attendees how students develop digital literacy skills during a Wikipedia-editing assignment. Last year, our workshop led to some great discussions about Wikipedia’s role in preserving digital history, how students can participate, and how crowdsourcing provides opportunities for documenting histories and perspectives of marginalized communities. We’ll come together with a larger group of workshop attendees to answer questions about syllabus design and building an assignment that benefits both Wikipedia and students.

    Over the years, we’ve supported students who have improved Wikipedia’s coverage of environmental history, notable women in history, theater history, history of science, art history, American history, Canadian history, and local history. Students in our Classroom Program have learned how some people haven’t made the history books, empowering them to change that for the future. Along the way, students curb Wikipedia’s systemic bias and become better researchers who can discern credible sources from low-quality ones.

    If you’re attending AHA’s annual meeting this year, please join us to learn more about how Wikipedia fits into the History classroom:

    Digital History workshops

    This session will offer a lesson plan attendees can drop into their classrooms that uses Wikipedia to train students in digital literacy and evaluation of sources. We will also discuss the caveats that accompany this kind of lesson plan and the role academic historians can play in developing knowledge on Wikipedia. After the session, we’ll join other participants in small groups to workshop Wikipedia assignments.

    • Date: Thursday, January 5, 2017
    • Workshops: 9:00am–10:15am, 10:30am–12:00pm; Table talks: 12:00–1:00pm
    • Location: Colorado Convention Center, Room 206

    Digital Alley booth in the exhibit hall

    Over the weekend, Helaine and I will be available in the AHA Digital Alley to speak to conference attendees about the benefits of teaching with Wikipedia. We’ll be available for one-on-one conversations about how Wikipedia assignments fit into an upcoming course.

    • Friday, January 6th: 9:00am–6:00pm
    • Saturday, January 7th: 9:00am–6:00pm
    • Sunday, January 8th: 9:00am–12:00pm

    Linguistic Society of America

    In Austin, Outreach Manager Samantha Weald will return to the Linguistic Society of America’s annual meeting. Shortly after we announced a partnership with the LSA to improve Wikipedia’s coverage of language and linguistics, Samantha attended the 2016 conference. Since then, we have supported 25 linguistics courses across the United States and Canada and announced the first language-related Wikipedia Visiting Scholars relationship. Astonishingly, students have added almost 300,000 words to articles about language convergence, German dialects, and subjectification. Now, the world has access to important information about linguistics they couldn’t access before.

    With so many students learning how to edit articles in this discipline, we created an editing guide specific to linguistics to get them started. We’re excited to share this guide and other tools to get new linguistics instructors involved in our programs. Stop by to see Samantha and work out details of your Wikipedia assignment later this week:

    Exhibit hall

    • Friday, January 6th: 10:00am–5:30pm
    • Saturday, January 7th: 10:00am–5:30pm
    • Sunday, January 8th: 8:30–11:00am

    by Jami Mathewson at January 03, 2017 09:50 PM

    Wikimedia Tech Blog

    Wikipedia Zero joins Mossab Banat on his trip to freely share human knowledge

    Photo by Joseph Zakarian/Wikimedia Foundation, CC BY-SA 3.0.

    Photo by Joseph Zakarian/Wikimedia Foundation, CC BY-SA 3.0.

    In just two and half years, Mossab Banat has made major qualitative contributions to the Arabic Wikipedia, led several initiatives online and offline, and has become one of Wikipedia’s most prolific contributors, especially in the medical field. Most of his online work was done using a mobile phone connected to the internet by zero-rated data.

    The third year medical student, who was born and raised in Russeifa, Zarqa Governorate, Jordan, had his first experience with the internet when he left high school and joined university. Banat came upon Wikipedia searching for medical terms during his studies. Wikipedia usually showed up in the top search results.

    “At this time, I had a monthly quota of 500 megabytes,” Banat recalls. “That wasn’t enough to even update the apps on my mobile phone. When I used Wikipedia, I noticed no change in my quota! That was when I first realized that I could browse Wikipedia for free.”

    Wikipedia Zero is a project that aims to provide free access to free knowledge, in places where mobile data is usually unaffordable. The initiative started in 2012 in response to a growing need to help increase diversity on Wikipedia when data costs often made achieving this goal cost prohibitive.

    “We use mobile phones more than computers,” says Banat. “You have constant access to your mobile almost all the time anywhere, which makes it easy for you to read and edit while traveling, between classes, etc. Wikipedia Zero is useful because it helps the young make the most efficient use of their free time.”

    Banat found the Arabic Wikipedia in a poor state compared to the English Wikipedia, which motivated him to start editing in December 2013. “I searched for medically-themed articles on the English Wikipedia,” Banat explains, “where I found the Arabic equivalent for the articles weak sometimes, or Contributors flat-out didn’t exist. It felt like I had a responsibility to contribute to the encyclopedia that had helped me so much.”

    Banat’s contributions quickly attracted the attention of the Arabic Wikipedia community. The scope of his work ranged from writing featured articles, developing medical content and helping newbies. His rapidly growing edit count has placed him at number four on the list of Arabic Wikipedia contributors by number of edits. Banat was nominated to be an administrator and won with the highest-ever number of support votes (as of publishing time).

    His contributions were not just noticed by the Arabic Wikipedia community. In February 2016, Banat received the Cure Award from the Wiki Project Med as one of the best contributors to the project, and the most prolific contributor to it on the Arabic Wikipedia in 2015.

    When he wanted to use this experience to recruit and train new volunteers to edit Wikipedia, Banat started to talk to professors at school about the Wikipedia Education Program. Educators liked how the idea could be useful in developing Wikipedia writing assignments for their students in their field of study. 32 students loved the idea and rushed to join the pharmacology and advanced cellular biology courses with him.

    According to Banat, one of the reasons that encourage him and other students to edit Wikipedia is because their articles are guaranteed exposure to a large audience.

    “Wikipedia is usually in the first search results,” says Banat. “When you edit an article, you know that what you’re doing is effective. The freedom of expression within the Wikipedia community is also a source of motivation.”

    Big changes can sometimes be accomplished using resources everyone uses in their daily lives. Using only his mobile phone and the free access he had to Wikipedia, Banat has brought about impactful change.

    “I see people typing on their mobile phones all day long, usually on social media. I wonder, where will their content end up?”

    Samir Elsharbaty, Digital Content Intern
    Wikimedia Foundation

    by Samir Elsharbaty at January 03, 2017 07:55 PM

    William Beutler

    The Top Ten Wikipedia Stories of 2016

    2016 was a hell of a year. In matters of war and peace, politics and governance, arts and celebrity culture—not to mention unexpected crossovers among them—it was a year that seemed to come off the swivel. Was this true on Wikipedia as well? In this post The Wikipedian will attempt, as it has done each year since 2010, to summarize the year in the Wikimedia movement by itemizing and ranking ten of the biggest trends and events.

    The list this time may be noteworthy less for what is included than what is not: in 2016 there was no major sock puppet or COI scandal (hopefully that’s because there weren’t any, not just that they weren’t called out), no major milestone (Wikipedia turned 15 in 2016, but it felt less consequential than the 5 millionth article last year), no mention of perennial fears about a declining editor base (is it still actually declining?) and nothing about last year’s number one, the implementation of HTTPS (it’s a done deal, and China hasn’t changed its mind about unblocking Wikipedia on the mainland).

    That said, in 2016 Wikipedia still had more than its share of turmoil, more ominous signs than one ever really wants to see, plus the occasional inspiring story that makes for much more pleasant anecdotes. In this post, we’ll attempt to do justice to them all, or at least the ten that made the biggest impressions on this blogger. Ready? Let’s go:

    ♦     ♦     ♦

    10. Women Scientists Revolt

    emily_temple-woodAmong Wikipedia’s more problematic systemic biases, the gender gap in participation and representation is one of the more frustrating. This year it was momentarily a bright spot, when Emily Temple-Wood, one of Wikipedia’s best known female editors, became a minor media sensation for a project with an irresistible hook: for every instance of online sexual harassment she experienced, she would create another Wikipedia article about a woman scientist. The story was picked up by the BBC, Washington Post, Guardian, New York, and Huffington Post, among many other outlets. The sudden micro-celebrity placed her in the unique category of Wikipedia editors with a Wikipedia biography earned as a result of their editing activities. Jimmy Wales also named her Wikipedian of the Year (along with Rosie Stephenson-Goodnight). And then she started med school.

    9. Wikipedia Vandalism, Spectator Sport

    lebron_jamesIf you’re the kind of person who searches Google News for “wikipedia” with any frequency, you have undoubtedly seen headlines like “Denver Broncos ‘own’ Carolina Panthers, according to Wikipedia edit”. Seriously, search “wikipedia sports owned” and you’ll find the same combination for Chase Utley and the Mets, LeBron James and the Bulls, Jürgen Klopp and Manchester City. And that’s just one gratingly common construction. Yes, sometimes it can actually be funny. Occasionally, even heartwarming. But no sport is safe, and the phenomenon is familiar enough for Fox Sports (a frequent offender) to have once created a list of “most entertaining” examples. In early 2016, former WSJ reporter and Wikimedia staffer Jeff Elder called out the trend, spotlighting the tedious extra work it creates for Wikipedia volunteers. VentureBeat followed up by making the argument it was time for sportswriters to move on. And so that put an end to it? Yeah, right. It’s not clear what will ever kill this “story”; there is almost certainly nothing within anyone’s actual control. While individual writers or readers may tire of it, the thing about sports is that every big win is a moment without precedent, that obliterates all reason, and naturally seeks a good, mean-spirited to laugh top it off. All things considered, better to vandalize Wikipedia than light a car on fire.

    8. The Business of Wikipedia is Fundraising

    wmf-fundraisingWikipedia is alone among the top 50 global websites (give or take) for the lack of advertising to be found on its pages. As a consequence, its funding model is the focus of fascination and frustration for both the editorial community and news media alike. And as you’re certainly well aware, every year the Wikimedia Foundation (WMF) launches a fundraising drive featuring very prominent and slightly annoying banners—which look a lot like advertisements for Wikipedia itself—to raise money from its millions of readers. To be sure, Wikipedia also raises money via grants and gifts from wealthy donors, but the vast majority comes from the annual campaign.

    Beginning in mid-November, the WMF stepped up its annual efforts with a persistent email campaign fronted by Wikipedia’s founder-mascot Jimmy Wales, using sophisticated techniques—variation, highlighting, boldfacing, talky subject lines, and more. WMF fundraising has been A/B tested for awhile, but this was undoubtedly the slickest incarnation yet. And what do you know, it worked: this year Wikipedia reached its annual goal faster than ever before. Such success cannot come sans scrutiny. An op-ed in The Wikipedia Signpost called for greater transparency, The Register needled Wikipedia about this as it does about pretty much everything, and philanthropic publications have second-guessed the WMF’s fundraising strategy writ large.[1]Update: This link previously went to an article on a different subject; this one is from late 2015 but illustrates the same point.

    All of which is fair, and one should be so lucky as to have to answer for this kind of success. As The Wikipedian sees it, the question of how much money WMF raises should be secondary to how it is spent, a topic historically less-well reported.

    7. ArbCom and the Alt-Right

    feels_good_manWikipedia’s Arbitration Committees (ArbComs) are elected panels of dedicated volunteer Wikipedia editors who agree to take up the often unpleasant and always time-consuming task of reviewing disputes involving the behavior of fellow editors. About a dozen of the most-active Wikipedia language editions have one, and it is by its nature the locus of controversy, year in and year out (said fundamental dysfunction last made this list in 2013). Lucky us, now we get to merge that with the rise of an international right-wing movement represented on last year’s list by Gamergate, and which in 2016 we learned to call the “alt-right”.

    This is based on two separate incidents on the two most prominent Wikipedias. Worse between them, the German ArbCom saw eight of its ten members resign in the last third of the year. The reasons are too complicated to recite here, but it concerns a single member who IRL is actively involved with the far-right Alternative für Deutschland party but had previously hidden his offline political activities from fellow editors. The decimated German ArbCom now lacks a quorum to act, and seems likely to remain inactive at least until new elections are held in May. Second was the near-election to the English ArbCom of a Canadian editor with a troubling Reddit history that included activity on the Gamergate-affiliated WikiInAction subreddit, dedicated to promoting alt-right views on Wikipedia. This candidacy was not successful, but it was a nail-biter, and close ArbCom observers are not reassured about future elections.

    Wikipedia has always had obnoxious contributors with noxious views, but their dispersal across the vast expanse of topics meant the problem areas were fairly isolated, and usually avoidable. But ArbCom is one of the few places on Wikipedia where actual power is concentrated. In a U.S. presidential election year (about which more later) in which anti-semitic tropes were promoted by the winning candidate, has there also been a concurrent rise in such views on Wikipedia? Some think so. And will ArbCom face an organized assault like the one the Hugo Awards has faced in recent years? It seems unlikely—but it’s definitely not impossible.

    6. Wikipedia Needs Better Critics

    Wikipediocracy_logoOur 2013 installment listed the rise of Wikipediocracy, a website devoted to criticism of the Wikimedia movement. This time we’re here not to praise it, but to bury it. The site’s multi-contributor blog has published exactly once in the second half of the year, while its once-lively (and sometimes disreputable) discussion forum has slowed to a crawl. What happened? The biggest factor was the departure of its most serious contributor, Andreas Kolbe, who took his talents to The Signpost. Second was an apparent falling out between mainstays Greg Kohs and Eric Barbour. The latter went on to create an alternative site named, hysterically, Wikipedia Sucks! (And So Do Its Critics.).

    The decline of Wikipediocracy highlights the dearth of effective Wikipedia criticism. What have we got? There’s the UK IT news site The Register, which harps on a few recurring themes of narrow appeal. There’s WikiInAction, affiliated with Gamergate, focused even more narrowly. Wikipedia Sucks is a joke, itself barely registering a pulse. For what it’s worth, The Wikipedian does not consider itself to be among their ranks. This site offers Wikipedia criticism, but will admit to being pro-Wikipedia in most ways; The Wikipedian is an apologist, if also a realist. And to drop the pretense for a moment, I don’t post often enough for it to matter but a few times a year.

    There is something about Wikipedia criticism that attracts people with fringe views, who are not always the most stable personalities, and whose obsessions tend toward the arcane. Of course this is generally true of the gadfly profession, but when you consider that Wikipedia owes its very existence to freaks and geeks, it shouldn’t be any wonder that participants who hold themselves apart from mainstream Wikipedia may be stranger still.

    As of late, the best criticism happens at The Signpost, especially under former editor Kolbe, and now under Pete Forsyth. Given the competition, however, that isn’t necessarily saying much.

    5. The Brief, Less Than Wondrous Board Membership of Arnnon Geshuri

    Arnnon_GeshuriWe now arrive at the first of a few related topics which dominated the early months of the year, a series of interrelated controversies far greater than this annual list has previously contemplated. The least-related among them was the early January appointment of Arnnon Geshuri to the WMF board of trustees. Geshuri received no public vetting, as most appointed board members do not. However, other board appointees also had not played a public role in one of Silicon Valley’s biggest recent scandals.

    To wit: Apple, Google, Intel and others secretly agreed (until, of course, it was found out) not to recruit each others’ employees, thereby holding back the careers, and holding down the salaries, of thousands of employees. As a Google executive, Geshuri had taken the initiative to fire a recruiter after then-CEO Eric Schmidt received an unhappy email from Apple’s then-CEO Steve Jobs. In his note back to Schmidt, Geshuri added: “Please extend my apologies as appropriate to Steve Jobs.” The U.S. Department of Justice eventually forced the firms to pay $415 million to settle class action claims.

    Geshuri’s membership on the Wikimedia board proved to be short-lived. Facing public criticism by former board members, a debate over what to say about it on his own Wikipedia entry, a no-confidence petition signed by more than 200 editors, and probably his own realization that this just wasn’t worth all the trouble, Geshuri stepped aside only two weeks after accepting the position. In another year, this could have been a top story. But 2016 had only just begun.

    4. Wikimedia’s New Leader

    katherine_maherAnother contender for top story in a less eventful year: the Wikimedia Foundation got a new leader. Katherine Maher was named interim executive director (ED for short) in March, and was made permanent in June. She is the third person to hold the title—the third woman, in fact—and brings experience in global governance, international institutions, and even the Arabic language.[2]Yes, I’m looking at her Wikipedia entry as I write this. Maher also brings something her predecessor lacked: a great deal of experience with Wikipedia and the Wikimedia movement.

    I am burying the lede, of course: she was previously the WMF’s chief communications officer, a position she had held since 2014. Oh yeah, and about that predecessor… as Wikimedians have already realized, I’m leaving out a lot of back story, and it’s because there is more coming further down this list. All that said, the advent of a new ED is big news in any year, and that’s true this year as well. The fact that Maher’s ascendancy falls outside the top three stories of 2016 owes as much to the public drama leading to her promotion as the absence of drama characterizing the start of her tenure.

    3. Fake News and the U.S. Presidential Election

    donald_trumpThe U.S. presidential election was literally the biggest story on Wikipedia this year, if we mean the topic that received the most edits across multiple entries. The biographical entry for president-elect Donald Trump, plus articles about Hillary Clinton’s endorsements, the general election, and GOP primary occupy four of the top five slots on the list of most-edited articles.[3]Number one was Deaths in 2016, but that’s pretty much always the case. But there’s a lot more to be said about Wikipedia’s relationship to the craziest and most surprising U.S. election in living memory.

    A chief attribute of Trumpism is, well, bullshit—in the Harry Frankfurt sense of the word—and anti-intellectualism as a virtue. As it became clear Trump’s victory was owed in part to falsehoods propagated on social media, the phrase “fake news” gained widespread currency among news commentators. With the mainstream[4]OK, fine, liberal media casting about for a better model, what better exemplar of valuing real facts over imagined realities than Wikipedia? Even before the election, Wikipedia’s model of requiring verification of information and allowing anyone to question received wisdom had garnered positive press attention. Afterward, Wikipedia’s commitment to veracity was held up as a kind of antidote to Facebook’s hands-off attitude toward the truth or falsity of claims shared by its users.[5]Facebook’s Mark Zuckerberg was initially dismissive of “fake news” concerns, only to do an abrupt about-face and announce plans for a fact-checking feature. The Wikimedia comms team took something of a victory lap in an early December post, declaring:

    We are not in a post-fact world. Facts matter, and we are committed to this now more than ever.

    Still, it would be a mistake to think that Wikipedia is free of falsehoods. It is only as good as its contributors and the reliability of the news sources they rely upon. Long-persisting hoaxes are not unheard of. Therein lie the biggest threats to Wikipedia: it must maintain an editorial community to uphold its own standards, and the media must keep up its end of the bargain with good reporting. Not unlike democracy, eternal vigilance is the price of an encyclopedia anyone can edit.

    2. Lila Tretikov Resigns as Wikimedia ED

    Right, so about Katherine Maher’s predecessor as executive director of the Wikimedia Foundation…

    Lila_TretikovLast year, The Wikipedian included “Exodus from New Montgomery Street” at number nine in the top-stories list—i.e., the large number of staff departures from the organization since the appointment of Lila Tretikov in 2014. In retrospect, this should have been higher, but in my defense the whispers were rather quiet until the emergence of a matter that we’ll explain better in the next entry.[6]Yes, this year was largely dominated by one very big story at the beginning of the year which had enough distinct elements to be treated separately, making for a confusing narrative. Alas. Tretikov, whose tenure got off to a rocky start for reasons not entirely her own fault and not worth going into again here,[7]if you must, you can go here was eventually forced to resign after losing the confidence of Foundation staff. Morale fell to such depths, and management became so unresponsive that, once the dam burst, virtually the whole thing played out in public, online.

    Low-level staffers came out of the woodwork to say what managers would or could not, and community observers filled in the gaps. Most persuasively, ArbCom member Molly White created a detailed timeline of Tretikov’s WMF leadership that presented the sequence of events without commentary—selectively perhaps, but damningly for sure. This very blog took the highly unusual step of actually calling for her ouster, a position this blogger never imagined when launching this site late last decade. Nobody wanted things to arrive at this dire situation, but once they had, Tretikov could no longer effectively lead the organization, and resign is what she did.

    Anyway, we’re not quite done with this topic.

    1. The Knowledge Engine and its Discontents

    Dr._James_HeilmanThe biggest story of 2016 actually began unfolding in the waning days of 2015, when just-elected community board trustee James Heilman announced his resignation with a cryptic message on a community email list. Subsequent comments from other board members failed to resolve the ambiguity. Thus began the most tumultuous period in recent Wikimedia history, ultimately leading to Lila Tretikov’s jumped-before-she-could-be-pushed departure and the elevation of Katherine Maher to the executive director role.

    Honestly, I’m kind of dreading the idea of recapping it all here. This blog expended 7,000 words[8]a conservative estimate on the topic earlier this year, and it’s a chore just to summarize. But let’s give it a try:

    Heilman’s departure owed to a disagreement about how to handle sensitive information related to the secretive development (and eventual abandonment) of a misbegotten “Manhattan Project” to create a search engine intended to preserve Wikipedia’s prominence if Google ever stopped sending it traffic on its historically massive level. In its most ambitious form, it was called the Knowledge Engine, and Tretikov’s WMF sought a grant for it from the Knight Foundation, with which it previously had enjoyed a good relationship, without disclosing the precise nature of the project. When scaled back, it was called Discovery and was limited to Wikipedia’s on-site search, which isn’t a bad idea by itself but wasn’t clearly a top priority for the volunteer community at large, let alone the foundation staff. The lack of public discussion was echoed in the catastrophic appointment of Geshuri to the board, establishing a pattern that could no longer be overlooked.

    knowledge-engine-rocketThe seriousness of the Knowledge Engine fiasco itself may have been overstated in terms of time and money allocated to it (and away from other projects) but it became emblematic of Tretikov’s ineffective leadership. More important probably was the botched Knight request, which contradicted good sense, and was seen to have damaged an important outside relationship. It wasn’t a crime, but it was covered up nonetheless, and Tretikov’s failure to communicate effectively—with external stakeholders, internal managers, staff throughout the organization—was what really did her in.

    If you really must have the whole story, and you have a few hours to spare, I recommend the following links:

    The regrettable history of the Knowledge Engine, the wasteful exit of Heilman from the board of trustees, the ill-advised appointment of Geshuri to same, the calamitous leadership of Lila Tretikov, the unfortunate departure of so many valuable foundation staffers, were separately and collectively the biggest story on Wikipedia this past year. Here’s hoping 2017 is just a bit less eventful.

    All images via Wikipedia, and the copyrights held by their respective contributors.

    Notes   [ + ]

    1. Update: This link previously went to an article on a different subject; this one is from late 2015 but illustrates the same point.
    2. Yes, I’m looking at her Wikipedia entry as I write this.
    3. Number one was Deaths in 2016, but that’s pretty much always the case.
    4. OK, fine, liberal
    5. Facebook’s Mark Zuckerberg was initially dismissive of “fake news” concerns, only to do an abrupt about-face and announce plans for a fact-checking feature.
    6. Yes, this year was largely dominated by one very big story at the beginning of the year which had enough distinct elements to be treated separately, making for a confusing narrative. Alas.
    7. if you must, you can go here
    8. a conservative estimate

    by William Beutler at January 03, 2017 06:30 PM

    Magnus Manske

    Mix’n’match post-mortem

    So this, as they say, happened.

    On 2016-12-27, I received an update on a Mix’n’match catalog that someone had uploaded. That update had improved names and descriptions for the catalog. I try to avoid such updates, because I made the import function so I do not have to deal with every catalog myself, and also because the update process is entirely manual, therefore somewhat painful and error-prone, as we will see. Now, as I was on vacation, I was naturally in a hurry, and (as it turned out later) there were too many tabs in the tab-delimited update file.

    Long story short, something went wrong with the update. For some reason, some of the SQL commands I generated from the update file did not specify some details about which entry to update. Like, its ID, or the catalog. So when I checked what was taking so long, just short of 100% of Mix’n’match entries had the label “Kelvinator stove fault codes”, and the description “0”.

    Backups, you say? Well, of course, but, look over there! /me runs for the hills

    Well, not all was lost. Some of the large catalogs were still around from my original import. Also, my scraping scripts for specific catalogs generate JSON files with the data to import, and those are still around as well. There was also a SQL dump from 2015. That was a start.

    Of course, I did not keep the catalogs imported through my web tool. Because they were safely stored in the database, you know? What could possibly go wrong? Thankfully, some people still had their original files around and gave them to me for updating the labels.

    I also wrote a “re-scraping” script, which uses the external URLs I store for each entry in Mix’n’match, together with the external ID. Essentially, I get the respective web page, and write a few lines of code to parse the <title> tag, which often includes the label. This works for most catalogs.

    So, at the time of writing, over 82% of labels in Mix’n’match have been successfully restored. That’s the good news.

    The bad news is that the remaining ~17% are distributed across 133 catalogs. Some of these do not have URLs to scrape, some URLs don’t play nicely (session-based Java horrors, JS-only pages etc.), and the rest need site-specific <title> scraping code. Fixing those will take some time.

    Apart from that, I fixed up a few things:

    • Database snapshots (SQL dump) will now be taken once a week
    • The snapshot from the previous week is preserved as well, in case damage went unnoticed
    • Catalogs that are uploaded through the import tool will be preserved as individual files

    Other than the remaining entries that require fixing, Mix’n’match is open for business, and while my one-man-show is spread thin as usual, subsequent blunders should be easier to mitigate. Apologies for the inconvenience, and all that.

    by Magnus at January 03, 2017 09:41 AM

    Gerard Meijssen

    #Wikidata - Biblical truths and Wikidata practice

    When you deal with historical figures, what is known for them is often because of material of a historic nature that is left to us. There is a wealth of material in the bible. It tells us about kings and kingdoms and finding references outside of books like the bible helps us get a more historic picture.

    The names of kings from other historic countries are often spelled in many ways and it takes a lot of hard work to research such issues. I do no research; I reflect what I find in Wikipedia in Wikidata. At that I am restricted to what is possible in Wikidata.

    At this time I am working on kings of the kingdom of Israel; this is a breakaway country that split off and the country that remained with the ruling dynasty; the descendants of David and Solomon were called the kingdom of Judah. The Wikidata practice is: we can not use names that change over time. We have only one prevailing label. This makes David and Solomon kings of Judah.  The counter argument is that biblical categorisation has it that there were multiple kingdoms. The problem is that when Rehoboam lost part of his kingdom, he was not made a king anew. He just lost part of his country and consequently it is the same country.

    Both Wikipedia and Wikidata practice have room for improvement. It would be nice when we could have a label and associate it with dates. We cannot. What we can do is not have a king of Israel as a citizen of the modern Israel but as a citizen of the kingdom of Israel. In Wikidata it is easy to remedy because no modern Israelite died before 1948 and we can query for it.

    The problem is that Biblical truth means Biblical expectations and Wikidata is not able to provide it. The question is how to resolve it for the time being.
    Thanks,
          GerardM

    by Gerard Meijssen (noreply@blogger.com) at January 03, 2017 08:45 AM

    Not Confusing (Max Klein)

    Wikidata Human Gender Indicators in the News

    Data from my project the Wikidata Human Gender Indicators has started to be cited in the press (BBC, Bloomberg), which is a large dose of validation. Traffic to the data visualizations increased 500% on the day of the BBC publication to 1,000 views/day, which inspires confidence. Moreover, Wikimedia Foundation’s Grants team—who funded WHGI—praised the project in their year-end report, saying:

    Grants for research and tools (such as WHGI) – which minimally contribute to the targets of people or articles – have been extremely valuable in improving our understanding of the gender gap and how or why it manifests.

    Read the rest

    by max at January 03, 2017 12:16 AM

    January 02, 2017

    Brion Vibber

    Limitations of AVSampleBufferDisplayLayer on iOS

    In my last post I described using AVSampleBufferDisplayLayer to output manually-uncompressed YUV video frames in an iOS app, for playing WebM and Ogg files from Wikimedia Commons. After further experimentation I’ve decided to instead stick with using OpenGL ES directly, and here’s why…

    • 640×360 output regularly displays with a weird horizontal offset corruption on iPad Pro 9.7″. Bug filed as rdar://29810344
    • Can’t get any pixel format with 4:4:4 subsampling to display. Theora and VP9 both support 4:4:4 subsampling, so that made some files unplayable.
    • Core Video pixel buffers for 4:2:2 and 4:4:4 are packed formats, and it prefers 4:2:0 to be a weird biplanar semi-packed format. This requires conversion from the planar output I already have, which may be cheap with Neon instructions but isn’t free.

    Instead, I’m treating each plane as a separate one-channel grayscale image, which works for any chroma subsampling ratios. I’m using some Core Video bits (CVPixelBufferPool and CVOpenGLESTextureCache) to do texture setup instead of manually calling glTeximage2d with a raw source blob, which improves a few things:

    • Can do CPU->GPU memory copy off main thread easily, without worrying about locking my GL context.
    • No pixel format conversions, so straight memcpy for each line…
    • Buffer pools are tied to the video buffer’s format object, and get swapped out automatically when the format changes (new file, or file changes resolution).
    • Don’t have to manually account for stride != width in the texture setup!

    It could be more efficient still if I could pre-allocate CVPixelBuffers with on-GPU memory and hand them to libvpx and libtheora to decode into… but they currently lack sufficient interfaces to accept frame buffers with GPU-allocated sizes.

    A few other oddities I noticed:

    • The clean aperture rectangle setting doesn’t seem to be preserved when creating a CVPixelBuffer via CVPixelBufferPool; I have to re-set it when creating new buffers.
    • For grayscale buffers, the clean aperture doesn’t seem to be picked up by CVOpenGLESTextureGetCleanTexCoords. Not sure if this is only supposed to work with Y’CbCr buffer types or what… however I already have all these numbers in my format object and just pull from there. 🙂

    I also fell down a rabbit hole researching color space issues after noticing that some of the video formats support multiple colorspace variants that may imply different RGB conversion matrices… and maybe gamma…. and what do R, G, and B mean anyway? 🙂 Deserves another post sometime.

     

     

    by brion at January 02, 2017 04:22 AM

    January 01, 2017

    Gerard Meijssen

    #Wikidata - Spearman Medal Revisited

    I blogged in the past about the Spearman Medal. The problem I had with the article is that it contained errors. I have a page where I show that lists like the winners of the Spearman Medal. This list does not contain errors but its best quality is that it can be easily reproduced on other Wikipedias

    Today I added the 2016 winner to Wikidata. In a day or so my example page will be updated automatically. My experience with Wikipedia meanwhile has suffered in 2016. I fixed one of the errors on Wikipedia; I removed a redirect only to be reverted.

    The question that I have is: What does it convince others to do things differently. When it comes to arguments, it seems obvious that Wikidata is superior to using internal MediaWiki. What does it take to convince that Wikipedia will be so much better to share in the sum of all knowledge.

    Mr Michael Banissy received the 2016 Spearman medal and I am willing to forego Reasonator what I prefer and point to SQID a later development for showing this information. When Wikipedians compare it with what they do provide, what is it that makes this their preferred option? It is funny to abuse a psychology award for this. It is a theme of our time that we are so fixed in our opinions that we hardly consider what the other has to say. If we Wikimedians cannot appreciate what we have to offer each other, what can we achieve in the "real" world?
    Thanks,
          GerardM

    by Gerard Meijssen (noreply@blogger.com) at January 01, 2017 08:45 PM

    #Wikidata - Kings of #Israel and kings of that time.

    When you ask: what kings lived around the year 0 east of Bethlehem, it seems obvious that Wikidata can only provide monarchs who lived around that time. Another perhaps better approach is to seek monarchs that were actually ruling at that time. This because it is better known when these people held office than when they were born.

    Historically, the Jews did have their own kings; Saul, David, Solomon are the best known. There was no continuous lineage until king Herod. There were separate kings for Israel and Judah and from 140 BCE–37 BCE the Hasmonean dynasty ruled.

    It is really interesting to know what rulers lived when and where and who were their contemporaries. But for it to make sense you want a display similar to the one I reported on before. It also means that some falsifications to what I perceive as the truth are removed. These kings of Judah are known as kings of Israel. They were not. Only by separating them out do you get a perspective that is historically correct. The problem is that you may get into a fight with religious people. They care for a name; was David for instance king of Judah or king of Israel.. (The Israelite revolted)..
    Thanks,
         GerardM

    by Gerard Meijssen (noreply@blogger.com) at January 01, 2017 08:17 AM

    December 31, 2016

    Wiki Education Foundation

    Highlighting Wikipedians’ contributions to science articles in the 2016 WikiCup

    In the early days of Wikipedia, the community’s priority was on creating content. Starting from scratch, all of the fundamental topics we take for granted had to start somewhere. Today, there are more than 5.3 million articles on the English Wikipedia alone, and while there are still plenty of notable subjects not yet represented, the community in general has shifted its focus somewhat to the development of article quality rather than quantity.

    2008-08-22_cortinarius_violaceus_l-_gray_18241
    Violet webcaps (Cortinarius violaceus).

    Among the ways Wikipedians come together to foster article improvement is through the WikiCup. It’s an annual competition that began as a contest for a handful of people to see who could make the most edits to association football articles. That was 10 years ago. Since then, it has morphed into a competition based largely on content quality. Points are awarded not for number of edits or number of articles in general, but for things like featured articles, good articles, featured lists, featured pictures, and articles which show up in the did you know or in the news sections of the Main page. Participants are thus typically very experienced, dedicated contributors, and the WikiCup can function as a showcase for their exceptional work.

    johannes_hevelius_-_prodromus_astronomia_-_volume_iii_firmamentum_sobiescianum_sive_uranographia_-_tavola_y_-_lynx
    The earliest depiction of the Lynx constellation (1690).

    As part of the Year of Science, Wiki Ed sponsored a 2016 WikiCup side competition to recognize the users who develop the most science-related featured articles and good articles over the course of the 5-round event.

    f_de_castelnau-poissons_-_diversity_of_fishes_composite_image
    Several species of teleost, painted by Castelnau in 1856.

    The winner will probably come as no surprise for fellow participants. User:Casliber won the overall WikiCup in large part because of his many featured articles, all of which were on scientific topics: plants like Port Jackson fig (Ficus robiginosa) and prickly banksia (Banksia aculeata), violet webcap mushrooms (Cortinarius violaceus), and constellations like the Lynx.

    millipede_body_types_1
    Different body types of millipede species.

    In second place is User:Cwmhiraeth, who made significant contributions to several good and featured articles, including some on big scientific topics like Fly, Millipede, Habitat, and Teleost. This is an impressive result considering Cwmhiraeth chose not to participate in the final round.

    Congratulations to Casliber and Cwmhiraeth, and thanks for your spectacular work in 2016!


    Images: FicrubAlamoana (cropped).jpg, by Wendy Cutler, CC BY-SA 2.0, via Wikimedia CommonsJohannes Hevelius – Prodromus Astronomia – Volume III “Firmamentum Sobiescianum, sive uranographia” – Tavola Y – Lynx.jpg, by Johannes Hevelius, public domain, via Wikimedia Commons; 2008-08-22 Cortinarius violaceus (L.) Gray 18241.jpg, by Dan Molter, CC BY-SA 3.0, via Wikimedia CommonsF de Castelnau-poissons – Diversity of Fishes (Composite Image).jpg, by Chiswick Chap (originals by Francis de Laporte de Castelnau), CC BY-SA 4.0 (individual drawings are in the public domain), via Wikimedia CommonsMillipede body types 1.jpg, by Animalparty (originals by Gilles San Martin, User:Stemonitis, and the USGS Native Bee Inventory and Monitoring Laboratory), CC BY 2.5, via Wikimedia Commons.

    by Ryan McGrady at December 31, 2016 10:59 PM

    Santhosh Thottingal

    Translating HTML content using a plain text supporting machine translation engine

    At Wikimedia, I am currently working on ContentTranslation tool, a machine aided translation system to help translating articles from one language to another. The tool is deployed in several wikipedias now and people are creating new articles sucessfully.

    The ContentTranslation tool provides machine translation as one of the translation tool, so that editors can use it as an initial version to improve up on. We used Apertium as machine translation backend and planning to support more machine translation services soon.

    A big difference in editing using ContentTranslation, is it does not involve Wiki Markup. Instead, editors can edit rich text. Basically it is contenteditable HTML elements. This also means, what you translate is HTML sections of articles.

    The HTML contains all possible markups that a typical Wikipedia article has. This means, the machine translation is on HTML content. But, not all MT engines support HTML content.

    Some MT engines, such as Moses, output subsentence alignment information directly, showing which source words correspond to which target words.

    $ echo 'das ist ein kleines haus' | moses -f phrase-model/moses.ini -t
    this is |0-1| a |2-2| small |3-3| house |4-4|
    

    The Apertium MT engine does not translate formatted text faithfully. Markup such as HTML tags is treated as a form of blank space. This can lead to semantic changes (if words are reordered), or syntactic errors (if mappings are not one-to-one).

    $ echo 'legal <b>persons</b>' | apertium en-es -f html
    Personas <b>legales</b>
    $ echo 'I <b>am</b> David' | apertium en-es -f html
    Soy</b> David 
    

    Other MT engines exhibit similar problems. This makes it challenging to provide machine translations of formatted text. This blog post explains how this challenge is tackled in ContentTranslation.

    As we saw in the examples above, a machine translation engine can cause the following errors in the translated HTML. The errors are listed in descending order of severity.

    1. Corrupt markup – If the machine translation engine is unaware of HTML structure, they can potentially move the HTML tags randomly, causing corrupted markup in the MT result
    2. Wrongly placed annotations – The two examples given above illustrate this. It is more severe if content includes links and link targets were swapped or randomly given in the MT output.
    3. Missing annotations – Sometimes the MT engine may eat up some tags in the translation process.
    4. Split annotations -During translation a single word can be translated to more than one word. If the source word has a mark up, say <a> tag. Will the MT engine apply the <a> tag wrapping both words or apply to each word?

    All of the above issues can cause bad experience to translators.

    Apart from potential issues with markup transfer, there is another aspect about sending HTML content to MT engines. Compared to plain text version of a paragraph, HTML version is bigger in terms of size(bytes). Most of these extra addition is tags and attributes which should be unaffected by the translation. This is unnecessary bandwidth usage. If the MT engine is a metered engine(non-free, API access is measured and limited), we are not being economic.

    An outline of the algorithm we used to transfer markups from source content to translated content is given below.

    1. The input HTML content is translated into a LinearDoc, with inline markup (such as bold and links) stored as attributes on a linear array of text chunks. This linearized format is convenient for important text manipulation operations, such as reordering and slicing, which are challenging to perform on an HTML string or a DOM tree.
    2. Plain text sentences (with all inline markup stripped away) are sent to the MT engine for translation.
    3. The MT engine returns a plain text translation, together with subsentence alignment information (saying which parts of the source text correspond to which parts of the translated text).
    4. The alignment information is used to reapply markup to the translated text.

    This make sure that MT engines are translating only plain text and mark up is applied as a post-MT processing.

    Essentially the algorithm does a fuzzy match to find the target locations in translated text to apply annotations. Here also content given to MT engines is plain text only.

    The steps are given below.

    1. For the text to translate, find the text of inline annotations like bold, italics, links etc. We call it subsequences.
    2. Pass the full text and subsequences to the plain text machine translation engine. Use some delimiter so that we can do the array mapping between source items(full text and subsequences) and translated items.
    3. The translated full text will have the subsequences somewhere in the text. To locate the subsequence translation in full text translation, use an approximate search algorithm
    4. The approximate search algorithm will return the start position of match and length of match. To that range we map the annotation from the source html.
    5. The approximate match involves calculating the edit distance between words in translated full text and translated subsequence. It is not strings being searched, but ngrams with n=number of words in subsequence. Each word in ngram will be matched independently.

    To understand this, let us try the algorithm in some example sentences.

    1. Translating the Spanish sentence <p>Es <s>además</s> de Valencia.</p> to Catalan: The plain text version is Es además de Valencia.. And the subsequence with annotation is  además . We give both the full text and subsequence to MT. The full text translation is A més de València.. and the word  además  is translated as a més. We do a search for a més in the full text translation. The search will be successfull and the <s> tag will be applied, resulting <p>És <s>a més</s> de València.</p>.The seach performed in this example is plain text exact search. But the following example illustrate why it cannot be an exact search.
    2. Translating an English sentence <p>A <b>Japanese</b> <i>BBC</i> article</p> to Spanish. The full text translation of this is Un artículo de BBC japonés  One of the subsequenceJapanese will get translated as Japonés. The case of J differs and search should be smart enough to identify japonés as a match for Japonés. The word order in source text and translation is already handled by the algorithm. The following example will illustrate that is not just case change that happens.
    3. Translating <p>A <b>modern</b> Britain.</p> to Spanish. The plain text version get translated as Una Gran Bretaña moderna.  and the word with annotation modern get translated as  Moderno. We need a match for moderna and Moderno. We get <p>Una Gran Bretaña <b>moderna</b>.</p>. This is a case of word inflection. A single letter at the end of the word changes.
    4. Now let us see an example where the subsequence is more than one word and the case of nested subsequences. Translating English sentence <p>The <b>big <i>red</i></b> dog</p> to Spanish. Here, the subsequnce Big red is in bold, and inside that, the red is in italics. In this case we need to translate the full text, sub sequence big red and red. So we have,   El perro rojo grande as full translation, Rojo grande and Rojo as translations of sub sequences. Rojo grande need to be first located and bold tag should be applied. Then search for Rojo and apply Italic. Then we get <p>El perro <b><i>rojo</i> grande</b></p>.
    5. How does it work with heavily inflected languages like Malayalam? Suppose we translate <p>I am from <a href=”x”>Kerala<a></p> to Malayalam. The plain text translation is ഞാന്‍ കേരളത്തില്‍ നിന്നാണു്. And the sub sequence Kerala get translated to കേരളം. So we need to match കേരളം and കേരളത്തില്‍. They differ by an edit distance of 7 and changes are at the end of the word. This shows that we will require language specific tailoring to satisfy a reasonable output.

    The algorithm to do an approximate string match can be a simple levenshtein distance , but what would be the acceptable edit distance? That must be configurable per language modules. And the following example illustrate that just doing an edit distance based matching wont work.

    Translating <p>Los Budistas no <b>comer</b> carne</p> to English. Plain text translation is The Buddhists not eating meat. Comer translates as eat. With an edit distance approach, eat will match more with meat than eating. To address this kind of cases, we mix a second criteria that the words should start with same letter. So this also illustrate that the algorithm should have language specific modules.

    Still there are cases that cannot be solved by the algorithm we mentioned above. Consider the following example

    Translating <p>Bees <b>cannot</b> swim</p>. Plain text translation to Spanish is Las Abejas no pueden nadar and the phrase cannot translates as Puede no. Here we need to match Puede no andno pueden which of course wont match with the approach we explained so far.

    To address this case, we do not consider sub sequence as a string, but an n-gram where n= number of words in the sequence. The fuzzy matching should be per word in the n-gram and should not be for the entire string. ie. Puede to be fuzzy matched with no and pueden, and no to be fuzzy matched wth no and pueden– left to right, till a match is found. This will take care of word order changes as welll as inflections

    Revisiting the 4 type of errors that happen in annotation transfer, with the algorithm explained so far, we see that in worst case, we will miss annotations. There is no case of corrupted markup.

    As and when ContentTranslation add more language support, language specific customization of above approach will be required.

    You can see the algorithm in action by watching the video linked above. And here is a ascreenshot:

    Translation of a paragraph from Palak Paneer article of Spanish Wikipedia to Catalan. Note the links, bold etc applied in correct position in translation at right side

    If anybody interested in the code, See https://github.com/wikimedia/mediawiki-services-cxserver/tree/master/mt – It is a javascript module in a nodejs server which powers ContentTranslation.

    Credits: David Chan, my colleague at Wikimedia,  for extensive help on providing lot of example sentences with varying complexity to fine tune the algorithm. The LinearDoc model that make the whole algorithm work is written by him. David also wrote an algorithm to handle the HTML translation using an upper casing algorithm, you can read it from here. The approximation based algorithm explained above replaced it.

    by Santhosh Thottingal at December 31, 2016 04:59 AM

    December 30, 2016

    Wiki Education Foundation

    New brochure explains how to edit political science articles

    When you contribute to an article on Wikipedia, there are best practices to consider regardless of what subject you work on, but there can also be particularities to different topic areas. For that reason, Wiki Ed works with instructors, organizations, and the Wikipedia community to develop subject-specific editing brochures to supplement our other training materials. I’m pleased to announce the most recent publication: Editing Wikipedia articles on Political Science.

    The “Editing Wikipedia articles on…” series of brochures provides in-depth information on how to get started, how articles on related subjects are typically structured, what kinds of sources are appropriate, whether special rules for style or sourcing may apply, and other helpful tips.

    We chose political science for this guide in part because of our partnership with the Midwest Political Science Association. MPSA encourages their members who teach political science at U.S. and Canadian universities to participate in Wiki Ed’s program, thereby improving the information on Wikipedia. As the number of political science courses we support has grown, so has our need for subject-specific brochure on the topic. Since we started our partnership, 500 political science students have added more than half a million words to Wikipedia, and their work has been viewed more than 25 million times. This new guide will help future political science students write even better content in this subject area.

    We’re grateful for the help of Melissa Heeke and Wikipedians User:Notecardforfree and User:TheVirginiaHistorian in reviewing drafts of this brochure. It’s available for everyone to use in PDF form through Wikimedia Commons, and available in print form for courses participating in our program. To participate in our program, visit teach.wikiedu.org.

    by Ryan McGrady at December 30, 2016 04:53 PM

    December 29, 2016

    Wikimedia Foundation

    Our top posts of 2016: Wikimedia community takes center stage

    One of our best stories of the year focused on Emily Temple-Wood's effort to increase Wikipedia's coverage of women scientists. One of her favorites isBarbara McClintock, seen here and a 1983 winner of the Nobel Prize. Photo from the Smithsonian, public domain.

    One of our best stories of the year focused on Emily Temple-Wood’s effort to increase Wikipedia’s coverage of women scientists. One of her favorites is Barbara McClintock (above), a 1983 winner of the Nobel Prize. Photo from the Smithsonian, public domain.

    On July 28, we announced a redesign of the Wikipedia Android app, with the aim of helping people find needed information along with “interesting, recommended Wikipedia content to dive into when you have a bit of spare time.”

    The post was extremely popular on Reddit, the self-proclaimed “front page of the internet,” and became our most-read blog post of the year.

    But unlike last year, the majority of our most-viewed posts were not Wikimedia Foundation announcements. That honor went to community contests—the Commons photo of the year and the winners of Wiki Loves Monuments—and factual lists, with the most-edited articles in the English Wikipedia’s history and in 2015.

    Much of the rest of the list comes from highlighting unique people or unusual history from Wikipedia. The impact of Prince’s death on Wikipedia told the story of why a MediaWiki extension, nicknamed the “Michael Jackson problem,” kicked in after an extreme combination of edits and pageviews. The twentieth anniversary of Pokémon gave us the opportunity to tell a bit of history, given the game’s influence on the English Wikipedia’s notability policies. And the new alchemy highlighted Emily Temple-Wood, who would later become a Wikipedian of the year, and her efforts to match bouts of online harassment with new articles on women scientists. This post garnered much press coverage, including from the Washington Post and BBC News.

    Less seen, but no less important, were the extensive interviews and profiles of Wikipedia editors we’ve done this year. Peter Isotalo has written articles on sunken warships that have been since raised back to the surface. Habib M’henni hosts workshops and takes photos for Wikimedia Commons. Mervat Salman focuses on narrowing Wikipedia’s gender gap. Diego Delso has contributed more featured pictures to Commons than anyone else. TrueHeartSusie3 and Loeba, both pseudonymous usernames, rewrote the article on Charlie Chaplin. Albin Olsson has photographed the last three Eurovisions and released all of his images under free licenses. Sara Mörtsell works with educators in Sweden to introduce them to Wikipedia. Gary Greenbaum chronicled the late Antonin Scalia.

    We’ve also ramped up our community digest, which aims to condense recent Wikimedia news from around the globe into one readable piece. Our archives have pieces on events like Wikipedia Asian Month to the Estonian presidential election, CEE Spring contest, and Wiki Loves Africa, along with ‘brief notes’ that summarize other happenings. If you have Wikimedia news that should be included, either as a main story or as a short brief note, send us an email at blogteam@wikimedia.org.

    Last, we’ve been slowly adding pieces to our “Why I” series, which aims to let Wikimedia editors tell the world about what motivates them to contribute to Wikimedia sites. This post is well exemplified by Sonja N Bohm’s “Why I proofread poetry at Wikisource,” where she discussed what motivates her to contribute: bringing a single little-known poet’s work back to life.

    Thank you for reading the Wikimedia Blog. We’ll see you next year.

    Ed Erhart, Editorial Associate
    Wikimedia Foundation

    by Ed Erhart at December 29, 2016 06:40 PM

    Gerard Meijssen

    #Wikidata - Khagan of the Rouran


    A great sign that Wikidata gains traction in other languages; much of the data for Yujiulü Shelun, Khagan of the Rouran from 402 to 410, does not have labels in English. When the idea is to include all these Khagans of the Rouran it becomes a challenge. The English article does have many names but do they fit what is already there for other languages.

    The challenge is to do good and bring things together. It is relevant to have all the right items properly connected. One thing that is missing; the item for Khagan of the Rouran. That is easily fixed.
    Thanks,
          GerardM

    by Gerard Meijssen (noreply@blogger.com) at December 29, 2016 04:04 PM

    December 28, 2016

    Wiki Education Foundation

    Teaching students to separate fact from feeling in the age of truthiness

    On the debut episode of The Colbert Report in 2005, the satirical host introduced the concept of “truthiness,” which set the tone for the rest of the series. It’s a concept intended to critique truth claims which prioritize feelings, opinions, or intuitions over evidence-based facts. A related idea is “post-truth,” the Oxford Dictionaries 2016 Word of the Year: “relating to or denoting circumstances in which objective facts are less influential in shaping public opinion than appeals to emotion and personal belief.”

    Thanks to ubiquitous access to the fast, searchable Internet, it’s easier than ever to find factual information. But at the same time, professionals in institutions like education and journalism — those whose job it is to cultivate an informed public — are contending with the popularity of things like fake news and other media which fail to draw, or intentionally omit, distinctions between feelings and facts, analysis and intuition, speculation and data. It’s easy to take for granted the idea that feelings are not the same as evidence — a concept articulated at least as far back as the philosophers and rhetors of Ancient Greece and Rome — but as with any other critical skill, it requires practice in applying it to different contexts.

    A year after “truthiness” entered popular discourse, Colbert coined a related term: “wikiality,” a wiki-like reality in which truth is determined by consensus opinion rather than fact. To demonstrate the concept, he asked his viewers to edit the Wikipedia entry for elephants to say the population of African elephants had tripled in the previous six months, and claimed credit for helping the animals when the article was changed accordingly. But while Colbert’s dedicated fanbase was certainly able to disrupt Wikipedia’s articles about elephants (not even Babar the Elephant was spared), Wikipedia actually does a remarkably good job of distinguishing facts from opinions, and hoaxes like this are almost always removed within minutes, if not seconds.

    Consensus among Wikipedians determines not reality or truth but the extent to which particular changes conform to best practices for neutrality, sourcing, tone, balance, source quality, clarity, etc. Editors on Wikipedia are expected to adhere to some core policies and guidelines. Articles should contain only “verifiable” content and should present all significant perspectives of a subject without including any original research. That means Wikipedia can contain no “gut feelings” and can only include what reliable sources have already published about a topic, summarized neutrally, with any statement of opinion or fringe perspective presented as such and attributed to its source.

    When students write Wikipedia articles for class, they are engaging squarely in the space of facts. They must learn how to identify the best sources for a given subject and how to neutrally summarize the different points and perspectives in those sources. And importantly, when they contribute to Wikipedia, they must also be ready to articulate why the sources are reliable and why it’s a clear, neutral summary if/when other editors challenge their additions. These experiences at the intersection of critical thinking, information literacy, and communication with an international community add a particularly unique dimension to the assignment through which students must apply what they’ve learned in a public forum.

    Instructors teach with Wikipedia in large part because of how valuable these skills are — but Wikipedia’s many rules and guidelines, which keep the information factual, make it difficult for an instructor who isn’t an experienced editor to successfully lead a class. That’s where the Wiki Education Foundation comes in; we provide that support so instructors can get the benefits of Wikipedia assignments for their students without having to be long-term contributors themselves. In the fall 2016 term, Wiki Ed is supporting more than 275 separate classes working on Wikipedia — more than ever before. Instructors work with us because our mission is to act as their bridge to Wikipedia, providing material, technical, and staff support for them and their students as they adapt to the sometimes complex workings of Wikipedia. Our Dashboard tool, interactive training, brochures, and other resources guide students through the necessary steps and skills involved, and ensures instructors can remain focused on what’s most important for their course rather than learning how to be a Wikipedia pro for their students.

    If you’re an instructor interested in the idea of teaching with Wikipedia, consider teaching with us. Wiki Ed is a 501(c)(3) non-profit organization that provides these resources for free in order to support the best possible experience for students and the best possible student contributions to Wikipedia. If you agree with us that this work is important, we’d appreciate spreading the word via social media (e.g. Facebook or Twitter). Or if you’re in a position to support our work financially, consider donating online.

    by Ryan McGrady at December 28, 2016 07:02 PM

    Mahmoud Hashemi

    Montage and Wiki Loves Monuments 2016

    Hatnote welcomed the holidays this year even more than usual, as they coincided with a fine end to another successful chapter of Wikipedia-related work. We’ve got several projects to talk about, but the centerpiece this year is Montage.

    Montage is a judging tool used to judge well over a hundred thousand of the submissions to this year’s Wiki Loves Monuments photography competition.

    We wrote more about it over on the Wikimedia blog, take a look!

    In the meantime, happy holidays from Stephen, Pawel, me, and the rest of the Hatnote crew!

    Hopefully next year we’ll all be able to make it into the holiday photo!

    December 28, 2016 04:00 PM

    Weekly OSM

    weeklyOSM 336

    12/20/2016-12/26/2016

    HDYC zeigt nun auch Informationen des OpenStreetCam-Profils

    HDYC now with OpenStreetCam 1

    Mapping

    • The very old proposal to introduce a validation date for keys is again being discussed.
    • Rory McCann reports on Townlands.ie, which can now display the historical names. The validity date for keys described above is used for this.
    • Philip Hunt presents his work on the automatic recognition of building outlines and makes further suggestions for improvements. John Whelan takes the opportunity to ask on the talk mailing list, if there is already a directive for such contributions to the database.
    • Daniel J H writes a diary about routing circular junctions. This post gives information about how OSRM distinguishes between roundabout intersections, named, and unnamed roundabouts. It also gives information about the criteria for a junction to be a roundabout.

    Community

    • The OpenCage Data Blog interviewed Julio May about OpenStreetMap in Costa Rica.
    • Pascal Neis’ HDYC now also supports OpenStreetCam profiles.
    • Alan described how the STV (single-use) voting system works and why it would be better suited than the majority voting system currently used by OSM US.
    • It has been speculated for a while that Niantic are using OSM data within Pokemon Go (to determine where certain game creatures appear). Since just before Christmas, we’ve now seen a lot of Pokemon Go players updating OpenStreetMap. In many cases their edits are to be welcomed and perfectly valid, but unfortunately a small number are adding unlikely nonexistent footpaths, parks and water features in their locality. However what Niantic are doing is purely speculation – Grant suggests that in the absence of hard evidence we should try and persuade US-based Pokemon Go players to perform TIGER fixup and map missing amenities 🙂 Using Pascal’s search tool for suspicious changesets, it’s easy to find such changes.
    • The app Transportr (that we’ve reported on previously) asks if a change from osmdroid to Mapbox would be a software freedom issue for their users.

    Events

    • In Solsona, a small town with 10,000 inhabitants near Barcelona, cartography students, headed by teacher Fermí Garriga, meets to collect data in OSM. Since google has “many disadvantages” and the data of the cartographic institute of Catalonia are partly faulty, but the participants map with their local knowledge, OSM-based maps are the best choice, Elpuntavui.cat said (auto translation).

    Humanitarian OSM

    • Douglas writes a very detailed summary of the HOT and FSD workshops in Uganda, which focused on the use of crowdsourcing and geodata for financial services.
    • There is page at HOT to make donations for OSM Latam event. Help the Latam event if you can!

    switch2OSM

    • Faced with high costs from commercial providers, a Pokemon map decides to “just serve their own OSM tiles” (and saves a lot of money in the process).

    Software

    • Lukas Martinelli published a video demonstrating the current status of Maputnik (that we reported on earlier).
    • You may have seen OSM diary entries by Mapbox’s data team such as this one showing potentially problematic changes to OSM. Many of the issues listed in those were revealed by OSMCha, originally written by Wille Marcel. In this diary entry Manohar from Mapbox goes into more detail about OSMCha and what they have found from using it.

    Programming

    • OpenStreetMap Carto was released in version 3.0.0. It uses Mapnik 3 for the first time and is a crucial step towards HStores on our main page.

    Releases

    Mapbox releases a new version of their Android SDK which comes with the following enhancements like clustering, integrated support for Mapbox Android services and much more.

    Software Version Release date Comment
    Cruiser for Android * 1.4.16 2016-12-20 Various improvements.
    Cruiser for Desktop * 1.2.16 2016-12-20 No info.
    Mapbox GL JS v0.29.0 2016-12-20 Many changes and bugfixes. Please read change log.
    Mapillary Android * 3.11 2016-12-20 Google login & signup is now working.
    OpenLayers 3.20.1 2016-12-20 Two bugfixes.
    GeoServer 2.10.1 2016-12-21 Four extensions and nine fixes.
    Komoot Android * var 2016-12-21 Minor enhancements.
    Locus Map Free * 3.21.0 2016-12-21 Many changes and bugfixes. Please read change log.
    Maps.me Android * var 2016-12-21 Bug fixes and new map data.
    OSRM Backend 5.5.2 2016-12-21 One bugfix.
    iD 2.0.2 2016-12-22 Three new features and six bugfixes.
    OpenStreetMap Carto Style 3.0.1 2016-12-22 Fix of version 3.0.0 released the same day.
    Magic Earth * 7.1.16.51 2016-12-23 Optimized search, new font system and more.
    GEOS 3.6.1 2016-12-24 Four bugfixes.

    Provided by the OSM Software Watchlist.

    (*) unfree software. See: freesoftware.

    Did you know …

    • … the page mapcontrib? It is up with new features: The ability to create heat maps a new tag and presets handling and has the theme creator to translate all the theme information in English, French and Italian (for now).
    • … the beginner’s guide to use JOSM by Ramya Ragupathy?

    Other “geo” things

    • Geomatics Canada presents “The five most interesting things that happened in geomatics in 2016”.
    • Uber explains in great detail how to deal with geodata and describes the different tools they use for this.
    • National Geographic presents 15 maps of 2016 who show “cartography at its best”.
    • National Geographic released a video of NASA with some impressive shots of Earth from space, with the title “A Year on Earth as Seen From Space”.

    Upcoming Events

    Where What When Country
    Dusseldorf Stammtisch 12/30/2016 Germany
    Münster Stammtisch / OSM Treffen 01/05/2017 Germany
    Dortmund Stammtisch 01/08/2017 Germany
    Manila 【MapAm❤re】OSM Workshop Series 6/8, San Juan 01/09/2017 Philippines
    Lyon Rencontre mensuelle mappeurs 01/10/2017 France
    Nantes Rencontres mensuelles 01/10/2017 France
    Berlin 103. Berlin-Brandenburg Stammtisch 01/12/2017 Germany
    Ulloa 1er encuentro comunidad OSMCo 01/13/2017-01/15/2017 Colombia
    Kyoto 【西国街道シリーズ】長岡天満宮マッピングパーティ 01/14/2017 Japan

    Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please check your event in our public calendar preview and correct it, where appropiate..

    This weeklyOSM was produced by Laura Barroso, Peda, Rogehm, SomeoneElse, YoViajo, derFred, jinalfoflia, keithonearth.

    by weeklyteam at December 28, 2016 09:51 AM

    December 27, 2016

    Wiki Education Foundation

    Monthly Report for November 2016

    Highlights

    • The Simons Foundation renewed its support for Wiki Ed, with a $480,000 two-year grant to extend the impact of the Year of Science, awarded through the Simons Foundation Science Sandbox initiative.
    • Wiki Ed was awarded a $25,000 gift from the Broadcom Foundation, which “empowers young people to be STEM literate, critical thinkers and college and career ready by creating multiple pathways and equitable access to achieve the 21st century skills they need to succeed as engineers, scientists and innovators of the future.”
    • In our ongoing effort to improve our support for the Classroom Program, we launched “Wiki Ed Office Hours”-an hour-long session in which instructors can ask questions via live chat. The program allows the Wiki Ed team to provide instructors with more one-on-one support while allowing the program to grow as well as helping to build a sense of community among those teaching with Wikipedia. We’re continuing to evaluate this new program and will be offering it again in the spring term.
    • The Dashboard got a new often-requested feature, the Diff Viewer, which makes it easy to view the details of edits students have made on Wikipedia.

    Programs

    Educational Partnerships

    Women’s Studies instructors discuss the benefits and challenges of having students edit Wikipedia articles in the classroom.

    In early November, Educational Partnerships Manager Jami Mathewson attended the National Women’s Studies Association (NWSA) conference in Montreal, Quebec. At this conference, we celebrated entering the second year of our partnership with NWSA, through which we have brought more than 100 women’s studies courses to Wikipedia. Our partnership with NWSA has been our most productive to date, with the highest instructor and student participation in our program. This was a great opportunity to meet with our current program participants to gauge their experiences teaching with Wikipedia. The general consensus was that making information available to the world “is the feminist work” they teach their students to engage in. We’re looking forward to working with the new instructors who signed up to join the initiative.

    Jami Mathewson and Dr. Carwil Bjork-James represent Wiki Ed at the American Anthropological Association’s annual meeting.

    Outreach Manager Samantha Weald joined Jami at the American Anthropological Association’s (AAA) annual meeting in Minneapolis to recruit anthropologists into the Classroom Program. While there, Wiki Ed board member Dr. Carwil Bjork-James and Jami held a workshop to inform attendees about the power of Wikipedia as a highly accessed, open source of information for the world. This was our first time attending the AAA meeting, and we observed a lot of enthusiasm for our presence and interest in joining our programs.

    Classroom Program

    Status of the Classroom Program for Fall 2016 in numbers, as of November 30:

    • 276 Wiki Ed-supported courses were in progress (130, or 47%, were led by returning instructors)
    • 6,244 student editors were enrolled
    • 56% were up-to-date with the student training.
    • Students added 2.5 million words, edited 4,543 articles, and created 423 new entries.

    Though many of our students are still hard at work contributing to Wikipedia, we’re gearing up for the Spring 2017 term! Fall 2016 was the Classroom Program’s largest term to date with 276 courses and more than 6,200 students. In just two years, the program has almost tripled in size from 98 courses in Fall 2014. As the year comes to a close, thousands of students from across the U.S. and Canada now have the ability to identify reliable sources of information, and it’s our hope they take this skill with them both in and beyond the walls of academia. When students contribute to Wikipedia, they learn critical media literacy skills to navigate an increasingly complex media landscape.

    Earlier in the term, Classroom Program Manager, Helaine Blumenthal, ran two webinars as part of a strategy to provide instructors with more ways to engage with Wiki Ed and to broaden their understanding of the impact of Wikipedia-based assignments both in and outside of the classroom. Based on feedback, we learned that most valuable to our instructors were the Q&A sections of these programs. As a result, we decided to launch “Wiki Ed Office Hours” later in the term. Twice in November, Helaine, along with Samantha, and Wikipedia Content Experts, Adam Hyland and Ian Ramjohn, hosted an hour-long live chat in which instructors had the chance to drop in and ask any questions they might have about their wiki assignments. While we are still evaluating the effectiveness of office hours, our preliminary impressions are quite favorable. The office hours structure helps Wiki Ed and instructors achieve several important outcomes. As the Classroom Program grows, it’s clear that we cannot visit every class either physically or virtually. Office hours allows the Wiki Ed team to have one-on-one interactions with professors while allowing the program to continue to scale. It also allows the Wiki Ed team to answer many questions at once in a short amount of time rather than spending hours answering individual emails from instructors. Additionally, the Wiki Ed team had the chance to inform instructors about the many resources available to them, like ask.wikiedu.org. Office hours also performs another critical function – it provides instructors an opportunity to interact with one another, and we saw from both sessions this term that instructors are eager to learn from their peers. Our instructors are scattered in universities around the U.S. and Canada, and office hours has the potential to foster a sense of community among participants in the Classroom Program. We’ll be offering office hours again in the spring 2017 term on a monthly basis beginning in February. We’ll continue to evaluate how to best use this approach to support, but it’s a promising new feature of the Classroom Program.

    Dr. Amin Azzam teaches students at UCSF about the professional obligation to make Wikipedia’s medical information high quality.

    Samantha and Jami joined Dr. Amin Azzam’s students at the University of California, San Francisco medical school to share information about Wikipedia-editing in the medical community. Students had great questions about using their medical studies to inform the rest of the world, and we’re excited Dr. Azzam’s elective course has a higher enrollment than ever before, which students attributed to praise from their peers who took the course in previous terms.

    We saw some great work from several courses:

    Students from several courses added biographies of women scientists to Wikipedia.

    • Kate Grillo’s African Archaeology course at UW-LaCrosse added many new biographical subjects to Wikipedia, including ethnoarchaeologist Fiona Marshall and paleontologist Alison S. Brooks.
    • Megan Peiser’s Missouri Women on Wikipedia course does what it says on the tin: Her students used their access to university archives and libraries to create and improve articles on Wikipedia. Thanks to User:Mnsvyc, Wikipedia has an article on Suzanne Saueressig, the first female veterinarian in Missouri. User:JacquelynSkoch added an article on Adaline Weston Couzins, a civil war nurse and suffragist.

    From Black Lives Matter, to Bernie Sanders’s primary campaign, to the reaction to Donald Trump’s election, social movements have been in the news. Angela Miller and David Harris’s Social Movements and Social Media class has produced many timely contributions to Wikipedia.

    • A student in the class created Occupy and the First Amendment, which covers First Amendment challenges to efforts by authorities to regulate Occupy protests.
    • Other students created articles about 99Rise, a social movement opposed to “big money” in politics, Causa Justa :: Just Cause, a social justice movement in San Francisco and Oakland, and It’s On Us, a social movement with a focus on sexual assault on college campuses which was founded by Barack Obama and the White House Council on Women and Girls.
    • Students also created or expanded articles on MoveOn.org, progressivism in the United States, hacktivism, and Asian American activism.
    • Students also reached beyond social movements to related topics like gentrification of San Francisco and California Assembly Bill 540 (2001), which allowed undocumented students in California to pay in-state tuition in the state’s colleges and universities.
    • The article on Fundação Nacional do Índio, which is tasked with protecting the rights of Brazil’s indigenous population, was expanded from a short article into something much more substantial.
    • Electric Yerevan was a protest against electricity rate increases in Armenia during the summer of 2015. A student in the class reworked and expanded the article from a short, heavily tagged piece into a much more substantial, better sourced article. Dozens of other articles were created or substantially improved by students in this class.

    Students in Vanessa Freije’s Readings in Censorship class worked on articles about censorship and press freedom around the world.

    Students in Glenn Dolphin’s Introductory Geology class have been extremely active creating and expanding biographies of women geologists.

    • Gail Ashley is an American sedimentologist whose work played a major role in reconstructing the paleoenvironment and paleoclimate of the Olduvai Gorge, a key site in the study of human evolution.
    • Margaret B. Fuller Boos played a major role in the study of pegmatite in Colorado’s Front Range.
    • Alva C. Ellisor was one of the first female stratigraphers in North America; her studies of foraminifera played an important role in the petroleum industry.
    • Mary Louise Rhodes, also a stratigrapher, was another pioneering woman in petroleum geology.
    • Alice Standish Allen was the first female engineering geologist in North America. Her work on mine subsidence earned her an award from the U.S. Department of Interior. These are just a few of the many articles that the class has created or substantially expanded.

    Community Engagement

    William McKinley’s 1896 presidential campaign poster, which appeared on Wikipedia’s Main Page on the anniversary of McKinley’s election victory.

    Wikipedia Visiting Scholars continued to produce high quality work in November. George Mason University Visiting Scholar Gary Greenbaum’s Featured Article on William McKinley’s 1896 presidential campaign appeared on Wikipedia’s main page on November 3, on the 120th anniversary of his landslide victory. Barbara Page at the University of Pittsburgh made substantial improvements to a number of health topics, including nursing assessment, and Danielle Robichaud at McMaster University continued to develop Wikipedia’s entry for the Canadian Indian residential school system.

    Community Engagement Manager Ryan McGrady continued to work with prospective sponsors and scholars at different stages of the onboarding process, and worked with Jami and Samantha to develop additional outreach strategies. November saw the start of two new Visiting Scholars positions: Jackie Koerner began working with the Paul K. Longmore Institute on Disability at San Francisco State University, and User:Lingzhi started a relationship with the University of San Francisco’s Department of Rhetoric and Language.

    Program Support

    Communications

    In November, Ryan moved from a part-time to a full-time position, and with his additional hours, he is taking over responsibility for Wiki Ed’s blog, social media channels, and other communication activities. Director of Programs LiAnna Davis released an RFP to seek a media firm who can pitch stories of Wiki Ed’s successes from the fall term to news media outlets, and interviewed several firms.

    Blog posts:

    Press Releases:

    External media:

    Digital Infrastructure

    In November, Product Manager Sage Ross spread his efforts among dashboard performance improvements and maintenance, including the new Diff Viewer feature, and collaboration and mentorship.

    With the increased usage of the Dashboard this term and more active students than ever, some of the weak points of our infrastructure have been revealing themselves. Sage fixed a number of edge case bugs and performance bottlenecks in the Dashboard this month, resulting in data updates that happen more quickly, more accurately, and with more failure tolerance. The system can better handle complex page histories and deletions, as well as users who are renamed. Sage also made considerable progress with the ongoing effort to standardize our codebase, which is gradually being converted from CoffeeScript to modern JavaScript.

    The initial, so-far basic version of a long-requested Dashboard feature launched this month: the Diff Viewer. It lets you quickly see the details of edits students have made on Wikipedia, as well as the cumulative changes to an article from an entire course. It’s the start of our effort to improve the Dashboard’s ability to facilitate the review and grading of on-wiki work, which will continue into 2017.

    Another recent focus has been on fostering a bigger and more active development community around the Dashboard. Sage continued collaborating with the Wikimedia Foundation’s Community Tech team on building new Campaign features, and has been experimenting with other ways of getting new developers involved. Two of those ways are already paying off. We’ve made plans to work with an Outreachy intern, Sejal Khatri, from December through early 2017. Sejal is a computer science student from India, and she’ll be working on the Dashboard’s user profile features. Also, at the end of November, we had our first contribution — an important bug fix — through Google Code-In, a free and open source software competition for high school students. Overall, the Dashboard saw contributions in November from six different developers outside of Wiki Ed staff and contractors, the highest number yet.

    Research and Academic Engagement

    Academic Engagement

    Over the course of November, Research Fellow Zach McDowell completed eight focus groups and made plans to conduct five more in the first two weeks of December.

    Surveys have been finalized and participation continues to grow, with a participation rate of over 18.5% for the first survey, with most course activity not finished. Zach has made multiple contacts and is working with three separate faculty on potential co-authored papers on the research, and has had a proposal accepted at one conference, with four others pending for spring 2017.

    Finally, Zach was invited to and gave a talk regarding teaching with Wikipedia and the student learning outcomes research at Hampshire College’s School for Cognitive Sciences on November 16. He was subsequently invited back to give a similar talk for the College of Natural Sciences on January 27, 2017.

    Finance & Administration / Fundraising

    Finance & Administration

    Wiki Ed's monthly financials for November 2016.

    For the month of November, expenses were $155,732 versus the approved budget of $185,675. The majority of the $30k variance is due to staffing vacancies ($14k); and savings in travel related ($20k) expenses.

    Wiki Ed's year to date financials for November 2016.

    Our year-to-date expenses of $742,436 was also less than our budgeted expenditures of $989,351 by $247k. Like the monthly variance, the year-to-date variance was also largely impacted by staffing vacancies ($87k); and savings in travel ($73k). In addition, deferrals in professional services ($47k); and marketing ($5k); as well as savings and deferral in staffing related expenses ($30k) added to the variance.

    Fundraising

    In November the Simons Foundation renewed its support for the Wiki Education Foundation with a $480,000 two-year grant to extend the impact of the Year of Science. This grant was awarded through the Simons Foundation Science Sandbox initiative. Previous support from the Simons Foundation in 2015 helped make the Year of Science the largest initiative to improve a topic area on Wikipedia to date.

    Director of Development Tom Porter, along with Wiki Education Foundation board member Lorraine Hariton, attended the Simons Foundation Science Sandbox initiative launch event in Brooklyn, New York, on November 21. Other attendees included fellow Simons Foundation grantees and representatives from other grantmaking organizations.

    Also in November, the Broadcom Foundation awarded the Wiki Education Foundation with a $25,000 gift. The Broadcom Foundation “empowers young people to be STEM literate, critical thinkers and college and career ready by creating multiple pathways and equitable access to achieve the 21st century skills they need to succeed as engineers, scientists and innovators of the future.”

    Working closely with vendors from WealthEngine and Direct Mail Center, the Wiki Education Foundation is also preparing a first-for-the-organization donor acquisition campaign, which is scheduled for mailing in early December 2016.

    Office of the ED

    Current priorities:

    • Securing funding
    • Working with the board on the future direction of the organization

    In November, Executive Director Frank Schulenburg conducted Wiki Ed’s first staff engagement survey. The purpose is to get a better picture of what motivates individual staff members and to see how we’re doing as an organization in different areas. The survey results will be used as a starting point for conversations between managers and their direct reports. Based on the outcomes, we will consider making adjustments in areas identified as needing improvement. The staff survey is an important feedback tool that helps our understanding of what motivates people to work for Wiki Ed.

    Also in November, the board created a Human Resources Committee. The committee’s purpose is to support the Board in fulfilling its oversight responsibilities related to human resource practices, including the evaluation of the ED’s performance and an annual compensation review. Furthermore, the committee will periodically review and make recommendations regarding Wiki Ed’s policies and practices related to hiring, diversity, compensation, performance evaluation, grievances and whistleblowing and succession planning.

    November also saw improvements in how staff embarks on new projects. From now on, a formal “project charter” serves as a starting point, providing project members with a written statement of the project’s scope, stakeholders, objectives, and measures of success. Although Wiki Ed’s staff have an excellent track record of executing projects successfully, on time and on budget, this measure will help us to further streamline the process of efficiently planning new initiatives.

    Visitors and guests

    • Sebastian Moleski, Wikimedia Deutschland
    • Megan Price, Human Rights Data Analysis Group
    • Stuart Gannes, entrepreneur & investor

    by Ryan McGrady at December 27, 2016 08:09 PM

    Wikimedia Foundation

    No, we’re not in a post-fact world. On Wikipedia, facts matter.

    File:Wikipedia - FactsMatter2016.webm

    Video by Victor Grigas, CC BY-SA 3.0. You can view it on YouTube, Vimeo, or in various languages on Wikimedia Commons. You can also help translate this video’s captions.

    Wikipedia and facts are inseparable.

    It’s what we do: find the facts and revise them as new information becomes available, updating Wikipedia almost 350 times a minute.

    Fact-based information is not always promised. In 2016, the line between truth and opinion seemed particularly blurry. Fake news spread across the internet. Statements were trusted without verification. And fact-checking happened too late for people to learn the real story.

    Wikipedia will never be perfect, or finished. But years of practice and collaboration have built an effective filter for hoaxes and inaccurate information. Article content is paired with references to show that it comes from a reliable source.

    Wikipedia editors haven’t rested since the first article was written almost 16 years ago. Since then, they’ve been constantly building and improving the treasured resource of information that we use almost every day—sharing breaking news in record time, providing source material so you can check for yourself.

    The Wikipedia #FactsMatter video features clips from interviews and events recorded throughout 2016. Included is footage from edit-a-thons in New York City, Mexico City, and user-generated videos from the third annual Art+Feminism Wikipedia edit-a-thon, a series of interviews with Wikipedians involved with Wikipedia in Education, a hackathon in Jerusalem, a tech meetup in Ramallah, Wikimania 2016, and an interview with Jimmy Wales. Also included are a few screenshots from articles that were updated in 2016—the 2016 Summer Olympics, the largest known prime number, the 2016 World Series, Panama Papers, Juno (spacecraft), the king of Thailand Bhumibol Adulyadej, Brexit, Rodrigo Duterte, and Donald Trump.

    We are not in a post-fact world. Facts matter, and we are committed to this now more than ever.

    Video by Victor Grigas, Storyteller
    Heather Walls, Interim Chief of Communications
    Wikimedia Foundation

    by Victor Grigas and Heather Walls at December 27, 2016 07:00 PM

    Gerard Meijssen

    #Wikimedia and the "official point of view"

    One of the pillars of #Wikipedia is its Neutral Point of View (NPOV). The point is that we should not take sides in an argument but should present arguments from both ends and thereby remain neutral. The problem is what to do when arguments are manifestly wrong. When science repeatedly shows that there is no merit in a point of view.

    What to do when it is even worse, when science is manipulated to show what is of benefit to some. When the Wikimedia Foundation had its collaboration with Cochrane, it was onto something important. Cochrane is big on debunking bad science.

    The new government of the USA has a reputation that precedes its actions. It already states that science is bad. It will state its point of view. They will argue that it is good for all but how will they substantiate this? In the mean time much of what science said so far will remain standing. The snake oil salesmen will try to sell you their product and I wonder how it will find its way in Wikipedia. Will we look at science and will we resist the snake oil?
    Thanks,
          GerardM

    by Gerard Meijssen (noreply@blogger.com) at December 27, 2016 08:45 AM

    December 26, 2016

    Amir E. Aharoni

    Amir Aharoni’s Quasi-Pro Tips for Translating the Software That Powers Wikipedia

    As you probably already know, Wikipedia is a website. A website has content—the articles; and it has user interface—the menus around the articles and the various screens that let editors edit the articles and communicate to each other.

    Another thing that you probably already know is that Wikipedia is massively multilingual, so both the content and the user interface must be translated.

    Translation of articles is a topic for another post. This post is about getting all of the user interface translated to your language, as quickly and efficiently as possible.

    The most important piece of software that powers Wikipedia and its sister projects is called MediaWiki. As of today, there are 3,335 messages to translate in MediaWiki. “Messages” in the MediaWiki jargon are strings that are shown in the user interface, and that can be translated. In addition to core MediaWiki, Wikipedia also has dozens of MediaWiki extensions installed, some of them very important—extensions for displaying citations and mathematical formulas, uploading files, receiving notifications, mobile browsing, different editing environments, etc. There are around 3,500 messages to translate in the main extensions, and over 10,000 messages to translate if you want to have all the extensions translated. There are also the Wikipedia mobile apps and additional tools for making automated edits (bots) and monitoring vandalism, with several hundreds of messages each.

    Translating all of it probably sounds like an enormous job, and yes, it takes time, but it’s doable.

    In February 2011 or so—sorry, I don’t remember the exact date—I completed the translation into Hebrew of all of the messages that are needed for Wikipedia and projects related to it. All. The total, complete, no-excuses, premium Wikipedia experience, in Hebrew. Every single part of the MediaWiki software, extensions and additional tools was translated to Hebrew, and if you were a Hebrew speaker, you didn’t need to know a single English word to use it.

    I wasn’t the only one who did this of course. There were plenty of other people who did this before I joined the effort, and plenty of others who helped along the way: Rotem Dan, Ofra Hod, Yaron Shahrabani, Rotem Liss, Or Shapiro, Shani Evenshtein, Inkbug (whose real name I don’t know), and many others. But back then in 2011 it was I who made a conscious effort to get to 100%. It took me quite a few weeks, but I made it.

    Of course, the software that powers Wikipedia changes every single day. So the day after the translations statistics got to 100%, they went down to 99%, because new messages to translate were added. But there were just a few of them, and it took me a few minutes to translate them and get back to 100%.

    I’ve been doing this almost every day since then, keeping Hebrew at 100%. Sometimes it slips because I am traveling or I am ill. It slipped for quite a few months because in late 2014 I became a father, and a lot of new messages happened to be added at the same time, but Hebrew is back at 100% now. And I keep doing this.

    With the sincere hope that this will be useful for translating the software behind Wikipedia to your language, let me tell you how.

    Preparation

    First, let’s do some work to set you up.

    Priorities, part 1

    The translatewiki.net website hosts many projects to translate beyond stuff related to Wikipedia. It hosts such respectable Free Software projects as OpenStreetMap, Etherpad, MathJax, Blockly, and others. Also, not all the MediaWiki extensions are used on Wikimedia projects; there are plenty of extensions, with many thousands of translatable messages, that are not used by Wikimedia, but only on other sites, but they use translatewiki.net as the platform for translation of their user interface.

    It would be nice to translate all of them, but because I don’t have time for that, I have to prioritize.

    On my translatewiki.net user page I have a list of direct links to the translation interface of the projects that are the most important:

    • Core MediaWiki: the heart of it all
      • Extensions used by Wikimedia: the extensions on Wikipedia and related sites
    • MediaWiki Action Api: the documentation of the API functions, mostly interesting to developers who build tools around Wikimedia projects
    • Wikipedia Android app
    • Wikipedia iOS app
    • Installer: MediaWiki’s installer, not used in Wikipedia because MediaWiki is already installed there, but useful for people who install their own instances of MediaWiki, in particular new developers
    • Intuition: a set of different tools, like edit counters, statistics collectors, etc.
    • Pywikibot: a library for writing bots—scripts that make useful automatic edits to MediaWiki sites.

    I usually don’t work on translating other projects unless all of the above projects are 100% translated to Hebrew. I occasionally make an exception for OpenStreetMap or Etherpad, but only if there’s little to translate there and the untranslated MediaWiki-related projects are not very important.

    Priorities, part 2

    So how can you know what is important among more than 15,000 messages from the Wikimedia universe?

    Start from MediaWiki most important messages. If your language is not at 100% in this list, it absolutely must be. This list is automatically created periodically by counting which 600 or so messages are actually shown most frequently to Wikipedia users. This list includes messages from MediaWiki core and a bunch of extensions, so when you’re done with it, you’ll see that the statistics for several groups improved by themselves.

    Now, if the translation of MediaWiki core to your language is not yet at 18%, get it there. Why 18%? Because that’s the threshold for exporting your language to the source code. This is essential for making it possible to use your language in your Wikipedia (or Incubator). It will be quite easy to find short and simple messages to translate (of course, you still have to do it carefully and correctly).

    Getting Things Done, One by One

    Once you have the most important MediaWiki messages 100% and at least 18% of MediaWiki core is translated to your language, where do you go next?

    I have surprising advice.

    You need to get everything to 100% eventually. There are several ways to get there. Your mileage may vary, but I’m going to suggest the way that worked for me: Complete the easiest piece that will get your language closer to 100%! For me this is an easy way to strike an item off my list and feel that I accomplished something.

    But still, there are so many items at which you could start looking! So here’s my selection of components that are more user-visible and less technical, sorted not by importance, but by the number of messages to translate:

    • Cite: the extension that displays footnotes on Wikipedia
    • Babel: the extension that displays boxes on userpages with information about the languages that the user knows
    • Math: the extension that displays math formulas in articles
    • Thanks: the extension for sending “thank you” messages to other editors
    • Universal Language Selector: the extension that lets people select the language they need from a long list of languages (disclaimer: I am one of its developers)
      • jquery.uls: an internal component of Universal Language Selector that has to be translated separately for technical reasons
    • Wikibase Client: the part of Wikidata that appears on Wikipedia, mostly for handling interlanguage links
    • ProofreadPage: the extension that makes it easy to digitize PDF and DjVu files on Wikisource
    • Wikibase Lib: additional messages for Wikidata
    • Echo: the extension that shows notifications about messages and events (the red numbers at the top of Wikipedia)
    • WikiEditor: the toolbar for the classic wiki syntax editor
    • ContentTranslation extension that helps translate articles between languages (disclaimer: I am one of its developers)
    • Wikipedia Android mobile app
    • Wikipedia iOS mobile app
    • UploadWizard: the extension that helps people upload files to Wikimedia Commons comfortably
    • MobileFrontend: the extension that adapts MediaWiki to mobile phones
    • VisualEditor: the extension that allows Wikipedia articles to be edited in a WYSIWYG style
    • Flow: the extension that is starting to make talk pages more comfortable to use
    • Wikibase Repo: the extension that powers the Wikidata website
    • Translate: the extension that powers translatewiki.net itself (disclaimer: I am one of its developers)
    • MediaWiki core: the software itself!

    I put MediaWiki core last intentionally. It’s a very large message group, with over 3000 messages. It’s hard to get it completed quickly, and to be honest, some of its features are not seen very frequently by users who aren’t site administrators or very advanced editors. By all means, do complete it, try to do it as early as possible, and get your friends to help you, but it’s also OK if it takes some time.

    Getting All Things Done

    OK, so if you translate all the items above, you’ll make Wikipedia in your language mostly usable for most readers and editors.

    But let’s go further.

    Let’s go further not just for the sake of seeing pure 100% in the statistics everywhere. There’s more.

    As I wrote above, the software changes every single day. So do the translatable messages. You need to get your language to 100% not just once; you need to keep doing it continuously.

    Once you make the effort of getting to 100%, it will be much easier to keep it there. This means translating some things that are used rarely (but used nevertheless; otherwise they’d be removed). This means investing a few more days or weeks into translating-translating-translating.

    But you’ll be able to congratulate yourself on the accomplishments along the way, and on the big accomplishment of getting everything to 100%.

    One strategy to accomplish this is translating extension by extension. This means, going to your translatewiki.net language statistics: here’s an example with Albanian, but choose your own. Click “expand” on MediaWiki, then again “expand” on “MediaWiki Extensions”, then on “Extensions used by Wikimedia” and finally, on “Extensions used by Wikimedia – Main”. Similarly to what I described above, find the smaller extensions first and translate them. Once you’re done with all the Main extensions, do all the extensions used by Wikimedia. (Going to all extensions, beyond Extensions used by Wikimedia, helps users of these extensions, but doesn’t help Wikipedia very much.) This strategy can work well if you have several people translating to your language, because it’s easy to divide work by topic.

    Another strategy is quiet and friendly competition with other languages. Open the statistics for Extensions Used by Wikimedia – Main. Find your language. Now translate as many messages as needed to pass the language above you in the list. Then translate as many messages as needed to pass the next language above you in the list. Repeat until you get to 100%.

    For example, here’s an excerpt from the statistics for today:

    MediaWiki translation stats exampleLet’s say that you are translating to Malay. You only need to translate eight messages to go up a notch (901 – 894 + 1). Then six messages more to go up another notch (894 – 888). And so on.

    Once you’re done, you will have translated over 3,400 messages, but it’s much easier to do it in small steps.

    Once you get to 100% in the main extensions, do the same with all the Extensions Used by Wikimedia. It’s over 10,000 messages, but the same strategies work.

    Good Stuff to Do Along the Way

    Never assume that the English message is perfect. Never. Do what you can to improve the English messages.

    Developers are people just like you are. They may know their code very well, but they may not be the most brilliant writers. And though some messages are written by professional user experience designers, many are written by the developers themselves. Developers are developers; they are not necessarily very good writers or designers, and the messages that they write in English may not be perfect. Keep in mind that many, many MediaWiki developers are not native English speakers; a lot of them are from Russia, Netherlands, India, Spain, Germany, Norway, China, France and many other countries, and English is foreign to them, and they may make mistakes.

    So report problems with the English messages to the translatewiki Support page. (Use the opportunity to help other translators who are asking questions there, if you can.)

    Another good thing is to do your best to try running the software that you are translating. If there are thousands of messages that are not translated to your language, then chances are that it’s already deployed in Wikipedia and you can try it. Actually trying to use it will help you translate it better.

    Whenever relevant, fix the documentation displayed near the translation area. Strange as it may sound, it is possible that you understand the message better than the developer who wrote it!

    Before translating a component, review the messages that were already translated. To do this, click the “All” tab at the top of the translation area. It’s useful for learning the current terminology, and you can also improve them and make them more consistent.

    After you gain some experience, create a localization guide in your language. There are very few of them, and there should be more. Here’s the localization guide for French, for example. Create your own with the title “Localisation guidelines/xyz” where “xyz” is your language code.

    As in Wikipedia, Be Bold.

    OK, So I Got to 100%, What Now?

    Well done and congratulations.

    Now check the statistics for your language every day. I can’t emphasize how important it is to do this every day.

    The way I do this is having a list of links on my translatewiki.net user page. I click them every day, and if there’s anything new to translate, I immediately translate it. Usually there is just a small number of new messages to translate; I didn’t measure precisely, but usually it’s less than 20. Quite often you won’t have to translate from scratch, but to update the translation of a message that changed in English, which is usually even faster.

    But what if you suddenly see 200 new messages to translate? It happens occasionally. Maybe several times a year, when a major new feature is added or an existing feature is changed.

    Basically, handle it the same way you got to 100% before: step by step, part by part, day by day, week by week, notch by notch, and get back to 100%.

    But you can also try to anticipate it. Follow the discussions about new features, check out new extensions that appear before they are added to the Extensions Used by Wikimedia group, consider translating them when you have a few spare minutes. At the worst case, they will never be used by Wikimedia, but they may be used by somebody else who speaks your language, and your translations will definitely feed the translation memory database that helps you and other people translate more efficiently and easily.

    Consider also translating other useful projects: OpenStreetMap, Etherpad, Blockly, Encyclopedia of Life, etc. Up to you. The same techniques apply everywhere.

    What Do I Get for Doing All This Work?

    The knowledge that thanks to you people who read in your language can use Wikipedia without having to learn English. Awesome, isn’t it? Some people call it “Good karma”.

    Oh, and enormous experience with software localization, which is a rather useful job skill these days.

    Is There Any Other Way in Which I Can Help?

    Yes!

    If you find this post useful, please translate it to other languages and publish it in your blog. No copyright restrictions, public domain (but it would be nice if you credit me and send me a link to your translation). Make any adaptations you need for your language. It took me years of experience to learn all of this, and it took me about four hours to write it. Translating it will take you much less than four hours, and it will help people be more efficient translators.


    Filed under: Free Software, localization, Wikipedia

    by aharoni at December 26, 2016 09:26 PM