en.planet.wikimedia

February 11, 2016

Wiki Education Foundation

Wiki Ed “Teaching with Wikipedia” workshop coming to Bryn Mawr

On Tuesday, February 16, I’ll join the Greenfield Digital Center for the History of Women’s Education and library at Bryn Mawr College to discuss how, and why, instructors build Wikipedia assignments into coursework.

Bryn Mawr is an historic women’s college, so my presentation will focus on one of the major initiatives of the Year of Science, which is closing the content gap on women’s contributions to science.

I met our host, Dr. Monica Mercado, at the National Women’s Studies Association’s conference in 2014. I’m thrilled to be working together to share our programs, and the NWSA Wikipedia initiative, with Bryn Mawr’s faculty. Bryn Mawr has graciously opened its doors to other instructors in the area. Any higher education instructor in the Philadelphia area who is interested in joining us for this workshop, please register here.

  • Tuesday, February 16, 4:30–6:00 p.m., Thomas Hall, Quita Woodward Room, Library & Information Technology Services campus. For directions, contact Dr. Monica Mercado at mmercado[at]brynmawr.edu

Photo: Rhoads Hall, Bryn Mawr. By SmallbonesOwn work, Public Domain, https://commons.wikimedia.org/w/index.php?curid=10196052

by Jami Mathewson at February 11, 2016 05:00 PM

Wikimedia Foundation

What TPP missed: meaningful transparency

glasses-415257_1920
Photo by LeeChangmin, public domain.

Last week, we criticized the intellectual property chapter of the Trans-Pacific Partnership (TPP) and how its promotion of extended copyright terms harms the public domain. There we took issue with the results of the TPP process. Here we’ll focus on the issue of the process itself. Had the process been truly transparent, it may have had better results.

The TPP negotiations were held in secret over the course of seven years, with no draft of the text being presented to the public until after the negotiations ended. Many have criticized how that secrecy is designed to exclude the voices of stakeholders until it is too late for their criticisms to actually improve the deal itself. For years, there have been calls for the negotiators to open up the process and release the draft text.

Simply dumping the TPP text on the public, though, would not make the process transparent. In fact, the public has had access to draft versions of TPP chapters during the negotiations—just through leaks rather than official channels. It was because of those leaks that we were able to take positions on TPP before it was finalized and officially released. But because the drafts were only available through confusing leaks, it was easier for the TPP negotiators to ignore the criticism.

True transparency requires that information be released in a way that facilitates understanding and action. It’s not enough that the public have access to all the information; we must be able to comprehend it. TPP is an extensive international trade agreement whose text is only meaningful to people with specialized knowledge and training (and the time to read all two million words!). The US Trade Representative (USTR) actually provided a good example of how to provide an understandable version of transparency with TPP… but only after the text was finalized. In its explanation of TPP, the USTR provides summaries of each chapter, followed by the full chapter text. It is unfortunate that these materials came only at the end of the process. Transparent TPP negotiations would have involved making similar summaries and full text available periodically, and before it was finalized.

Without true transparency, there was not much anyone could do to participate in the TPP negotiation process unless they were invited to do so.[1] It is easy for negotiators to ignore letters and petitions that are based on leaked documents. It is easy to ignore voices and viewpoints by shutting them out of the room. There was no one involved in the TPP negotiations who was advocating for the commons and the public domain. The entire population of the world serves to gain from an expanded public domain, but there was no voice in the room to advocate for it.

Lack of transparency plagues trade negotiations, particularly when they have copyright implications, and regularly leaves the public domain in the lurch. It happened with ACTA, it’s happened with TPP, and it seems to be happening again with TTIP. Transparency is not just an ideal, it’s a necessary tool for building a more democratic society.

You can join the Wikimedia public policy mailing list to discuss issues like this that matter for a strong public domain.

Chuck Roslof, Legal Counsel
Wikimedia Foundation

[1] In the US, the draft TPP text was made available to members of “trade advisory committees”. Providing information to members of those committees does not increase transparency. The government sharing information with a committee of government-selected advisors is not the same as sharing the information with the general public. And the committees in no way represent the public or the public interest. The Advisory Committee for Trade Policy and Negotiations almost entirely represents the interests of corporate America, with no members from organizations that advocate for the rights and interests of everyone.

by Charles M. Roslof at February 11, 2016 02:00 PM

Wikimedia Foundation removes The Diary of Anne Frank due to copyright law requirements

AnneFrankSchoolPhoto
Anne Frank in 1940. Photo by Collectie Anne Frank Stichting Amsterdam, public domain.

Today, in an unfortunate example of the overreach of the United States’ current copyright law, the Wikimedia Foundation removed the Dutch-language text of The Diary of a Young Girl—more commonly known in English as the Diary of Anne Frank—from Wikisource.[1]

We took this action to comply with the United States’ Digital Millennium Copyright Act (DMCA), as we believe the diary is still under US copyright protection under the law as it is currently written. Nevertheless, our removal serves as an excellent example of why the law should be changed to prevent repeated extensions of copyright terms, an issue that has plagued our communities for years.

What prompted us to remove the diary?

The deletion was required because the Foundation is under the jurisdiction of US law and is therefore subject to the DMCA, specifically title 17, chapter 5, section 512 of the United States Code. As we noted in 2013, “The location of the servers, incorporation, and headquarters are just three of many factors that establish US jurisdiction … if infringing content is linked to or embedded in Wikimedia projects, then  the Foundation may still be subject to liability for such use—either as a direct or contributory infringer.

Based on email discussions sent to the Wikimedia Foundation at legal[at]wikimedia.org, we determined that the Wikimedia Foundation had either “actual knowledge” (i in the statute quoted below) or what is commonly called “red flag knowledge” (ii in the statute quoted below) that the Anne Frank text was hosted on Wikisource and was under copyright. The statute section states that a service provider is only protected by the DMCA when it:

(i) does not have actual knowledge that the material or an activity using the material on the system or network is infringing;

(ii) in the absence of such actual knowledge, is not aware of facts or circumstances from which infringing activity is apparent; or

(The rest applies when we get a proper DMCA takedown notice.)

Of particular concern, the US’ 9th Circuit Court of Appeals stated in their ruling for UMG Recordings, Inc. v. Shelter Capital Partners LLC that in circumstances where a hosting provider (like the Wikimedia Foundation) is informed by a third party (like an unrelated user) about infringing copyrighted content, that would likely constitute either actual or red flag knowledge under the DMCA.

We believe, based on the detail and specificity contained in the emails, that we received that we had actual knowledge sufficient for the DMCA to require us to perform a takedown even in the absence of a demand letter.

How is the diary still copyrighted?

You may wonder why or how the Anne Frank text is copyrighted at all, as Anne Frank died in February 1945. With 70 years having passed since her death, the text may have passed into public domain in the Netherlands on January 1, 2016, where it was first published, although there is still some dispute about this.

However, in the United States, the Anne Frank original text will be under copyright until 2042. This is the result of several factors coming together, and the English-language Wikipedia has actually covered this issue with a multi-part test on its non-US copyrights content guideline.

In short, there are three major laws that together make the diary still copyrighted:

  1. In general, the U.S. copyright for works published before 1978 is 95 years from date of publication. This came about because copyrights in the U.S. were originally for 28 years, with the ability to then extend that for a second 28 years (making a total of 56). Starting with the 1976 Copyright Act and extending to several more acts, the renewal became automatic and was extended. Today, the total term of works published before 1978 is 95 years from date of publication.
  2. Foreign works of countries that are treaty partners to the United States are covered as if they were US works.
  3. Even if a country was not a treaty partner under copyright law at the time of a publication, the 1994 Uruguay Round Agreements Act (URAA) restored copyright to works that:
    • had been published in a foreign country
    • were still under copyright in that country in 1996
    • and would have had U.S. copyright but for the fact they were published abroad.

 

Court challenges to the URAA have all failed, with the most notable (Golan v. Holder) resulting in a Supreme Court ruling that upheld the URAA.

What that means for Anne Frank’s diary is unfortunately simple: no matter how it wound up in the United States and regardless of what formal copyright notices they used, the US grants it copyright until the year 2042, or 95 years after its original publication in 1947.

Under current copyright law, this remains true regardless of its copyright status anywhere else in the world and regardless of whether it may have been in the public domain in the United States in the past.

Jacob Rogers, Legal Counsel*
Wikimedia Foundation

*Special thanks to Anisha Mangalick, Legal Fellow, for her assistance in this matter.

[1] The diary text was originally located at https://nl.wikisource.org/wiki/Het_Achterhuis_(Anne_Frank).

This article was edited to clarify that it is not just the location of the Wikimedia Foundation’s servers that determine whether we fall in US jurisdiction.

by Jacob Rogers at February 11, 2016 04:57 AM

February 10, 2016

Weekly OSM

weekly 290

02/03/2016-02/08/2016

Logo
See where the roads are exciting (or boring) [1] | tortuOSMity

Mapping

  • Ben Spaulding summarises the goals he had set for mapping for the month of January 2016. Though he didn’t fully succeed in mapping for at least 15 minutes, but he still got some mapping done and his experiences are interesting none the less.
  • In the city of Chefchaouen (the chief town of the province of the same name) in Morocco, a collaboration between young enthusiasts and OSM community led to a remarkable result. The project is described here in detail. (automatic translation).
  • User oini writes a diary on re-tagging quadrant routes in Pennsylvania, USA. She also publishes another diary where she provides break-down of the retagged routes.
  • Chris Hill (user chillly) from Kingston makes some critical remarks in his blog aboutisplay HTML differently. If you want to ensure your emails look great everywhere, you should test across as many clients as possible. the procedure used for mapping the schools at the UK 2016 Q1 Mapping Marathon.

Community

  • Belgian Mapper of the Month: ponci4520.
  • Digitalcourage suggests to find one’s way (automatic translation) and for better privacy to prefer OpenStreetMap over Google Maps.
  • Hetzner donated a more powerful server to JOSM developers. The move from the old to the new server will happen soon.
  • Geraldine Sarmiento cartographer at Mapzen shows in a blog entry how different lines styles make different maps.
  • Matt Amos writes at Mapzen blog about attempts to derive the relevance of railway stations from route and stop area relations.
  • User marczoutendijk publishes quite interesting statistics about the Dutch mappers.
  • The UNESCO Wikimedian in Residence is importing UNESCO data (ex. world heritage sites) to Wikidata. He would like to create a link between Wikidata and OpenStreetMap for each of the physical place inscription programmes. He has an idea about how it can be done and is asking for feedback and helpers, more information can be found here.
  • The website of OpenStreetMap Mali has gone live recently.

Imports

  • Metrolinx, “an agency of Government of Ontario to deliver mobility solutions”, tried to import address data from StatCan. The import has been reverted due to a lack of discussion at imports mailing list. Paul Norman points out during the discussion about the import attempt and its revert, that the license of StatCan data is incompatible.

Events

  • Joost Schouppe informs about a mapping party on 27 of February in Brussels. The mapping party focusses on infrastructure for homeless people.
  • Ben Abelhausen reports on Talk-be mailing list about the progress of State of the Map organization (Brüssel, 23–25 September). Among others, they are currently looking for a good location for the social event and a company for recording/streaming the talks.
  • France will arrange a SOTM-FR from 20th to 22nd May.
  • A conference and workshop at the University Autónoma del Estado de México Campus Toluca.

Humanitarian OSM

  • Managua became Central America’s first capital with a complete transportation map. weeklyOSM reported earlier and says “congrats” Felix!
  • HOT publishes a bi-monthly newsletter.
  • Pratik published HOT-Task-Map, a map generator that visualizes the footprint of all HOT tasks up to January 5, 2016, with a total coverage of about 7,449,759 sq kms.
  • Missing Maps/MSF UK are looking for volunteers assisting with the development of a micro-mapping app.

Maps

  • Free South-East Asia OpenStreetMap for Garmin.
  • Mapzen enhanced the output of their route guidance, which also improves the routing on the main site.
  • [1] Martin Raifer published tortuOSMity, a project that displays the curviness of roads on a map.
  • Mapper utack had problems running available OSM based maps on his old Garmin nüvi 205t. So he created his own and wrote a short crisp blog post about.
  • OpenMeatMap? The idea was born by the journalist Miguel Graziano and realized by mapache or Mapa Ché 😉. #ElMapaDelAsado shows the prices of meat collected by the crowd. Miguel Graziano said to weeklyOSM: “The mapa was born with the intention of responding to higher prices and the loss of purchasing power of wages.” … the whole statement from Miguel Graziano. Istyjek from mapache: “We are currently going trough an inflation process here in Argentina, and meat price is getting higher and higher. This tool helps people find cheap and good meat price near house. It’s kind of that simple.”

Open Data

  • Willem reports about a presentation he attended, where they spoke about Open Data and OpenStreetMap.
  • Ottawa publishes approved sledding hills as Open Data on an OpenStreetMap map. (via @rps333)
  • User Geow points out that public data was released in South Tyrol under the license CC0 and are reachable via the platform Open Data Tyrol.

Licences

  • An interesting article talks about “How to use the OpenStreetMap correctly?” (automatic translation). Different situations where OSM data or map can be used and how should they be correctly referenced, are discussed.

Software

  • CycleStreets published a new app with a wide range of features dedicated to cyclists. The app covers North Staffordshire (including the urban areas of Stoke-on-Trent and Newcastle-under-Lyme).
  • User ruthmaben writes about a tool called POI Finder she helped to develop. The tool uses the Overpass API to show all amenities in a given area or between two or more points. The result is a map which shows POI all around the world. Serious drawback: it works only for nodes, not for POI which are mapped on closed ways like buildings. The source code is available for download on github.
  • The map style Lyrk, which e.g. is used on the Accomodation Map, got published as free software. (via @DailyLyrk)
  • Andrew Byrd reports about their progress with the exchange format VEX, which attempts to implement various advantages over PBF.
  • Osiris is a new open source server for Indoor Maps based on OSM.

Releases

Did you know …

  • … the CERN app for Android and iOS made by Adrian Alan Pol, Andrea Giardini and Maciek Muszkowski. The app is absolutely useless for “common man”, but excellent for staff and visitors at CERN. Humorous (at least the Android app), it contains all the necessary information, including an OSM map exemplary attributed. The source code is available on Github.
  • … the Five favorite maps from Stamen’s founder and creative director Eric Rodenbeck.
  • … a French site that offers data sets for OSM data (per “category”) in France.
  • … Sanjay Bhangar’s maps-gl-srt project that allows the display of a video on a map? (via @arunasank)
  • openstreetmap.cz, a website which is more than just a map? (Tschechisch)
  • … the site TagFinder that features a semantic search for OpenStreetMap tags and terms?

Other “geo” things

  • Christoph explains how the different satellites observe the different wavelengths of light.
  • Norviz is a new company projecting graphics on physical landscape models.
  • The guys from MangoMap have published a video tutorial as a beginners’ guide to QGIS.
  • Visualisation of the world’s refugee crisis
  • Read this to see how Google Mapmapker is used as a vector for fake sites appearing legitimate; obviously OSM can suffer from exactly the same problem.

Upcoming Events

Where What When Country
Austin Mapping Party 02/10/2016 united states
Cebu Bogo OSM Workshop, Bogo City 02/10/2016-02/12/2016 philippines
Wien 54. Wiener Stammtisch 02/11/2016 österreich
Paris Mapathon Missing Maps 02/11/2016 france
Budapest OpenStreetMap Budapest Meetup 02/15/2016 hungary
Riohacha Mapathon por la Guajira 02/16/2016-02/26/2016 colombia
Derby Derby 02/16/2016 england
Seattle Missing Maps Mapathon 02/20/2016 us
Taipei OpenStreetMap Taipei Meetup 02/22/2016 taiwan
Graz Stammtisch 02/22/2016 austria
Urspring Stammtisch Ulmer Alb 02/23/2016 germany
Colorado Humanitarian Mapathon @ Colorado State University, Fort Collins 02/24/2016 us
Toluca Primeras Jornadas de Mapas Libres 26/02/2016-27/02/2016 mexico
Karlsruhe Hack Weekend 02/27/2016-02/28/2016 germany

Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please don’t forget to mention the city and the country in the calendar.

 

This weekly was produced by Katja Ulbert, Nakaner, Peda, Rogehm, TheFive, Ziltoidium, bogzab, derFred, elpbatista, escada, jinalfoflia, malenki, mgehling, stephan75, wambacher, widedangel.

by weeklyteam at February 10, 2016 11:15 PM

Wiki Education Foundation

Wiki Ed hosting President’s Day workshop in Philadelphia

On Monday, February 15, I’ll visit Temple University for a teaching with Wikipedia workshop, and help spread Wiki Ed’s Year of Science initiative.

When students write or expand Wikipedia articles, they practice media literacy, fact-based writing, research, collaboration, and critical thinking skills. I’ll highlight the best practices for teaching through Wikipedia, and how we support participants.

These workshops are a great opportunity for face-time with instructors.  It helps us advocate for bridging higher education and Wikipedia. Many of our instructors tell us that the visible impact they can have on a global audience is inspiring. University students have access to a wealth of library resources. Combined with the expertise of their instructors, they bring a lot to the table.

Institutions can play a similar role. Sponsoring a Visiting Scholar connects an experienced Wikipedian with library resources. We’ll speak with Temple University librarians about that opportunity to improve Wikipedia.

Thank you to Steven Bell of the Temple University libraries for hosting our workshop! Please join us:

  • February 15, 3:00 – 5:00 p.m., Paley Library Lecture Hall

Photo: Benjamin Franklin Bridge at night. By Jeffrey Phillips Freeman – Debeo MoriumOwn work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=10680495

by Jami Mathewson at February 10, 2016 06:24 PM

Greg Sabino Mullane

Bonked By Basic_auth Because Bcrypt

tl;dr - don't use a high bcrypt cost with HTTP basic auth!

Recently we had a client approach us with reports of a "slow" wiki experience. This was for a MediaWiki we recently installed for them; there were no fancy extensions, and the hardware, the OS, and the Apache web server were solid, perfectly normal choices. I was tasked to dive in and solve this issue.

The first step in any troubleshooting is to verify and duplicate the problem. While the wiki did feel a bit sluggish, it was not as bad as the reports we were getting of taking over 15 seconds to view a page. A side-by-side comparison with a similar wiki seemed a good place to start. I called up the main wiki page on both the client wiki and End Point's internal wiki. Both were running the latest version of MediaWiki, had the same type of servers (located a similar distance from me), were using the same version of Apache, and had roughly the same server load. While both wiki's pages had roughly the same amount of content, the client one loaded noticeably slower. It took less than a second for the End Point wiki, and around ten seconds for the client one!

The first culprit was MediaWiki itself. Perhaps something was misconfigured there, or some extension was slowing everything down? MediaWiki has good debugging tools. Inside the both wiki's LocalSettings.php file I turned on debugging temporarily with:

$wgDebugLogFile         = '/tmp/mediawiki.debug';
$wgDebugDBTransactions  = true;
$wgDebugDumpSql         = true;
$wgDebugTimestamps      = true;

I reloaded the page, then commented out the $wgDebugLogFile line to stop it from growing large (the debug output can be quite verbose!). Here's some snippets from the generated log file:

0.9151   4.2M  Start request GET /wiki/Main_Page
...
[caches] main: SqlBagOStuff, message: SqlBagOStuff, parser: SqlBagOStuff
[caches] LocalisationCache: using store LCStoreDB
0.9266   9.2M  Implicit transaction open enabled.
0.9279   9.2M  Query wikidb (1) (slave): SET /* DatabasePostgres::open  */ client_encoding='UTF8'
0.9282   9.2M  Resource id #127: Transaction state changed from IDLE -> ACTIVE
0.9268   9.2M  Query wikidb (2) (slave): SET /* DatabasePostgres::open  */ datestyle = 'ISO, YMD'
...
0.9587   9.2M  Query wikidb (11) (slave): SELECT /* LCStoreDB::get  */  lc_value  FROM "l10n_cache"   WHERE lc_lang = 'en' AND lc_key = 'deps'  LIMIT 1
0.9573   9.5M  Query wikidb (12) (slave): SELECT /* LCStoreDB::get  */  lc_value  FROM "l10n_cache"   WHERE lc_lang = 'en' AND lc_key = 'list'  LIMIT 1
0.9567  10.8M  Query wikidb (13) (slave): SELECT /* LCStoreDB::get  */  lc_value  FROM "l10n_cache"   WHERE lc_lang = 'en' AND lc_key = 'preload'  LIMIT 1
0.9572  10.8M  Query wikidb (14) (slave): SELECT /* LCStoreDB::get  */  lc_value  FROM "l10n_cache"   WHERE lc_lang = 'en' AND lc_key = 'preload'  LIMIT 1
...
0.9875  21.2M  Query wikidb (195) (slave): SELECT /* LCStoreDB::get Greg */  lc_value  FROM "l10n_cache"   WHERE lc_lang = 'en' AND lc_key = 'messages:accesskey-pt-mycontris'  LIMIT 1
0.9873  21.2M  Query wikidb (196) (slave): SELECT /* LCStoreDB::get Greg */  lc_value  FROM "l10n_cache"   WHERE lc_lang = 'en' AND lc_key = 'messages:tooltip-pt-logout'  LIMIT 1
0.9868  21.2M  Query wikidb (197) (slave): SELECT /* LCStoreDB::get Greg */  lc_value  FROM "l10n_cache"   WHERE lc_lang = 'en' AND lc_key = 'messages:accesskey-pt-logout'  LIMIT 1
0.9883  21.2M  Query wikidb (198) (slave): SELECT /* LCStoreDB::get Greg */  lc_value  FROM "l10n_cache"   WHERE lc_lang = 'en' AND lc_key = 'messages:vector-more-actions'  LIMIT 1

Just to load a simple page, there were 194 SELECT statements! And 137 of those were trying to look in the l10n_cache table, one row at a time. Clearly, there is lots of room for improvement there. Someday, I may even jump in and tackle that. But for now, despite being very inefficient, it is also very fast. Because of the $wgDebugTimestamps, it was easy to compute how much time both wikis spent actually creating the page and sending it back to Apache. In this case, the difference was minimal, which meant MediaWiki was not the culprit.

I then turned my attention to Apache. Perhaps it was compiled differently? Perhaps there was some obscure SSL bug slowing things down for everyone? These were unlikely, but it was worth checking the Apache logs (which were in /var/log/httpd). There are two main logs Apache uses: access and error. The latter revealed nothing at all when I loaded the main wiki page. The access logs looked fairly normal:

85.236.207.120 - greg [19/Jan/2016:12:23:21 -0500] "GET /wiki/Main_Page HTTP/1.1" 200 23558 "-" "Mozilla/5.0 Firefox/43.0"
85.236.207.120 - greg [19/Jan/2016:12:23:22 -0500] "GET /mediawiki/extensions/balloons/js/balloon.config.js HTTP/1.1" 200 4128 "https://wiki.endpoint.com/wiki/Main_Page
" "Mozilla/5.0 Firefox/43.0"
...
85.236.207.120 - greg [19/Jan/2016:12:23:22 -0500] "GET /mediawiki/load.php?debug=false⟨=en&modules=mediawiki.legacy.commonPrint%2Cshared%7Cmediawiki.sectionAnchor%7Cmediawiki.skinning.interface%7Cskins.vector.styles&only=styles&skin=vector HTTP/1.1" 200 58697 "https://wiki.endpoint.com/wiki/Main_Page" "Mozilla/5.0 Firefox/43.0"
85.236.207.120 - greg [19/Jan/2016:12:23:22 -0500] "GET /mediawiki/resources/assets/poweredby_mediawiki_88x31.png HTTP/1.1" 200 3525 "https://wiki.endpoint.com/wiki/Main_Page" "Mozilla/5.0 Firefox/43.0"

Still nothing out of the ordinary. What to do next? When all else fails, go to the system calls. It's about as close to bare metal as you can easily get on a Linux system. In this case, I decided to run strace on the Apache daemon to see exactly where the time was being spent. As expected, there were a large handful of httpd processes already spawned and waiting for a connection. While there was no way to know which one would field my requests, some shell-fu allowed me to strace them all at once:

## The -u prevents us from picking the parent httpd process, because it is owned by root!
$ strace -o greg.httpd.trace -tt -ff `pgrep -u apache httpd | xargs -n 1 echo -p | xargs`
Process 5148 attached
Process 4848 attached
Process 5656 attached
Process 4948 attached
Process 5149 attached
Process 5148 attached
Process 4858 attached
Process 5657 attached
Process 4852 attached
Process 4853 attached
^CProcess 5148 detached
Process 4848 detached
Process 5656 detached
Process 4948 detached
Process 5149 detached
Process 5148 detached
Process 4858 detached
Process 5657 detached
Process 4852 detached
Process 4853 detached

Looking at one of the output of one of these revealed some important clues:

$ head greg.httpd.trace.4948
13:00:28.799807 read(14, "\27\3\3\2\221\0\0\0\0\0\0\0\1\35-\332\3123(\200\302\"\251'g\256\363b5"..., 8000) = 666
13:00:28.799995 stat("/wiki/htdocs/mediawiki/load.php", {st_mode=S_IFREG|0644, st_size=1755, ...}) = 0
13:00:28.800126 open("/wiki/htpasswd.users", O_RDONLY|O_CLOEXEC) = 15
13:00:28.800176 fstat(15, {st_mode=S_IFREG|0640, st_size=947, ...}) = 0
13:00:28.800204 read(15, "alice:{SHA}jA0EAgMCMEpo4Wa3n/9gy"..., 4096) = 2802
13:00:28.800230 close(15)               = 0
13:00:29.496369 setitimer(ITIMER_PROF, {it_interval={0, 0}, it_value={60, 0}}, NULL) = 0
13:00:29.496863 rt_sigaction(SIGPROF, {0x7fc962da7ab0, [PROF], SA_RESTORER|SA_RESTART, 0x7fc970605670}, {0x7fc962da7ab0, [PROF], SA_RESTORER|SA_RESTART, 0x7fc970605670

Aha! If you look close at those timestamps, you will notice that the time gap from the call to close() and the subsequent setitimer() is quite large at .69 seconds. That's a long time for Apache to be waiting around for something. The second clue is the file it just opened: "htpasswd.users". Seeing the top of the file, with the {SHA} in quotes, made me realize the problem - htpasswd files now support bcrypt as an authentication method, and bcrypt is designed to be secure - and slow. Sure enough, the htpasswd file had bcrypt entries with a high cost for the people that were having the most issues with the speed. This is what the file looked like (names and values changed):

alice:{SHA}jA0EAgMCMEpo4Wa3n/9gybBBsDPa
greg$2y$13$+lE6+EwgtzP0m8K8VQDnYMRDRMf6rNMRZsCzko07QQpskKI9xbb/y9
mallory$2y$15$ww8Q4HMI1Md51kul2Hiz4ctetPqJ95cmspH8T81JHfqRvmg===rVgn
carol:7RnEKJWc38uEO
bob$apr1$uKX9Z63CqPOGX4lD1R4yVZsloJyZGf+
jon$2y$08$SUe3Z8sgEpyDWbWhUUUU5wtVTwlpEdc7QyXOg3e5WBwM4Hu35/OSo1
eve$apr1$I/hv09PcpU0VfXhyG7ZGaMz7Vhxi1Tm

I recognized the bcrypt format right away ($2y$13$). The people who were complaining the most (e.g. mallory in the example above) about the speed of the wiki had the highest costs, while those with low costs (e.g. jon), and those using something other than bcrypt (everyone else above), were not complaining at all! The 'cost' is the number after the second dollar sign: as you can see, some of them had a cost of 15, which is much more expensive than a cost of 13, which is what my user ("greg") was using. This was a smoking gun, but one more step was needed for proof. I adjusted the cost of my password to something low using the htpasswd program:

$ htpasswd -B -C 6 /wiki/htpasswd.users greg
New password: 
Re-type new password: 
Updating password for user greg

Voila! The page loaded in a flash. I then changed the cost to 15 and suddenly the wiki was even slower than before - taking upwards of 15 seconds to load the main page of the wiki. Mystery solved. All those high cost bcrypt requests are also not good for the server: not only does it use a lot of CPU, but ends up keeping the Apache daemon tied up waiting for the bcrypt to finish, rather than simply finishing up quickly and going back to the main pool.

You may be asking a few questions at this point, however. Why would htpasswd offer a footgun like this? Why such a radical difference in effect for slightly different costs? Is bcrypt a good practice for a htpasswd file? Let's attempt to answer those. Before we do, we have to learn a little bit about bcrypt and passwords in general. Some of this is purposefully oversimplified, so be gently in the comments. :)

Passwords themselves are never stored on a server (aka the machine doing the authentication). Instead, the server stores a hash of the password. This is created by what is known as a "one-way" function, that creates a unique fingerprint of your password. If this fingerprint (aka hash) is discovered, there is no direct way to see the password that created it. When you login to a site, it creates a hash of the password you give it, then compares that hash to the one it has stored. Thus, it can verify that you have given it the correct password without actually having to store the password.

For a long time, very simple algorithms were used to create these hashes. However, as computers became more powerful, and as the field of cryptography advanced, it became easier to "crack" these hashes and determine the password that was used to create them. This was an important problem, and one of the solutions that people came up with was the bcrypt algorithm, which makes the computation of the hash very expensive, in terms of computer speed. Furthermore, that speed is adjustable, and determined by the "cost" given at creation time. You may have noticed the -C option I used in the htpasswd example above. That number indicates the number of rounds the algorithm must go through. However, the cost given leads to 2^code rounds, which means that the cost is exponential. In other words, a cost of 13 means that bcrypt runs 2 to the 13th power rounds, or 8,192 rounds. A cost of 14 is 2 to the 14th power, or 16,384 rounds - twice as slow as a cost of 13! A cost of 15 is 32,768 rounds, etc. Thus, one can see why even a cost of 15 would be much slower than a cost of 13.

A web page usually returns more than just the requested HTML. There are commonly images, CSS, and javascript that must also be loaded from the webserver to fully render the page. Each of these requests must go through basic auth, and thus get slowed down by bcrypt. This is why even though each basic authentication via bcrypt of 15 only takes a couple of seconds, the entire web page can take much longer.

What encryption options are available for htpasswd program? The bcrypt option was introduced without much fanfare in version 2.4.4 of Apache, which was released on February 25, 2013. So, it's been around a while. The output of --help shows us that bcrypt is the only secure one, but allows for other legacy ones to be used. Also note that the range of costs for bcrypt range from 4 to 31:

 -m  Force MD5 encryption of the password (default).
 -B  Force bcrypt encryption of the password (very secure).
 -C  Set the computing time used for the bcrypt algorithm
     (higher is more secure but slower, default: 5, valid: 4 to 31).
 -d  Force CRYPT encryption of the password (8 chars max, insecure).
 -s  Force SHA encryption of the password (insecure).
 -p  Do not encrypt the password (plaintext, insecure).

So should you use bcrypt for your htpasswd? Absolutely yes. Even a lower cost bcrypt is incredibly more secure than using MD5, CRYPT, or SHA. A cost of 10 is roughly the same speed as those, but a much, much better choice. You can measure the time it takes to create or update your password via the command-line htpasswd command to get a rough idea of how much impact it will have on your website. You can use the time it takes to run the htpasswd command as rough proxy for the total page load time. Here are some numbers I generated on my local box. Numbers represent the average of a few runs:

Bcrypt cost htpasswd creation time Web page load time
100.0795.68 seconds
120.2686.77 seconds
140.97910.78 seconds
163.68425.72 seconds
1814.68388.85 seconds
2058.680358.80 seconds
22236.3691357.82 seconds
31186,173 seconds
(51 hours and 42 minutes!!)
Ah...no

There are times where you really do want a higher bcrypt cost. The basic auth usage in this scenario is really the exception, and not the norm. In most cases, a password will be used to log in to something, and you will either create a persistent connection (e.g. SSH), or a cookie with a temporary token will be issued (e.g. almost every website in the world). In those cases, a few seconds delay are quite acceptable, as it is a rare event.

So why do we even care about passwords so much, especially for something like basic auth and a htpasswd file? After all, if someone can view the contents of the htpasswd file, they can also more than likely view whatever material on the web server it was designed to protect. These days, however, it's important to view strong hashes such as bcrypt as not just protecting data, but protecting the password as well. Why? Password reuse. It's very common for people to use the same (or very similar) password on all the sites they visit. The danger is thus not that an attacker can view the file contents protected by the htpasswd file, but that an attacker can use that password on the user's email accounts, or on other sites the user may have visited and used the same password.

What bcrypt cost should you use? The general answer is to use the highest possible cost you can get away with. Take something with such a high cost that is causes discomfort to the users, then dial it back a tiny bit. Measure it out and see what your server can handle. For general bcrypt use, start with 13, but don't be afraid to keep going up until it takes a wall clock second or two to run. For basic auth, use something very fast: perhaps 9 or less. Anything that takes over a second to create via htpasswd will slow a site down noticeably!

by Greg Sabino Mullane (noreply@blogger.com) at February 10, 2016 05:57 PM

Luis Villa

Reinventing FOSS user experiences: a bibliography

There is a small genre of posts around re-inventing the interfaces of popular open source software; I thought I’d collect some of them for future reference:

Recent:

Older:

The first two (Drupal, WordPress) are particularly strong examples of the genre because they directly grapple with the difficulty of change for open source projects. I’m sure that early Firefox and VE discussions also did that, but I can’t find them easily – pointers welcome.

Other suggestions welcome in comments.

by Luis Villa at February 10, 2016 04:13 PM

Wikimedia Foundation

Super Bowl searches show Wikipedia is the ‘second screen’

Broncos host Military Day
Peyton Manning was one of the most-searched articles on the English-language Wikipedia during and after the Super Bowl. Photo by the US Air National Guard, public domain.

On Sunday evening, the most valuable player of the Super Bowl—the US’ biggest sports event—was announced: Von Miller.

Who?

The Denver Broncos linebacker was not a household name, unlike the two quarterbacks in the game, Peyton Manning and Cam Newton. So out came viewer’s mobile phones, and up came his Wikipedia article.

In the minute after Miller was announced as MVP, his article received 41,000 clicks, or 683 a second. The position Wikipedia played in the Super Bowl was clear: second screen.

Anticipation for the game, which is the NFL’s final contest each year, was high. Before, during, and after the game, the Wikipedia article on Super Bowl 50 received tens of thousands of hits, with some hours reaching 50,000. Over one million hits were registered in the 48 hours around the game,[1] adding even more pageviews to what was already the eighth-most popular article at the end of January; the article on the Super Bowl as an institution added over 500,000 more.

We obtained this data with the help of the Wikimedia Foundation’s Analytics team, and mined it to match up viewcount spikes with events during the game.

Players

Players graph
Players’ viewcounts were erratic and tended to match the flow of the game. You can click on these images for larger views; the times correspond with UTC, where 22:00 is 2pm PST and 5pm EST. Graph by Joe Sutherland, public domain.

Of the players we examined, the largest jump in pageviews for a player’s Wikipedia article came in the minute after Von Miller was named as the game’s MVP: 40,849 hits were recorded between 7:40 and 7:41 PST (11:40–41pm EST).

Overall, Miller’s game as viewed through the lenses of Wikipedia views was fascinating. Before the game, Miller rarely crested above 50 views per minute. He had one spike twelve minutes before the game, although we don’t know why—readers, please let us know if you have a solution. They soon settled back into a two-digit pattern …

… until he strip-sacked Newton in the game’s first quarter, knocking the ball out of his hands and into the arms of a waiting teammate for a Broncos score—that’s the spike you’ll see in the graph above between 0:00 and 00:15 UTC. At the time, the Panthers offense had gained a grand total of -6 yards.

Interest in Miller remained high after that, spiking several times—such as when he split a sack with teammate DeMarcus Ware at the end of the third quarter. Sustained interest took hold after Miller’s second strip-sack of Newton near the end of the game, jumped as noted when he won the MVP, and had a small bump long after the game when he appeared on The Late Show with Stephen Colbert live via satellite.

Miller finished the night with five solo tackles, two forced fumbles, and two and a half of his team’s seven quarterback sacks, playing a key role in putting pressure on Cam Newton on 21 out of 48 pass plays.

Denver’s quarterback Peyton Manning came close to matching Miller’s high-water mark at and after the end of the game (7:16–29pm PST), including one minute with 38,238 hits. He did, however, blow Miller out of the water in total views during the 7pm PST hour—Manning’s article was viewed 226,099 times, driven by those 13 minutes of five-digit attention.

We also looked at the matchup between the two starring quarterbacks, where Manning clearly beat out Carolina’s Cam Newton for views—Newton won the head-to-head matchup in only two hours and only by a total of 15,000 views. Manning cleaned up during the rest of it, besting Newton by 19,000 views in one hour and a whopping 169,000 in the next, which corresponded with the end of the game.

Newton had several early game view spikes, including after his first-quarter fumble and after he was sacked just before halftime.

All that said, the longer view is more nuanced.[1] Newton beat Manning in the pre-game 24 hours by about 46,000 views, but in the endgame/post-game thrill of victory, Manning won the succeeding 24 hours by over 904,000 views.

Halftime

HT graph
Thousands looked up the half-time performers as they took the stage in Santa Clara. Graph by Joe Sutherland, public domain.

The halftime performers received a good deal of attention at, unsurprisingly, halftime. Traditionally, featured artists have received quite a bump in sales from the tens to hundreds of millions of people tuning in.

Coldplay and Bruno Mars spiked at 38,149 and 32,029, respectively, in single minutes when they were singing, but no one this year approached a record number of views for a Super Bowl halftime show. Data from 2013 shows that halftime performers Madonna and the Who received nearly a million and 570,000 views (respectively) in the hours they performed—and it is almost certain that these numbers are understated, as at that time mobile phone views were not counted.

Up against these numbers, Coldplay managed only 417,516 for the day, much less in a single hour, where they received a maximum of 126,898 hits. Bruno Mars topped out at 94,347 in an hour.

That said, we have reason to question Beyoncé‘s figures: the much-anticipated appearance of the pop icon, who many news outlets thought ‘stole the show,’ did not result in anything close to a similar view count. Beyoncé’s article spiked at just 13,282 views and hit five digits in only one other minute. We don’t have a good answer for this discrepancy, as the difference is not made up by Wikipedia redirects or her choice of song; were people we too entranced by the performance to look up her Wikipedia article, or did they simply already know who she was? Like Miller above, we’d love to hear readers’ theories.

In miscellanea, Lady Gaga’s sterling (albeit controversial, at least in the prop betting world) rendition of the US national anthem led to a ten-minute increase in interest in her from 3:29–39pm PST, including one minute of 22,663 hits. Michael Jackson, whose image appeared during halftime as part of a callback to past performances, had a small but notable seven-minute increase in traffic of his own.

Teams

Team graphs
Both teams’ pages also attracted thousands of hits over the course of the night. Graph by Joe Sutherland, public domain.

Denver and Carolina also battled it out on their own Wikipedia pages. With an average of 1,283 views per minute over both the Denver Broncos and Carolina Panthers Wikipedia articles, people were clearly interested in the clubs’ histories as the game went on. Carolina had the higher average, at 661 views per minute to Denver’s 621.

With halftime came a big drop in visits for both teams’ pages, presumably as viewers focused on the eclectic show. The largest spike came for Denver as Miller’s MVP award was announced; 24,580 visitors arrived at the Broncos’ article in the two minutes that followed.

We’ve put all of the minute-by-minute Wikipedia pageview data in a Google spreadsheet; play around with it and let us know what you find.

Our thanks go to the Wikimedia Foundation’s Dan Andreescu, who compiled and tabulated this data for us despite having very little advance notice. This post would not exist without his assistance.

Ed Erhart, Editorial Associate
Joe Sutherland, Communications Intern
Wikimedia Foundation

[1] Wikipedia’s day ends at 12am UTC, or 4pm PST/7pm EST. This time change came sometime in the first quarter, so the per-game day-by-day views are split between February 7 and 8. We get into more detailed data later in the post.

This post was edited after publication to correct the source of our data.

by Ed Erhart and Joe Sutherland at February 10, 2016 05:07 AM

February 09, 2016

Wiki Education Foundation

Wiki Ed joining AAAS annual meeting

YoSWordpresslogo170x153I’m pleased to announce that the Wiki Education Foundation is attending the annual meeting of the American Association for the Advancement of Science (AAAS) to promote the Wikipedia Year of Science. On February 13 and 14, we’ll join the Simons Foundation to support scientists attending a series of Wikipedia edit-a-thons.

Samantha Erickson and I will join Simons Foundation staff and Wikimedia DC volunteers to encourage attendees to amplify their impact by assigning students to edit Wikipedia. When students participate in the Year of Science, they improve public access to scientific information. Students benefit from an assignment that focuses on understanding, and communicating, science knowledge— developing skills valuable across disciplines and in their post-academic lives.

We’re grateful for the help of local Wikipedians from Wikimedia DC in helping to support instructors at the edit-a-thons, some of whom may be working with Wikipedia for the first time. The final session will focus specifically on diversity in science, which we hope will encourage instructors to explore the role of Wikipedia assignments in battling content gaps, such as the gender gap, on Wikipedia.

AAAS conference attendees can also join us for an in-depth look at building a bridge between academia and Wikipedia during our Friday workshop from 10:00–11:30 a.m. at the Cove Co-working Space. Additionally, we’ll be available during the AAAS Communicating Science seminar on Thursday. This will be a great opportunity for one-on-one discussions about how a Wikipedia assignment helps students communicate science to the general public.

Major support for the Year of Science is generously provided by Google and the Simons Foundation.

Jami Mathewson
Educational Partnerships Manager


Photo: By Carol M. HighsmithThis image is available from the United States Library of Congress‘s Prints and Photographs division under the digital ID highsm.04037Public Domain, https://commons.wikimedia.org/w/index.php?curid=12681733

by Jami Mathewson at February 09, 2016 10:14 PM

Wiki Ed Visiting Georgetown University Linguistics Department

On Friday, February 12, Samantha Erickson and I will join linguistics faculty at Georgetown University to share the Wikipedia Year of Science, and encourage instructors to teach with Wikipedia or sponsor a Visiting Scholar. Linguistics articles are highly underrepresented on Wikipedia, and the Linguistic Society of America has launched an initiative to close content gaps.

Past linguistics student editors have improved articles ranging from the Cahuilla language of Southern California, to head-directionality parameter, to Logical Form. It’s amazing what students can achieve during a classroom assignment, and we’re looking forward to sharing those accomplishments with faculty members at Georgetown later this week. All members of the GU Linguistics community are invited to attend.

To learn more about getting involved in the Year of Science, please join us:

  •  Georgetown University, 230 Poulton Hall, Friday, February 12, 1:00 – 2:00 p.m.

Special thanks to Dr. Anastasia Nylund for hosting us and helping bring better content to Wikipedia!


Photo: By Symphoney Symphoney from New York, US – Ancient Tamil Script, CC BY 2.0, https://commons.wikimedia.org/w/index.php?curid=3902376

 

by Jami Mathewson at February 09, 2016 07:39 PM

“Did You Know?” highlights Visiting Scholar article on Pittsburgh folk musician

Robert_Schmertz,_1919University of Pittsburgh Visiting Scholar Casey Monaghan created the article about Pittsburgh architect and musician Robert Schmertz. In November, it was promoted to Good Article status, a designation of quality achieved by less than .5% of Wikipedia entries.

Just before Christmas, it also appeared on the “Did You Know?” section of Wikipedia’s main page:

“Did you know that in 1917, future folk musician and Carnegie Institute of Technology professor emeritus Robert Schmertz was arrested while dressed in ‘a girl’s middy blousy and a small white hat’?”

Schmertz donned the outfit to poke fun of the United States Navy’s traditional attire. He was pretending to recruit outside of a movie theater as part of a fraternity initiation ritual. He was arrested for “mocking the uniform.” The charge was later dismissed.

Known for his folk music, Schmertz’s songs have been covered by Pete Seeger, Burl Ives, and other notable performers.

The image of Schmertz is one of several that Casey has uploaded from University of Pittsburgh collections. William W. Irwin and William J. Howard are just two more of the dozens of articles about Mayors of Pittsburgh that have portraits from the library’s Historic Pittsburgh collection. See this gallery on Wikimedia Commons for more, including the historic illustration of Fort Pitt, shown above.

Interested in sponsoring or becoming a Visiting Scholar? See the Visiting Scholars page on our website for more information about the program.


 Photos: Robert Schmertz, 1919” by Unknown – Staff of “The Thistle” Year Book of Tech. The Pittsburgh Gazette Times. Second section, page 7. May 11, 1919.. Licensed under Public Domain via Commons. Header: Fort Pitt, Pennsylvania, 1759” by Pittsburgh Photo Engraving Co.; unknown artist – White, Edward, and Lucas, De Witt B. 150 years of unparalleled thrift: Pittsburgh Sesqui-centennial chronicling a development from a frontier camp to a mighty city; official history and programme. 1908. p. 2. Immediate source: Historic Pittsburgh digitization. Licensed under Public Domain via Wikimedia Commons.

by Ryan McGrady at February 09, 2016 05:00 PM

Gerard Meijssen

One world is where #Wikimedia is weak

#Wikimania, the annual conference of the Wikimedia Foundation, brings people from all over the world together. It instills some needed cohesion in a community that is from around the world. It brings together hackers, editors, language buffs.

In its infinite wisdom the WMF has decided that Wikimania is to be bi-annual. The notion is that "local" conferences may be held every other year. To be honest, what I fail to understand what this will bring. It is obvious that global cooperation will suffer. The only thing I see is that it may reduce some costs.

One third of only 82 respondents pointed to the unique value of Wikimania. Given the size of the population interviewed I doubt the reliability of the findings. I am not one to readily dismiss what the WMF has to say but in this the WMF fails its mission.
Thanks,
     GerardM

by Gerard Meijssen (noreply@blogger.com) at February 09, 2016 07:16 AM

#Wikipedia - Eric Kandel the relevance of his "occupation"

Mr Kandel won the Nobel prize among many other awards. His reputation is safe and it is why he is a good example to talk about what he is about and as importantly what he is not.

Wikipedia has it that he is a "neuropsychiatrist". Neuropsychiatry is a "branch of medicine that deals with mental disorders attributable to diseases of the nervous system". Given what Mr Kandel is known for, it makes sense to call him a "neurospsychologist" because neuropsychology "studies the structure and function of the brain as they relate to specific psychological processes and behaviors."

Mr Kandel may have studied psychoanalysis but the work he is celebrated for did not deal with people, so he did not deal with mental disorders; the Wikipedia text is quite clear about this. By considering Mr Kandel as a neuropsychiatrist, it is implied that his work has a practical application to mental health while at most his work helps explain how memory works.

The work of Mr Kandel is important but it is a fallacy to call it psychiatry when it so obviously is not. It is a fallacy because it takes away from what psychiatry is about.
Thanks,
     GerardM

by Gerard Meijssen (noreply@blogger.com) at February 09, 2016 05:37 AM

February 08, 2016

Wikimedia Foundation

Looking at Wikidata and the future: Gerard Meijssen

Gerard_Meijssen
Meijessen, seen here in 2011. Photo by Victor Grigas, freely licensed under CC BY-SA 3.0.

Gerard Meijssen is a high-volume contributor to Wikidata and frequently comments on the state of the Wikimedia movement through his personal blog and posts to Wikimedia-l, the community’s mailing list. Here, fellow community member Syed Muzammiluddin interviews him about the Wikimedia movement and his contributions to it. As always, comments and critical commentary are invited in our comments section at the bottom.

You’ve played a significant role on Wikimedia projects, including the Language committee, OmegaWiki, ICANNWiki, and more. How do you describe your contributions in your own words?

The one thing why my contributions matter is that I believe very much that the only way to be credible is by being involved and making a difference by leading by example. To me Wiktionary proved to be horribly broken and it is why I tried the OmegaWiki approach.

Given that Wikipedia is not English only, it makes sense to have other languages and at the time it was likely that no more new languages would be possible. This is why the Language Committee started. When you want to make a difference, it is not about what is wrong, it is more about how it can be improved, how to make it right.

Going by your contributions, your preferred project appears to Wikidata. Why is that?

Wikidata has many similarities with OmegaWiki. It has the benefit of being supported by the WMF and it makes a difference because it makes data accessible in any language. To me this makes a crucial difference. Every language is eligible as long as it is recognised in ISO-639-3. Wikidata was relevant from the start because it is such a big improvement for Wikipedia.

What especially motivates you to be a zealous contributor on this project, with over two million edits?

Making a lot of edits was relatively easy because much of the data was based on information available in the categories of Wikipedia. As I blog about the work that I do and the difference it makes, I invite people to make a difference as well. When I can make a difference, everybody can make a difference.

What special areas have you contributed to on Wikidata?

I have been particularly involved in people who recently died. For them I have been adding information that connects them with others. Things like the school they went to, the club they played for and the award they received. At this time I am particularly interested in awards because it connects “the best and brightest”.

You’ve remarked that “Wikidata is a tool, it is a mechanism that allows us to make a difference. Only storing data is so little of what we can do.” Can you elaborate on your “difference” statement?

Adding data to Wikidata is important, it is one way of improving quality. There are however issues with some of the data we hold. When we have a clue where we have issues, it makes a real difference when we point people to them. It is how their effort gains value. We should concentrate on where we have the most effect.

What according to you is the most significant aspect of Wikidata which is often not recognized?

The most important aspect of Wikidata is its quality. Many people approach it on the basis of single items and are easily disappointed. There has never been research to determine relative quality for our data. The main point to do this would be to determine where we can do better. Whatever the number, people will complain because they do not understand what Wikidata stands for and the many ways you can experience quality.

What are the most interesting areas of Wikidata according to your experience?

Wikidata is too big to really appreciate everything that is going on. I love what Jane and Maarten are doing for GLAM’s and art. I am really curious about what is done about genes and proteins. My interest is mostly in people and what I find is how much the “third world” is lacking in attention. I find that there are some things that I can do but it is only scratching the surface. Given the overwhelming amount of data on the “global north” it hardly registers.

What special approaches do you suggest to create awareness and a high level of enthusiasm about Wikidata?

There are lists that can be automatically updated on a Wikipedia using the Listeria bot. When you have the same list on multiple Wikipedias for instance about an award, all the list will be updated when Wikidata is updated.

In one of my earlier interviews, it was suggested that training and workshops are needed for Wikidata. How important do you feel is the training and orientation here?

When you start from scratch and when you are not fluent in English, it makes a real difference when you first get a good introduction followed by hands on experience together. Once multiple people “get” Wikidata they cab support each other and together they will raise the relevance of their culture and language a lot.

As a sysop on the Urdu Wikipedia, I noticed that clicking “Edit links” often opens the page for creating new Wikidata item. Users innocently create a new secondary item and this become s a mess. How do we deal with such situations?

The first thing is to keep calm. Things happen. You can check if an item already exists in another language and just add it to the existing item. Alternatively when you find that two items exist for one subject, it is easy enough to merge them. There is a gadget that makes it easy.

There are bots active on Urdu Wikipedia and several other Wikis which add categories to the articles automatically provided the articles and the categories of Urdu Wikipedia are integrated with English Wikipedia on Wikidata. Can you suggest additional areas where Wikidata integration can ease editorial activity?

When you add labels to items in Wikidata, you will find them in your Wikipedia when you enable Wikidata Search like it is done in the Tamil Wikipedia. Try searching for for instance for any article that exists in the Urdu Wikipedia and, you will find it anyway.

In moving to Wikidata, identifiers and other information in non-English Wikipedia pages get integrated along with English Wikipedia. Is the process giving more weightage to English Wikipedia or do you believe it aims for a harmony of English and non-English Wikipedias?

VIAF is a system of the international library organisation. By moving to Wikidata any language is now represented with equal relevance. So it is a very welcome move as it removes the existing English bias and brings in all the other languages from the cold.

Do you suggest some changes in the present structure of Wikidata?

If there are two things I would suggest, The first would be to have Reasonator implemented natively for Wikidata; it would give meaning to the data that is available. The second is that I would spend a lot more effort on comparing sources and in an iterative way ensure that the quality of Wikidata improves.

Do you believe Wikidata can check self-promotional efforts of individuals through different versions of Wikipedia?

This is a non issue for me. The only way any person becomes more relevant is by ensuring that he or she is well connected to other items. In the process all the others become more relevant as well and when the data provided is wrong, it becomes easier to find fault and remove any nonsense. Knowing objective facts is worthwhile in its own right. When a subject is hardly relevant, knowing the facts about it do not hurt.

How do you leverage your association with Wikidata vis-à-vis other Wikimedia projects?

I have been involved in so many projects that I do not care too much about the distinctions between projects. There is only one “sum of all knowledge” and this is not specific to any one project. With Wikidata we are able to serve the sum of all available knowledge and it provides the glue between the many projects that make up everything that is the Wikimedia Foundation.

How do you foresee the future of Wikidata?

Wikidata increasingly connects many sources. My expectation is that people will build applications that rely on data that is available never mind where it is. These applications will estimate how likely information is relevant and correct. One easy example is suggesting to load a book from the library and enable you to reserve it at YOUR local library. Alternatively they will provide you a book from Wikisource.

Syed Muzammiluddin
Wikimedia community volunteer

by Syed Muzammiluddin at February 08, 2016 10:29 PM

Wiki Education Foundation

Wiki Education Foundation launches Wiki Playlist

Frank Schulenburg
Frank Schulenburg

Over the past 15 years, volunteers from all parts of the world have shared their knowledge with others through Wikipedia. As a result, every person on the planet who has access to the internet can freely access information about a wide variety of topics. Starting today, Wikipedia’s readers can acknowledge that gigantic effort made by ten thousands of volunteer editors by selecting their favorite articles and sharing that list with their friends and family.

Over the last couple of months and executed by our incredible tech partners WINTR in Seattle, Wiki Education Foundation developed the Wiki Playlist, a new tool that enables readers to select their favorite Wikipedia articles and share this selection through social media. The Wiki Playlist has been created in the context of Wiki Education Foundation’s Year of Science, an initiative to increase content quality of science-related articles on Wikipedia.

Screen Shot 2016-02-08 at 10.55.46 AMBy making the Wiki Playlist available to millions of Wikipedia readers, our organization actively pays tribute to the longtime effort of Wikipedia contributors — including many of our program participants — to continuously increase Wikipedia’s quality by improving articles and uploading photos to the world’s largest resource of free knowledge.

Already, the Playlist has attracted a diverse set of interesting articles:

What will yours contain? Create it at http://playlist.wiki.

I would like to thank Wiki Ed staff LiAnna Davis and Sage Ross who’ve worked tirelessly on shaping this new feature and on overseeing its development. I would also like to thank the Simons Foundation for their generous grant that made Wiki Playlist come to life as part of Wiki Education Foundation’s Year of Science. And finally, I would like to applaud our tech partners WINTR in Seattle for creating a brilliant and seamless user experience for people who would like to share the joy of Wikipedia with others.

Frank Schulenburg
Executive Director of the Wiki Education Foundation

by Frank Schulenburg at February 08, 2016 06:30 PM

William Beutler

Twitter and Wikipedia: Parallel Challenges

Twitter-Wikipedia

Twitter has had an almost unprecedented run of bad press lately. Its stock is down, its executives are out, and uncertainty reigns. In recent weeks, Twitter has announced (or had leaked) plans to change the platform’s famous 140-character limit, its reverse-chronological order of messages, and the site’s most vocal users are fearing, and saying, the worst.

The more I read of it, the more I think about the bad press Wikipedia has received over the past few years, and I see some striking parallels.

To be sure, they are very different entities. Most importantly, Twitter Inc. is a publicly traded company, while the Wikimedia Foundation is a non-profit organization. But both are important platforms in the online information ecosystem facing significant questions about not just its future but even its present. Both have much in common in their history and structure, and in the challenges they now face:

  • Wikipedia and Twitter both started out as side projects of other projects that weren’t going anywhere: Wikipedia of traditionally-edited online encyclopedia Nupedia, and Twitter of possibly-before-its-time podcast directory Odeo.
  • Both are basically monopolies in their particular corner of the information ecosystem: Wikipedia has no competitor in collating “sum of all human knowledge” into readable text; Twitter is the only public, real-time conversation network (in perhaps this alone it has bested Facebook). Both have been described as a “utility” at one time or another.
  • Both are among the most-recognized, heavily-visited destinations on the web. Google pretty much points searchers to Wikipedia by default, and recently re-upped a deal to provide Twitter results in searches. Both are top 10 global websites: according to Alexa, Wikipedia is 7th and Twitter is 10th. In the U.S., Wikipedia is currently 6th and Twitter 8th.
  • Both are open publishing platforms, inviting its readers to be contributors. Even so, the vast majority of participants (broadly defined) choose only to consume. Wikipedia’s reader base has always vastly exceeded its editors, which isn’t a huge surprise. But Twitter has been trending this way for a number of years. (See also: the Pareto principle, the Internet’s 1% rule).
  • One possible reason why both have so few active contributors is that they are both notoriously difficult to use. This is rather obviously true for Wikipedia. It is, after all, an encyclopedia, and making beneficial contributions to it requires time, knowledge and inclination (not to mention persistence and thick skin). Twitter’s 140-character simplicity belies its true complexity, as Walt Mossberg has argued recently.
  • Both are organized as democratic, non-hierarchical platforms where everyone theoretically has an equal chance to be seen and heard. But of course invisible hierarchies emerge, as certain power users self-identify through the strength of social ties or canny dexterity with the platform. Twitter at least makes follower counts public, while Wikipedia is considerably more opaque.
  • For each, active users grew dramatically (even exponentially) until hitting a peak and then declining. This happened for Wikipedia in 2007, which happened to be the same year Twitter first started gaining traction. However, this growth ran out by 2009, making for a very similar looking user growth-and-decline charts:
  • Growth and decline: Wikipedia editors at left; Twitter audience at right.

    Growth and decline: Wikipedia editors at left; Twitter audience at right.

  • Both allow users anonymity—or, more accurately, pseudonymity—which arguably fosters a community culture suffering from a lack of responsibility and accountability. Relatedly, both have had significant trouble with the so-called Gamergate movement, and female users of both platforms have reported serious harassment issues.
  • Fallings out among top leadership have been the norm since the beginning. At Wikipedia, co-founder Larry Sanger became disillusioned with the project, leaving Jimmy Wales free to bask in the glory of being a “digital god” as the Evening Standard actually called him last week. As Nick Bilton described in his book, Hatching Twitter, Twitter’s most contentious co-founders, Jack Dorsey and Ev Williams, were at each other’s throats almost constantly. Multiple defenestrations later, Dorsey once again leads the company as CEO.
  • Besides the personal squabbles among its founders, both have experienced very recent and very concerning internal confusion at the company / parent organization, riven with conflicts about the future of the organization, and a revolving door of high-level executives. For Twitter, this has been in the tech press almost constantly. For Wikipedia, this has been covered most extensively by only The Wikipedia Signpost and a handful of blogs, including this one.
  • The direction of each has caused immense consternation in the community of power users who are conflicted about revisions to the platform, both rumored and launched. Impending changes to Twitter’s character limit and algorithmic order of tweets can be compared to community revolts over several recent software initiatives, especially the Visual Editor debacle, which sought to fundamentally change the nature of editors’ interaction with the site. At present, Wikipedians are anxious to know if this “Knowledge Engine” project is another.
  • For both, the silver lining is that their position is secure so long as arguments are being had there: that people care about what is being said on each website. No matter what ails each one, no competitor is likely to displace them, and their core function is likely to be relevant for the foreseeable future.

Are there lessons for one or the other? I’m not so sure. One conclusion that does occur to me as a longtime Wikipedia editor, observer and fan: how fortunate is Wikipedia to be a non-profit foundation right now! Whatever complaints one may have about Jimmy Wales, and there are many valid ones, his decision to forsake the chance to become “an Internet billionaire” on the back of Wikipedia, as The New York Times once put it, infelicitously, owes significantly to its central role on the Internet today. Had, for example, Wales insisted on monetizing Wikipedia with advertising (something Twitter once, long ago, promised it would never do, and only recently has begun turning off ads for power users) the rest of Wikipedia’s contributors might have walked out the door along with the 2002 “Spanish fork”.

Twitter, on the other hand, was founded by startup veterans who probably never seriously considered doing anything but become Internet billionaires. (For what it’s worth, Dorsey and Williams both achieved this goal.) I come here not to criticize the ambition, but to observe that it hasn’t worked out so well for the platform. In its attempts to generate revenue to match their brand recognition, Twitter has experimented with several different strategies and business models. Unfortunately, these often ran at cross-purposes to what Twitter was good at, as observers from Ben Thompson to Twitter investor Chris Sacca have both written. That it is now publicly traded is a worse headache, and places on it a burden of expectations that may ultimately spell its doom as an independent company.

Fortunately for Wikipedia, it has a clearer notion of what it should be. It is an encyclopedia. Its recent struggles may owe something to the fact that the Wikimedia Foundation doesn’t always seem to recognize that. Twitter may have largely succeed at becoming “the pulse of the planet” but, for a company whose shareholders expect continuing growth, that isn’t enough.

by William Beutler at February 08, 2016 05:44 PM

Wiki Education Foundation

Happy Birthday, Charles Darwin!

Charles Darwin was born on February 12, 1809. In lieu of a cake (we’re a bit concerned about the flames from 207 candles), we’re celebrating the upcoming birthday of this pioneer in biology by sharing great student work related to evolution.

From Dr. Urs Schmidt-Ott’s course, Evolution and Development at the University of Chicago:

  • Insects that go through all four developmental stages (egg, larva, pupa, and adult) exhibit holometabolism (also called ‘complete metamorphosis’). A student doubled the size of the article, adding two paragraphs on the evolutionary context and and another five discussing theories on the evolution of this developmental sequence.
  • Another student contributed a section to the article on fish fins, exploring the evolution of paired fins. That contribution adds several paragraphs, and shares some history of “Gegenbaur hypothesis” that paired fins come from gills. The section briefly discusses how that theory has evolved and changed over time.

One of the major questions Darwin tackled was sexual dimorphism in species: Why are there differences in appearance between sexes? And why were so many of them so impractical? Darwin even once admitted that the entire idea of a peacock’s feather made him sick with frustration. Too bad he couldn’t look it up on Wikipedia.

Students in Dr. Kasey Fowler-Finn’s Evolutionary Biology course at St. Louis University have added tremendous content about dimorphism and sexual selection in species. They could have saved Darwin a lot of work.

These students also shared knowledge about sexual selection and mating patterns:

  • In an article on the Red flour beetle, student added bits on patterns of reproductive fitness and variation.
  • In the Drosophila_pseudoobscura article, students added sections on polyandry and its consequences for this species of fruit fly.
  • In the Gryllus_bimaculatus article, students added sections on polyandry for this species of field cricket.

Thanks to these students for contributing great content about evolution and species to Wikipedia.

It’s just one example of the kinds of work students can do through the Wikipedia Year of Science. We’re still able to support courses in current and upcoming terms that want their students to have a similar experience. We even offer an entire guidebook for students writing articles about Species, or another for Ecology, and even one for Genes and Proteins. If you want to find out more, check out the Year of Science page or drop us an e-mail.


Photo:Charles Darwin 01” by J. Cameron – Unknown. Licensed under Public Domain via Wikimedia Commons.

by Eryk Salvaggio at February 08, 2016 05:00 PM

Content Translation Update

February 8 CX Update: Fixed Infinite Loops, More Machine Translation Support, and Improved Suggestions

First of all, congratulations to all Content Translation users: There are now 50,000 published articles! In the Wikimedia Blog you can read more on that, along with a round-up of Content Translation’s first year.

Because of some technical issues, scheduled deployments of new features were again delayed for a few weeks. On February 4th they were finally resumed, and here are the most important updates:

  • If a user started a translation, deleted it, and then started it again, the translation interface would go into an “infinite loop” of loading, and become unusable. This is now fixed. (bug report)
  • Featured articles are now shown as suggestions only if there are no other useful suggestions to show. (bug report)
  • The link from the dashboard to the tool that shows articles that don’t exist in your language is removed, on the premise that the integrated suggestions are more useful.
  • Machine translation using Yandex is now available for Albanian, Armenian, Bashkir, Polish and Uzbek.

by aharoni at February 08, 2016 03:34 PM

Wikimedia UK

A year as Wikipedian in Residence at the National Library of Wales

This post was written by Jason Evans, Wikimedian in Residence at the National Library of Wales and first published on their website.

Hundreds of new articles created, thousands of images shared and millions of hits on Wikipedia

It’s been a year now since I began my journey into the world of Wikipedia. My brief was simple enough – get people editing, engage the community and embed an open access ethos at the National Library of Wales.

With 18 billion page views a month it seems that Wikipedia is most peoples’ one stop shop for information of any kind, and across the world top cultural institutions have been teaming up with the giant encyclopaedia in order to share their knowledge and their growing digital collections. The Nations Library’s goal is to provide knowledge for all, and Wikipedia is just one avenue being used to share that knowledge.

 

Making Wikipedia better

Wikipedia has not been without its critics, and its policy of inviting anyone and everyone to contribute means that some articles have certain shortcomings. To help remedy this and to better represent Wales on Wikipedia, a number of community events, or ‘Edit-a-thons’, have been organised to train new Wikipedia editors on a number of subjects from Medieval Law to the Rugby World Cup.

Over 100 people have volunteered to have a go at editing during organised events, and Wikipedia’s introduction of the new ‘Visual Editor’ has made contributing even easier.

A volunteer improving Wikipedia articles relating to WWI at a Public Edit-a-thon event

A volunteer improving Wikipedia articles relating to WWI at a Public Edit-a-thon event

Staff and members of the Library’s enthusiastic volunteer team have also been busy working on Wikipedia related projects, and with 6.5 million printed books in the Library vaults there is no shortage of information to be added.

Through the course of the year it has also become apparent that Edit-a-thons act as a gateway for community engagement. They help engage the public with the library, its collections and with Welsh heritage in a flexible, inspiring and subtle way.

 

Sharing

The Library began digitising its collections nearly 20 years ago and has now amassed hundreds of thousands of digital items representing all aspects of Wales cultural heritage. More recently a major shift in policy meant that they no longer lay claim to copyright of digital images, if copyright in the original works has expired.

This open access policy has led the library to start sharing parts of its digital collections on Flickr and social media. During the residency the library have taken the next step towards openness by sharing nearly 8000 images with Wikipedia’s sister project Wikimedia Commons, where they are freely available to all without any restrictions.

Already, National Library of Wales images have been added to over a thousand Wikipedia articles in more the 70 languages and since those images were added, these articles have been viewed nearly 33 million times, highlighting the incredible exposure Wikipedia can facilitate.

Statistics highlighting the impact of sharing images via Wikimedia Commons

Statistics highlighting the impact of sharing images via Wikimedia Commons

Impact

Improving content and sharing collections are both crucial aspects of the residency but it is equally important that the benefits of activities are clearly recorded and shared with others.

Demonstrating impact certainly made it easier for the Library to extend the residency, and one of the library’s major partners, People’s Collection Wales have taken big steps toward open access and a sustainable relationship with Wikipedia.

One of the first things I did as a Wikipedian was to delve into the world of Twitter as a way of networking and sharing news about the residency, and this has led to great exposure both for the Library and for Wikipedia in Wales. Community events and digital content shared with Wikimedia Commons has caught the eye of news agencies, magazines and bloggers alike.

Infographic highlighting advocacy work during the first year of the residency

Infographic highlighting advocacy work during the first year of the residency

 

What next?

Together the Library and Wikimedia UK were able to extend the residency beyond the initial 12 months and the post is now funded until August 30th 2016.

Work on improving Wikipedia content will continue in English and in Welsh and thousand more images will be made available via Wiki Commons.

Images from the National Library of Wales in Wikimedia Commons. (left to right) Powis Castle 1794, 'Boy destroying Piano by Philip Jones Griffiths, The siege of Jerusalem from the medeival 'Vaux Passional' manuscript.

Images from the National Library of Wales on Wikimedia Commons. (left to right) Powis Castle 1794, ‘Boy destroying Piano’ by Philip Jones Griffiths, The siege of Jerusalem (70AD) from the medieval ‘Vaux Passional’ manuscript.

Existing partnerships will be built upon, but I also want to reach out to other Welsh cultural institutions and encourage them to get involved in any way they can.

One of the biggest challenges between now and August will be finding ways to get Wikipedia into  the education sector – to encourage young people and their teachers not to ignore the enormous globe shaped elephant in the room, but to engage with it responsibly.

Finally, all credit to the National Library who have embraced Wikipedia. With their open access, knowledge for all ethos my residency has been supported at every turn. Steps are now being taken to ensure that the legacy of the residency will be long and fruitful, helping ensure that Wales, its people and culture are well represented on the world’s biggest ever encyclopaedia.

by Richard Nevell at February 08, 2016 12:16 PM

February 07, 2016

Wikimedia Foundation

He wrote the article on the Carolina Panthers: tales from Wikipedia’s NFL editors

Broncos_vs_49ers_preseason_game_at_Levi's_Stadium
The Carolina Panthers and Denver Broncos will play Super Bowl 50 in Levi’s Stadium, seen here. Photo by Jim Bahn, freely licensed under CC BY 2.0.

As with many topics, the English-language Wikipedia’s coverage of the National Football League is extensive. Players from as large as Cam Newton or Peyton Manning to as little-known as Shamar Stephen or Gerald Christian all have articles, and their statistics are updated after every game by legions of volunteer editors.

The encyclopedia even has a dedicated project to keep track of these reams of articles: WikiProject National Football League (NFL). Of the 22,265 articles in its scope, 80 are of “featuredquality—that is, they have gone through and passed a rigorous peer-review process, making them “the best articles Wikipedia has to offer.”

We talked with several members of the project about their work on the site, starting with the pseudonymous Toa Nidhiki05, the author of the featured article on the Carolina Panthers. Toa has been a fan of the Panthers since he was eight or nine years old, starting with their run to the Super Bowl in 2003—where they lost to the New England Patriots, who were in the middle of a historically dominant period where they won three of four Super Bowls.

Toa had started editing Wikipedia with articles on Christian rock bands like MercyMe, but as a Panthers fan, he was eventually “naturally” drawn to the Wikipedia article on his favorite team. He was the first to try to get an NFL football franchise’s article to featured status, but he was able to draw from other sports for the examples he needed.

Since he finished in 2013, the team has markedly improved, having won 12 games in that year and 15 this season. Toa told us that in step with their improved performance, the article has received more attention and “vandalism”—problematic or joke edits that are usually reverted in a few minutes.

How does it feel to have your favorite team in the Super Bowl? Toa told us that “This whole season was a shock so I’m still kind of processing it.” He’ll be watching the game with his family, possibly with tacos on the side.

We also talked to WikiProject NFL members Bagumba, Rockchalk717, Dirtlawyer1, Giants2008, Crash Underride, and Dissident93. At least five of the six are Americans, but all of them take on different roles when editing Wikipedia: Crash Underride and Dissident do important but unheralded “wikignome” work, “keeping things consistent from one article to the next.” On the other side of the coin, Giants2008—as you might expect by the username—helped to construct an entire suite of New York Giants-related articles.

Bagumba and Dirtlawyer both avoid articles related to current events; Bagumba tries to avoid the “fanboy” elements inherent in San Diego Chargers articles and tries to do their gnoming in older topics, while Dirtlawyer similarly prefers to take advantage of the higher-quality references that appear months after events—and not just for the NFL. Dirtlawyer also takes on baseball, university sports (especially the Florida Gators), and Olympic swimming. Rockchalk contributes to Kansas City Chiefs articles, which are his favorite team, in his 15–20 hours a week of editing after work.

When it comes to the Super Bowl, most of these hardcore editors avoid the article because they are popular enough to get attention from mainstream editors. Giants, however, keeps up on several related articles, like the Super Bowl Most Valuable Player Award. “While it’s the kind of article that typically needs to be updated only once per year,” Giants2008 says, “it does face a lot of vandalism that has to be cleaned up. It requires a careful eye to ensure that the information remains accurate.”

These editor’s missions are not any different than the tens of thousands of Wikipedia editors that watch over the world’s largest encyclopedia 24 hours a day, 7 days a week. They watch over the sum of all knowledge, whether it’s American football, military history, medical content, or African breads, to ensure that it is preserved for generations to come.

Ed Erhart, Editorial Associate
Wikimedia Foundation

by Ed Erhart at February 07, 2016 05:03 AM

February 06, 2016

Gerard Meijssen

#Wikipedia: R. Kerry Rowe - #Sources may stink

For whatever reason I stumbled upon the Miroslow Romamowski medal. It is a Canadian medal named after a Polish scientist. When I add people to a medal, I make sure that the medal is at least an award and typically I add the person the award was named after. Searching for Mr Romanowski, a Portuguese article on the award proved to exist as well and it was merged.

The Wikipedia article is well maintained; this is assured by the date when the official website was last retrieved; "Retrieved 22 November 2015" and this source shows that the award has not been conferred after 2012. Mr R. Kerry Rowe however has been awarded the medal in 2015 and this can be sourced to this.

For Mr Rowe there is a German article, It indicates that he won the Legett award, its source is a dead link but hey, there is enough information to find the society that conferred it, and finally a functional source for the award.

This award is in and of itself not that relevant. When you combine the links to articles as can be found in the "Concept cloud" and Reasonator for Mr Rowe, it becomes interesting, Combined they show the known relations for an article, an item. It would be cool when the two could be merged. When a link in a Wikipedia article is known as a Wikidata statement, it would become interesting to make the statement that Mr Rowe gave the Rankine Lecture.
Thanks,
     GerardM

by Gerard Meijssen (noreply@blogger.com) at February 06, 2016 11:14 PM

#Wikipedia - Shows updates from #Wikidata

Seeing updates go live on a Wikipedia when updates happen on Wikidata; wow! Mr Starobinski is from Switzerland so it is not really surprising when German language and French language awards are awarded.

At first a Q1730045 showed up; it is the "Karl-Jaspers-Preis" it did not have a label so I added one in French and now it shows in black. I noticed dates, so I added them all. <grin> if there is one thing the script could do is sort them by date :) This is a wonderful experience!
Thanks,
      GerardM

by Gerard Meijssen (noreply@blogger.com) at February 06, 2016 11:10 PM

#Drapetomania is in other forms alive and well

A #scientist by the name of Samuel A. Cartwright wrote a paper defining a mental disorder called drapetomania. The suggested treatment was that "they should be punished until they fall into that submissive state which was intended for them to occupy. They have only to be kept in that state, and treated like children to prevent and cure them from running away". The mental disorder described slaves that run away. Drapetomania is no longer believed to be a mental disorder.

In the twenty first century, professor Jonathan Metzl wrote a book called The Protest Psychosis, it describes how schizophrenia was used to define people who had notions about their civil rights. The book describes an era where the DSM-2 was the latest and greatest to define mental health.

It is all too easy to welcome the publication of Mr Metzl as an important work as has been done by leading psychological institutions. Both Mr Cartwright and the DSM-2 have already been shown to be obsolete.

What the aftermath of the DSM-2 and Mr Cartwright prove is that papers on psychiatry cannot be relied on because current approaches to psychiatry are equally problematic. A recent publication (in Dutch) provides many arguments. One of the more relevant arguments is that many of the current studies are not reproducible and what they describe are based on theoretical constructs that are not universally agreed upon.

The book argues that publications about subjects that are fashionable have a better chance of being published over publications that expose methodological weaknesses (Delespaul, Milo, Schalken, Boevink & Os, 2015 p31,32).

Given that there is enough literature to support this point of view, what does it mean for a Wikipedia where sources are a holy grail that is to be ingested without all too much salt? What sources can be relied on and why and how do we recognise official pot quackery?
Thanks,
      GerardM

by Gerard Meijssen (noreply@blogger.com) at February 06, 2016 02:10 PM

February 05, 2016

Wiki Education Foundation

Looking back at the Classroom Program in 2015

Helaine Blumenthal
Helaine Blumenthal

Wiki Ed supported more students in 2015 than ever before. With improvements in our tools and resources, we’ve been able to maintain quality work from those student editors.

It was a year of rapid growth and considerable change. In one term, the courses we supported rose from 117 (spring 2015) to 162 in fall 2015 — an almost 40% increase. The number of students enrolled in those courses rose from 2,290 in spring to 3,710 in the fall — a 60% increase. Closing the gender content gap, one of our major initiatives, has also seen great success. We supported 34 women’s studies classes in 2015. In those classes, 907 students improved 696 articles and created 89 new ones.

Those numbers tell a story of incredible growth supported by more resources available to classes than we’ve ever been able to offer. These students contributed 5.15 million words to Wikipedia, improved 10,200 articles, and created 1,019 new entries. Printed and bound as a book, that would be just over 7,000 pages— 13 days of silent reading.

But there’s a quirk in that story.

When we compared Fall 2015 to Fall 2014, we saw that there were nearly 1,000 more students in Fall 2015. That’s great news. But the weird thing was that these students didn’t seem to be contributing as many words as their cohort from Fall 2014.

We scratched our heads. Are our new resources causing a reduction in student contributions? We’ve always known that growth alone does not a success make. To keep quality on pace with quantity, we launched our Dashboard in the spring. That’s helped instructors create better Wikipedia assignments and track student work. Moving from Wikipedia to our Dashboard was a major change to the Classroom Program. It has been a big improvement for instructors, students, and the community.

And yet, students contributed less content.

We wondered if the course creation tool was making it so easy for instructors to design courses, that they were designing courses where students weren’t asked to contribute as much as we know they can.

We also changed the way students take, and instructors track, the online training. As a result, many of our courses have added the training as a graded requirement. In the spring, 52% of the students we supported completed the training, and 74% completed it in the fall. We think that’s led to higher-quality contributions and fewer issues involving student work. But has it lead to fewer contributions?

It would have been a mystery, but luckily, Wiki Ed brought on Kevin Schiroo, resident data detective. His first case was to examine what happened to content this term. After all, our chief focus is on improving Wikipedia in ways that tap students’ abilities, and give students the confidence to make bold contributions of their knowledge to the public sphere. We want to make sure students in these courses are challenged to contribute the quality and quantity of work we know is possible.

What Kevin told us was kind of amazing. It wasn’t that students were asked to contribute less this term. It wasn’t that we had discouraged bold editing through our online training or classroom resources.

The issue was: Their content was being challenged less often, leading to fewer whole-scale revisions of content. We always encouraged students to contribute to Wikipedia in the spirit of a peer review. That’s one of the great learning experiences the assignment carries. In the past, students made contributions, which were then questioned by other editors. In the fall of 2014, our students were more likely to revert changes without discussing the reasoning behind the change. This resulted in stressful back-and-forth reversions, and even edit wars.

For each term, Kevin made a list of all the articles students had worked on. From that list, he pulled all the revisions that were made to those articles during the term to find and remove reverted edits. With a list that was clear of any unproductive contributions, he was able to tally all of the words that were added to the page by students, knowing that anything that remained was a productive edit. Counting in this way, the difference in words added between the two terms became significantly smaller. Kevin concluded that the reverted content had been inflating the productivity of Fall 2014 compared to Fall 2015.

We heard and responded to those concerns from 2014 throughout 2015. We created a series of red flags for onboarding classes, improved our student and instructor training, and created tools to track student contributions more efficiently. We’re serious about making sure Wikipedia assignments benefit Wikipedia, as well as student learning outcomes.

This term, we’re seeing the fruits of those efforts to improve contributions. Students are getting their contributions right, and when they aren’t, they’re more likely to discuss community edits appropriately. The result is that a whopping 40% of the drop in student content contributed to Wikipedia this term is the result of students following better Wikipedia etiquette. We’ve seen a real drop in reversions and problematic edits.

The content they’ve contributed may fill fewer books than students wrote in 2014, but the books they’re writing require less revising from longterm Wikipedia editors. And those books would hold some incredible, and diverse, content. For example, a detailed description of the surface of Venus, a history of Western Canada, lots of information about women scientists, information on Japanese theater (some using sources that had been translated from Japanese). It would also have a lot of information about bees and mass spectrometry.

In hindsight, it’s been a great year for student contributions on Wikipedia. In fact, we were surprised to see just how much the support efforts have paid off. It makes us even more confident that we can continue to grow through 2016 while maintaining excellent student contributions. We’re constantly expanding tools and resources to make sure this trend continues.

We’re still looking for courses in all subjects. Our Year of Science is especially aimed at supporting courses in STEM or other sciences. We’d love to hear from you.

Helaine Blumenthal
Classroom Program Manager

by Helaine Blumenthal at February 05, 2016 06:30 PM

Wikimedia DC hosting Wiki Ed workshop, Feb. 12

Year_of_Science_Logo.svgOn Friday, February 12, the Wiki Education Foundation will join university instructors in Washington, D.C., to talk about the Year of Science. Together, we’ll discuss how to share comprehensive science information with millions of readers by improving Wikipedia articles in the sciences. Along the way, we’ll challenge university students across the United States and Canada to improve critical thinking, media literacy, and science communications skills.

We thank Wikimedia DC members for hosting us!

Space is limited; please RSVP here. Instructors and faculty from any institution of higher education are invited to attend.

We’re grateful to Google and the Simons Foundation for providing major support for the Year of Science in 2016. Hope to see you there!

Samantha Erickson
Outreach Manager


Photo: “Aerial view of Georgetown, Washington, D.C.” by Carol M. HighsmithThis image is available from the United States Library of Congress‘s Prints and Photographs division under the digital ID highsm.14570Public Domain.

by Samantha Erickson at February 05, 2016 06:20 PM

Wiki Ed attended LSA conference in January

Samantha Erickson
Samantha Erickson

In January, I traveled to Washington, D.C. to attend the Linguistic Society of America annual conference and spread the word about the Wikipedia Year of Science. This was our first conference with LSA after announcing our partnership in November.

I met close to 100 linguistics instructors over three days for lots of conversation about language learning, digital pedagogy, and the presence of linguistics content online.

I attended an edit-a-thon run by LSA member Gretchen McCulloch, and a department chair’s roundtable meeting. The common theme across these events was that yes, students do use Wikipedia! The Classroom Program opens the discussion of media literacy, and finally questions how they use it.

Wikipedia classroom projects benefit linguists, too, by bringing accurate information to Wikipedia. Our partnership with the LSA helps ensure that information on linguistic topics online is accurate.

If you are an instructor interested in hosting a Wikipedia project or if you know of an instructor who may be a good fit, please let me know. I’m excited to support all new instructors in our Year of Science in 2016 and beyond!

We have plenty more workshops coming up:

  • Temple University, February 15, 3 – 5 p.m., Paley Library Lecture Hall, 1210 Polett Walk, Ground Floor.
  • Bryn Mawr College, February 16, 4:30 – 6 p.m. Location TBD. Registration required.
  • California State University, East Bay, February 19, 10 a.m. – 12 p.m., LI3079, Upper Mall level of the CSU East Bay Library, from 10 a.m. to noon (map). Register here. (more).
  • University of California, Davis, March 2, 10 a.m. – 12 p.m., Life Sciences building, room 1022 (map). Registration required (here), follow Wiki Ed for more details. Supported by the UC Davis Biotech program.

Samantha Erickson
Outreach Manager

by Samantha Erickson at February 05, 2016 05:00 PM

Jeroen De Dauw

Missing in PHP7: Named parameters

This is the second post in my Missing in PHP7 series. The previous one is about function references.

Readability of code is very important, and this is most certainly not readable:

getCatPictures( 10, 0, true );

You can make some guesses, and in a lot of cases you’ll be passing in named variables.

getCatPictures( $limit, $offset, !$includeNonApproved );

It’s even possible to create named variables where you would otherwise have none.

$limit = 10;
$offset = 0;
$approvedPicturesOnly = true;
getCatPictures( $limit, $offset, $approvedPicturesOnly );

This gains you naming of the arguments, at the cost of boilerplate, and worse, the cost of introducing local state. I’d hate to see such state be introduced even if the function did nothing else, and it only gets worse as the complexity and other state of the function increases.

Another way to improve the situation a little is by making the boolean flag more readable via usage of constants.

getCatPictures( 10, 0, CatPictureRepo::APPROVED_PICTURES_ONLY );

Of course, long argument lists and boolean flags are both things you want to avoid to begin with and are rarely needed when designing your functions well. It’s however not possible to avoid all argument lists. Using the cat pictures example, the limit and offset parameters can not be removed.

getApprovedCatPictures( 10, 0 );

You can create a value object, though this just moves the problem to the constructor of said value object, and unless you create weird function specific value objects, this is only a partial move.

getApprovedCatPictures( new LimitOffset( 10, 0 ) );

An naive solution to this problem is to have a single parameter that is an associative array.

getApprovedCatPictures( [
    'limit' => 10,
    'offset' => 0
] );

The result of this is catastrophe. You are no longer able to see which parameters are required and supported from the function signature, or what their types are. You need to look at the implementation, where you are also forced to do a lot of checks before doing the actual job of the function. So many checks that they probably deserve their own function. Yay, recursion! Furthermore, static code analysis gets thrown out of the window, making it next to impossible for tools to assist with renaming a parameter or finding its usages.

What I’d like to be able to do is naming parameters with support from the language, as you can do in Python.

getApprovedCatPictures( limit=10, offset=0 );

To me this is not a small problem. Functions with more than 3 arguments might be rare in a well designed codebase, though even with fewer arguments readability suffers greatly. And there is an exception to the no more than 3 parameters per function: constructors. Unless you are being rather extreme and following Object Calisthenics, you’ll have plenty of constructors where the lack of named parameters gets extra annoying. This is especially true for value objects, though that is a topic for another post.

by Jeroen at February 05, 2016 01:29 AM

February 04, 2016

User:Geni

What is the most significant subject wikipedia doesn’t have an article on

In absolute terms if you go by size its probably one of the voids that make up the large scale structure of the universe. In terms of mass it is probably a galaxy filament or wall.

On a less cosmic scale and limiting things to earth its probably the only ocean current not to have an article the South Australian Counter Current.

For humans its probably going to be a high ranking civil servant in a major country. Either someone high up in the Chinese system or someone like Kamal Pande the most recent Cabinet Secretary of India not to have an article

The most notable thing in wikipedia terms of shear number of citations commenting on its probably harder. Possibly some bacteria species popular in experiments? Its really rather hard to say.


by geniice at February 04, 2016 10:18 PM

Wiki Education Foundation

Questions and answers from Hunter College

Earlier this month, I hosted a “Teaching with Wikipedia” workshop at Hunter College alongside Chanitra Bishop. Twenty one instructors from across greater New York learned about our Classroom Program and Year of Science.

After showing off our Dashboard, the conversation shifted to details about running the assignment. Instructors wanted to know more about the six-week timeline, and wondered if the assignment might take up too much course time.

So, we walked them through a typical syllabus. Wikipedia assignments involve many small, weekly assignments that complement regular course work. These assignments can be as simple as creating a username on Wikipedia, for example — the task for week one. Once the assignment timeline was clear, instructors relaxed about the potential for integrating it.

If you’re an instructor, or know one who might be interested in a Wikipedia assignment, let us know. I’d love to help you get started!

We have some upcoming campus visits:

  • Temple University, February 15, 3 – 5 p.m., Paley Library Lecture Hall, 1210 Polett Walk, Ground Floor.
  • Bryn Mawr College, February 16, 4:30 – 6 p.m. Location TBD. Registration required.
  • California State University, East Bay, February 19, 10 a.m. – 12 p.m., LI3079, Upper Mall level of the CSU East Bay Library, from 10 a.m. to noon (map). Register here. (more).
  • University of California, Davis, March 2, 10 a.m. – 12 p.m., Life Sciences building, room 1022 (map). Registration required (here), follow Wiki Ed for more details. Supported by the UC Davis Biotech program.

Samantha Erickson
Outreach Manager


Photo:Samantha Erickson, Hunter College, NYC, 12 January 2016” by AramtakOwn work. Licensed under CC BY-SA 4.0 via Wikimedia Commons.

by Samantha Erickson at February 04, 2016 05:00 PM

Wikimedia Foundation

50 weird Super Bowl facts for 50 Super Bowls

Katy_Perry_-_Super_Bowl_XLIX_Halftime_02
“Left Shark” became a social media sensation last year for its offbeat and seemingly mistaken dance moves. Photo by Huntley Paton, freely licensed under CC BY-SA 2.0.

The first Super Bowl in 1967 was simulcast by two TV networks, NBC and CBS, which had to share one microphone in the postgame show. The teams used different balls because the Green Bay Packers and Kansas City Chiefs played in separate leagues and the balls were slightly different in shape. The cost of a 30-second commercial was $42,000. Due to the then-common practice of tape wiping, Super Bowl I was not seen again until 2016, when the NFL strung footage together from over two dozen sources and overlaid it with the radio broadcast.

These days, the Super Bowl is the most-watched US television broadcast each year—in fact, the NFL can say that with one way of counting, it holds the top 23 spots on the all-time list. Americans eat more on Super Bowl Sunday than they do any other day of the year, except for Thanksgiving. And with all the hoopla comes cultural zeitgeists: from multi-million dollar ads for failing startups to Left Shark’s viral popularity, the Super Bowl is a championship of pop culture.

Wikipedia chronicles them all.

The main Super Bowl article provides an overview of the National Football League championship, which started in 1966 in response to the growing popularity of the upstart American Football League. That page lists article pages for each game that note the halftime performers, cost of commercials, statistics, and quirky events.

Based in San Francisco and not far from the site of Super Bowl 50, the Wikimedia Foundation supports Wikipedia and its sister projects such as the media repository Wikimedia Commons. The foundation, fresh off celebrating Wikipedia’s 15th birthday, is paying homage to the Super Bowl’s 50th birthday with 50 fascinating factoids you are unlikely to find anywhere else.

Unlike mainstream media, Wikipedia and its sister projects are written and edited by volunteers—around 80,000 actively maintain its articles, which last year exceeded 5 million on the English-language Wikipedia alone (there are Wikipedia editions in 291 languages). Those volunteers combine and hone a crowdsourced view off the mainstream media path; there are many odd nuggets along the way. As we head into Super Bowl 50, the first one not to go by Roman numerals, take a peek at a quirky factoid for each of the games below.

Feel free to share them, show them off at your Super Bowl party, tweet them, or write about them in a blog post or article—Wikipedians will find more. Wikipedia’s Super Bowl of facts is played every day, all around the world, by all kinds of people.

 

Super_Bowl_I_Logo.svg

The first Super Bowl featured the top teams from two separate leagues—the American and National Football Leagues. They would later merge under the latter’s name. Logo by unknown, public domain.

 

  1. It is the only Super Bowl to have been simulcast. NBC and CBS both televised the game—with both wanting to win the ratings war, tensions flared and a fence was built between their trucks.
  2. Almost 80% of the country lost the video feed of the CBS broadcast late in the second quarter.
  3. Performers representing players from the teams appeared on top of a large, multi-layered, smoke topped cake.
  4. The cost of one 30-second commercial was $78,000.
  5. The two teams had a Super Bowl record 11 combined turnovers in the game.
  6. Dolphins safety Jake Scott entered the game with a broken left hand and soon broke his right wrist as well.
  7. Dolphins employees inspected the trees around the practice field every day for spies from the Redskins.
  8. The Vikings complained that their practice facilities at a Houston high school had no lockers and most of the shower heads didn’t work.
  9. Pittsburgh played for a league championship for the first time in its 42-year team history.
  10. Scenes for the film Black Sunday, about a terrorist attack on the Super Bowl, were filmed during the game.
  11. The national anthem was not sung. Vikki Carr sang “America the Beautiful.”
  12. Halftime featured the Tyler Junior College Apache Belles drill team.
  13. Cowboys linebacker Thomas “Hollywood” Henderson said opposing quarterback Terry “Bradshaw couldn’t spell cat if you spotted him the C and the A.”
  14. The Rams barely outscored their opponents, ending the season up only 323-309 overall, and finished the regular season with a 9-7 record—the worst ever by a team who advanced to the Super Bowl.
  15. The winning Oakland Raiders were suing the NFL at the time of the game over a proposed move to Los Angeles.
  16. 49.1 percent of all US television households tuned into the game, the highest-rated Super Bowl of all time.
  17. A players’ strike reduced the 1982 regular season from a 16-game schedule to 9.
  18. The broadcast aired the famous “1984” television commercial, introducing the Apple Macintosh.
  19. Ronald Reagan appeared live via satellite from the White House and tossed the coin on the same day that he was inaugurated for a second term.
  20. The Bears’ post-Super Bowl White House visit was canceled due to the Space Shuttle Challenger disaster. Members of the team were invited back in 2011.
  21. Giants players celebrated their victory with what was then a new stunt—dumping a Gatorade cooler on head coach Bill Parcells.
  22. The halftime show featured 88 grand pianos.
  23. Prior to the game, Coca-Cola distributed 3-D glasses at retailers for viewers to use to watch the halftime festivities.
  24. The halftime show featured a float so huge that one of the goal posts had to be moved so it could be put on the field.
  25. Whitney Houston performedThe Star-Spangled Banner,” and the recording reached No. 20 on the Billboard Hot 100.
  26. Bills defensive line coach Chuck Dickerson said Redskins tackle Joe Jacoby was “a Neanderthal—he slobbers a lot, he probably kicks dogs in his neighborhood.”
  27. The opening coin toss featured OJ Simpson, who was working for NBC Sports at the time; the halftime ceremony featured Michael Jackson and 3,500 children.
  28. This main stadium lights were turned off for a halftime performance by dancers with yard-long light sticks.
  29. 30 second ads exceeded the $1,000,000 mark.
  30. Some weeks before the game, it was found that some proxy servers were blocking the web site for the event because XXX is usually associated with pornography.
  31. The last in a run of 13 straight Super Bowl victories by the NFC over the AFC.
  32. Except for two penalties and quarterback kneel-downs to end each half, the Broncos did not lose yardage on any play.
  33. On the night before the Super Bowl, Falcons safety Eugene Robinson was arrested for solicitation of prostitution after receiving the league award that morning for “high moral character.”
  34. Pets.com paid millions for an advertisement featuring a sock puppet. The company would collapse before the end of the year.
  35. This was the last Super Bowl to have individual player introductions for both teams.
  36. Janet Jackson was originally scheduled to perform at halftime, but allowed U2 to perform a tribute to September 11.
  37. Referred to as the “Pirate Bowl” due to the teams involved (the Buccaneers and Raiders).
  38. Janet Jackson‘s breast was exposed by Justin Timberlake in what was later referred to as a “wardrobe malfunction“.
  39. The Eagles signed Jeff Thomason, a former tight end who was working construction, to a one-game contract for the Super Bowl.
  40. Aretha Franklin, Aaron Neville, John and a 150-member choir performed the national anthem.
  41. The Art Institute of Chicago’s lions were decorated to show support for the Chicago Bears—see the photo at the bottom.
  42. The band Eels attempted to pull together 30 one-second ads but were told they could cause seizures.
  43. Due to the recession, 200 fewer journalists covered the game than the previous year.
  44. The U.S. Census Bureau spent $2.5 million on a 30-second commercial advertising the upcoming census.
  45. Fans who paid $200 per ticket for seats in a part of the stadium damaged by a winter storm were allowed to watch outside the stadium.
  46. Some hotel rooms in downtown Indianapolis reportedly cost more than $4,000 a night.
  47. Power went out in the Superdome, causing a 34-minute interruption in play. Luckily Norman the Scooter Dog was in New Orleans to entertain.
  48. The Broncos hosted press conferences on a cruise ship at the pier of their Jersey City, N.J., hotel.
  49. Left Shark,” pictured at the top, became an Internet meme.

 

And for Super Bowl 50, the only Super Bowl to be identified without a Roman numeral: CBS set the base rate for a 30-second ad at $5,000,000, a record high price for a Super Bowl ad.

 

Lion_Chicago_Bears_Helmet
When the Chicago Bears last went to the Super Bowl, the city’s art institute decorated their lion statues. Photo by Señor Codo, freely licensed under CC BY-SA 2.0.

Barack_and_Michelle_Obama_looking_the_2009_Superbowl_with_3-D_glasses
Barack Obama watched part of Super Bowl 43 (2009) with 3D glasses. Photo by Pete Souza, public domain.

Super_Bowl_XLIII_-_Thunderbirds_Flyover_-_Feb_1_2009
A traditional flyover from military aircraft prior to the beginning of the game. Photo from the US Air Force, public domain. 

SB TV viewers by year
Television viewing statistics for each Super Bowl—all sourced from Wikipedia. The bars represent an average of the number of people watching, not the highest total reached during the event. Graph by Andrew Sherman, freely licensed under CC BY-SA 3.0.

Jeff Elder, Digital Communications Manager
Michael Guss, Research Analyst
Wikimedia Foundation

by Jeff Elder and Michael Guss at February 04, 2016 04:00 PM

Weekly OSM

weekly 289

1/26/2016-2/1/2016

Logo
Chefchaouen, Marocco – view in OpenTopoMap [1] | the world in OpenTopoMap

Mapping

  • A blog by Mapillary explains how crowd-sourced photography of cities can allow local 3D mapping which can be useful for local authorities and more up-to-date than sources like Google Streetview.
  • Matthijs Melissen created a proposal to deprecate tourism=gallery, which is still heavily discussed on the tagging mailing list.
  • In his e-mail concerning the UK 2016 Q1 mapping marathon, Rob Nickerson gives an overview on activities and evaluation tools:
  • David Marchal is wondering how wiki tagging votes have to be interpreted. He asks if they should be treated as MAYs, SHOULDs or MUSTs.

Community

  • The Ordnance Survey’s Cartographic Design Team has produced this for the Youth Hostel Association. It’s roughly from this area. User SomeoneElse commented: “I’m pretty familiar with this area, but I don’t remember a river at the top of the ridge near Hollins Cross!”
  • There is a discussion in the Spanish OSM list about the urgent update in the names of the nameless Street labels defining up to some levels that every country should have. The aim is to improve and homogenize the map development. All this information can be found in these discussion emails (in Spanish). (automatic translation).
  • Setting up a UK OSM Group is proceeding with a further virtual meeting for interested persons. Discussion has centered on the high-level aims of the group and its administrative/legal form.
  • GOwin reports about a mapathon in the Philippines at the American embassy.
  • Steve Coast joins Navmii as a Board Advisor.
  • Geofabrik got a nice present from Santa Claus this Christmas which makes it prepared for the steady growth of OSM.
  • Felix Delattre in an interview at OpenCage Data Blog about OpenStreetMap in Nicaragua.

Imports

  • Sander Deryckere started a discussion about bad imports and therefore created a new wiki page, that’s supposed to list and keep track of the state of such imports.

OpenStreetMap Foundation

Events

Humanitarian OSM

  • A map of the HOT contributions
  • The German online news portal GOLEM reported on the presentation from Blake Girardot at Fosdem 2016 in Brussels. (automatic tranlation).
  • The Huffington Post reports about how digital humanitarians are closing the gaps in worldwide disaster response (HOT).
  • HOT is searching for a web-developer to build an OSM Data Quality Analysis Tool. Their budget comes from a grant from the knight foundation.

Maps

  • The indoor map Indoorvator by user ubahnverleih has won contest “The best apps using the Elevator API” of Deutsche Bahn (German National Railway) in the category “indoor”.
  • Aleszu Bajak has published on story bench a step by step guide for creating weather maps with cartodb.
  • [1] Derstefan announced a prerelease of OpenTopoMap 2.0. The innovations:
    • worldwide coverage
    • house numbers at Zoom level 16 and 17
    • Shading and contour lines with 30m resolution
    • Bridges with abutments
      evoked positive and constructive feedback.
  • transformap.co is a collection of Overpass-based maps as parks, eco-stores, alternative housing, community gardens etc.
  • The British Library has catalogued, conserved and digitised over 550 military intelligence maps.
    Andy Mabbett asks about the potential.
    Richard Fairhurst points to a solution.
    Jez Nicholson starts a test.

switch2OSM

  • Then french Police uses OSM – printed and online. (Attention video!)
  • The German Railways uses OpenStreetMap on their info screens in the new IC2 (double-decker InterCity) and shows not only the route but the current location of the train and highlights of the station as well.
  • Have you ever seen an acustic city map, based on OSM?

Open Data

Licences

Software

  • The App Location Privacy helps to protect your privacy on your Android smartphone if you click on a location link.
  • Maps.me released the night map styles on Android! The new style available in routing mode after the sunset.
  • The layout of nominatim.openstreetmap.org has been changed. The detail page especially, has been improved considerably!
  • Stephan Bösch Plepelits aka skunk proclaims in his blog that he stopped the OpenStreetbrowser service.

Programming

  • According to their blog the Opencage Geocoder now delivers results that contain an additional link to edit the mentioned OSM element directly, when available.
  • Pelican Mapping, a Open Source 3D Mapping software, has tested their new Buildings module with OpenStreetMap data of Paris.

Releases

  • Software Version Release Date Comment
    Maps.me 5.6 2016-01-12 Bug fixed
    GeoServer 2.8.2 2016-01-26 4 Extensions in Geofence UIFixed 24 bugs
    Locus Map Free 3.15.0 2016-01-27 Charged biking maps, better downloads, estimated time of travel
    Grass Gis 7.0.3 2016-01-28 64-bit support and 210 stability fixes
    Atlas 1.2.0 2016-01-30 POT search and info, better map and routing engines

Did you know …

Other “geo” things

  • Nosolosig, a Spanish-language “NotonlyGis”-magazine reported about a book with 476 antique maps, issued by the Instituto de Estadística y Cartografía de Andalucía. According Nosolosig it is one of the finest geographical books in Spanish. (automatic translation).
  • Mapbox on Amazon Web Services or … the migration from NoSQL to Amazon DynamoDB.
  • OSM Science Fiction: The DLR (the German Aviation Center) and computer scientists at the University of Würzburg are working on a navigation system for future missions on Mars. Using radio buoys and drones robotic vehicles can use the system to orient themselves.
  • User zvenzzon sent en email: Hey, look at this impressive work, not openstreetmap but cool. Zvenzzon was right!
  • Gozalo Prieto, has already published some atricles about maps. On geografiainfinita he has now published an article illustrated with beautiful old maps entitled “The development of the road map in Spain from the Roman times to the present day”. (automatic translation)

Upcoming Events

Where What When Country
Charlottetown Learn to OpenStreetMap 02/04/2016 canada
Moscow Schemotechnica 02/06/2016 russia
Berlin Hack Weekend 02/06/2016-02/07/2016 germany
Passau Mappertreffen 02/08/2016 germany
Lyon Rencontre mensuelle mappeurs 02/09/2016 france
Salt Lake City Geobeers 02/09/2016 united states
Cebu Bogo OSM Workshop, Bogo City 02/10/2016-02/12/2016 philippines
Wien 54. Wiener Stammtisch 02/11/2016 österreich
Paris Mapathon Missing Maps 02/11/2016 france
Derby Derby 02/16/2016 england

Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please don’t forget to mention the city and the country in the calendar.

This weekly was produced by MoiArcSan, Nakaner, Peda, Rogehm, bogzab, derFred, jinalfoflia, mgehling, stephan75, wambacher, widedangel.

by weeklyteam at February 04, 2016 10:02 AM

Gerard Meijssen

#Wikipedia - Dorothy E. Smith - #links and #sources

Mrs Smith is recognised as an important scholar. One of her books is considered a classic, she developed two theories, received several awards and there is a Wikipedia article about her in three languages.

Mrs Smith received the Jessie Bernard Award in 1993 and it is why I learned about her. The Wikipedia article mentions two brothers, one is linked the other is not. She was born in Northallerton, There is a link to John Porter but the only relation to Mr Porter is that Mrs Smith won the John Porter Award. The award was given for a book by a red-linked organisation an organisation that bestowed their 'outstanding contribution award' to her as well in 1990.

This outstanding contribution award was given in 2015 to a Monica Boyd. She can be found on the Internet as well, it is easy enough to expand the amount of information around a relevant person like Mrs Smith. Almost every line in her article contains facts that could be mapped in Wikidata. With some effort sources can be added. The only problem is that adding sources for everything is painful; it is just too much work. This is a reality for Wikipedia as well, When Wikipedia and Wikidata align, when its links / statements match, it must be obvious that the likelihood of facts being correct is high, if only because multiple people had a look at most of the facts.
Thanks,
      GerardM


by Gerard Meijssen (noreply@blogger.com) at February 04, 2016 08:08 AM

February 03, 2016

Wikimedia Foundation

What’s TPP? The problematic partnership

THE_BATTLE_OF_COPYRIGHT
Photo by Christopher Dombres, freely licensed under CC BY-SA 2.0.

Tomorrow, government representatives from twelve countries of the Pacific Rim will meet in New Zealand to sign a 6,000 page long treaty called the Trans-Pacific Partnership (TPP). Among other things, the agreement will govern how the signatory countries protect and enforce intellectual property rights.

On Wikipedia, millions of articles are illustrated with public domain images, meaning images that are not restricted by copyright. At the Wikimedia Foundation, we believe that shorter copyright terms make it possible for more people to create and share free knowledge. We’ve previously shared some of our concerns about TPP and co-signed letters asking negotiators not to extend copyright terms and to refrain from forcing intermediaries to police their sites and block content.

Since the final text was released, various digital rights groups have condemned both the secrecy of the negotiations and the substance of the treaty. We’d like to talk about what effect TPP may have on Wikipedia, the other Wikimedia projects, and our mission to share free knowledge.

Wikipedia and its power for the creation and sharing of free knowledge are directly driven by a strong and healthy public domain. Unfortunately, TPP would extend copyright terms at a minimum of the author’s life plus 70 years, eating into the public domain. This cements a lengthy copyright term in countries where it already exists like Australia, the US, and Chile. But it’s especially worrisome for the public domain in countries like Japan, New Zealand, and Canada that now have shorter copyright terms because it means that a great number of works will not be free to use, remix, and share for another 20 years. In some countries, the lengthy copyright term is mitigated by strong and broad exceptions from copyright. But TPP makes this sort of balance optional. It only contains a non-binding exception for education, criticism, news reporting, and accessibility, like fair use in the US, that countries can choose not to enact in their national laws.

TPP tips the balance in favor of rigid copyright, at the detriment of the public domain we all share.

TPP isn’t all bad. It states that countries should not require the hosts of sites like Wikipedia to monitor their content for copyright infringement and provides for safe harbors from intermediary liability. Sites can rely on a notice and takedown system, where they remove infringing material once they get alerted by copyright holders. Yet, TPP doesn’t get this balance right either. It lacks a process for counter notices, so that users can push back when a site receives an invalid request to remove content. It also allows rightsholders to demand identifying information about users when they allege there is copyright infringements. The vague standards in TPP leave this notice and takedown process open for abuse that can chill speech.

TPP is a problematic treaty because it harms the public domain and our ability to create and share free knowledge. It is time for countries to partner for the policies and projects that benefit everyone, like the public domain, clear copyright exceptions and intermediaries empowered to stay out of content creation with good safe harbor protections.

Yana Welinder, Legal Director
Stephen LaPorte, Legal Counsel
Jan Gerlach, Public Policy Manager
Wikimedia Foundation

by Yana Welinder, Stephen LaPorte and Jan Gerlach at February 03, 2016 09:16 PM

Jeroen De Dauw

Missing in PHP7: Value objects

This is the second post in my Missing in PHP7 series. The previous one is about named parameters.

A Value Object does not have an identity, which means that if you have two of them with the same data, they are considered equal (take two latitude, longitude pairs for instance). Generally they are immutable and do not have methods beyond simple getters.

Such objects are a key building block in Domain Driven Design, and one of the common types of objects even in well designed codebases that do not follow Domain Driven Design. My current project at Wikimedia Deutschland by and large follows the “Clean Architecture” architecture, which means that each “use case” or “interactor” comes with two value objects: a request model and a response model. Those are certainly not the only Value Objects, and by the time this relatively small application is done, we’re likely to have over 50 of them. This makes it real unfortunate that PHP makes it such a pain to create Value Objects, even though it certainly does not prevent you from doing so.

Let’s look at an example of such a Value Object:

class ContactRequest {

	private $firstName;
	private $lastName;
	private $emailAddress;
	private $subject;
	private $messageBody;

	public function __construct( string $firstName, string $lastName, string $emailAddress, string $subject, string $messageBody ) {
		$this->firstName = $firstName;
		$this->lastName = $lastName;
		$this->emailAddress = $emailAddress;
		$this->subject = $subject;
		$this->messageBody = $messageBody;
	}

	public function getFirstName(): string {
		return $this->firstName;
	}

	public function getLastName(): string {
		return $this->lastName;
	}

	public function getEmailAddress(): string {
		return $this->emailAddress;
	}

	public function getSubject(): string {
		return $this->subject;
	}

	public function getMessageBody(): string {
		return $this->messageBody;
	}

}

As you can see, this is a very simple class. So what the smag am I complaining about? Three different things actually.

1. Initialization sucks

If you’ve read my previous post in the series, you probably saw this one coming. Indeed, I mentioned Value Objects at the end of that post. Why does it suck?

new ContactRequest( 'Nyan', 'Cat', 'maxwells-demon@entopywins.wtf', 'Kittens', 'Kittens are awesome' );

The lack of named parameters forces one to use a positional list of non-named arguments, which is bad for readability and is error prone. Of course one can create a PersonName Value Object with first- and last name fields, and some kind of partial email message Value Object. This only partially mitigates the problem though.

There are some ways around this, though none of them are nice. An obvious fix with an equally obvious downside is to have a Builder using a Fluent Interface for each Value Object. To me the added clutter sufficiently complicates the program to undo the benefits gained from removing the positional unnamed argument lists.

Another approach to avoid the positional list is to not use the constructor at all, and instead rely on setters. This does unfortunately introduce two new problems. Firstly, the Value Object becomes mutable during its entire lifetime. While it might be clear to some people those setters should not be used, their presence suggests that there is nothing wrong with changing the object. Having to rely on such special understanding or on people reading documentation is certainly not good. Secondly, it becomes possible to construct an incomplete object, one that misses required fields, and pass it to the rest of the system. When there is no automated checking going on, people will end up doing this by mistake, and the errors might be very non-local, and thus hard to trace the source of.

Some time ago I tried out one approach to tackle both these problems introduced by using setters. I created a wonderfully named Trait to be used by Value Objects which use setters in the format of a Fluent Interface.

class ContactRequest {
	use ValueObjectsInPhpSuckBalls;

	private $firstName;
	// ...

	public function withFirstName( string $firstName ): self {
		$this->firstName = $firstName;
		return $this;
	}
	// ...

}

The trait provides a static newInstance method, enabling construction of the using Value Object as follows:

$contactRequest =
    ContactRequest::newInstance()
        ->withFirstName( 'Nyan' )
        ->withLastName( 'Cat' )
        // ...
        ->withMessageBody( 'Pink fluffy unicorns dancing on rainbows' );

The trait also provides some utility functions to check if the object was fully initialized, which by default will assume that a field with a null value was not initialized.

More recently I tried out another approach, also using a trait to be used by Value Objects: FreezableValueObject. One thing I wanted to change here compared to the previous approach is that the users of the initialized Value Object should not have to do anything different from or additional to what they would do for a more standard Value Object initialized via constructor call. Freezing is a very simple concept. An object starts out as being mutable, and then when freeze is called, modification stops being possible. This is achieved via a freeze method that when called sets a flag that is checked every time a setter is called. If a setter is called when the flag is set, an exception is thrown.

$contactRequest = ( new ContactRequest() )
        ->setFirstName( 'Nyan' )
        ->setLastName( 'Cat' )
        // ...
        ->withMessageBody( 'Pink fluffy unicorns dancing on rainbows' )
        ->freeze();

$contactRequest->setFirstName( 'Nyan' ); // boom

To also verify initialization is complete in the code that constructs the object, the trait provides a assertNoNullFields method which can be called together with freeze. (The name assertFieldsInitialized would actually be better, as the former leaks implementation details and ends up being incorrect if a class overrides it.)

A downside this second trait approach has over the first is that each Value Object needs to call the method that checks the freeze flag in every setter. This is something that is easy to forget, and thus another potential source of bugs. I have yet to investigate if the need for this can be removed via some reflection magic.

It’s quite debatable if any of these approaches pay for themselves, and it’s clear none of them are even close to being nice.

2. Duplication and clutter

For each part of a value object you need a constructor parameter (or setter), a field and a getter. This is a lot of boilerplate, and the not needed flexibility the class language construct provides, creates ample room for inconsistency. I’ve come across plenty of bugs in Value Objected caused by assignments to the wrong field in the constructor or returning of the wrong field in getters.

“Duplication is the primary enemy of a well-designed system.”

― Robert C. Martin

(I actually disagree with the (wording of the) above quote and would replace “duplication” by “Complexity of interpretation and modification”.)

3. Concept missing from language

It’s important to convey intent with your code. Unclear intent causes time being wasted in programmers trying to understand the intent, and time being wasted in bugs caused by the intent not being understood. When Value Objects are classes, and many other things are classes, it might not be clear if a given class is intended to be a Value Object or not. This is especially a problem when there are more junior people in a project. Having a dedicated Value Object construct in the language itself would make intent unambiguous. It also forces conscious and explicit action to change the Value Object into something else, eliminating one avenue of code rot.

“Clean code never obscures the designers’ intent but rather is full of crisp abstractions and straightforward lines of control.”

― Grady Booch, author of Object-Oriented Analysis

I can haz

ValueObject ContactRequest {
    string $firstName;
    string $lastName;
    string $emailAddress;
}

// Construction of a new instance:
$contactRequest = new ContactRequest( firstName='Nyan', lastName='cat', emailAddress='something' );

// Access of the "fields":
$firstName = $contactRequest->firstName;

// Syntax error:
$contactRequest->firstName = 'hax';

See also

i-will-always-favor-value-objects

by Jeroen at February 03, 2016 07:42 PM

Missing in PHP7

I’ve decided to start a series of short blog posts on how PHP gets in the way of creating of well designed applications, with a focus on missing features.PHP7

The language flamewar

PHP is one of those languages that people love to hate. Its standard library is widely inconsistent, and its gradual typing approach leaves fundamentalists in both the strongly typed and dynamically typed camps unsatisfied. The standard library of a language is important, and, amongst other things, it puts an upper bound to how nice an application written in a certain language can be. This upper bound is however not something you run into in practice. Most code out there suffers from all kinds of pathologies that have quite little to do with the language used, and are much more detrimental to understanding or changing the code than its language. I will take a well designed application in a language that is not that great (such as PHP) over a ball of mud in [insert the language you think is holy here] any day.

“That’s the thing about people who think they hate computers. What they really hate is lousy programmers.”

― Larry Niven

Well designed applications

By well designed application, I do not mean an application that uses at least 10 design patterns from the GoF book and complies with a bunch of design principles. It might well do that, however what I’m getting at is code that is easy to understand, maintain, modify, extend and verify the correctness of. In other words, code that provides high value to the customer.

“The purpose of software is to help people.”

― Max Kanat-Alexander

Missing features

These will be outlined in upcoming blog posts which I will link here.

by Jeroen at February 03, 2016 07:18 PM

Wiki Education Foundation

Wiki Ed to visit California State University, East Bay

Samantha Erickson
Samantha Erickson

On Friday, February 19, Educational Partnerships Manager Jami Mathewson and I will be presenting a “Teach with Wikipedia” workshop at California State University, East Bay.

Research and reference librarian, Tom Bickley, will also be joining us. His subject specialties include music, philosophy, math and computer science. As an instructor in our program, Tom can speak from experience about running a Wikipedia assignment.

The workshop will take place in room LI3079, Upper Mall level of the CSU East Bay Library, from 10 a.m. to noon (map).

To RSVP, please sign up here. Attendees from all CSU, UC and other California institutions are welcome to attend. Parking is available on campus by purchasing a permit at various dispensers.

If you have any questions, e-mail me: samantha@wikiedu.org. See you there!


Photo: “Csueb view” by Jennifer Williams – originally posted to Flickr as csueb view. Licensed under CC BY-SA 2.0 via Wikimedia Commons.

 

by Samantha Erickson at February 03, 2016 06:30 PM

Teaching (and diversifying) classical music through Wikipedia

Kim Davenport, Lecturer, Interdisciplinary Arts & Sciences at the University of Washington, Tacoma, works with Wikipedia in her “Intro to Humanities” course for first-year students there. She shares her thoughts on student contributions to coverage of classical music on Wikipedia.

My course introduces the world of classical music. Through several projects, students explore the role of music in their lives and community.

With the luxury of small class sizes (capped at 25), I’ve been able to incorporate active learning, innovative projects, and collaborative work. I did have one tired old research assignment, though, which I was eager to revamp.

During a week-long diversity workshop on our campus, ‘Strengthening Education Excellence through Diversity’, a colleague shared her success incorporating a Wikipedia project into her Sociology course. I found the inspiration I needed.

My old assignment posed research questions for the sole purpose of building research skills. “Look up these facts”; “explore this database”; “compare these two encyclopedias”; etc.

There was nothing exciting, current, or personally relevant to my students. It showed. They lacked enthusiasm for the assignment, and I lacked excitement toward grading it!

I hoped that adding a Wikipedia-based assignment would achieve several goals. I wanted to place students, as editors, within a research tool they used on a daily basis. I wanted them to explore issues of bias within both classical music and Wikipedia. I also hoped to give students some room to choose which Wikipedia article they would create or expand.

I asked students to choose a classical music composer from Wikipedia’s lists of requested or stub articles. They would conduct research, write, and publish their new or expanded article. I was working with first-year students in a 100-level course, and my campus is on the 10-week quarter system, so I assigned it as a group project.

I introduced the project early in the quarter, while immersed in a ‘crash course’ in classical music history. It is impossible to study this history without noticing that most of the composers we studied were white and male.

My students do learn that classical music’s world has diversified through the centuries. While I work hard to share diverse examples in class, I wanted to do more to confront the issue.

Sharing Wikipedia’s own awareness of its systemic bias with my students helped me frame the issue. I could make it relevant both for my course content, and for students’ understanding. I made clear that my students could choose any name they wished from the lists of composers. However, I asked them to consider the impact they would have writing an article about, for example, a current American composer of color, or a female composer from the Renaissance. After this introduction to the project, the excitement among the students was tangible.

Deadlines throughout the quarter helped to keep them on task. I reserved a computer lab for two class periods: one mid-quarter and one late-quarter. The students used that time to work together under my (light) supervision.

I was pleasantly surprised with the group element of the project. I sometimes shy away from assigning group work, because of its well-documented pitfalls. In this case, I think it was the right decision.

By working together, students were excited and engaged in the topic over many weeks. It offered comfort to students who had worried that editing Wikipedia was beyond their comfort zone. Relying on another group member, or splitting up the group’s work to each student’s strengths, took some worry out of the process.

When the end of the quarter arrived, I had the opportunity to review the students’ final work. I was pleased with the outcome on several levels.

My diverse group of students had selected a diverse range of composers to research. Men and women, from various periods in history, representing Europe and the Americas but also other parts of the globe.

The vast majority of the articles withstood review by other Wikipedia editors with only minor edits. A few were edited substantially or, in one case, deleted. Even then, the students understood the weaknesses in their work. They took the learning experience in stride.

The experience of incorporating Wikipedia into my classroom has been an extremely positive one. I’m excited to repeat it.

I must also give a shout out to the tremendous staff with the Wiki Education Foundation. Going into this experience, I had never edited so much as one Wikipedia article. I was a bit concerned as to whether I had the expertise to guide my students through the experience. Given the online and paper training materials for instructors and students, and the staff’s help from start to finish, I felt confident — and it was a success!


Photo: Bartholomeo Bettera – Still Life with Musical Instruments and Books – Google Art Project” by Bartholomeo Bettera (1639 – 1699) (Italian) (Details of artist on Google Art Project)UwHQialctX43Hw at Google Cultural Institute, zoom level maximum. Licensed under Public Domain via Wikimedia Commons.

 

by Guest Contributor at February 03, 2016 05:00 PM

Joseph Reagle

Why learning styles are hard to give up

Some of my students refuse to believe the theory of learning styles is discredited. Referring them to Wikipedia or literature reviews isn't sufficient because they strongly identify as visual or tactile learners. It's a deeply felt intuition---that I share.

I think the intuition is misleading because we confuse style with ability thresholds. Einstein, a brilliant autodidact, can learn a difficult concept (like gyroscopic precession) from a dry and boring text. I can learn the same concept only by way of a visual demonstration, such Derek Muller's.

I might mistakenly conclude "I'm a visual learner," but Einstein can also learn from the demonstration. Everyone benefits from a great demonstration. People do have different abilities, and we'll encounter different thresholds at which we then want a better learning method. But this is different from what learning style theory predicts, that (1) you can identify people who learn better through one style/modality and (2) they actually do better in a curriculum tailored for that style and people with different purported styles do not. There is little evidence of this.

by Joseph Reagle at February 03, 2016 05:00 AM

February 01, 2016

Wikimedia Foundation

What I Learned: Improving the Armenian Wiktionary with the help of students

Armenia,_Winter_Wiki_Camp_2016_01
Group photo of the participants in Winter WikiCamp 2016, the latest edition of this program. Photo by Beko, freely licensed under CC-BY-SA 4.0.

WikiCamp is an education program that aims to get young students editing Armenian-language content on Wikimedia sites. National identity, recreation and knowledge are at the core of this program, as participant Shahen Araboghlian states: “WikiCamp is a new experience, a chance to meet other people, self-develop and enrich Western Armenian language in the Internet”.

This exciting initiative, which aims to get young Armenian Education Program participants aged 14 to 20 to edit the Armenian Wikipedia, has been around since July 2014—since which there have been five WikiCamps. The return from the camps is surprisingly high, and they are well known in Armenia—having been covered by Aravot and News.am in January 2015, along with several others.

The camp has also increased in popularity among students, increasing the number of participants from 59 in the first camp to 76 in the second edition, and growing. Participants are not only happy to join WikiCamp and make new friends, but also, to contribute to expand open knowledge in their own language. “Honestly, I never edit Wikipedia to have high points and to come to Wikicamp. I edit to make articles available in Armenian, for people who share the same interests as me. This is what motivates me when I edit”, says participant Arthur Mirzoyan.

Since Winter WikiCamp 2015, however, we have made this activity available to even younger editors in this education project with a new, much easier wiki activity: Wiktionary editing.

Armenia,_Winter_WikiCamp_2016_04
Participants editing in one of the camp workshops. Photo by David Saroyan, freely licensed under CC-BY-SA 4.0.

Shared lesson: Look beyond Wikipedia to create a more accessible Education Program

Before Winter WikiCamp 2015, Armenian Wikipedia Education program focused only on Wikipedia. As the Wikipedia Education program rapidly spread in different regions of Armenia and became popular, many secondary school students expressed their wish to join the program—but not all of these students could easily learn Wikipedia editing techniques or were able to write and improve articles that met the project’s guidelines, like writing in encyclopedic style or having reliable sources.

As their wish to edit Wikipedia and its sister projects was enormous, we decided to involve these students in editing Wiktionary. Editing Wiktionary doesn’t have the same requirements as writing an encyclopedic article; the editors just fill the necessary fields using the provided dictionaries. Helpful guides, word lists, dictionary links made the editing process much easier for secondary school students. As the Armenian Wiktionary did not have active users, we created Wiktionary tutorials for them. We also found different free dictionaries and digitized them, created word lists and provided these to the students, all of which gave us a foundation to begin enriching Armenian Wiktionary with words.

The process was simple: we asked students to write the word’s definition, examples of usage, etymology, expressions, synonyms, derived forms and translations. Before we started with the Education program, there was a major gap in Armenian-language content and no complete free dictionary. This gap inspired us and was the main reason why we focused solely on Armenian content. At that time, we had only 3,000 Armenian words which needed to be improved, and during the Winter WikiCamp 2015 a portion of those words were enriched. The improvement process continued successfully after the camp ended, transforming our education initiative into a well-known Wiktionary-based program for secondary school students.

Where do we go from here?

As a result of Wikimedia Armenia’s Education Program and its participants’ efforts, this year the Armenian Wiktionary reached 100,000 entries.

After this success, we decided to include Wiktionary editing in the next WikiCamp: in Summer 2015, Wiktionary editors were actively involved in the camp and around 14,000 entries were created with joint efforts. Our main breakthrough was realizing that other Wikimedia projects could be as effective education tools as Wikipedia, so we attempted to implement other iterations in WikiCamp. Wikisource and Wikimedia Commons are still in development stage. You can find this and other outcomes in WikiCamp project page on Armenian Wikipedia.

David Saroyan, Program Manager, Wikimedia Armenia
Lilit Tarkhanyan, Wikipedia Education Program leader, Wikimedia Armenia
María Cruz, Communications and Outreach, Learning and Evaluation team, Wikimedia Foundation
Samir Elsharbaty, Fellow, Wikipedia Education Program, Wikimedia Foundation

«What I Learned» is a blog series that seeks to capture and share lessons learned in different Wikimedia communities over the world. This knowledge stems from the practice of Wikimedia programs, a series of programmatic activities that have a shared, global component, and a singular, local aspect. Every month, we will share a new story for shared learning from a different community. If you want to feature a lesson you learned, reach out!

by David Saroyan, Lilit Tarkhanyan, María Cruz and Samir El-Sharbaty at February 01, 2016 10:10 PM

Wiki Education Foundation

Announcing: Genes and Proteins

We’re pleased to announce a new handbook for students writing articles about genes and proteins for classroom assignments.

When we started creating our Ecology brochure, we wanted to create handbooks for students interested in sharing information about life on Earth. When we set out to create a biology handbook for students, we found there was just so much to cover, it wouldn’t fit in just one. With our new handbook on Genes and Proteins, we’re zooming in on the world of genetics. That complements our existing guides, Writing Wikipedia articles on Species, which covers writing about plants, animals, and fungi.

Genes and Proteins offers advice specific to articles on those topics: suggestions for reliable journals (and how to identify poor quality journals), how to structure an article to keep in line with other genes or protein articles, and even how to create a relevant infobox for individual proteins, protein families, enzymes, GNF proteins, nonhuman proteins, and RNA families.

When it comes to open science, Wikipedia is particularly important in the context of educational resources. By keeping reliable information on genes and proteins available in an easy-to-find place, students have a place to turn to for quick clarifications of their understanding of science. Finding an overview of a topic with a pre-vetted list of sources can make the difference in taking future research one step further. With this information on Wikipedia, more people have a starting point to explore the world of science, or begin to explore a popular science topic more deeply. They can help fill in the pieces for lay scientists to get back up to speed quickly on the topics that matter to them.

This guide was written in collaboration with our Wikipedia Content Expert in the Sciences, Ian Ramjohn, Ph.D. Ian’s dual Ph.D. (in Plant Biology as well as Ecology, Evolutionary Biology, and Behavior), provided a deep background in relevant fields.

We’re also grateful to other experienced Wikipedians who helped us. Volunteers at WikiProjects Science and Molecular and Cell Biology created the backbone of this handbook over years, establishing and documenting the best practices and techniques for writing these types of articles. Users Andrew Su, BatteryIncluded, Boghog, DrBogdan, Gtsulab, KatBartlow, and Opabinia regalisshared additional ideas and resources.

These handbooks are just one part of Wiki Ed’s Year of Science campaign. They’re available as free .pdfs for everyone on Wikimedia Commons, and in print for any science instructors teaching through our program. If you’d like to get involved, reach out to us through the Year of Science page or e-mail us: contact@wikiedu.org.

 

by Eryk Salvaggio at February 01, 2016 05:00 PM

Wikimedia UK

Wikipedia and Open Access: making research as useful as it can be

This post was written by Martin Poulter, Wikimedian in Residence at Bodleian Libraries, and was first published on Open Access Oxford.

The Budapest Open Access Declaration is one of the defining documents of the Open Access movement. It says that free, unrestricted access to peer-reviewed scholarly literature will “accelerate research, enrich education, share the learning of the rich with the poor and the poor with the rich, make this literature as useful as it can be, and lay the foundation for uniting humanity in a common intellectual conversation and quest for knowledge.”

To bring about this optimistic vision, there needs to be some way to deliver this knowledge directly to everyone on the planet. Rather than broadcasting to passive recipients, this needs to encourage repurposing and remixing of research outputs, so people can adapt them into educational materials, translate them into other languages, extract data from them, and find other uses.

Fifteen years after its creation in January 2001, Wikipedia is emerging as that creative space. Wikipedia is not a competitor to normal scholarly publication, but a bridge connecting it to the world of informal learning and discussion. Wikipedia is only meant to be a starting point: its credibility does not come from its contributors, who are often anonymous, but from checkable citations to reputable sources.

Being “the free encyclopedia” reflects not just that Wikipedia is available without charge, but that it is free for use by anyone for any purpose, subject to the requirements of the most liberal Creative Commons licences. These freedoms are a part of its success: on the article about your favourite topic, click “View history”, then “Page view statistics”: it is not uncommon to see a scientific article getting thousands of hits per day.

When a team in 2015 announced the discovery of a new hominid, Homo Naledi, the extensive diagrams, fossil photos and other supplements they produced exceeded the size limit set by their first choice of journal, Nature. So they went to the open-access journal eLIFE. As well as publishing the peer reviews along with the paper, eLIFE uses a very liberal licence, so figures from the paper made it possible to create a comprehensive Wikipedia article for Homo Naledi, and to improve related articles.

There are many more cases where a research paper is adapted into a Wikipedia article which acts as a lay summary. For example, the article on Major Urinary Proteins was written by scientists at the Wellcome Trust Sanger Institute based on, and using figures from, papers they had published in PLOS open-access journals.

Editing Wikipedia used to involve learning a form of markup called “wiki code”. Thanks to some software development, this is no longer necessary. When you register an account, each article presents two tabs “Edit” and “Edit source”. “Edit source” gives you the old wiki code interface; but “Edit” gives a much more straightforward wordprocessor-like interface. Especially handy is the “Cite” button, which can convert a DOI (Digital Object Identifier) into a full citation.

Still much about Wikipedia is poorly-designed and dependent on insider knowledge. Luckily there are insiders who are keen to share, and training is available. The Royal Society of Chemistry, Cancer Research UK and the Royal Society are amongst the scientific bodies which have employed Wikipedians In Residence. As WIR at the Bodleian Libraries, I have run events to improve articles on Women In Science and am celebrating Wikipedia’s 15th birthday working with researchers and students from the Oxford Internet Institute to improve articles about the “social internet”.

Wikimedia encompasses more than just Wikipedia: it is an ecosystem of different projects handling and repurposing text, data and digital media. There are many sites that you can use without charge to share or build materials, but Wikimedia is distinctive in being a charitable project existing purely to share knowledge, with no adverts or other commercial influences.

Wikimedia Commons is the media archive, hosting photographs, diagrams, video clips and other digital media, along with author and source credits and other metadata. It currently offers just under 30 million files, of which tens of thousands are extracted from figures or supplementary materials from Open Access papers. It’s a massively multilingual site, where each file can have descriptions in many languages, and one of the repurposing activities going on is creating alternative language versions of labelled diagrams.

Wikidata describes itself as “a free linked database that can be read and edited by both humans and machines”. It holds secondary data: not raw measurements, but key facts and figures concluded from them. Looking up Platinum, for example, gives the element’s periodic table placement, various official identifiers and physical properties. Wikidata holds knowledge about fifteen million entities, including species, molecules, astronomical bodies and diseases although the number is still rapidly growing.

What’s exciting about Wikidata is the uses it can be put to. Making data about many millions of things freely available enables a new generation of applications for education and reference. Reasonator gives a visually pleasing overview of what Wikidata knows about an entity. Histropedia (histropedia.com) is a tool for building timelines (try “Discovers of chemical elements”, then zoom in).

There are eleven Wikimedia projects in total each with its own strengths and flaws. My personal favourites include Wikisource – a library of open access and out-of-copyright text, including for example Michael Faraday’s Royal Institution lectures – and Wikibooks which aims to create textbooks for every level and topic from ABCs to genome sequencing.

As open access becomes more mainstream, technical and legal barriers around research outputs will diminish, so more research will become as “useful as it can be” through the Wikimedia projects. That benefits the research in terms of impact and public awareness, but it also benefits the end users who, in a connected world, are everybody.

by Martin Poulter at February 01, 2016 11:59 AM

Addshore

Language usage on Wikidata

the Wikidata LogoWikidata is a multilingual project, but due to the size of the project it is hard to get a view on the usage of languages.

For some time now the Wikidata dashboards have existed on the Wikimedia grafana install. These dashboards contain data about the language content of the data model by looking at terms (labels, descriptions and aliases) as well as data about the language distribution of the active community.

For reference the dashboard used are:

All data below was retrieved on 1 February 2016

Active user language coverage

Active users here is defined as users that have made 1 edit or more in the last 30 days.

A single user can have multiple languages (in the case that they use a babel box). If the user does not have a babel box then the user interface language is used.

18190 users are represented below with 317 languages shown as covered 27660 times.

The primary active user language is shown as English, this is likely due to the fact that the default user interface language is English and only 2905 users have babel boxes.

On average a user that has a babel box has 3.3 languages defined in it.

Term language coverage

Across all Wikidata entities 410 languages are used (including variants).

This leaves a gap of roughly 93 languages between those used in terms and those viewed by active editors currently.

The distributions per term type can be seem below.

Of course all of the numbers above are constantly changing and the dashboards should be referred to for up to date data.

by addshore at February 01, 2016 08:32 AM

Gerard Meijssen

#Wikipedia - The Jessie Bernard Award


At the #Wikimedia Foundation a lot of words are used on the subject of #diversity. So much so that diversity is short for gender diversity. My point is not that having proper attention for women and women causes is not a good thing, it is the exclusion of everything else.

On the subject of gender diversity, an award was named in recognition of Mrs Bernard: the Jessie Bernard award. "It is presented for significant cumulative work done throughout a professional career, and is open to women or men and is not restricted to sociologists."

The Wikipedia article on the award has a substantial number of red links of the unsung heroes of gender diversity. Obviously everyone is invited to fill this in. The quote indicates that being a housewife is a special case of being crazy. If so, what does "mentally ill" mean and how well does Wikipedia cover that subject?
Thanks,
      GerardM

by Gerard Meijssen (noreply@blogger.com) at February 01, 2016 07:19 AM

January 31, 2016

Addshore

The break in Wikidata edits on 28 Jan 2016

On the 28th of January 2016 all Wikimedia MediaWiki APIs had 2 short outages. The outage is documented on Wikitech here.

The outage didn’t have much of an impact on most projects hosted by Wikimedia. However due to most Wikidata editing happening through the API, even when using the UI, the project basically stopped for roughly 30 minutes.

Interestingly there is an unusual increase in the edit rate 30 minutes after recovery.

I wonder if this is everything that would have happened in the gap?

by addshore at January 31, 2016 07:39 PM

User:Josve05a

DMCA’s – Copyrights best friend

In August 2013, the Wikimedia Foundation recived a DMCA takedown notice of some content on the “Sport in Australia“-article on the English Wikipedia. That’s because the sending party claimed that they own the copyrights for the following information table, which is included below.

Screenshot 2016-01-29 at 19.54.34 - Edited.png
Used on this webpage under two provisions. One being that it is not copyrightable, more than the wikitable perhaps, which is attributed to the Wikimedia contributors under CC-By-SA 3.0. The other being that it is fair use anyways, since it is a commentary on the content. So please, do not send me a DMCA takedown notice as well.

It consist only on material and data points which are in themself not copyrightable. Despite this, the Wikimedia Foundation was forced to comply with the takedown notice, in order not to lose it’s status under the Safe harbor.

The list itself might not be that encyclopedia, and should perhaps not be in the article, but if I want to use that table in another article, or restore an old version of this article, this takedown notice must be refuted first. That is why I sent in, 2.5 years later, a formal DMCA counter notice stating that I had a good faith belief that they were in error and that this is not copyrightable.

They, the senders of the DMCA takedown notice, now have 14 days from reciving my forwarded counter notice to respond with legal actions to stop this content from being displayed. If they do not, this is a huge victory for our community.


by Jonatan Svensson Glad (Josve05a; @JonatanGlad) at January 31, 2016 01:31 PM

Addshore

Building applications around Wikidata (a beer example)

Wikidata provides free and open access to entities representing real world concepts. Of course Wikidata is not meant to contain every kind of data, for example beer reviews or product reviews would probably never make it into Wikidata items. However creating an app that is powered by Wikidata & Wikibase to contain beer reviews should be rather easy.

A base data set

I’m going to take the example of beer as mentioned above. I’m sure there are thousands if not millions of beers that Wikidata is currently missing, but at the time of writing this there are 958 contained in the database. These can be found using the simple query below:

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>

SELECT ?i WHERE {
   ?i wdt:P31/wdt:P279* wd:Q44 .
}

Any application can use data stored within Wikidata, in the case of beers this includes labels and descriptions in multiple different languages, mappings to wikipedia articles and external databases for even more information, potential images of said beer, the type of beer and much more. Remember the Wikidata dataset is ever evolving and the IDs are persistent.

Application specific data

Lets say that you want to review the beers! You could set up another Wikibase installation and SPARQL endpoint to store and query review and rating information. Wikibase provides an amazingly flexible structure meaning this is easily possible. Reviews and ratings could be stored as a new entity type, linking to an item on Wikibase or an item could be created mapping to a Wikidata item containing statements of review or rating data. Right now documentation is probably lacking but this is all possible.

Of course I am shouting about Wikibase first as Wikidata is powered by it and thus integration should be easier, however there is no reason that you couldn’t use any other database mapping your application specific information to Wikibase item Ids. MusicBrainz is already doing something like this and I am sure there are other applications out there too!

Sharing of knowledge

Knowledge is power, Wikipedia has proven that free and open knowledge is an amazing resource in an  unstructured text form. Wikidata is a step up providing structured data. Imagine a world in which applications share basic world information building a dataset for a common worldwide goal. In the example above, add an image of a beer in one application, have it instantly available in another application, translate a description for one user and have it benefit millions.

Lets see what we can do in the next 10 years!

by addshore at January 31, 2016 12:13 AM

January 30, 2016

Addshore

MediaWiki CRAP – The worst of it

I don’t mean Mediawiki is crap! The Change Risk Anti-Patterns (CRAP) Index is calculated based on the cyclomatic complexity and code coverage of a unit of code. Complex code and untested code will have a higher CRAP index compared with simple well tested code. Over the last 2 years I have been tracking the CRAP index of some of Mediawikis more complex classes as reported by the automatic coverage reports, and this is a simple summary of what has been happening.

Just over 2 years ago I went through all of the Mediawiki unit tests and added @covers tags to improve the coverage reports for the source. This brought the line coverage to roughly 4% in toward the end of 2013. Since then the coverage has steadily been growing and is now at an amazing 9%. Now I am only counting coverage of the includes directory here, including maintenance scripts and Language definitions the 9% is actually 7%.

You can see the sharp increase in coverage at the very start of the graph below.

Over the past 2 years there has also been a push forward with librarization which has resulted in the removal of many things from the core repository and creation of many libraries now required using composer. Such libraries include:

  • mediawiki/at-ease – A safe alternative to PHP’s “@” error control operator
  • wikimedia/assert – Alternative to PHP’s assert()
  • wikimedia/base-convert – Improved base_convert for PHP
  • wikimedia/ip-set – PHP library to match IPs against CIDR specs
  • wikimedia/relpath – Compute a relative path between two paths
  • wikimedia/utfnormal – Unicode normalization functions
  • etc.

All of the above has helped to generally reduce the CRAP across the code base, even with some of the locations with the largest CRAP score.

The graph shows the CRAP index for the top 10 CRAP clases in Mediawiki core at any one time. The data is taken from 12 snapshots of the CRAP index across the 2 year period. At the very left of the graph you can see a sharp decrease in the CRAP index as unit test coverage was taken into account from this point (as in the coverage graph). Some classes fall out of the top 10 and are replaced by more CRAP classes through the 2 year period.

Well, coverage is generally trending up, CRAP is generally trending down. That’s good right? The overall CRAP index of the top 10 CRAP classes has actually decreased from 2.5 million to 2.2 million! Which of source means for the top 10 classes the CRAP average has decreased from 250,000 to 220,000!

Still a long way to go but it will be interesting to see what this looks like in another year.

by addshore at January 30, 2016 11:46 PM

Myanmar coordinates on Wikidata by Lockal & Widar

In a recent blog post I showed the amazing apparent effect that Wikimania’s location had on the coordinate location data in Mexico on Wikidata. A comment on the post by Finn Årup Nielsen pointed out a massive increase in data in the Myanmar (Burma). I had previously spotted this increase but chosen not to mention it in the post. But now after a quick look at some items and edit histories I have found who we have to thank!

The increase in geo coordinate information around the region can clearly be seen in the image above. As with the Mexico comparison this shows the difference between June and October 2015.

Finding the source of the new data

So I knew that the new data was focused around Q836 (Myanmar) but starting from that item wasn’t really going to help. So instead I zoomed in on a regular map and found a small subdivision of Myanmar called Q7961116 (Wakema). Unfortunately the history of this item showed its coordinate was added prior to the dates of the image above.

I decided to look at what linked to the item, and found that there used to be another item about the same place which now remains as a redirect Q13076630. This item was created by Sk!dbot but did not have any coordinate information before being merged, so still no luck for me.

Bots generally create items in bulk meaning it was highly likely the new items either side of Q13076630 would also be about the same topic. Loading Q13076629 (the previous item) revealed that it was also in Myanmar. Looking at the history of this item then revealed that coordinate information was added by Lockal using Widar!

Estimating how much was added

So with a few quick DB queries we can find out how many claims were created for items stating that they were in Myanmar as well as roughly how many coordinates were added:

SELECT count(*) AS count
FROM revision
WHERE rev_user = 53290
  AND rev_timestamp > 2015062401201
  AND rev_comment LIKE "%wbcreateclaim-create%"
  AND rev_comment LIKE "%Q836%"

SELECT count(*) AS count
FROM revision
WHERE rev_user = 53290
  AND rev_timestamp > 2015062401201
  AND rev_comment LIKE "%wbcreateclaim-create%"
  AND rev_comment LIKE "%P625%"

Roughly 16,000 new country statements and 19,000 new coordinates. All imported from Burmese Wikipedia.

Many thanks Lockal!

by addshore at January 30, 2016 11:46 PM

Submitting a patch to Mediawiki on Gerrit

I remember when I first submitted a patch to Mediawiki on Gerrit. It was a +12 -4 line patch and it probably took me at least half a day to figure everything out and get my change up! There is a tutorial on mediawiki.org but it is far too wordy and over complicated. In this post I try to explain things as basically as possible. Enjoy!

Git

In order to be able to submit a patch to Gerrit you need to have Git installed!

If your on a linux system you can install this using your package manager, eg. “apt-get install git”. If you are on another system such as Windows you can just use a build from git-scm. Basically just get git from https://git-scm.com/downloads!

Once you have downloaded Git you need to configure it!

git config --global user.email "example@example.com"
 git config --global user.name "example"

Gerrit

Next you need to create an account for Gerrit. To do this navigate to gerrit.wikimedia.org, and click on the Register link in the top right. This will then take you to wikitech.wikimedia.org where you must create your account!

Once you have created an account and logged in you must add an SSH key. Go to your settings (again in the top right) and navigate to “SSH Public Keys“.

To generate a key do the following on your machine:

ssh-keygen -t rsa -C "example@example.com"

You should then be able to get your key from “~/.ssh/id_rsa.pub” (or the location you chose) and then add it to Gerrit.

Getting the code

Now that you have git and you have added your SSH key to gerrit you can use ssh to clone the code repository onto your local machine. Again you can read docs for this on the git-scm website.

git clone ssh://<USERNAME>@gerrit.wikimedia.org:29418/mediawiki/core

When logged in you can see the command at https://gerrit.wikimedia.org/r/#/admin/projects/mediawiki/core

Making a commit

Now you have the code cloned locally you can go ahead and change the files!

Once you have made your changes you should be able to review them using the follow command:

git diff

You can then add ALL changed files to a commit by doing the following:

git commit -a

A text editor should then load where you should enter a commit message, for example:

Fixing issue with unset new article flag

Some extra description can go here, but you
should try and keep your lines short!
A bug number can be linked at the bottom of
the commit as shown.

Bug: T12345

Once you have saved the text you will have make your commit! Now to try and push it as a change set to Gerrit (although it’s your first time so this will fail)!

You should get a message saying that you are missing a Change-Id in the commit message footer! This lovely message also contains the command that you need to run in order to fix the issue!

gitdir=$(git rev-parse --git-dir); scp -p -P 29418 <username>@gerrit.wikimedia.org:hooks/commit-msg ${gitdir}/hooks/

This created a hook file in your .git directory for this repo that will automatically add the Change-Id in the future! To get the Change-Id in your current commit message run:

git commit -a --amend --no-edit

And now you are ready to actually push your commit for review!

git push origin HEAD:refs/publish/master

You change should now be on Gerrit!

Your master branch is now 1 commit ahead of where master actually is, so to clean up and reset your local repo to the same state as the remote just run:

git reset --hard origin/master

You can always get back to your commit by using the hash of the commit with the “git checkout” command. Or you can copy the remote checkout command from the Gerrit UI, it looks something like the below:

git fetch https://gerrit.wikimedia.org/r/mediawiki/core refs/changes/74/257074/1 && git checkout FETCH_HEAD

Amending your change

If people comment on your commit on Gerrit you many want to change it, fixing the issues that people have pointed out.

To do this checkout your change again as described above, either using the hash locally the fetch & checkout command you can copy from the Gerrit UI.

git checkout d50ca328033702ced91947e60939e3550ca0212a
//OR
git fetch https://gerrit.wikimedia.org/r/mediawiki/core refs/changes/74/257074/1 && git checkout FETCH_HEAD

Make your changes to the files.

Amend the commit (you can add –no-edit if you do not want to edit the commit message):

git commit -a --amend

And push the patch again!

git push origin HEAD:refs/publish/master

Notes

This post covers the bare necessities for submitting a patch to Gerrit and responding to comments. There are many things it does not cover such as Git, re-basing, drafts, cherry-picks, merge resolution etc.

Also I should point out that Gerrit is going to be disappearing very soon in favour of Diffuision so there may have been little point in me writing this, but someone asked!

If you do not want to use git-review to contribute to Wikimedia or Gerrit projects then the most important thing to raw from this post is the under advertised “git push HEAD:refs/publish/master” command!

by addshore at January 30, 2016 11:45 PM

Reducing packet count to Statsd using Mediawiki

Recently I have been spending lots of time looking at the Wikimedia graphite set-up due to working on Grafana dashboards. In exchange for what some people had been doing for me I decided to take a quick look down the list of open Graphite tickets and found T116031. Sometimes it is great when such a small fix can have such a big impact!

After digging through all of the code I eventually discovered the method which sends Mediawiki metrics to Statsd is SamplingStatsdClient::send. This method is an overridden version of StatsdClient::send which is provided by liuggio/statsd-php-client. However a bug has existed in the sampling client ever since its creation!

The fix for the bug can be found on gerrit and only a +10 -4 line change (only 2 of those lines were actually code).

// Before
$data = $this->sampleData( $data );
$messages = array_map( 'strval', $data );
$data = $this->reduceCount( $data );
$this->send( $messages );

//After
$data = $this->sampleData( $data );
$data = array_map( 'strval', $data );
$data = $this->reduceCount( $data );
$this->send( $data );

The result of deploying this fix on the Wikimedia cluster can be seen below.

Decrease in packets when deploying fixed Mediawiki Statsd client

You can see a reduction from roughly 85kpps to 25kpps at the point of deployment. This is over a 50% decrease!

Decrease in bytes in after Mediawiki Statsd client fix deployment

A decrease in bytes received can also be seen, even though the same number of metrics are being sent. This is due to the reduction in packet overhead, a drop of roughly 1MBps at deployment.

The little things really are great. Now to see if we can reduce that packet count even more!

by addshore at January 30, 2016 11:44 PM

Wikidata references from Microdata

Recently some articles appeared on the English Wikipedia Signpost about Wikidata (1, 2, 3). Reading these articles, especially the second and third, pushed me to try to make a dent in the ‘problem’ of references on Wikidata. It turns out that this is actually not that hard!

Script overview

I have written a script as part of my addwiki libraries and the ‘aww’ command line tool (still to be fully released). The main code for the this specific command in its current version can be found here.

The script can be passed either a single Item ID or some SPARQL matchers as shown below:

aww wm:wd:ref --item Q464933

OR

aww wm:wd:ref --sparql P31:Q11424 --sparql P161:?

The script will then either act on a single item if passed or perform a SPARQL query and retrieve a list of Item IDs.

Each Item is then loaded and its type is checked (using instance of) against a list of configured values, currently Q5 (human) and Q11424 (film) which are in turn mapped to the schema.org types Person and Movie. For each type there is then a further mapping of Wikidata properties to schema.org properties, for example P19(place of birth) to ‘birthPlace’ for humans and P57(director) to ‘director’ for films. These mappings can be used to check microdata on webpages against the data contained in Wikidata.

Microdata is collected by loading all of the external links used on all of the Wikipedia articles for the loaded Item and parsing the HTML. When all of the checks succeed and the data on Wikidata matches the microdata a reference is added.

Example command line output

As you can see the total references added for the three items shown in the example above was 55, the diffs are linked below.

 

Further development

  • More types: As explained above the script currently only works for people and films, but both Wikidata and schema.org cover far more data than this so the script could likely be easily expanded in this areas.
  • More property maps: Currently there are still many properties on both schema.org and Wikidata for the enabled types that lack a mapping.
  • Better sourcing of microdata: The current approach of finding microdata is simply load all Wikipedia external links and hope that some of them will have some microdata. This is network intensive and currently the slowest part of the script. It is currently possible to create custom Google search engines to match a specific schema.org type, for example films and search the web for pages containing microdata. However there is not actually any ‘nice’ API for search queries like this (hint hint Google).
  • Why stop at microdata: Other standards of structured data in webpages exist, so others could also be covered?

Other thoughts

This is another step in the right direction in terms of fixing things on a large scale. This is the beauty of having machine-readable data in Wikidata and the larger web.

Being able to add references on mass has reminded me how much duplicate information the current reference system includes. For example, a single Item could have 100 statements each which can be referenced to a single web page. This reference data must then be included 100 times!

 

by addshore at January 30, 2016 11:43 PM

Mediawiki Developer Summit 2016

The Wikimedia Developer Summit is an event with an emphasis on the evolution of the MediaWiki architecture and the Wikimedia Engineering goals for 2016. Last year the event was called the MediaWiki Developer Summit.

As with last year the event took place in the Mission Bay Center, San Francisco, California. The event was slightly earlier this year, positioned at the beginning of January instead of the end. The event format changed slightly compared with the previous year and also included a 3rd day of general discussion and hacking in the WMF offices. Many thanks to everyone that helped to organise the event!

I have an extremely long list of things todo that spawned from discussions at the summit, but as a summary of what happened below are some of the more notable scheduled discussion moments:

T119032 & T114320 – Code-review migration to Differential

Apparently this may mostly be complete in the next 6 months? Or at least migration will be well on the way. The Differential workflow is rather different to that which we have be forced into using with Gerrit. Personally I think the change will be a good one, and I also can not wait to be rid of git-review!

T119403 – Open meeting between the MediaWiki Stakeholders Group and the Wikimedia Foundation

There was lots of discussion during this session, although lots of things were repeated that have previously been said at other events. Toward the end of the session it was again proposed that a Mediawiki Foundation of some description might be the right way to go and it looks as if this might start moving forward in the next months / year (see the notes).

Over the past years Mediawiki core development has been rather disjointed due to the WMF assigning a core team, dissolving said core team and thus responsibilities have been scattered and generally unknown. Having a single organization to concentrate on the software, covering use cases the WMF doesn’t care about could be a great step forward to Mediawiki.

T119022 – Content Format

The notes for this session can be found here and covered many RFCs such as multi-content revisions, balanced templates and general evolution of content format. Lots of super interesting things discussed here and all pushing Mediawiki in the right direction (in my opinion).

T113210 – How should Wikimedia software support non-Wikimedia deployments of its software?

Notes can be found here. Interesting points include:

  • “Does MediaWiki need a governance structure outside of Wikimedia?” which ties in with the stakeholders discussion above and a potential Mediawiki foundation.
  • “How can we make extension compatibility work between versions?”. Over the past year or so some work has gone into this and progress is slowly being made with extension registration in Mediawiki and advances in the ExtensionDistribution extension. Still a long way to go.
  • “Should Wikimedia fork MediaWiki?”. Sounds like this could get ugly :/
  • “Do we need to stick with a LAMP stack? Could we decide that some version in the not distant future will be the last “pure PHP” implementation?”. I can see lots of the user base being lost if this were to happen..

#Source-Metadata meetup

Lots of cool stuff is going to be happening with DOIs and Wikidata! (Well more than just DOIs, but DOIs to start). Watch this space!

by addshore at January 30, 2016 11:42 PM

Jeroen De Dauw

Missing in PHP7: function references

This is the first post in my Missing in PHP7 series.

Over time, PHP has improved its capabilities with regards to functions. As of PHP 5.3 you can create anonymous functions and as of 5.4 you can use the callable type hint. However referencing a function still requires using a string.

call_user_func( 'globalFunctionName' );
call_user_func( 'SomeClass::staticFunction' );
call_user_func( [ $someObject, 'someMethod' ] );

Unlike in Languages such as Python, that do provide proper function references, tools provide no support for the PHP approach whatsoever. No autocompletion or type checking in your IDE. No warnings from static code analysis tools. No “find usages” support.

Example

A common place where I run into this limitation is when I have a method that needs to return a modified version of an input array.

public function someStuff( array $input ) {
    $output = [];

    foreach ( $input as $element ) {
        $output[] = $this->somePrivateMethod( $element );
    }

    return $output;
}

In such cases array_map and similar higher order functions are much nicer than creating additional state, doing an imperative loop and a bunch of assignments.

public function someStuff( array $input ) {
    return array_map(
        [ $this, 'somePrivateMethod' ],
        $input
    );
}

I consider the benefit of tool support big enough to prefer the following code over the above:

public function someStuff( array $input ) {
    return array_map(
        function( $element ) {
            return $this->somePrivateMethod( $element );
        },
        $input
    );
}

This does make the already hugely verbose array map even more verbose, and makes this one of these scenarios where I go “really PHP? really?” when I come across it.

Related: class references

A similar stringly-typed problem in PHP used to be creating mocks in PHPUnit. Which of course is not a PHP problem (in itself), though still something affecting many PHP projects.

$kittenRepo = $this->getMock( 'Awesome\Software\KittenRepo' );

This causes the same types of problems as the lack of function references. If you now rename or move KittenRepo, tools will not update these string references. If you try to find usages of the class, you’ll miss this one, unless you do string search.

Luckily PHP 5.5 introduced the class:: construct, which allows doing the following:

$kittenRepo = $this->getMock( KittenRepo::class );

Where KittenRepo got imported with an use statement.

by Jeroen at January 30, 2016 10:28 PM

January 29, 2016

Liam Wyatt (Witty Lama)

Strategy and controversy, part 2

It’s been a busy time at Wikimedia Foundation HQ since my first post in this series, summarising the several simultaneous controversies and attempting to draw a coherent connecting-line between them. The most visible change is Arnnon Geshuri agreeing to vacate his appointed seat on the WMF Board of Trustees after sustained pressure; including a community-petition, several former Board members speaking out, and mainstream media attention – as summarised in The Signpost. This departure is notwithstanding the entirely unconventional act of Silicon Valley native Guy Kawasaki in voting against the petition to the Board despite the fact that he’s on the Board and that it was effectively his first public action relating to Wikimedia since receiving that appointment – as I described on Meta.

Although this news about Geshuri was well received, I feel that this controversy became the flash point because it was easily definable, and had a binary decision associated with it – go or stay. Most problems aren’t so neatly resolvable. Hopefully then, the fact that it is mostly resolved (pending the now highly sensitive task of finding his replacement) should allow focus to be drawn back to more fundamental issues of leadership.

Earlier this month The Signpost published details from the internal WMF staff survey:

We understand that there was a healthy 93% response rate among some 240 staff. While numbers approached 90% for pride in working at the WMF and confidence in line managers, the responses to four propositions may raise eyebrows:

  • Senior leadership at Wikimedia have communicated a vision that motivates me: 7% agree
  • Senior leadership at Wikimedia keep people informed about what is happening: 7% agree
  • I have confidence in senior leadership at Wikimedia: 10% agree
  • Senior leadership effectively directs resources (funding, people and effort) towards the Foundation’s goals: 10% agree

The Signpost has been informed that among the “C-levels” (members of the executive), only one has confidence in senior leadership.

A week later the head of the HR department Boryana Dineva – the person with arguably the most difficult job at the WMF right now – gave a summary of that survey in the publicly recorded monthly metrics meeting – starting at 42 minutes in:

Notice the complete absence of mention of the part of the survey which was highlighted by the Signpost? You’re not the only one. In the following Q&A came a question from Frances Hocutt, later paraphased on-wiki by Aaron Halfaker – “Why are we not speaking clearly about the most concerning results of the engagement survey? “. Starting at 56 minutes in:

It is my supposition that the extremely low confidence in senior leadership among the staff including by the “C-Levels” is directly connected to both:

  1. a lack of clarity in the organisation’s strategic direction following a long period since the previous strategy expired and several false-starts (such as the 2-question survey), leading to sudden and unexplained departmental re-organisations, and  delays in the current process.
  2. the organisation’s recent apparent failures to abide by its own organisation Values. Notably in this case, the values of “independence”, “diversity”, and “transparency”.

Anne Clin – better known to Wikimedians as Risker – neatly tied these two threads together earlier this month in her keynote to the WMF annual all-staff meeting. In a speech entitled “Keep your eye on the Mission” she stated:

Wikimedia watchers have known for quite a while that the Foundation has decided that search and discovery should be a strategic priority. It’s not clear on what this decision has been based, although one could shoe-horn it into the mission under disseminating information effectively and globally. It wasn’t something that was fleshed out during the 2015 Strategy community consultation a year ago, and it wasn’t discussed in the Call to Action. The recent announcement about the Knight Foundation grant tells us it is for short-term funding to research and prototype improvements to how people “discover” information on Wikimedia projects. No doubt Search and Discovery, which already has a large number of talented staff affiliated with it, will show up near the top of proposed strategic priorities next week when they are announced to the community – and will be assigned a sizeable chunk of the 2016-17 budget. The results of the Knight Foundation funded research probably won’t be available early enough to use it for budgeting purposes.

This is the only picture I can find of that speech – Anne at the lectern discussing “the board” :-)

Arguably, she actually got that prediction wrong. Of 18 different approaches identified in the now-public strategic planning consultation process only one of them seems directly related to the search and discovery team’s work: “Explore ways to scale machine-generated, machine-verified and machine-assisted content“. It is also literally the last of the 18 topics listed (6 in each of reach, communities and knowledge) and is softened with the verb “explore” (rather than other items which have firmer targets to “increase”, “provide”, etc.). This quasi-hidden element of the strategy therefore invites the question – if this is such a small part of the documented strategy, why is “Discovery” receiving such disproportionate staffing, funding, attention? All of the projects listed on their portal and their three year plan are desirable and welcome, but the team is clearly staffed-up in preparation for significantly more ambitious efforts.

Anne again:

This mission statement was last revised in November 2012 – it is incorporated into the bylaws of the Wikimedia Foundation. And this revision of the mission statement occurred shortly after what many of us remember as the “narrowing focus” decision. Notice what isn’t included in the mission statement:

Not a word about the Wikimedia Foundation being a “tech and grantmaking organization”. While it is quite true that the bulk of the budget is directly linked to these two areas, the Board continues to recognize that the primary mission is dissemination of educational material, not technology or grants….

…Engineering – or as it is now called, “Product”, had three significant objectives set for it back in late 2012: develop Visual Editor, develop Mobile, and make a significant dent in the longstanding technical debt. The first two have come a long way – not without hiccups, but there’s been major progress. And there has been some work on the technical debt – HHVM being only one significant example. But the MediaWiki core is still full of crufty code, moribund and unloved extensions, and experiments that went nowhere. That’s not improving significantly; in fact, we’re seeing the technical debt start to build as new extensions are added that lose their support when someone changes teams or they leave the organization. Volunteer-managed extensions and tools suffer entropy when the volunteer developer moves on, and there’s no plan to effectively deprecate the software or to properly integrate and support it. There’s no obvious plan to maintain and improve the core infrastructure; instead the talk is all of new extensions, new PRODUCTS. From the outside, it looks like the Foundation is busy building detours instead of fixing the potholes in the highways.

It is my understanding that the original grant request to the Knight Foundation was MUCH larger than the $250,000 actually received. Jimmy Wales declared that concerns about the details of this grant are a “red herring” and that ousted Board member James Heilman’s concerns about transparency are “utter fucking bullshit” (causing James to announce he will soon be providing proof of his claims). Hopefully the grant agreement itself will be published soon, as Jimmy implied, so we can actually know what it is that has been promised.

It is worth noting that the “Call to action” mentioned above was part of the mid-2015 to mid-2016 Annual Plan, but that the risk assessment component of that plan was only published this week. Presumably this was written at the time but unintentionally left-off the final publication. Nevertheless, it includes some rather ironic statements when read in hindsight:

Risk: Failure to create a strong, consistent values­ based work culture could cause valued staff to leave.

Mitigation strategies:

  • Establish initiatives that support our commitment to diversity and creating spaces for constructive, direct and honest communications.
  • Communicate and listen effectively with staff on values and initiatives undertaken.

Significantly, the WMF’s Statement of Purpose as described in its own bylaws, states that it will perform its mission “In coordination with a network of individual volunteers and our independent movement organizations, including recognized Chapters, Thematic Organizations, User Groups, and Partners”. This corresponds to the last of the official organisation Values: “Our community is our biggest asset”. At its meeting this weekend, the Board will have to determine whether the current executive leadership can demonstrate adherence to these avowed values – particularly coordination and transparency of its vision to the community and the staff – and is fit to deliver this latest strategy process.

[The first post in this Montgomerology* series “Strategy and Controversy” was published on January 8.]

Edit: Within the hour of publishing this blogpost, and one day before the board meeting, a “background on the Knowledge Engine grant” has now been published on Lila’s talkpage.

*Montgomerology: The pseudo-science of interpretation of meaning in signals emanating from the WMF headquarters at New Montgomerology St., San Francisco. cf. Vaticanology or Kremlinology.


by wittylama at January 29, 2016 11:07 PM

Wikimedia Foundation

Wikimedia Research Newsletter, January 2016

Wikimedia Research Newsletter
Wikimedia Research Newsletter Logo.png

Vol: 6 • Issue: 01 • January 2016 [contribute] [archives] Syndicate the Wikimedia Research Newsletter feed

Bursty edits; how politics beat religion but then lost to sports; notability as a glass ceiling

With contributions by: Brian Keegan, Piotr Konieczny, and Tilman Bayer

Burstiness in Wikipedia editing

Reviewed by Brian Keegan

Wikipedia pages are edited with varying levels of consistency: stubs may only have a dozen or fewer revisions and controversial topics might have more than 10,000 revisions. However, this editing activity is not evenly spaced out over time either: some revisions occur in very quick succession while other revisions might persist for weeks or months before another change is made. Many social and technical systems exhibit “bursty” qualities of intensive activity separated by long periods of inactivity. In a pre-print submitted to arXiv, a team of physicists at the Belgian Université de Namur and Portuguese University of Coimbra examine this phenomenon of “burstiness” in editing activity on the English Wikipedia.[1]

The authors use a database dump containing the revision history until January 2010 of 4.6 million English Wikipedia pages. Filtering out pages and editors with fewer than 2000 revisions, bots, and edits from unregistered accounts, the paper adopts some previously-defined measures of burstiness and cyclicality in these editing patterns. The measures of editors’ revisions’ burstiness and memory fall outside of the limits found in prior work about human dynamics, suggesting different mechanisms are at work on Wikipedia editing than in mobile phone communication, for example.

Using a fast Fourier transform, the paper finds the 100 most active editors have signals occurring at a 24-hour frequency (and associated harmonics) indicating they follow a circadian pattern of revising daily as well as differences by day of week and hour of day. However, the 100 most-revised pages lack a similar peak in the power spectrum: there is no characteristic hourly, daily, weekly, etc. revision pattern. Despite these circadian patterns, editors’ revision histories still show bursty patterns with long-tailed inter-event times across different time windows.

The paper concludes by arguing, “before performing an action, we must overcome a “barrier”, acting as a cost, which depends, among many other things, on the time of day. However, once that “barrier” has been crossed, the time taken by that activity no longer depends on the time of day at which we decided to perform it. … It could be related to some sort of queuing process, but we prefer to see it as due to resource allocation (attention, time, energy), which exhibits a broad distribution: shorter activities are more likely to be executed next than the longer ones.”

Emerging trends based on Wikipedia traffic data and contextual networks

Reviewed by Brian Keegan

Google Trends is widely used in academic research to model the relationship between information seeking and other social and behavioral phenomenon. However, Wikipedia pageview data can provide a superior – if underused – alternative that has attracted some attention for public health and economic modeling, but not to the same extent as Google Trends. The authors cite the relative openness of Wikipedia pageview data, the semantic disambiguation, and absolute counts of activity in contrast to Google Trends’ closed API, semantic ambiguity of keywords, and relative query share data. However, Trends data (at a weekly level) does go back to 2004, while pageview data (at an hourly level) is only available from 2008.

In a peer-reviewed paper published by PLoS ONE, a team of physicists perform a variety of time series analyses to evaluate changes in attention around the “big data” topic of Hadoop.[2] Defining two key constructs of relevance and representation based on the interlanguage links as well as hyperlinks to/from other concepts, they examine changes in these features over time. In particular, changes in the articles’ content and attention occurred in concert with the release of new versions and the adoption of the technology by new firms.

The time series analyses (and terms used to refer to them) will be difficult for non-statisticians to follow, but the paper makes several promising contributions. First, it provides a number of good critiques of research relying exclusive on Google Trends data (outlined above). Second, it provides some methods for incorporating behavioral data from strongly related topics and examining these changes over time in a principled manner. Third, the paper examines behavior across multiple languages editions rather than focusing solely on the English Wikipedia. The paper points to ways in which Wikipedia is an important information sources for tracking publication and recognition of new topics.

“Hidden revolution of human priorities: An analysis of biographical data from Wikipedia”

Reviewed by Piotr Konieczny

This paper[3] data mines Wikipedia’s biographies, focusing on individuals’ longevity, profession and cause of death. The authors are not the first to observe that the majority of Wikipedia biographies are about sportspeople (half of them soccer players), followed by artists and politicians. But they do make some interesting historical observations, such as that the sport rises only in the 20th century (particularly from the 1990s), that politics surpassed religion in the 13th century, until it was surpassed by sport, and so on. The authors divide the biographies into public (politicians, businessmen, religion) and private (artists and sportspeople) and note that it was only in the last few decades that the second group started to significantly outnumber the first; they conclude that this represents a major shift in societal values, which they refer to as “hidden revolution in human priorities”. It is an interesting argument, though the paper is unfortunately completely missing the discussion of some important topics, such as the possible bias introduced by Wikipedia’s notability policies.

“Women through the glass-ceiling: gender asymmetries in Wikipedia”

Reviewed by Piotr Konieczny

This paper[4] looks into gender inequalities in Wikipedia articles, presenting a computational method for assessing gender bias in Wikipedia along several dimensions. It touches on a number of interesting questions, such as whether the same rules are used to determine whether women and men are notable; whether there is linguistic bias, and whether articles about men and women have similar structural properties (e. g., similar meta-data, and network properties in the hyperlink network).

They conclude that notability guidelines seem to be more strictly enforced for women than for men, that linguistic bias exists (ex. one of the four words most strongly associated with female biographies is “husband”, whereas such family-oriented words are much less likely to be found in biographies of male subjects), and that as the majority of biographies are about men and men tend to link more to men than to women, this lowers visibility of female biographies (for example, in search engines like Google). The authors suggest that Wikipedia community should consider lowering notability requirements for women (controversial), and adding gender-neutral language requirements to the Manual of Style (a much more sensible proposal).

Briefly

Wikipedia influences medical decisionmaking in acute and critical care

Reviewed by Tilman Bayer

A survey[5] of 372 anesthesists and critical care providers in Austria and Australia found that “In order to get a fast overview about a medical problem, physicians would prefer Google (32%) over Wikipedia (19%), UpToDate (18%), or PubMed (17%). 39% would, at least sometimes, base their medical decisions on non peer-reviewed resources. Wikipedia is used often or sometimes by 77% of the interns, 74% of residents, and 65% of consultants to get a fast overview of a medical problem. Consulting Wikipedia or Google first in order to get more information about the pathophysiology, drug dosage, or diagnostic options in a rare medical condition was the choice of 66%, 10% or 34%, respectively.” (A 2012 literature review found that “Wikipedia is widely used as a reference tool” among clinicians.)

Other recent publications

A list of other recent publications that could not be covered in time for this issue – contributions are always welcome for reviewing or summarizing newly published research.

Papers about medical content on Wikipedia and its usage

  • “How do Twitter, Wikipedia, and Harrison’s Principles of Medicine describe heart attacks?”[6] From the abstract: “For heart attacks, the chapters from Harrison’s had higher Jaccard similarity to Wikipedia than Braunwald’s or Twitter. For palpitations, no pair of sources had a higher Jaccard (token) similarity than any other pair. For no source was the Jaccard (token) similarity attributable to semantic similarity. This suggests that technical and popular sources of medical information focus on different aspects of medicine, rather than one describing a simplified version of the other.”
  • “Information-seeking behaviour for epilepsy: an infodemiological study of searches for Wikipedia articles[7] From the abstract: “Fears and worries about epileptic seizures, their impact on driving and employment, and news about celebrities with epilepsy might be major determinants in searching Wikipedia for information.”
  • “Wikipedia and neurological disorders”[8] From the abstract: “We determined the highest search volume peaks to identify possible relation with online news headlines. No relation between incidence or prevalence of neurological disorders and the search volume for the related articles was found. Seven out of 10 neurological conditions showed relations in search volume peaks and news headlines. Six of these seven peaks were related to news about famous people suffering from neurological disorders, especially those from showbusiness. Identification of discrepancies between disease burden and health seeking behavior on Wikipedia is useful in the planning of public health campaigns. Celebrities who publicly announce their neurological diagnosis might effectively promote awareness programs, increase public knowledge and reduce stigma related to diagnoses of neurological disorders.”
  • “Medical student preferences for self-directed study resources in gross anatomy”[9] From the abstract: “To gain insight into preclinical versus clinical medical students’ preferences for SDS resources for learning gross anatomy, […] students were surveyed at two Australian medical schools, one undergraduate-entry and the other graduate-entry. Lecture/tutorial/practical notes were ranked first by 33% of 156 respondents (mean rank ± SD, 2.48 ± 1.38), textbooks by 26% (2.62 ± 1.35), atlases 20% (2.80 ± 1.44), videos 10% (4.34 ± 1.68), software 5% (4.78 ± 1.50), and websites 4% (4.24 ± 1.34). Among CAL resources, Wikipedia was ranked highest.”

Papers analyzing community processes and policies

  • “Transparency, control, and content generation on Wikipedia: editorial strategies and technical affordances”[10] From the abstract: “Even though the process of social production that undergirds Wikipedia is rife with conflict, power struggles, revert wars, content transactions, and coordination efforts, not to mention vandalism, the article pages on Wikipedia shun information gauges that highlight the social nature of the contributions. Rather, they are characterized by a “less is more” ideology of design, which aims to maximize readability and to encourage future contributions. … Closer investigation reveals that the deceivingly simple nature of the interface is in fact a method to attract new collaborators and to establish content credibility. As Wikipedia has matured, its public notoriety demands a new approach to the manner in which Wikipedia reflects the rather complex process of authorship on its content pages. This chapter discusses a number of visualizations designed to support this goal, and discusses why they have not as yet been adopted into the Wikipedia interface.”
  • “Policies for the production of content in Wikipedia, the free encyclopedia”[11] From the abstract: “It is a case study with qualitative approach that had Laurence Bardin‘s content analysis as theoretical and methodological reference.”
  • “Validity claims of information in face of authority of the argument on Wikipedia”[12] From the abstract: “proposes to approach the claims of validity made by Jürgen Habermas in the face of the authority of the better argument. It points out that Wikipedia is built as an emancipatory discourse according to Habermas’ argumentative discourse considering the process of discursive validation of information.”
  • “Wikipedia and history: a worthwhile partnership in the digital era?”[13]
  • “Is Wikipedia really neutral? A sentiment perspective study of war-related Wikipedia articles since 1945”[14] From the abstract: “The results obtained so far show that reasons such as people’s feelings of involvement and empathy can lead to sentiment expression differences across multilingual Wikipedia on war-related topics; the more people contribute to an article on a war-related topic, the more extreme sentiment the article will express; different cultures also focus on different concepts about the same war and present different sentiments towards them.”
  • “The heart work of Wikipedia: gendered, emotional labor in the world’s largest online encyclopedia”[15] (CHI 2015 Best Papers award, slides)
  • “Knowledge quality of collaborative editing in Wikipedia: an integrative perspective of social capital and team conflict”[16] From the abstract: “Despite the abundant researches on Wikipedia, to the best of our knowledge, no one has considered the integration of social capital and conflict. […] our study proposes the nonlinear relationship between task conflict and knowledge quality instead of linear relationships in prior studies. We also postulate the moderating effect of task complexity. […] This paper aims at proposing a theoretical model to examine the effect of social capital and conflict, meanwhile taking the task complexity into account.”

Papers about visualizing or mining Wikipedia content

  • “Visualizing Wikipedia article and user networks: extracting knowledge structures using NodeXL[17]
  • “Utilising Wikipedia for text mining applications”[18] From the abstract: “Wikipedia … has proven to be one of the most valuable resources in dealing with various problems in the domain of text mining. However, previous Wikipedia-based research efforts have not taken both Wikipedia categories and Wikipedia articles together as a source of information. This thesis serves as a first step in eliminating this gap and throughout the contributions made in this thesis, we have shown the effectiveness of Wikipedia category-article structure for various text mining tasks. … First, we show the effectiveness of exploiting Wikipedia for two classification tasks i.e., 1- classifying the tweets being relevant/irrelevant to an entity or brand, 2- classifying the tweets into different topical dimensions such as tweets related with workplace, innovation, etc. To do so, we define the notion of relatedness between the text in tweet and the information embedded within the Wikipedia category-article structure.”
  • “Integrated parallel sentence and gragment extraction from comparable corpora: a case study on Chinese-Japanese Wikipedia”[19] From the abstract: “A case study on the Chinese–Japanese Wikipedia indicates that our proposed methods outperform previously proposed methods, and the parallel data extracted by our system significantly improves SMT [ statistical machine translation ] performance.”
  • “How structure shapes dynamics: knowledge development in Wikipedia – a network multilevel modeling approach”‘[20] From the abstract: “The data consists of the articles in two adjacent knowledge domains: psychology and education. We analyze the development of networks of knowledge consisting of interlinked articles at seven snapshots from 2006 to 2012 with an interval of one year between them. Longitudinal data on the topological position of each article in the networks is used to model the appearance of new knowledge over time. […] Using multilevel modeling as well as eigenvector and betweenness measures, we explain the significance of pivotal articles that are either central within one of the knowledge domains or boundary-crossing between the two domains at a given point in time for the future development of new knowledge in the knowledge base.” (cf. earlier paper coauthored by the same researchers: “Knowledge Construction in Wikipedia: A Systemic-Constructivist Analysis”)

References

  1. Gandica, Yerali; Carvalho, Joao; Aidos, Fernando Sampaio Dos; Lambiotte, Renaud; Carletti, Timoteo (2016-01-05). “On the origin of burstiness in human behavior: The wikipedia edits case”. arXiv:1601.00864 [physics]. 
  2. Kämpf, Mirko; Tessenow, Eric; Kenett, Dror Y.; Kantelhardt, Jan W. (2015-12-31). “The Detection of Emerging Trends Using Wikipedia Traffic Data and Context Networks”. PLoS ONE 10 (12): e0141892. doi:10.1371/journal.pone.0141892. 
  3. Reznik, Ilia; Shatalov, Vladimir (February 2016). “Hidden revolution of human priorities: An analysis of biographical data from Wikipedia”. Journal of Informetrics 10 (1): 124–131. doi:10.1016/j.joi.2015.12.002. ISSN 1751-1577.  Closed access
  4. Wagner, Claudia; Graells-Garrido, Eduardo; Garcia, David (2016-01-19). “Women Through the Glass-Ceiling: Gender Asymmetries in Wikipedia”. arXiv:1601.04890 [cs]. Jupyter notebooks
  5. Rössler, B.; Holldack, H.; Schebesta, K. (2015-10-01). “Influence of wikipedia and other web resources on acute and critical care decisions. a web-based survey”. Intensive Care Medicine Experimental 3 (Suppl 1): –867. doi:10.1186/2197-425X-3-S1-A867. ISSN 2197-425X.  (Poster presentation)
  6. Devraj, Nikhil; Chary, Michael (2015). “How Do Twitter, Wikipedia, and Harrison’s Principles of Medicine Describe Heart Attacks?”. Proceedings of the 6th ACM Conference on Bioinformatics, Computational Biology and Health Informatics. BCB ’15. New York, NY, USA: ACM. pp. 610–614. doi:10.1145/2808719.2812591. ISBN 978-1-4503-3853-0. 
  7. Brigo F, Otte WM, Igwe SC, Ausserer H, Nardone R, Tezzon F, Trinka E. Information-seeking behaviour for epilepsy: an infodemiological study of searches for Wikipedia articles. Epileptic Disorders, 2015 Dec 1;17(4):460 DOI:10.1684/epd.2015.0772 Closed access
  8. Brigo, Francesco; Igwe, Stanley C.; Nardone, Raffaele; Lochner, Piergiorgio; Tezzon, Frediano; Otte, Willem M. (July 2015). “Wikipedia and neurological disorders”. Journal of Clinical Neuroscience: Official Journal of the Neurosurgical Society of Australasia 22 (7): 1170–1172. doi:10.1016/j.jocn.2015.02.006. ISSN 1532-2653. PMID 25890773. 
  9. Choi-Lundberg, Derek L.; Low, Tze Feng; Patman, Phillip; Turner, Paul; Sinha, Sankar N. (2015-05-01). “Medical student preferences for self-directed study resources in gross anatomy”. Anatomical Sciences Education: n/a. doi:10.1002/ase.1549. ISSN 1935-9780.  Closed access
  10. Matei, Sorin Adam; Foote, Jeremy (2015). “Transparency, Control, and Content Generation on Wikipedia: Editorial Strategies and Technical Affordances”. In Sorin Adam Matei, Martha G. Russell, Elisa Bertino (eds.). Transparency in Social Media. Computational Social Sciences. Springer International Publishing. pp. 239–253. ISBN 978-3-319-18551-4.  Closed access
  11. Sandrine Cristina de Figueirêdo Braz, Edivanio Duarte de Souza: Políticas para produção de conteúdos na Wikipédia, a enciclopédia livre (Policies For The Production Of Contents In The Wikipedia, The Free Encyclopedia). In: ENCONTRO NACIONAL DE PESQUISA EM CIÊNCIA DA INFORMAÇÃO, 15., 2014, Belo Horizonte. Anais … Belo Horizonte: UFMG, 2014. PDF (in Portuguese, with English abstract)
  12. Marcio Gonçalves, Clóvis Montenegro de Lima: Pretensões de validade da informação diante da autoridade do argumento na wikipédia (Validity claims of information in face of authority of the argument on wikipedia). In: ENCONTRO NACIONAL DE PESQUISA EM CIÊNCIA DA INFORMAÇÃO, 15., 2014, Belo Horizonte. Anais … Belo Horizonte: UFMG, 2014. PDF (in Portuguese, with English abstract)
  13. Phillips, Murray G. (2015-10-07). “Wikipedia and history: a worthwhile partnership in the digital era?”. Rethinking History 0 (0): 1–21. doi:10.1080/13642529.2015.1091566. ISSN 1364-2529.  Closed access
  14. Yiwei Zhou, Alexandra I. Cristea and Zachary Roberts: Is Wikipedia really neutral? A sentiment perspective study of war-related Wikipedia articles since 1945. 29th Pacific Asia Conference on Language, Information and Computation pages 160–68. Shanghai, China, October 30 – November 1, 2015 PDF
  15. Menking, Amanda; Erickson, Ingrid (2015). “The heart work of Wikipedia: gendered, emotional labor in the world’s largest online encyclopedia”. Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. CHI ’15. New York, NY, USA: ACM. pp. 207–210. doi:10.1145/2702123.2702514. ISBN 978-1-4503-3145-6.  Closed access , also as draft version on Wikimedia Commons
  16. Zhan, Liuhan; Wang, Nan; Shen, Xiao-Liang; Sun, Yongqiang (2015-01-01). “Knowledge quality of collaborative editing in Wikipedia: an integrative perspective of social capital and team conflict”. PACIS 2015 Proceedings. 
  17. Shalin Hai-Jew (Kansas State University, US): Visualizing Wikipedia article and user networks: extracting knowledge structures using NodeXL. In: Developing Successful Strategies for Global Policies and Cyber Transparency in E-Learning. DOI:10.4018/978-1-4666-8844-5.ch005 Closed access
  18. Qureshi, Muhammad Atif (2015-10-08). “Utilising Wikipedia for text mining applications”.  (PhD thesis, U Galway)
  19. Chu, Chenhui; Nakazawa, Toshiaki; Kurohashi, Sadao (December 2015). “Integrated parallel sentence and gragment extraction from comparable corpora: a case study on Chinese-Japanese Wikipedia”. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 15 (2). doi:10.1145/2833089. ISSN 2375-4699.  Closed access
  20. Halatchliyski, Iassen; Cress, Ulrike (2014-11-03). “How structure shapes dynamics: knowledge development in Wikipedia – a network multilevel modeling approach”. PLoS ONE 9 (11): e111958. doi:10.1371/journal.pone.0111958. 

Wikimedia Research Newsletter
Vol: 6 • Issue: 01 • January 2016
This newletter is brought to you by the Wikimedia Research Committee and The Signpost
Subscribe: Syndicate the Wikimedia Research Newsletter feed Email WikiResearch on Twitter[archives] [signpost edition] [contribute] [research index]

by Tilman Bayer at January 29, 2016 09:07 PM

Wiki Education Foundation

Partnering with the Society for Marine Mammalogy

Educational Partnerships Manager, Jami Mathewson
Educational Partnerships Manager, Jami Mathewson

Wikipedia recently celebrated its 15th birthday, and there’s so much content left to expand and improve. That’s why the Wiki Education Foundation started the Wikipedia Year of Science—to encourage more scientists to join our programs and contribute knowledge to Wikipedia. I’m happy to announce that Wiki Ed has partnered with the Society for Marine Mammalogy (SMM) for just that reason. Expert scientists can improve Wikipedia’s coverage of marine mammal science by assigning their students to edit Wikipedia, or sponsoring a Visiting Scholar.

Read with porpoise

Last month, Outreach Manager Samantha Erickson and I joined Shane Gero of the Society for Marine Mammalogy at their annual meeting in San Francisco. We spent four hours with attendees, discussing Wikipedia’s culture and ideology, highlighting best practices for using Wikipedia in the classroom, and showcasing Wiki Ed’s Dashboard and other tools.

We also experimented with an idea for expert engagement, asking attendees to assess Wikipedia’s existing coverage in their area of study. We used Monika Sengul-Jones’ learning patterns from her experience coordinating what we call a content gap analysis, essentially a needs assessment for Wikipedia content. What’s missing? What could be improved? The marine mammalogists dived into Wikipedia’s content in search of missing sections, missing sources, and missing articles; but also inaccurate information, an imbalance in content compared to the underlying scholarship, and opportunities to turn an article subsection into its own encyclopedic entry.

To guide these scientists, we asked them to identify a gap in marine mammal science related to their research, studies, and expertise (e.g., Are they a leading expert in porpoises? Cetaceans? Marine conservation?) and consider:

  1. Is there content in this article that doesn’t belong there?
  2. Is one research method, point of view, or side of an issue represented in an imbalanced way compared to the academic literature of the topic? Is there evidence of bias in the article?
  3. How does the topic relate to existing content on Wikipedia? What other topics ought link to it, but don’t?
  4. What are the key sources someone would use to write this content? Be as specific as possible—you may even want to add a bibliography.
  5. Optionally, we asked if there were images on Wikimedia Commons that should be in the article. Were there any other media related to this topic that are not yet on Wikimedia Commons but might improve this article?
  6. Again, as an option, we asked if they would draft up a paragraph-long overview of the missing content, aimed to be an inspiration for the lead section of the article.
  7. Finally, we asked: what kind of university or college course studies this topic?

This final question can help Wiki Ed match notes from experts to students participating in the Classroom Program as we grow our partnership.

Cetacean needed

Several marine mammal articles are already high quality, including Featured Articles about killer whales, pinnipeds, and sea otters. SMM conference attendees have identified the following areas for improvement, which we’ll encourage Classroom Program students and Visiting Scholars to use for direction in their editing:

  • Marine mammal monitoring: observation practices and protocols, available technology, and legislative requirements for monitoring marine mammal behavior while mitigating the impact on their ecosystems
  • Aerobic dive limit: an unrepresented concept with nearly 1,000 results on Google Scholar alone
  • Marine mammal health: most of Wikipedia’s threats to marine mammal health revolve around pollution, yet there is little information about disease and harmful algal blooms
  • Human impacts on marine mammal ecosystems: noise pollution, boat speed, etc., and guidelines experts recommend to reduce negative impacts
  • Cetacean intelligence: limited information is available about the cognitive capacity of bottlenose dolphins, though the existing research confirms this is a notable topic

You otter join Wiki Ed’s programs

We’re excited about this partnership with the Society for Marine Mammalogy, and its potential to positively contribute to Wikipedia. Marine mammalogists and SMM members can join the Year of Science in the following ways:

  • Assign students to edit Wikipedia. This is a proven way to amplify impact and improve many articles within the field.
  • Sponsor a Visiting Scholar. This is one of the best ways to affect high-quality changes in articles, as experienced Wikipedia editors can help bring articles up to Good Article or Featured Article status.
  • Review articles within your expertise, and add your comments, suggested sources, and guidelines to the content gap page on Wikipedia.
  • Join the Marine Mammal WikiSprint, a virtual edit-a-thon held regularly, including this week!

Photo: “Sea otters holding hands” by Joe Robertson from Austin, Texas, USA. – holding hands. Licensed under CC BY 2.0 via Wikimedia Commons.

by Jami Mathewson at January 29, 2016 06:16 PM

Wikimedia Foundation

WikipediansSpeak: Odia Wikisourcer shares her journey and goals

File:Odia Wikisourcer Pankajmala Sarangi sharing her experience and future plans.webm

Odia Wikisourcer Pankajmala Sarangi shares her experience and future plans to grow the community. Video by Pankajmala Sarangi (original video) and Subhashish Panigrahi (post production), freely licensed under CC BY-SA 4.0 license.

The most active editor on the Odia Wikisource is Pankajmala Sarangi, a native of Odisha who now lives in New Delhi, where she works at a non-profit. As a leader in a broad community that is dominated by males—indeed, she is the most active contributor to the Odia-language Wikisource—we asked her to share her journey and her goals to grow the project and community as part of the “WikipediansSpeak” interview series.

What is the community like on the Odia Wikisource?

Pankajmala feels that the community is like her home. “I can’t tell how happy I am after seeing that this one year-old project has already digitized over 200 books. With more and more youth coming on the internet, the internet won’t disappoint them when the type and search in Odia language.”

What are projects that you would like to start or get help from the existing community to grow?

  • Forming expert /resource groups to increase the thematic group structure in the community so that each group could work collaboratively for specific goals.
  • We can also create groups through the help of the resident welfare associations in Odisha cities where Odia WikiTungis (in Odia Wikimedia community city based informal groups have been started that actively organize outreach and engage with new Wikimedians) are already working. They can work hand-in-hand. This will help us expand this program to more new places.
  • We can tie up with basic computer training institutes where their students and new Wikimedians who do not have access to computer/internet could learn about Odia Wikipedia editing as a vocational training. These institutes remain idle during day time and get busy after 4 pm as schools and college students come to learn about computer basics after their class hours.
  • One idea could be involving veterans whose expertise could help improve the quality of the articles which is otherwise going in vain after retirement. Post retirement life is otherwise lonely and many feel worthless who could enjoy the company of many new friends. The senior citizens groups could train new Wikimedians using these institutional facilities.
  • Summer vacation Wikipedia outreach for school/college students:
    • It has became mandatory in all private schools and colleges to do voluntary work for a few hours every day for six months to finish up a program. We can ask these private institutions to include editing and contributing to Odia Wikipedia and other Odia Wikimedia projects in their syllabus. They would not only get Wikipedians as facilitators without spending any money but will become part of such a global and multilingual group. We can involve students for both editing Wikipedia articles, and digitizing and correcting typos and other mistakes on Odia Wikisource. If a manual with the above details is available then it would be better to refer it while working. When we are discussing something in our community then the users (who are eligible for giving suggestions) should automatically get a message saying your suggestion/input is required on this (with that link to the page).

An overall statistics of the Odia Wikisource as per December 2015. Infograhphics by Subhashish Panigrahi, freely licensed under CC BY-SA 4.0.

According to a 2011 survey, Wikipedia editors are only about nine percent female. The Wikisource statistics is not yet known, but I would theorize that it is similar. How do you think we could bridge this gap in Odia?

We surely have less women. We could reorient our current work to bring in a few other aspects like more focused outreach in women’s college and schools, creating a network of women who are interested to contribute to Wikimedia projects, making Twitter lists and Facebook groups for women for more friendly conversation and support, inviting and involving more women participants in Wikimedia outreach. I also wonder how we can gift the top contributors some gifts as a token of appreciation. We could also organize field trips for them to a public library, museum or art gallery so that they get to see how Wikimedia projects could grow by imbibing available resources.

What are your personal plans to build a community for Odia Wikisource in New Delhi?

Well, I think I would work on creating a database of all the Odia speakers living in New Delhi and the city organizations that work in propagating Odia language and culture, and plan Wikisource outreach programs for them.

Subhashish Panigrahi, Wikimedian and Programme Officer, Access to Knowledge (CIS-A2K), Centre for Internet and Society
Nasim Ali, Odia and English Wikimedian

This post is part of the WikipediansSpeak series, which aims to chronicle the voices of the Wikipedia community. You can find more of these posts on the Wikimedia Commons.

by Subhashish Panigrahi and Nasim Ali at January 29, 2016 01:07 PM

Wikimedia Tech Blog

Content Translation tool has now been used for 50,000 articles

Wikimania_Translathon_20150718_162444
Content Translation session at Wikimania 2015. Photo by Amire80, freely licensed under CC BY-SA 4.0.

Last year around this time, we announced the arrival of a new tool that evolved out of an experiment aimed at making the editing process easier for our users. The tool in question—Content Translation—was initially enabled for 8 languages: Catalan, Danish, Esperanto, Indonesian, Malay, Norwegian Bokmal, Spanish and Portuguese. Today, 12 months later, this article creating tool has been used by more than 11,000 editors across 289 Wikipedias to create more than 50,000 new articles.

Content Translation introduced a simple way to create Wikipedia articles through translation. Many editors have used this method for years in an effort to enrich content in Wikipedias where creation of high quality articles has been an uphill struggle due to many reasons. However, translating a Wikipedia article included several cumbersome steps like copying content across multiple browser tabs, manual adaption of links and references, etc. Content translation abstracts all these steps and provides a neat interface that is easy to use and provides a much faster method of creating a new article.

Content Translation is a beta feature. As part of the beta program, it is available for all logged-in users on 289 Wikipedias to try and provide us with their feedback.

Progress during the year

Over the last year, we have regularly documented the progress of the tool and how it was being adapted. Feedback from the users of Content Translation through many interactions helped us ascertain the features that had been helpful, or lacking and needed more attention. Also, we relied heavily on trends determined through the statistics that were being captured every day. For instance, during initial days we found that many users were unaware of the existence of this tool. To make it easier we surfaced several access points where the tool may be needed, including the contributions page, the list of interwiki languages on an article and other easily accessible spots. Sometime during the middle of 2015, we found that many users had not used the tool after 1 or 2 times. During conversations, users cited several reasons, like lack of machine translation support for their language, technical difficulties with some features, greater effort to find articles that needed translation etc. As a result, we focused on two key aspects:

  1. continued engagement with our returning users, and
  2. increased reliability and stability of the tool.

While working on Content Translation, we also made simultaneous improvements to the Statistics page. This page displays the weekly and total figures related to articles translated and deleted, as well as information related to the active languages. The statistics page (Special:ContentTranslationStats) is available in all wikis where Content Translation extension exists. Several interesting information is surfaced through the statistics page. For instance:

  • 64% of all articles have been translated from the English Wikipedia. Spanish is the second popular choice (12%).
  • more than 1000 new articles have been created in 15 languages, of which 6000 individual articles have been written in both Catalan and Spanish Wikipedias.
  • highest number of individual translators have used Content Translation in the Spanish Wikipedia (more than 2000).
  • the highest number of articles created during a single week is 1968. Over 1900 articles are created using Content Translation every week—up from about 1000 per week in August 2015, the first month when it was enabled in all languages.
  • weekly deletion rates have been found to be between 6 to 8% of the total articles created

Besides this regular set of data, occasionally we have observed some interesting trends related to specific events. For example, when a machine translation system was enabled on the Russian Wikipedia in early November, the weekly article translation numbers doubled and has continued to grow.

Cx-suggestions-accepted2015
Comparison between articles created in Content Translation with and without the suggestions feature. Image by Runa Bhattacharjee, public domain/CC0.

Engagement and Stability

One of the major outcomes in recent months is the addition of the ‘Suggestions’ feature. Instead of searching for what to do, users can view a list of articles that they can translate. This is an ongoing collaboration between the Language and Research teams at the Wikimedia Foundation. Users are displayed a list comprising of articles on topics determined on the basis of various factors like their past translations, popular topics in the language, etc. Additionally, topic-based targeted campaigns with predetermined article lists have also been introduced. The first of these was proposed by the Medical Translation Project and completed for translating a set of articles from English to Persian. A month after this feature was introduced, we found that suggestions have been used to start about 16% of the translations.

In terms of stability, increased usage of the tool has thrown up some of the technical challenges that need further attention. These include better handling of translation saving and publishing errors, reducing wikitext errors in published articles and uninterrupted service uptime through better monitoring of services. As a development team, constant interactions with users of Content Translation have been valuable as a source of information regards the performance of the tool and its shortcomings.

Coming up next

The main focus at the moment continues to be improving the wikitext sanity of the published content, reducing publishing and saving errors, and an overall improvement in stability of the article translation workflow.

Besides this, we will continue improvements of a feature that is an important aspect of this project. Content Translation uses third-party machine translation systems for several languages. To help benefit the wider machine translation development community, we recently completed the initial development of the parallel corpora API that provides an easy access to the human-modified translations. This is an open repository compiling examples of translated content and the corrections users had to make. It will be a valuable resource in improving quality and language coverage in all new and existing machine translation systems.

We would like to sincerely thank everyone for comments, feedback, encouragement and wholehearted participation that provided direction to this project. We look forward to many new things in the next 12 months.

You can share your comments and feedback about the Content Translation tool with the Wikimedia Language team at the project talk page. You can also follow us on twitter (@whattotranslate) for updates and other news.

Runa Bhattacharjee, Language team (Editing)
Wikimedia Foundation

by Runa Bhattacharjee at January 29, 2016 06:43 AM

Pete Forsyth, Wiki Strategies

Grants and transparency: Wikimedia Foundation should follow standards it sets

Former Wikimedia ED Sue Gardner (right) championed strong views about restricted grants and transparency. Have those values survived into the era of Lila Tretikov (left)? Photo by Victor Grigas, licensed CC BY-SA

Former Wikimedia ED Sue Gardner (right) championed strong views about restricted grants and transparency. Have those values survived into the era of Lila Tretikov (left)? Photo by Victor Grigas, licensed CC BY-SA

I wrote and edited a number of grant proposals and reports on behalf of the Wikimedia Foundation (WMF) from 2009 to 2011. In that role, I participated in a number of staff discussions around restricted grants, and transparency in the grant process. I was continually impressed by the dedication to transparency and alignment to mission and strategy.

As of 2015, however, the four people most strongly associated with those efforts at WMF have all left the organization; and I am concerned that the diligence and dedication I experienced may have left the organization along with them. Yesterday’s announcement of a $250,000 grant from the Knight Foundation increases my concern. That grant is apparently restricted to activities that are not explicitly established in any strategy document I’ve seen. It is also not specifically identified as a restricted grant.

In the WMF’s 2015-16 Annual Plan (which was open for public comment for five days in May), this phrase stood out:

Restricted amounts do not appear in this plan. As per the Gift Policy, restricted gifts above $100K are approved on a case-by-case basis by the WMF Board.

There does not appear to be any companion document (or blog posts, press releases, etc.) covering restricted grants.

When I worked for WMF, four people senior to me maintained strong positions about the ethics and mission-alignment relating to restricted grants:

  • Sue Gardner, Executive Director
  • Erik Möller, Deputy Director
  • Frank Schulenburg, Head of Public Outreach
  • Sara Crouse, Head of Partnerships and Foundation Relations

They strongly advocated against accepting restricted grants (primarily Gardner), and for publishing substantial portions of grant applications and reports (primarily Möller). At the time, although we worked to abide by those principles, we did not operate under any formal or even semi-formalized policy or process. [UPDATE Jan. 28: I am reminded that Gardner did in fact articulate a WMF policy on the topic in October 2011. Thanks MZMcBride.] I am proud of the work we did around restricted grants, and I benefited greatly in my understanding of how organizational needs intersect with community values. These principles influenced many activities over many years; in public meeting minutes from 2009, for instance, Gardner articulated a spending area (data centers) that would be appropriate for restricted grants.

Today, however, none of us still works for Wikimedia (though Gardner retains an unpaid position as Special Advisor to the Board Chair).

In the time since I left, there has been very little information published about restricted grants. The English Wikipedia article about the Wikimedia Foundation reflects this: it mentions a few grants, but if I’m not mistaken, the most recent restricted grants mentioned are from 2009.

Restricted grants can play a significant role in how an organization adheres to its mission. Last year, Gardner blogged about this, advocating against their use. While her observations are valuable and well worth consideration, I would not suggest her view settles the issue — restricted grants can be beneficial in many cases. But irrespective of her ultimate conclusion, her post does a good job of identifying important considerations related to restricted grants.

The principles of Open Philanthropy, an idea pioneered by Mozilla Foundation executive director Mark Surman, and long championed by Wikimedia Advisory Board member Wayne Mackintosh, align strongly with Wikimedia’s values. The Open Philanthropy doctrine emphasizes (among other things) publishing grant applications and reports and inviting scrutiny and debate.

In its grant-giving capacity, the Wikimedia Foundation appears to practice Open Philanthropy (though it doesn’t explicitly use the term). It has published principles for funds dissemination:

  • Protect the core
  • Assess impact
  • Promote transparency and stability
  • Support decentralized activity
  • Promote responsibility and accountability
  • Be collaborative and open

Those principles are not mere words, but are incorporated into the organization’s grant-giving activities. For example, the WMF’s Annual Plan program, which funds chapters and affiliates, requires applicants to submit proposals for public review for 30 days, and to make public reports on past grants. The Project and Event Grants program also requires open proposals and open reports.

But the Wikimedia Foundation appears to still lack any clear standard for transparency of the restricted grants it receives. (There is less urgency for openness in the case of unrestricted grants, which by definition do not obligate the recipient to shift its operational priorities. But conditions are sometimes attached to unrestricted or restricted grants, such as the appointment of a Trustee; these should be clearly disclosed as well.) The WMF Gift Policy merely asserts that “Restricted gifts [of $100k+] may be accepted for particular purposes or projects, as specified by the Foundation [and with Board approval].”

Addendum: I have been reminded that in November 2015, the Wikimedia Foundation’s Funds Dissemination Committee — which advises the Board of Trustees on the Annual Plan Grants mentioned above, but has no formal authority over the WMF itself — voiced strong criticism of the Wikimedia Foundation’s lack of adherence to the standards it requires of affiliates. The critique is well worth reading in full, but this sentence captures its spirit:

The FDC is appalled by the closed way that the WMF has undertaken both strategic and annual planning, and the WMF’s approach to budget transparency (or lack thereof).

In December 2015, the Wikimedia Board of Trustees removed one of its own members, Dr. James Heilman — one of the three Trustees selected by community vote. Though the full story of behind this action has not emerged, Dr. Heilman has maintained that his efforts to increase the organization’s transparency were met with resistance.

What can the WMF’s current practices around restricted grants, and grants with conditions attached, tell us about its commitment to transparency? Can, and should, its transparency around grants be improved? I believe there is much room for improvement. The easiest and most sensible standard, I believe, would be for the WMF to adopt the same transparency standards in the grants it pursues, as it requires of the people and organizations it funds.

by Pete Forsyth at January 29, 2016 02:31 AM

January 28, 2016

Wikimedia Foundation

MIT’s Pantheon explores historical culture with Wikipedia

The_United_Trinity,_George_Best,_Denis_Law_and_Bobby_Charlton
Denis Law, centre, is scientifically the most famous person from my home town. Who’s yours? Photo by apasciuto, freely licensed under CC BY 2.0.

Who is, scientifically, the most famous person in your home town? A new research project might be able to tell you.

Pantheon, a project developed by the Macro Connections group at the MIT Media Lab, is collecting data from thousands of Wikipedia biographies across 25 language editions, then using that data to visualise historical significance.

With the dataset (and its data visualisation kit), it’s possible to create a treemap of the occupations of a certain region’s famous people across history. You can also use it to find out the most famous people in a certain city—in my home town of Aberdeen, Scotland, that honour belongs to Manchester United footballer, and one-third of the “United Trinity“, Denis Law.

The tool uses a metric the team call “Historical Popularity Index”, gained through a variety of methods laid out on the tool’s methods page. One of these is the number of Wikipedia language editions which contain the person’s biography—for example, Jesus Christ is featured in 214 Wikipedias.

César Hidalgo, director of the MIT Media Lab’s Macro Connections group, where he uses Wikipedia data together with Amy Yu, Cristian Jara, and Shahar Ronen, says the project studies “collective memory”. During a previous project, in which he worked on mapping cities and countries’ industrial output, he realised that he was missing a significant part of national exports: “The US exports soybeans and jet engines, but it is also the birthplace of Miles Davis and Malcolm X. That should count for something, but our records of international trade do not gather information on cultural exports.

“I decided to start a project to map globally famous biographies as a mean to map [these] cultural exports,” he says. “By now our thinking has evolved, and we think of this dataset as a biographical view of human collective memory. But the origins came from the cultural exports framing.”

Right now, the dataset is limited to just over 11,000 individuals, thanks in part to manual verification and data cleaning. The team of researchers behind the project say this incompleteness is unavoidable, but that it’s also a motivation for them to continue to improve the service.

To find the most popular person on Wikipedia from your home town, head over to Pantheon’s site and find the right city.

Joe Sutherland, Communications intern
Wikimedia Foundation

by Joe Sutherland at January 28, 2016 10:41 PM