February 05, 2016

Wiki Education Foundation

Looking back at the Classroom Program in 2015

Helaine Blumenthal
Helaine Blumenthal

Wiki Ed supported more students in 2015 than ever before. With improvements in our tools and resources, we’ve been able to maintain quality work from those student editors.

It was a year of rapid growth and considerable change. In one term, the courses we supported rose from 117 (spring 2015) to 162 in fall 2015 — an almost 40% increase. The number of students enrolled in those courses rose from 2,290 in spring to 3,710 in the fall — a 60% increase. Closing the gender content gap, one of our major initiatives, has also seen great success. We supported 34 women’s studies classes in 2015. In those classes, 907 students improved 696 articles and created 89 new ones.

Those numbers tell a story of incredible growth supported by more resources available to classes than we’ve ever been able to offer. These students contributed 5.15 million words to Wikipedia, improved 10,200 articles, and created 1,019 new entries. Printed and bound as a book, that would be just over 7,000 pages— 13 days of silent reading.

But there’s a quirk in that story.

When we compared Fall 2015 to Fall 2014, we saw that there were nearly 1,000 more students in Fall 2015. That’s great news. But the weird thing was that these students didn’t seem to be contributing as many words as their cohort from Fall 2014.

We scratched our heads. Are our new resources causing a reduction in student contributions? We’ve always known that growth alone does not a success make. To keep quality on pace with quantity, we launched our Dashboard in the spring. That’s helped instructors create better Wikipedia assignments and track student work. Moving from Wikipedia to our Dashboard was a major change to the Classroom Program. It has been a big improvement for instructors, students, and the community.

And yet, students contributed less content.

We wondered if the course creation tool was making it so easy for instructors to design courses, that they were designing courses where students weren’t asked to contribute as much as we know they can.

We also changed the way students take, and instructors track, the online training. As a result, many of our courses have added the training as a graded requirement. In the spring, 52% of the students we supported completed the training, and 74% completed it in the fall. We think that’s led to higher-quality contributions and fewer issues involving student work. But has it lead to fewer contributions?

It would have been a mystery, but luckily, Wiki Ed brought on Kevin Schiroo, resident data detective. His first case was to examine what happened to content this term. After all, our chief focus is on improving Wikipedia in ways that tap students’ abilities, and give students the confidence to make bold contributions of their knowledge to the public sphere. We want to make sure students in these courses are challenged to contribute the quality and quantity of work we know is possible.

What Kevin told us was kind of amazing. It wasn’t that students were asked to contribute less this term. It wasn’t that we had discouraged bold editing through our online training or classroom resources.

The issue was: Their content was being challenged less often, leading to fewer whole-scale revisions of content. We always encouraged students to contribute to Wikipedia in the spirit of a peer review. That’s one of the great learning experiences the assignment carries. In the past, students made contributions, which were then questioned by other editors. In the fall of 2014, our students were more likely to revert changes without discussing the reasoning behind the change. This resulted in stressful back-and-forth reversions, and even edit wars.

For each term, Kevin made a list of all the articles students had worked on. From that list, he pulled all the revisions that were made to those articles during the term to find and remove reverted edits. With a list that was clear of any unproductive contributions, he was able to tally all of the words that were added to the page by students, knowing that anything that remained was a productive edit. Counting in this way, the difference in words added between the two terms became significantly smaller. Kevin concluded that the reverted content had been inflating the productivity of Fall 2014 compared to Fall 2015.

We heard and responded to those concerns from 2014 throughout 2015. We created a series of red flags for onboarding classes, improved our student and instructor training, and created tools to track student contributions more efficiently. We’re serious about making sure Wikipedia assignments benefit Wikipedia, as well as student learning outcomes.

This term, we’re seeing the fruits of those efforts to improve contributions. Students are getting their contributions right, and when they aren’t, they’re more likely to discuss community edits appropriately. The result is that a whopping 40% of the drop in student content contributed to Wikipedia this term is the result of students following better Wikipedia etiquette. We’ve seen a real drop in reversions and problematic edits.

The content they’ve contributed may fill fewer books than students wrote in 2014, but the books they’re writing require less revising from longterm Wikipedia editors. And those books would hold some incredible, and diverse, content. For example, a detailed description of the surface of Venus, a history of Western Canada, lots of information about women scientists, information on Japanese theater (some using sources that had been translated from Japanese). It would also have a lot of information about bees and mass spectrometry.

In hindsight, it’s been a great year for student contributions on Wikipedia. In fact, we were surprised to see just how much the support efforts have paid off. It makes us even more confident that we can continue to grow through 2016 while maintaining excellent student contributions. We’re constantly expanding tools and resources to make sure this trend continues.

We’re still looking for courses in all subjects. Our Year of Science is especially aimed at supporting courses in STEM or other sciences. We’d love to hear from you.

Helaine Blumenthal
Classroom Program Manager

by Helaine Blumenthal at February 05, 2016 06:30 PM

Wikimedia DC hosting Wiki Ed workshop, Feb. 12

Year_of_Science_Logo.svgOn Friday, February 12, the Wiki Education Foundation will join university instructors in Washington, D.C., to talk about the Year of Science. Together, we’ll discuss how to share comprehensive science information with millions of readers by improving Wikipedia articles in the sciences. Along the way, we’ll challenge university students across the United States and Canada to improve critical thinking, media literacy, and science communications skills.

We thank Wikimedia DC members for hosting us!

Space is limited; please RSVP here. Instructors and faculty from any institution of higher education are invited to attend.

We’re grateful to Google and the Simons Foundation for providing major support for the Year of Science in 2016. Hope to see you there!

Samantha Erickson
Outreach Manager

Photo: “Aerial view of Georgetown, Washington, D.C.” by Carol M. HighsmithThis image is available from the United States Library of Congress‘s Prints and Photographs division under the digital ID highsm.14570Public Domain.

by Samantha Erickson at February 05, 2016 06:20 PM

Wiki Ed attended LSA conference in January

Samantha Erickson
Samantha Erickson

In January, I traveled to Washington, D.C. to attend the Linguistic Society of America annual conference and spread the word about the Wikipedia Year of Science. This was our first conference with LSA after announcing our partnership in November.

I met close to 100 linguistics instructors over three days for lots of conversation about language learning, digital pedagogy, and the presence of linguistics content online.

I attended an edit-a-thon run by LSA member Gretchen McCulloch, and a department chair’s roundtable meeting. The common theme across these events was that yes, students do use Wikipedia! The Classroom Program opens the discussion of media literacy, and finally questions how they use it.

Wikipedia classroom projects benefit linguists, too, by bringing accurate information to Wikipedia. Our partnership with the LSA helps ensure that information on linguistic topics online is accurate.

If you are an instructor interested in hosting a Wikipedia project or if you know of an instructor who may be a good fit, please let me know. I’m excited to support all new instructors in our Year of Science in 2016 and beyond!

We have plenty more workshops coming up:

  • Temple University, February 15, 3 – 5 p.m., Paley Library Lecture Hall, 1210 Polett Walk, Ground Floor.
  • Bryn Mawr College, February 16, 4:30 – 6 p.m. Location TBD. Registration required.
  • California State University, East Bay, February 19, 10 a.m. – 12 p.m., LI3079, Upper Mall level of the CSU East Bay Library, from 10 a.m. to noon (map). Register here. (more).
  • University of California, Davis, March 2, 10 a.m. – 12 p.m., Life Sciences building, room 1022 (map). Registration required (here), follow Wiki Ed for more details. Supported by the UC Davis Biotech program.

Samantha Erickson
Outreach Manager

by Samantha Erickson at February 05, 2016 05:00 PM

Jeroen De Dauw

Missing in PHP7: Named parameters

This is the second post in my Missing in PHP7 series. The previous one is about function references.

Readability of code is very important, and this is most certainly not readable:

getCatPictures( 10, 0, true );

You can make some guesses, and in a lot of cases you’ll be passing in named variables.

getCatPictures( $limit, $offset, !$includeNonApproved );

It’s even possible to create named variables where you would otherwise have none.

$limit = 10;
$offset = 0;
$approvedPicturesOnly = true;
getCatPictures( $limit, $offset, $approvedPicturesOnly );

This gains you naming of the arguments, at the cost of boilerplate, and worse, the cost of introducing local state. I’d hate to see such state be introduced even if the function did nothing else, and it only gets worse as the complexity and other state of the function increases.

Another way to improve the situation a little is by making the boolean flag more readable via usage of constants.

getCatPictures( 10, 0, CatPictureRepo::APPROVED_PICTURES_ONLY );

Of course, long argument lists and boolean flags are both things you want to avoid to begin with and are rarely needed when designing your functions well. It’s however not possible to avoid all argument lists. Using the cat pictures example, the limit and offset parameters can not be removed.

getApprovedCatPictures( 10, 0 );

You can create a value object, though this just moves the problem to the constructor of said value object, and unless you create weird function specific value objects, this is only a partial move.

getApprovedCatPictures( new LimitOffset( 10, 0 ) );

An naive solution to this problem is to have a single parameter that is an associative array.

getApprovedCatPictures( [
    'limit' => 10,
    'offset' => 0
] );

The result of this is catastrophe. You are no longer able to see which parameters are required and supported from the function signature, or what their types are. You need to look at the implementation, where you are also forced to do a lot of checks before doing the actual job of the function. So many checks that they probably deserve their own function. Yay, recursion! Furthermore, static code analysis gets thrown out of the window, making it next to impossible for tools to assist with renaming a parameter or finding its usages.

What I’d like to be able to do is naming parameters with support from the language, as you can do in Python.

getApprovedCatPictures( limit=10, offset=0 );

To me this is not a small problem. Functions with more than 3 arguments might be rare in a well designed codebase, though even with fewer arguments readability suffers greatly. And there is an exception to the no more than 3 parameters per function: constructors. Unless you are being rather extreme and following Object Calisthenics, you’ll have plenty of constructors where the lack of named parameters gets extra annoying. This is especially true for value objects, though that is a topic for another post.

by Jeroen at February 05, 2016 01:29 AM

February 04, 2016


What is the most significant subject wikipedia doesn’t have an article on

In absolute terms if you go by size its probably one of the voids that make up the large scale structure of the universe. In terms of mass it is probably a galaxy filament or wall.

On a less cosmic scale and limiting things to earth its probably the only ocean current not to have an article the South Australian Counter Current.

For humans its probably going to be a high ranking civil servant in a major country. Either someone high up in the Chinese system or someone like Kamal Pande the most recent Cabinet Secretary of India not to have an article

The most notable thing in wikipedia terms of shear number of citations commenting on its probably harder. Possibly some bacteria species popular in experiments? Its really rather hard to say.

by geniice at February 04, 2016 10:18 PM

Wiki Education Foundation

Questions and answers from Hunter College

Earlier this month, I hosted a “Teaching with Wikipedia” workshop at Hunter College alongside Chanitra Bishop. Twenty one instructors from across greater New York learned about our Classroom Program and Year of Science.

After showing off our Dashboard, the conversation shifted to details about running the assignment. Instructors wanted to know more about the six-week timeline, and wondered if the assignment might take up too much course time.

So, we walked them through a typical syllabus. Wikipedia assignments involve many small, weekly assignments that complement regular course work. These assignments can be as simple as creating a username on Wikipedia, for example — the task for week one. Once the assignment timeline was clear, instructors relaxed about the potential for integrating it.

If you’re an instructor, or know one who might be interested in a Wikipedia assignment, let us know. I’d love to help you get started!

We have some upcoming campus visits:

  • Temple University, February 15, 3 – 5 p.m., Paley Library Lecture Hall, 1210 Polett Walk, Ground Floor.
  • Bryn Mawr College, February 16, 4:30 – 6 p.m. Location TBD. Registration required.
  • California State University, East Bay, February 19, 10 a.m. – 12 p.m., LI3079, Upper Mall level of the CSU East Bay Library, from 10 a.m. to noon (map). Register here. (more).
  • University of California, Davis, March 2, 10 a.m. – 12 p.m., Life Sciences building, room 1022 (map). Registration required (here), follow Wiki Ed for more details. Supported by the UC Davis Biotech program.

Samantha Erickson
Outreach Manager

Photo:Samantha Erickson, Hunter College, NYC, 12 January 2016” by AramtakOwn work. Licensed under CC BY-SA 4.0 via Wikimedia Commons.

by Samantha Erickson at February 04, 2016 05:00 PM

Wikimedia Foundation

50 weird Super Bowl facts for 50 Super Bowls

“Left Shark” became a social media sensation last year for its offbeat and seemingly mistaken dance moves. Photo by Huntley Paton, freely licensed under CC BY-SA 2.0.

The first Super Bowl in 1967 was simulcast by two TV networks, NBC and CBS, which had to share one microphone in the postgame show. The teams used different balls because the Green Bay Packers and Kansas City Chiefs played in separate leagues and the balls were slightly different in shape. The cost of a 30-second commercial was $42,000. Due to the then-common practice of tape wiping, Super Bowl I was not seen again until 2016, when the NFL strung footage together from over two dozen sources and overlaid it with the radio broadcast.

These days, the Super Bowl is the most-watched US television broadcast each year—in fact, the NFL can say that with one way of counting, it holds the top 23 spots on the all-time list. Americans eat more on Super Bowl Sunday than they do any other day of the year, except for Thanksgiving. And with all the hoopla comes cultural zeitgeists: from multi-million dollar ads for failing startups to Left Shark’s viral popularity, the Super Bowl is a championship of pop culture.

Wikipedia chronicles them all.

The main Super Bowl article provides an overview of the National Football League championship, which started in 1966 in response to the growing popularity of the upstart American Football League. That page lists article pages for each game that note the halftime performers, cost of commercials, statistics, and quirky events.

Based in San Francisco and not far from the site of Super Bowl 50, the Wikimedia Foundation supports Wikipedia and its sister projects such as the media repository Wikimedia Commons. The foundation, fresh off celebrating Wikipedia’s 15th birthday, is paying homage to the Super Bowl’s 50th birthday with 50 fascinating factoids you are unlikely to find anywhere else.

Unlike mainstream media, Wikipedia and its sister projects are written and edited by volunteers—around 80,000 actively maintain its articles, which last year exceeded 5 million on the English-language Wikipedia alone (there are Wikipedia editions in 291 languages). Those volunteers combine and hone a crowdsourced view off the mainstream media path; there are many odd nuggets along the way. As we head into Super Bowl 50, the first one not to go by Roman numerals, take a peek at a quirky factoid for each of the games below.

Feel free to share them, show them off at your Super Bowl party, tweet them, or write about them in a blog post or article—Wikipedians will find more. Wikipedia’s Super Bowl of facts is played every day, all around the world, by all kinds of people.



The first Super Bowl featured the top teams from two separate leagues—the American and National Football Leagues. They would later merge under the latter’s name. Logo by unknown, public domain.


  1. It is the only Super Bowl to have been simulcast. NBC and CBS both televised the game—with both wanting to win the ratings war, tensions flared and a fence was built between their trucks.
  2. Almost 80% of the country lost the video feed of the CBS broadcast late in the second quarter.
  3. Performers representing players from the teams appeared on top of a large, multi-layered, smoke topped cake.
  4. The cost of one 30-second commercial was $78,000.
  5. The two teams had a Super Bowl record 11 combined turnovers in the game.
  6. Dolphins safety Jake Scott entered the game with a broken left hand and soon broke his right wrist as well.
  7. Dolphins employees inspected the trees around the practice field every day for spies from the Redskins.
  8. The Vikings complained that their practice facilities at a Houston high school had no lockers and most of the shower heads didn’t work.
  9. Pittsburgh played for a league championship for the first time in its 42-year team history.
  10. Scenes for the film Black Sunday, about a terrorist attack on the Super Bowl, were filmed during the game.
  11. The national anthem was not sung. Vikki Carr sang “America the Beautiful.”
  12. Halftime featured the Tyler Junior College Apache Belles drill team.
  13. Cowboys linebacker Thomas “Hollywood” Henderson said opposing quarterback Terry “Bradshaw couldn’t spell cat if you spotted him the C and the A.”
  14. The Rams barely outscored their opponents, ending the season up only 323-309 overall, and finished the regular season with a 9-7 record—the worst ever by a team who advanced to the Super Bowl.
  15. The winning Oakland Raiders were suing the NFL at the time of the game over a proposed move to Los Angeles.
  16. 49.1 percent of all US television households tuned into the game, the highest-rated Super Bowl of all time.
  17. A players’ strike reduced the 1982 regular season from a 16-game schedule to 9.
  18. The broadcast aired the famous “1984” television commercial, introducing the Apple Macintosh.
  19. Ronald Reagan appeared live via satellite from the White House and tossed the coin on the same day that he was inaugurated for a second term.
  20. The Bears’ post-Super Bowl White House visit was canceled due to the Space Shuttle Challenger disaster. Members of the team were invited back in 2011.
  21. Giants players celebrated their victory with what was then a new stunt—dumping a Gatorade cooler on head coach Bill Parcells.
  22. The halftime show featured 88 grand pianos.
  23. Prior to the game, Coca-Cola distributed 3-D glasses at retailers for viewers to use to watch the halftime festivities.
  24. The halftime show featured a float so huge that one of the goal posts had to be moved so it could be put on the field.
  25. Whitney Houston performedThe Star-Spangled Banner,” and the recording reached No. 20 on the Billboard Hot 100.
  26. Bills defensive line coach Chuck Dickerson said Redskins tackle Joe Jacoby was “a Neanderthal—he slobbers a lot, he probably kicks dogs in his neighborhood.”
  27. The opening coin toss featured OJ Simpson, who was working for NBC Sports at the time; the halftime ceremony featured Michael Jackson and 3,500 children.
  28. This main stadium lights were turned off for a halftime performance by dancers with yard-long light sticks.
  29. 30 second ads exceeded the $1,000,000 mark.
  30. Some weeks before the game, it was found that some proxy servers were blocking the web site for the event because XXX is usually associated with pornography.
  31. The last in a run of 13 straight Super Bowl victories by the NFC over the AFC.
  32. Except for two penalties and quarterback kneel-downs to end each half, the Broncos did not lose yardage on any play.
  33. On the night before the Super Bowl, Falcons safety Eugene Robinson was arrested for solicitation of prostitution after receiving the league award that morning for “high moral character.”
  34. Pets.com paid millions for an advertisement featuring a sock puppet. The company would collapse before the end of the year.
  35. This was the last Super Bowl to have individual player introductions for both teams.
  36. Janet Jackson was originally scheduled to perform at halftime, but allowed U2 to perform a tribute to September 11.
  37. Referred to as the “Pirate Bowl” due to the teams involved (the Buccaneers and Raiders).
  38. Janet Jackson‘s breast was exposed by Justin Timberlake in what was later referred to as a “wardrobe malfunction“.
  39. The Eagles signed Jeff Thomason, a former tight end who was working construction, to a one-game contract for the Super Bowl.
  40. Aretha Franklin, Aaron Neville, John and a 150-member choir performed the national anthem.
  41. The Art Institute of Chicago’s lions were decorated to show support for the Chicago Bears—see the photo at the bottom.
  42. The band Eels attempted to pull together 30 one-second ads but were told they could cause seizures.
  43. Due to the recession, 200 fewer journalists covered the game than the previous year.
  44. The U.S. Census Bureau spent $2.5 million on a 30-second commercial advertising the upcoming census.
  45. Fans who paid $200 per ticket for seats in a part of the stadium damaged by a winter storm were allowed to watch outside the stadium.
  46. Some hotel rooms in downtown Indianapolis reportedly cost more than $4,000 a night.
  47. Power went out in the Superdome, causing a 34-minute interruption in play. Luckily Norman the Scooter Dog was in New Orleans to entertain.
  48. The Broncos hosted press conferences on a cruise ship at the pier of their Jersey City, N.J., hotel.
  49. Left Shark,” pictured at the top, became an Internet meme.


And for Super Bowl 50, the only Super Bowl to be identified without a Roman numeral: CBS set the base rate for a 30-second ad at $5,000,000, a record high price for a Super Bowl ad.


When the Chicago Bears last went to the Super Bowl, the city’s art institute decorated their lion statues. Photo by Señor Codo, freely licensed under CC BY-SA 2.0.

Barack Obama watched part of Super Bowl 43 (2009) with 3D glasses. Photo by Pete Souza, public domain.

A traditional flyover from military aircraft prior to the beginning of the game. Photo from the US Air Force, public domain. 

SB TV viewers by year
Television viewing statistics for each Super Bowl—all sourced from Wikipedia. The bars represent an average of the number of people watching, not the highest total reached during the event. Graph by Andrew Sherman, freely licensed under CC BY-SA 3.0.

Jeff Elder, Digital Communications Manager
Michael Guss, Research Analyst
Wikimedia Foundation

by Jeff Elder and Michael Guss at February 04, 2016 04:00 PM

Weekly OSM

weekly 289

1/26/2016-2/1/2016 Chefchaouen, Marocco – view in OpenTopoMap [1] | the world in OpenTopoMap Mapping A blog by Mapillary explains how crowd-sourced photography of cities can allow local 3D mapping which can be useful for local authorities and more up-to-date than sources like Google Streetview. Matthijs Melissen created a proposal to deprecate tourism=gallery, which is still […]

by weeklyteam at February 04, 2016 10:02 AM

Gerard Meijssen

#Wikipedia - Dorothy E. Smith - #links and #sources

Mrs Smith is recognised as an important scholar. One of her books is considered a classic, she developed two theories, received several awards and there is a Wikipedia article about her in three languages.

Mrs Smith received the Jessie Bernard Award in 1993 and it is why I learned about her. The Wikipedia article mentions two brothers, one is linked the other is not. She was born in Northallerton, There is a link to John Porter but the only relation to Mr Porter is that Mrs Smith won the John Porter Award. The award was given for a book by a red-linked organisation an organisation that bestowed their 'outstanding contribution award' to her as well in 1990.

This outstanding contribution award was given in 2015 to a Monica Boyd. She can be found on the Internet as well, it is easy enough to expand the amount of information around a relevant person like Mrs Smith. Almost every line in her article contains facts that could be mapped in Wikidata. With some effort sources can be added. The only problem is that adding sources for everything is painful; it is just too much work. This is a reality for Wikipedia as well, When Wikipedia and Wikidata align, when its links / statements match, it must be obvious that the likelihood of facts being correct is high, if only because multiple people had a look at most of the facts.

by Gerard Meijssen (noreply@blogger.com) at February 04, 2016 08:08 AM

February 03, 2016

Wikimedia Foundation

What’s TPP? The problematic partnership

Photo by Christopher Dombres, freely licensed under CC BY-SA 2.0.

Tomorrow, government representatives from twelve countries of the Pacific Rim will meet in New Zealand to sign a 6,000 page long treaty called the Trans-Pacific Partnership (TPP). Among other things, the agreement will govern how the signatory countries protect and enforce intellectual property rights.

On Wikipedia, millions of articles are illustrated with public domain images, meaning images that are not restricted by copyright. At the Wikimedia Foundation, we believe that shorter copyright terms make it possible for more people to create and share free knowledge. We’ve previously shared some of our concerns about TPP and co-signed letters asking negotiators not to extend copyright terms and to refrain from forcing intermediaries to police their sites and block content.

Since the final text was released, various digital rights groups have condemned both the secrecy of the negotiations and the substance of the treaty. We’d like to talk about what effect TPP may have on Wikipedia, the other Wikimedia projects, and our mission to share free knowledge.

Wikipedia and its power for the creation and sharing of free knowledge are directly driven by a strong and healthy public domain. Unfortunately, TPP would extend copyright terms at a minimum of the author’s life plus 70 years, eating into the public domain. This cements a lengthy copyright term in countries where it already exists like Australia, the US, and Chile. But it’s especially worrisome for the public domain in countries like Japan, New Zealand, and Canada that now have shorter copyright terms because it means that a great number of works will not be free to use, remix, and share for another 20 years. In some countries, the lengthy copyright term is mitigated by strong and broad exceptions from copyright. But TPP makes this sort of balance optional. It only contains a non-binding exception for education, criticism, news reporting, and accessibility, like fair use in the US, that countries can choose not to enact in their national laws.

TPP tips the balance in favor of rigid copyright, at the detriment of the public domain we all share.

TPP isn’t all bad. It states that countries should not require the hosts of sites like Wikipedia to monitor their content for copyright infringement and provides for safe harbors from intermediary liability. Sites can rely on a notice and takedown system, where they remove infringing material once they get alerted by copyright holders. Yet, TPP doesn’t get this balance right either. It lacks a process for counter notices, so that users can push back when a site receives an invalid request to remove content. It also allows rightsholders to demand identifying information about users when they allege there is copyright infringements. The vague standards in TPP leave this notice and takedown process open for abuse that can chill speech.

TPP is a problematic treaty because it harms the public domain and our ability to create and share free knowledge. It is time for countries to partner for the policies and projects that benefit everyone, like the public domain, clear copyright exceptions and intermediaries empowered to stay out of content creation with good safe harbor protections.

Yana Welinder, Legal Director
Stephen LaPorte, Legal Counsel
Jan Gerlach, Public Policy Manager
Wikimedia Foundation

by Yana Welinder, Stephen LaPorte and Jan Gerlach at February 03, 2016 09:16 PM

Jeroen De Dauw

Missing in PHP7: Value objects

This is the second post in my Missing in PHP7 series. The previous one is about named parameters.

A Value Object does not have an identity, which means that if you have two of them with the same data, they are considered equal (take two latitude, longitude pairs for instance). Generally they are immutable and do not have methods beyond simple getters.

Such objects are a key building block in Domain Driven Design, and one of the common types of objects even in well designed codebases that do not follow Domain Driven Design. My current project at Wikimedia Deutschland by and large follows the “Clean Architecture” architecture, which means that each “use case” or “interactor” comes with two value objects: a request model and a response model. Those are certainly not the only Value Objects, and by the time this relatively small application is done, we’re likely to have over 50 of them. This makes it real unfortunate that PHP makes it such a pain to create Value Objects, even though it certainly does not prevent you from doing so.

Let’s look at an example of such a Value Object:

class ContactRequest {

	private $firstName;
	private $lastName;
	private $emailAddress;
	private $subject;
	private $messageBody;

	public function __construct( string $firstName, string $lastName, string $emailAddress, string $subject, string $messageBody ) {
		$this->firstName = $firstName;
		$this->lastName = $lastName;
		$this->emailAddress = $emailAddress;
		$this->subject = $subject;
		$this->messageBody = $messageBody;

	public function getFirstName(): string {
		return $this->firstName;

	public function getLastName(): string {
		return $this->lastName;

	public function getEmailAddress(): string {
		return $this->emailAddress;

	public function getSubject(): string {
		return $this->subject;

	public function getMessageBody(): string {
		return $this->messageBody;


As you can see, this is a very simple class. So what the smag am I complaining about? Three different things actually.

1. Initialization sucks

If you’ve read my previous post in the series, you probably saw this one coming. Indeed, I mentioned Value Objects at the end of that post. Why does it suck?

new ContactRequest( 'Nyan', 'Cat', 'maxwells-demon@entopywins.wtf', 'Kittens', 'Kittens are awesome' );

The lack of named parameters forces one to use a positional list of non-named arguments, which is bad for readability and is error prone. Of course one can create a PersonName Value Object with first- and last name fields, and some kind of partial email message Value Object. This only partially mitigates the problem though.

There are some ways around this, though none of them are nice. An obvious fix with an equally obvious downside is to have a Builder using a Fluent Interface for each Value Object. To me the added clutter sufficiently complicates the program to undo the benefits gained from removing the positional unnamed argument lists.

Another approach to avoid the positional list is to not use the constructor at all, and instead rely on setters. This does unfortunately introduce two new problems. Firstly, the Value Object becomes mutable during its entire lifetime. While it might be clear to some people those setters should not be used, their presence suggests that there is nothing wrong with changing the object. Having to rely on such special understanding or on people reading documentation is certainly not good. Secondly, it becomes possible to construct an incomplete object, one that misses required fields, and pass it to the rest of the system. When there is no automated checking going on, people will end up doing this by mistake, and the errors might be very non-local, and thus hard to trace the source of.

Some time ago I tried out one approach to tackle both these problems introduced by using setters. I created a wonderfully named Trait to be used by Value Objects which use setters in the format of a Fluent Interface.

class ContactRequest {
	use ValueObjectsInPhpSuckBalls;

	private $firstName;
	// ...

	public function withFirstName( string $firstName ): self {
		$this->firstName = $firstName;
		return $this;
	// ...


The trait provides a static newInstance method, enabling construction of the using Value Object as follows:

$contactRequest =
        ->withFirstName( 'Nyan' )
        ->withLastName( 'Cat' )
        // ...
        ->withMessageBody( 'Pink fluffy unicorns dancing on rainbows' );

The trait also provides some utility functions to check if the object was fully initialized, which by default will assume that a field with a null value was not initialized.

More recently I tried out another approach, also using a trait to be used by Value Objects: FreezableValueObject. One thing I wanted to change here compared to the previous approach is that the users of the initialized Value Object should not have to do anything different from or additional to what they would do for a more standard Value Object initialized via constructor call. Freezing is a very simple concept. An object starts out as being mutable, and then when freeze is called, modification stops being possible. This is achieved via a freeze method that when called sets a flag that is checked every time a setter is called. If a setter is called when the flag is set, an exception is thrown.

$contactRequest = ( new ContactRequest() )
        ->setFirstName( 'Nyan' )
        ->setLastName( 'Cat' )
        // ...
        ->withMessageBody( 'Pink fluffy unicorns dancing on rainbows' )

$contactRequest->setFirstName( 'Nyan' ); // boom

To also verify initialization is complete in the code that constructs the object, the trait provides a assertNoNullFields method which can be called together with freeze. (The name assertFieldsInitialized would actually be better, as the former leaks implementation details and ends up being incorrect if a class overrides it.)

A downside this second trait approach has over the first is that each Value Object needs to call the method that checks the freeze flag in every setter. This is something that is easy to forget, and thus another potential source of bugs. I have yet to investigate if the need for this can be removed via some reflection magic.

It’s quite debatable if any of these approaches pay for themselves, and it’s clear none of them are even close to being nice.

2. Duplication and clutter

For each part of a value object you need a constructor parameter (or setter), a field and a getter. This is a lot of boilerplate, and the not needed flexibility the class language construct provides, creates ample room for inconsistency. I’ve come across plenty of bugs in Value Objected caused by assignments to the wrong field in the constructor or returning of the wrong field in getters.

“Duplication is the primary enemy of a well-designed system.”

― Robert C. Martin

(I actually disagree with the (wording of the) above quote and would replace “duplication” by “Complexity of interpretation and modification”.)

3. Concept missing from language

It’s important to convey intent with your code. Unclear intent causes time being wasted in programmers trying to understand the intent, and time being wasted in bugs caused by the intent not being understood. When Value Objects are classes, and many other things are classes, it might not be clear if a given class is intended to be a Value Object or not. This is especially a problem when there are more junior people in a project. Having a dedicated Value Object construct in the language itself would make intent unambiguous. It also forces conscious and explicit action to change the Value Object into something else, eliminating one avenue of code rot.

“Clean code never obscures the designers’ intent but rather is full of crisp abstractions and straightforward lines of control.”

― Grady Booch, author of Object-Oriented Analysis

I can haz

ValueObject ContactRequest {
    string $firstName;
    string $lastName;
    string $emailAddress;

// Construction of a new instance:
$contactRequest = new ContactRequest( firstName='Nyan', lastName='cat', emailAddress='something' );

// Access of the "fields":
$firstName = $contactRequest->firstName;

// Syntax error:
$contactRequest->firstName = 'hax';

See also


by Jeroen at February 03, 2016 07:42 PM

Missing in PHP7

I’ve decided to start a series of short blog posts on how PHP gets in the way of creating of well designed applications, with a focus on missing features.PHP7

The language flamewar

PHP is one of those languages that people love to hate. Its standard library is widely inconsistent, and its gradual typing approach leaves fundamentalists in both the strongly typed and dynamically typed camps unsatisfied. The standard library of a language is important, and, amongst other things, it puts an upper bound to how nice an application written in a certain language can be. This upper bound is however not something you run into in practice. Most code out there suffers from all kinds of pathologies that have quite little to do with the language used, and are much more detrimental to understanding or changing the code than its language. I will take a well designed application in a language that is not that great (such as PHP) over a ball of mud in [insert the language you think is holy here] any day.

“That’s the thing about people who think they hate computers. What they really hate is lousy programmers.”

― Larry Niven

Well designed applications

By well designed application, I do not mean an application that uses at least 10 design patterns from the GoF book and complies with a bunch of design principles. It might well do that, however what I’m getting at is code that is easy to understand, maintain, modify, extend and verify the correctness of. In other words, code that provides high value to the customer.

“The purpose of software is to help people.”

― Max Kanat-Alexander

Missing features

These will be outlined in upcoming blog posts which I will link here.

by Jeroen at February 03, 2016 07:18 PM

Wiki Education Foundation

Wiki Ed to visit California State University, East Bay

Samantha Erickson
Samantha Erickson

On Friday, February 19, Educational Partnerships Manager Jami Mathewson and I will be presenting a “Teach with Wikipedia” workshop at California State University, East Bay.

Research and reference librarian, Tom Bickley, will also be joining us. His subject specialties include music, philosophy, math and computer science. As an instructor in our program, Tom can speak from experience about running a Wikipedia assignment.

The workshop will take place in room LI3079, Upper Mall level of the CSU East Bay Library, from 10 a.m. to noon (map).

To RSVP, please sign up here. Attendees from all CSU, UC and other California institutions are welcome to attend. Parking is available on campus by purchasing a permit at various dispensers.

If you have any questions, e-mail me: samantha@wikiedu.org. See you there!

Photo: “Csueb view” by Jennifer Williams – originally posted to Flickr as csueb view. Licensed under CC BY-SA 2.0 via Wikimedia Commons.


by Samantha Erickson at February 03, 2016 06:30 PM

Teaching (and diversifying) classical music through Wikipedia

Kim Davenport, Lecturer, Interdisciplinary Arts & Sciences at the University of Washington, Tacoma, works with Wikipedia in her “Intro to Humanities” course for first-year students there. She shares her thoughts on student contributions to coverage of classical music on Wikipedia.

My course introduces the world of classical music. Through several projects, students explore the role of music in their lives and community.

With the luxury of small class sizes (capped at 25), I’ve been able to incorporate active learning, innovative projects, and collaborative work. I did have one tired old research assignment, though, which I was eager to revamp.

During a week-long diversity workshop on our campus, ‘Strengthening Education Excellence through Diversity’, a colleague shared her success incorporating a Wikipedia project into her Sociology course. I found the inspiration I needed.

My old assignment posed research questions for the sole purpose of building research skills. “Look up these facts”; “explore this database”; “compare these two encyclopedias”; etc.

There was nothing exciting, current, or personally relevant to my students. It showed. They lacked enthusiasm for the assignment, and I lacked excitement toward grading it!

I hoped that adding a Wikipedia-based assignment would achieve several goals. I wanted to place students, as editors, within a research tool they used on a daily basis. I wanted them to explore issues of bias within both classical music and Wikipedia. I also hoped to give students some room to choose which Wikipedia article they would create or expand.

I asked students to choose a classical music composer from Wikipedia’s lists of requested or stub articles. They would conduct research, write, and publish their new or expanded article. I was working with first-year students in a 100-level course, and my campus is on the 10-week quarter system, so I assigned it as a group project.

I introduced the project early in the quarter, while immersed in a ‘crash course’ in classical music history. It is impossible to study this history without noticing that most of the composers we studied were white and male.

My students do learn that classical music’s world has diversified through the centuries. While I work hard to share diverse examples in class, I wanted to do more to confront the issue.

Sharing Wikipedia’s own awareness of its systemic bias with my students helped me frame the issue. I could make it relevant both for my course content, and for students’ understanding. I made clear that my students could choose any name they wished from the lists of composers. However, I asked them to consider the impact they would have writing an article about, for example, a current American composer of color, or a female composer from the Renaissance. After this introduction to the project, the excitement among the students was tangible.

Deadlines throughout the quarter helped to keep them on task. I reserved a computer lab for two class periods: one mid-quarter and one late-quarter. The students used that time to work together under my (light) supervision.

I was pleasantly surprised with the group element of the project. I sometimes shy away from assigning group work, because of its well-documented pitfalls. In this case, I think it was the right decision.

By working together, students were excited and engaged in the topic over many weeks. It offered comfort to students who had worried that editing Wikipedia was beyond their comfort zone. Relying on another group member, or splitting up the group’s work to each student’s strengths, took some worry out of the process.

When the end of the quarter arrived, I had the opportunity to review the students’ final work. I was pleased with the outcome on several levels.

My diverse group of students had selected a diverse range of composers to research. Men and women, from various periods in history, representing Europe and the Americas but also other parts of the globe.

The vast majority of the articles withstood review by other Wikipedia editors with only minor edits. A few were edited substantially or, in one case, deleted. Even then, the students understood the weaknesses in their work. They took the learning experience in stride.

The experience of incorporating Wikipedia into my classroom has been an extremely positive one. I’m excited to repeat it.

I must also give a shout out to the tremendous staff with the Wiki Education Foundation. Going into this experience, I had never edited so much as one Wikipedia article. I was a bit concerned as to whether I had the expertise to guide my students through the experience. Given the online and paper training materials for instructors and students, and the staff’s help from start to finish, I felt confident — and it was a success!

Photo: Bartholomeo Bettera – Still Life with Musical Instruments and Books – Google Art Project” by Bartholomeo Bettera (1639 – 1699) (Italian) (Details of artist on Google Art Project)UwHQialctX43Hw at Google Cultural Institute, zoom level maximum. Licensed under Public Domain via Wikimedia Commons.


by Guest Contributor at February 03, 2016 05:00 PM

Joseph Reagle

Why learning styles are hard to give up

Some of my students refuse to believe the theory of learning styles is discredited. Referring them to Wikipedia or literature reviews isn't sufficient because they strongly identify as visual or tactile learners. It's a deeply felt intuition---that I share.

I think the intuition is misleading because we confuse style with ability thresholds. Einstein, a brilliant autodidact, can learn a difficult concept (like gyroscopic precession) from a dry and boring text. I can learn the same concept only by way of a visual demonstration, such Derek Muller's.

I might mistakenly conclude "I'm a visual learner," but Einstein can also learn from the demonstration. Everyone benefits from a great demonstration. People do have different abilities, and we'll encounter different thresholds at which we then want a better learning method. But this is different from what learning style theory predicts, that (1) you can identify people who learn better through one style/modality and (2) they actually do better in a curriculum tailored for that style and people with different purported styles do not. There is little evidence of this.

by Joseph Reagle at February 03, 2016 05:00 AM

February 01, 2016

Wikimedia Foundation

What I Learned: Improving the Armenian Wiktionary with the help of students

Group photo of the participants in Winter WikiCamp 2016, the latest edition of this program. Photo by Beko, freely licensed under CC-BY-SA 4.0.

WikiCamp is an education program that aims to get young students editing Armenian-language content on Wikimedia sites. National identity, recreation and knowledge are at the core of this program, as participant Shahen Araboghlian states: “WikiCamp is a new experience, a chance to meet other people, self-develop and enrich Western Armenian language in the Internet”.

This exciting initiative, which aims to get young Armenian Education Program participants aged 14 to 20 to edit the Armenian Wikipedia, has been around since July 2014—since which there have been five WikiCamps. The return from the camps is surprisingly high, and they are well known in Armenia—having been covered by Aravot and News.am in January 2015, along with several others.

The camp has also increased in popularity among students, increasing the number of participants from 59 in the first camp to 76 in the second edition, and growing. Participants are not only happy to join WikiCamp and make new friends, but also, to contribute to expand open knowledge in their own language. “Honestly, I never edit Wikipedia to have high points and to come to Wikicamp. I edit to make articles available in Armenian, for people who share the same interests as me. This is what motivates me when I edit”, says participant Arthur Mirzoyan.

Since Winter WikiCamp 2015, however, we have made this activity available to even younger editors in this education project with a new, much easier wiki activity: Wiktionary editing.

Participants editing in one of the camp workshops. Photo by David Saroyan, freely licensed under CC-BY-SA 4.0.

Shared lesson: Look beyond Wikipedia to create a more accessible Education Program

Before Winter WikiCamp 2015, Armenian Wikipedia Education program focused only on Wikipedia. As the Wikipedia Education program rapidly spread in different regions of Armenia and became popular, many secondary school students expressed their wish to join the program—but not all of these students could easily learn Wikipedia editing techniques or were able to write and improve articles that met the project’s guidelines, like writing in encyclopedic style or having reliable sources.

As their wish to edit Wikipedia and its sister projects was enormous, we decided to involve these students in editing Wiktionary. Editing Wiktionary doesn’t have the same requirements as writing an encyclopedic article; the editors just fill the necessary fields using the provided dictionaries. Helpful guides, word lists, dictionary links made the editing process much easier for secondary school students. As the Armenian Wiktionary did not have active users, we created Wiktionary tutorials for them. We also found different free dictionaries and digitized them, created word lists and provided these to the students, all of which gave us a foundation to begin enriching Armenian Wiktionary with words.

The process was simple: we asked students to write the word’s definition, examples of usage, etymology, expressions, synonyms, derived forms and translations. Before we started with the Education program, there was a major gap in Armenian-language content and no complete free dictionary. This gap inspired us and was the main reason why we focused solely on Armenian content. At that time, we had only 3,000 Armenian words which needed to be improved, and during the Winter WikiCamp 2015 a portion of those words were enriched. The improvement process continued successfully after the camp ended, transforming our education initiative into a well-known Wiktionary-based program for secondary school students.

Where do we go from here?

As a result of Wikimedia Armenia’s Education Program and its participants’ efforts, this year the Armenian Wiktionary reached 100,000 entries.

After this success, we decided to include Wiktionary editing in the next WikiCamp: in Summer 2015, Wiktionary editors were actively involved in the camp and around 14,000 entries were created with joint efforts. Our main breakthrough was realizing that other Wikimedia projects could be as effective education tools as Wikipedia, so we attempted to implement other iterations in WikiCamp. Wikisource and Wikimedia Commons are still in development stage. You can find this and other outcomes in WikiCamp project page on Armenian Wikipedia.

David Saroyan, Program Manager, Wikimedia Armenia
Lilit Tarkhanyan, Wikipedia Education Program leader, Wikimedia Armenia
María Cruz, Communications and Outreach, Learning and Evaluation team, Wikimedia Foundation
Samir Elsharbaty, Fellow, Wikipedia Education Program, Wikimedia Foundation

«What I Learned» is a blog series that seeks to capture and share lessons learned in different Wikimedia communities over the world. This knowledge stems from the practice of Wikimedia programs, a series of programmatic activities that have a shared, global component, and a singular, local aspect. Every month, we will share a new story for shared learning from a different community. If you want to feature a lesson you learned, reach out!

by David Saroyan, Lilit Tarkhanyan, María Cruz and Samir El-Sharbaty at February 01, 2016 10:10 PM

Wiki Education Foundation

Announcing: Genes and Proteins

We’re pleased to announce a new handbook for students writing articles about genes and proteins for classroom assignments.

When we started creating our Ecology brochure, we wanted to create handbooks for students interested in sharing information about life on Earth. When we set out to create a biology handbook for students, we found there was just so much to cover, it wouldn’t fit in just one. With our new handbook on Genes and Proteins, we’re zooming in on the world of genetics. That complements our existing guides, Writing Wikipedia articles on Species, which covers writing about plants, animals, and fungi.

Genes and Proteins offers advice specific to articles on those topics: suggestions for reliable journals (and how to identify poor quality journals), how to structure an article to keep in line with other genes or protein articles, and even how to create a relevant infobox for individual proteins, protein families, enzymes, GNF proteins, nonhuman proteins, and RNA families.

When it comes to open science, Wikipedia is particularly important in the context of educational resources. By keeping reliable information on genes and proteins available in an easy-to-find place, students have a place to turn to for quick clarifications of their understanding of science. Finding an overview of a topic with a pre-vetted list of sources can make the difference in taking future research one step further. With this information on Wikipedia, more people have a starting point to explore the world of science, or begin to explore a popular science topic more deeply. They can help fill in the pieces for lay scientists to get back up to speed quickly on the topics that matter to them.

This guide was written in collaboration with our Wikipedia Content Expert in the Sciences, Ian Ramjohn, Ph.D. Ian’s dual Ph.D. (in Plant Biology as well as Ecology, Evolutionary Biology, and Behavior), provided a deep background in relevant fields.

We’re also grateful to other experienced Wikipedians who helped us. Volunteers at WikiProjects Science and Molecular and Cell Biology created the backbone of this handbook over years, establishing and documenting the best practices and techniques for writing these types of articles. Users Andrew Su, BatteryIncluded, Boghog, DrBogdan, Gtsulab, KatBartlow, and Opabinia regalisshared additional ideas and resources.

These handbooks are just one part of Wiki Ed’s Year of Science campaign. They’re available as free .pdfs for everyone on Wikimedia Commons, and in print for any science instructors teaching through our program. If you’d like to get involved, reach out to us through the Year of Science page or e-mail us: contact@wikiedu.org.


by Eryk Salvaggio at February 01, 2016 05:00 PM

Wikimedia UK

Wikipedia and Open Access: making research as useful as it can be

This post was written by Martin Poulter, Wikimedian in Residence at Bodleian Libraries, and was first published on Open Access Oxford.

The Budapest Open Access Declaration is one of the defining documents of the Open Access movement. It says that free, unrestricted access to peer-reviewed scholarly literature will “accelerate research, enrich education, share the learning of the rich with the poor and the poor with the rich, make this literature as useful as it can be, and lay the foundation for uniting humanity in a common intellectual conversation and quest for knowledge.”

To bring about this optimistic vision, there needs to be some way to deliver this knowledge directly to everyone on the planet. Rather than broadcasting to passive recipients, this needs to encourage repurposing and remixing of research outputs, so people can adapt them into educational materials, translate them into other languages, extract data from them, and find other uses.

Fifteen years after its creation in January 2001, Wikipedia is emerging as that creative space. Wikipedia is not a competitor to normal scholarly publication, but a bridge connecting it to the world of informal learning and discussion. Wikipedia is only meant to be a starting point: its credibility does not come from its contributors, who are often anonymous, but from checkable citations to reputable sources.

Being “the free encyclopedia” reflects not just that Wikipedia is available without charge, but that it is free for use by anyone for any purpose, subject to the requirements of the most liberal Creative Commons licences. These freedoms are a part of its success: on the article about your favourite topic, click “View history”, then “Page view statistics”: it is not uncommon to see a scientific article getting thousands of hits per day.

When a team in 2015 announced the discovery of a new hominid, Homo Naledi, the extensive diagrams, fossil photos and other supplements they produced exceeded the size limit set by their first choice of journal, Nature. So they went to the open-access journal eLIFE. As well as publishing the peer reviews along with the paper, eLIFE uses a very liberal licence, so figures from the paper made it possible to create a comprehensive Wikipedia article for Homo Naledi, and to improve related articles.

There are many more cases where a research paper is adapted into a Wikipedia article which acts as a lay summary. For example, the article on Major Urinary Proteins was written by scientists at the Wellcome Trust Sanger Institute based on, and using figures from, papers they had published in PLOS open-access journals.

Editing Wikipedia used to involve learning a form of markup called “wiki code”. Thanks to some software development, this is no longer necessary. When you register an account, each article presents two tabs “Edit” and “Edit source”. “Edit source” gives you the old wiki code interface; but “Edit” gives a much more straightforward wordprocessor-like interface. Especially handy is the “Cite” button, which can convert a DOI (Digital Object Identifier) into a full citation.

Still much about Wikipedia is poorly-designed and dependent on insider knowledge. Luckily there are insiders who are keen to share, and training is available. The Royal Society of Chemistry, Cancer Research UK and the Royal Society are amongst the scientific bodies which have employed Wikipedians In Residence. As WIR at the Bodleian Libraries, I have run events to improve articles on Women In Science and am celebrating Wikipedia’s 15th birthday working with researchers and students from the Oxford Internet Institute to improve articles about the “social internet”.

Wikimedia encompasses more than just Wikipedia: it is an ecosystem of different projects handling and repurposing text, data and digital media. There are many sites that you can use without charge to share or build materials, but Wikimedia is distinctive in being a charitable project existing purely to share knowledge, with no adverts or other commercial influences.

Wikimedia Commons is the media archive, hosting photographs, diagrams, video clips and other digital media, along with author and source credits and other metadata. It currently offers just under 30 million files, of which tens of thousands are extracted from figures or supplementary materials from Open Access papers. It’s a massively multilingual site, where each file can have descriptions in many languages, and one of the repurposing activities going on is creating alternative language versions of labelled diagrams.

Wikidata describes itself as “a free linked database that can be read and edited by both humans and machines”. It holds secondary data: not raw measurements, but key facts and figures concluded from them. Looking up Platinum, for example, gives the element’s periodic table placement, various official identifiers and physical properties. Wikidata holds knowledge about fifteen million entities, including species, molecules, astronomical bodies and diseases although the number is still rapidly growing.

What’s exciting about Wikidata is the uses it can be put to. Making data about many millions of things freely available enables a new generation of applications for education and reference. Reasonator gives a visually pleasing overview of what Wikidata knows about an entity. Histropedia (histropedia.com) is a tool for building timelines (try “Discovers of chemical elements”, then zoom in).

There are eleven Wikimedia projects in total each with its own strengths and flaws. My personal favourites include Wikisource – a library of open access and out-of-copyright text, including for example Michael Faraday’s Royal Institution lectures – and Wikibooks which aims to create textbooks for every level and topic from ABCs to genome sequencing.

As open access becomes more mainstream, technical and legal barriers around research outputs will diminish, so more research will become as “useful as it can be” through the Wikimedia projects. That benefits the research in terms of impact and public awareness, but it also benefits the end users who, in a connected world, are everybody.

by Martin Poulter at February 01, 2016 11:59 AM


Language usage on Wikidata

the Wikidata LogoWikidata is a multilingual project, but due to the size of the project it is hard to get a view on the usage of languages.

For some time now the Wikidata dashboards have existed on the Wikimedia grafana install. These dashboards contain data about the language content of the data model by looking at terms (labels, descriptions and aliases) as well as data about the language distribution of the active community.

For reference the dashboard used are:

All data below was retrieved on 1 February 2016

Active user language coverage

Active users here is defined as users that have made 1 edit or more in the last 30 days.

A single user can have multiple languages (in the case that they use a babel box). If the user does not have a babel box then the user interface language is used.

18190 users are represented below with 317 languages shown as covered 27660 times.

The primary active user language is shown as English, this is likely due to the fact that the default user interface language is English and only 2905 users have babel boxes.

On average a user that has a babel box has 3.3 languages defined in it.

Term language coverage

Across all Wikidata entities 410 languages are used (including variants).

This leaves a gap of roughly 93 languages between those used in terms and those viewed by active editors currently.

The distributions per term type can be seem below.

Of course all of the numbers above are constantly changing and the dashboards should be referred to for up to date data.

by addshore at February 01, 2016 08:32 AM

Gerard Meijssen

#Wikipedia - The Jessie Bernard Award

At the #Wikimedia Foundation a lot of words are used on the subject of #diversity. So much so that diversity is short for gender diversity. My point is not that having proper attention for women and women causes is not a good thing, it is the exclusion of everything else.

On the subject of gender diversity, an award was named in recognition of Mrs Bernard: the Jessie Bernard award. "It is presented for significant cumulative work done throughout a professional career, and is open to women or men and is not restricted to sociologists."

The Wikipedia article on the award has a substantial number of red links of the unsung heroes of gender diversity. Obviously everyone is invited to fill this in. The quote indicates that being a housewife is a special case of being crazy. If so, what does "mentally ill" mean and how well does Wikipedia cover that subject?

by Gerard Meijssen (noreply@blogger.com) at February 01, 2016 07:19 AM

January 31, 2016


The break in Wikidata edits on 28 Jan 2016

On the 28th of January 2016 all Wikimedia MediaWiki APIs had 2 short outages. The outage is documented on Wikitech here.

The outage didn’t have much of an impact on most projects hosted by Wikimedia. However due to most Wikidata editing happening through the API, even when using the UI, the project basically stopped for roughly 30 minutes.

Interestingly there is an unusual increase in the edit rate 30 minutes after recovery.

I wonder if this is everything that would have happened in the gap?

by addshore at January 31, 2016 07:39 PM


DMCA’s – Copyrights best friend

In August 2013, the Wikimedia Foundation recived a DMCA takedown notice of some content on the “Sport in Australia“-article on the English Wikipedia. That’s because the sending party claimed that they own the copyrights for the following information table, which is included below.

Screenshot 2016-01-29 at 19.54.34 - Edited.png
Used on this webpage under two provisions. One being that it is not copyrightable, more than the wikitable perhaps, which is attributed to the Wikimedia contributors under CC-By-SA 3.0. The other being that it is fair use anyways, since it is a commentary on the content. So please, do not send me a DMCA takedown notice as well.

It consist only on material and data points which are in themself not copyrightable. Despite this, the Wikimedia Foundation was forced to comply with the takedown notice, in order not to lose it’s status under the Safe harbor.

The list itself might not be that encyclopedia, and should perhaps not be in the article, but if I want to use that table in another article, or restore an old version of this article, this takedown notice must be refuted first. That is why I sent in, 2.5 years later, a formal DMCA counter notice stating that I had a good faith belief that they were in error and that this is not copyrightable.

They, the senders of the DMCA takedown notice, now have 14 days from reciving my forwarded counter notice to respond with legal actions to stop this content from being displayed. If they do not, this is a huge victory for our community.

by Jonatan Svensson Glad (Josve05a; @JonatanGlad) at January 31, 2016 01:31 PM


Building applications around Wikidata (a beer example)

Wikidata provides free and open access to entities representing real world concepts. Of course Wikidata is not meant to contain every kind of data, for example beer reviews or product reviews would probably never make it into Wikidata items. However creating an app that is powered by Wikidata & Wikibase to contain beer reviews should be rather easy.

A base data set

I’m going to take the example of beer as mentioned above. I’m sure there are thousands if not millions of beers that Wikidata is currently missing, but at the time of writing this there are 958 contained in the database. These can be found using the simple query below:

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>

   ?i wdt:P31/wdt:P279* wd:Q44 .

Any application can use data stored within Wikidata, in the case of beers this includes labels and descriptions in multiple different languages, mappings to wikipedia articles and external databases for even more information, potential images of said beer, the type of beer and much more. Remember the Wikidata dataset is ever evolving and the IDs are persistent.

Application specific data

Lets say that you want to review the beers! You could set up another Wikibase installation and SPARQL endpoint to store and query review and rating information. Wikibase provides an amazingly flexible structure meaning this is easily possible. Reviews and ratings could be stored as a new entity type, linking to an item on Wikibase or an item could be created mapping to a Wikidata item containing statements of review or rating data. Right now documentation is probably lacking but this is all possible.

Of course I am shouting about Wikibase first as Wikidata is powered by it and thus integration should be easier, however there is no reason that you couldn’t use any other database mapping your application specific information to Wikibase item Ids. MusicBrainz is already doing something like this and I am sure there are other applications out there too!

Sharing of knowledge

Knowledge is power, Wikipedia has proven that free and open knowledge is an amazing resource in an  unstructured text form. Wikidata is a step up providing structured data. Imagine a world in which applications share basic world information building a dataset for a common worldwide goal. In the example above, add an image of a beer in one application, have it instantly available in another application, translate a description for one user and have it benefit millions.

Lets see what we can do in the next 10 years!

by addshore at January 31, 2016 12:13 AM

January 30, 2016


MediaWiki CRAP – The worst of it

I don’t mean Mediawiki is crap! The Change Risk Anti-Patterns (CRAP) Index is calculated based on the cyclomatic complexity and code coverage of a unit of code. Complex code and untested code will have a higher CRAP index compared with simple well tested code. Over the last 2 years I have been tracking the CRAP index of some of Mediawikis more complex classes as reported by the automatic coverage reports, and this is a simple summary of what has been happening.

Just over 2 years ago I went through all of the Mediawiki unit tests and added @covers tags to improve the coverage reports for the source. This brought the line coverage to roughly 4% in toward the end of 2013. Since then the coverage has steadily been growing and is now at an amazing 9%. Now I am only counting coverage of the includes directory here, including maintenance scripts and Language definitions the 9% is actually 7%.

You can see the sharp increase in coverage at the very start of the graph below.

Over the past 2 years there has also been a push forward with librarization which has resulted in the removal of many things from the core repository and creation of many libraries now required using composer. Such libraries include:

  • mediawiki/at-ease – A safe alternative to PHP’s “@” error control operator
  • wikimedia/assert – Alternative to PHP’s assert()
  • wikimedia/base-convert – Improved base_convert for PHP
  • wikimedia/ip-set – PHP library to match IPs against CIDR specs
  • wikimedia/relpath – Compute a relative path between two paths
  • wikimedia/utfnormal – Unicode normalization functions
  • etc.

All of the above has helped to generally reduce the CRAP across the code base, even with some of the locations with the largest CRAP score.

The graph shows the CRAP index for the top 10 CRAP clases in Mediawiki core at any one time. The data is taken from 12 snapshots of the CRAP index across the 2 year period. At the very left of the graph you can see a sharp decrease in the CRAP index as unit test coverage was taken into account from this point (as in the coverage graph). Some classes fall out of the top 10 and are replaced by more CRAP classes through the 2 year period.

Well, coverage is generally trending up, CRAP is generally trending down. That’s good right? The overall CRAP index of the top 10 CRAP classes has actually decreased from 2.5 million to 2.2 million! Which of source means for the top 10 classes the CRAP average has decreased from 250,000 to 220,000!

Still a long way to go but it will be interesting to see what this looks like in another year.

by addshore at January 30, 2016 11:46 PM

Myanmar coordinates on Wikidata by Lockal & Widar

In a recent blog post I showed the amazing apparent effect that Wikimania’s location had on the coordinate location data in Mexico on Wikidata. A comment on the post by Finn Årup Nielsen pointed out a massive increase in data in the Myanmar (Burma). I had previously spotted this increase but chosen not to mention it in the post. But now after a quick look at some items and edit histories I have found who we have to thank!

The increase in geo coordinate information around the region can clearly be seen in the image above. As with the Mexico comparison this shows the difference between June and October 2015.

Finding the source of the new data

So I knew that the new data was focused around Q836 (Myanmar) but starting from that item wasn’t really going to help. So instead I zoomed in on a regular map and found a small subdivision of Myanmar called Q7961116 (Wakema). Unfortunately the history of this item showed its coordinate was added prior to the dates of the image above.

I decided to look at what linked to the item, and found that there used to be another item about the same place which now remains as a redirect Q13076630. This item was created by Sk!dbot but did not have any coordinate information before being merged, so still no luck for me.

Bots generally create items in bulk meaning it was highly likely the new items either side of Q13076630 would also be about the same topic. Loading Q13076629 (the previous item) revealed that it was also in Myanmar. Looking at the history of this item then revealed that coordinate information was added by Lockal using Widar!

Estimating how much was added

So with a few quick DB queries we can find out how many claims were created for items stating that they were in Myanmar as well as roughly how many coordinates were added:

SELECT count(*) AS count
FROM revision
WHERE rev_user = 53290
  AND rev_timestamp > 2015062401201
  AND rev_comment LIKE "%wbcreateclaim-create%"
  AND rev_comment LIKE "%Q836%"

SELECT count(*) AS count
FROM revision
WHERE rev_user = 53290
  AND rev_timestamp > 2015062401201
  AND rev_comment LIKE "%wbcreateclaim-create%"
  AND rev_comment LIKE "%P625%"

Roughly 16,000 new country statements and 19,000 new coordinates. All imported from Burmese Wikipedia.

Many thanks Lockal!

by addshore at January 30, 2016 11:46 PM

Submitting a patch to Mediawiki on Gerrit

I remember when I first submitted a patch to Mediawiki on Gerrit. It was a +12 -4 line patch and it probably took me at least half a day to figure everything out and get my change up! There is a tutorial on mediawiki.org but it is far too wordy and over complicated. In this post I try to explain things as basically as possible. Enjoy!


In order to be able to submit a patch to Gerrit you need to have Git installed!

If your on a linux system you can install this using your package manager, eg. “apt-get install git”. If you are on another system such as Windows you can just use a build from git-scm. Basically just get git from https://git-scm.com/downloads!

Once you have downloaded Git you need to configure it!

git config --global user.email "example@example.com"
 git config --global user.name "example"


Next you need to create an account for Gerrit. To do this navigate to gerrit.wikimedia.org, and click on the Register link in the top right. This will then take you to wikitech.wikimedia.org where you must create your account!

Once you have created an account and logged in you must add an SSH key. Go to your settings (again in the top right) and navigate to “SSH Public Keys“.

To generate a key do the following on your machine:

ssh-keygen -t rsa -C "example@example.com"

You should then be able to get your key from “~/.ssh/id_rsa.pub” (or the location you chose) and then add it to Gerrit.

Getting the code

Now that you have git and you have added your SSH key to gerrit you can use ssh to clone the code repository onto your local machine. Again you can read docs for this on the git-scm website.

git clone ssh://<USERNAME>@gerrit.wikimedia.org:29418/mediawiki/core

When logged in you can see the command at https://gerrit.wikimedia.org/r/#/admin/projects/mediawiki/core

Making a commit

Now you have the code cloned locally you can go ahead and change the files!

Once you have made your changes you should be able to review them using the follow command:

git diff

You can then add ALL changed files to a commit by doing the following:

git commit -a

A text editor should then load where you should enter a commit message, for example:

Fixing issue with unset new article flag

Some extra description can go here, but you
should try and keep your lines short!
A bug number can be linked at the bottom of
the commit as shown.

Bug: T12345

Once you have saved the text you will have make your commit! Now to try and push it as a change set to Gerrit (although it’s your first time so this will fail)!

You should get a message saying that you are missing a Change-Id in the commit message footer! This lovely message also contains the command that you need to run in order to fix the issue!

gitdir=$(git rev-parse --git-dir); scp -p -P 29418 <username>@gerrit.wikimedia.org:hooks/commit-msg ${gitdir}/hooks/

This created a hook file in your .git directory for this repo that will automatically add the Change-Id in the future! To get the Change-Id in your current commit message run:

git commit -a --amend --no-edit

And now you are ready to actually push your commit for review!

git push origin HEAD:refs/publish/master

You change should now be on Gerrit!

Your master branch is now 1 commit ahead of where master actually is, so to clean up and reset your local repo to the same state as the remote just run:

git reset --hard origin/master

You can always get back to your commit by using the hash of the commit with the “git checkout” command. Or you can copy the remote checkout command from the Gerrit UI, it looks something like the below:

git fetch https://gerrit.wikimedia.org/r/mediawiki/core refs/changes/74/257074/1 && git checkout FETCH_HEAD

Amending your change

If people comment on your commit on Gerrit you many want to change it, fixing the issues that people have pointed out.

To do this checkout your change again as described above, either using the hash locally the fetch & checkout command you can copy from the Gerrit UI.

git checkout d50ca328033702ced91947e60939e3550ca0212a
git fetch https://gerrit.wikimedia.org/r/mediawiki/core refs/changes/74/257074/1 && git checkout FETCH_HEAD

Make your changes to the files.

Amend the commit (you can add –no-edit if you do not want to edit the commit message):

git commit -a --amend

And push the patch again!

git push origin HEAD:refs/publish/master


This post covers the bare necessities for submitting a patch to Gerrit and responding to comments. There are many things it does not cover such as Git, re-basing, drafts, cherry-picks, merge resolution etc.

Also I should point out that Gerrit is going to be disappearing very soon in favour of Diffuision so there may have been little point in me writing this, but someone asked!

If you do not want to use git-review to contribute to Wikimedia or Gerrit projects then the most important thing to raw from this post is the under advertised “git push HEAD:refs/publish/master” command!

by addshore at January 30, 2016 11:45 PM

Reducing packet count to Statsd using Mediawiki

Recently I have been spending lots of time looking at the Wikimedia graphite set-up due to working on Grafana dashboards. In exchange for what some people had been doing for me I decided to take a quick look down the list of open Graphite tickets and found T116031. Sometimes it is great when such a small fix can have such a big impact!

After digging through all of the code I eventually discovered the method which sends Mediawiki metrics to Statsd is SamplingStatsdClient::send. This method is an overridden version of StatsdClient::send which is provided by liuggio/statsd-php-client. However a bug has existed in the sampling client ever since its creation!

The fix for the bug can be found on gerrit and only a +10 -4 line change (only 2 of those lines were actually code).

// Before
$data = $this->sampleData( $data );
$messages = array_map( 'strval', $data );
$data = $this->reduceCount( $data );
$this->send( $messages );

$data = $this->sampleData( $data );
$data = array_map( 'strval', $data );
$data = $this->reduceCount( $data );
$this->send( $data );

The result of deploying this fix on the Wikimedia cluster can be seen below.

Decrease in packets when deploying fixed Mediawiki Statsd client

You can see a reduction from roughly 85kpps to 25kpps at the point of deployment. This is over a 50% decrease!

Decrease in bytes in after Mediawiki Statsd client fix deployment

A decrease in bytes received can also be seen, even though the same number of metrics are being sent. This is due to the reduction in packet overhead, a drop of roughly 1MBps at deployment.

The little things really are great. Now to see if we can reduce that packet count even more!

by addshore at January 30, 2016 11:44 PM

Wikidata references from Microdata

Recently some articles appeared on the English Wikipedia Signpost about Wikidata (1, 2, 3). Reading these articles, especially the second and third, pushed me to try to make a dent in the ‘problem’ of references on Wikidata. It turns out that this is actually not that hard!

Script overview

I have written a script as part of my addwiki libraries and the ‘aww’ command line tool (still to be fully released). The main code for the this specific command in its current version can be found here.

The script can be passed either a single Item ID or some SPARQL matchers as shown below:

aww wm:wd:ref --item Q464933


aww wm:wd:ref --sparql P31:Q11424 --sparql P161:?

The script will then either act on a single item if passed or perform a SPARQL query and retrieve a list of Item IDs.

Each Item is then loaded and its type is checked (using instance of) against a list of configured values, currently Q5 (human) and Q11424 (film) which are in turn mapped to the schema.org types Person and Movie. For each type there is then a further mapping of Wikidata properties to schema.org properties, for example P19(place of birth) to ‘birthPlace’ for humans and P57(director) to ‘director’ for films. These mappings can be used to check microdata on webpages against the data contained in Wikidata.

Microdata is collected by loading all of the external links used on all of the Wikipedia articles for the loaded Item and parsing the HTML. When all of the checks succeed and the data on Wikidata matches the microdata a reference is added.

Example command line output

As you can see the total references added for the three items shown in the example above was 55, the diffs are linked below.


Further development

  • More types: As explained above the script currently only works for people and films, but both Wikidata and schema.org cover far more data than this so the script could likely be easily expanded in this areas.
  • More property maps: Currently there are still many properties on both schema.org and Wikidata for the enabled types that lack a mapping.
  • Better sourcing of microdata: The current approach of finding microdata is simply load all Wikipedia external links and hope that some of them will have some microdata. This is network intensive and currently the slowest part of the script. It is currently possible to create custom Google search engines to match a specific schema.org type, for example films and search the web for pages containing microdata. However there is not actually any ‘nice’ API for search queries like this (hint hint Google).
  • Why stop at microdata: Other standards of structured data in webpages exist, so others could also be covered?

Other thoughts

This is another step in the right direction in terms of fixing things on a large scale. This is the beauty of having machine-readable data in Wikidata and the larger web.

Being able to add references on mass has reminded me how much duplicate information the current reference system includes. For example, a single Item could have 100 statements each which can be referenced to a single web page. This reference data must then be included 100 times!


by addshore at January 30, 2016 11:43 PM

Mediawiki Developer Summit 2016

The Wikimedia Developer Summit is an event with an emphasis on the evolution of the MediaWiki architecture and the Wikimedia Engineering goals for 2016. Last year the event was called the MediaWiki Developer Summit.

As with last year the event took place in the Mission Bay Center, San Francisco, California. The event was slightly earlier this year, positioned at the beginning of January instead of the end. The event format changed slightly compared with the previous year and also included a 3rd day of general discussion and hacking in the WMF offices. Many thanks to everyone that helped to organise the event!

I have an extremely long list of things todo that spawned from discussions at the summit, but as a summary of what happened below are some of the more notable scheduled discussion moments:

T119032 & T114320 – Code-review migration to Differential

Apparently this may mostly be complete in the next 6 months? Or at least migration will be well on the way. The Differential workflow is rather different to that which we have be forced into using with Gerrit. Personally I think the change will be a good one, and I also can not wait to be rid of git-review!

T119403 – Open meeting between the MediaWiki Stakeholders Group and the Wikimedia Foundation

There was lots of discussion during this session, although lots of things were repeated that have previously been said at other events. Toward the end of the session it was again proposed that a Mediawiki Foundation of some description might be the right way to go and it looks as if this might start moving forward in the next months / year (see the notes).

Over the past years Mediawiki core development has been rather disjointed due to the WMF assigning a core team, dissolving said core team and thus responsibilities have been scattered and generally unknown. Having a single organization to concentrate on the software, covering use cases the WMF doesn’t care about could be a great step forward to Mediawiki.

T119022 – Content Format

The notes for this session can be found here and covered many RFCs such as multi-content revisions, balanced templates and general evolution of content format. Lots of super interesting things discussed here and all pushing Mediawiki in the right direction (in my opinion).

T113210 – How should Wikimedia software support non-Wikimedia deployments of its software?

Notes can be found here. Interesting points include:

  • “Does MediaWiki need a governance structure outside of Wikimedia?” which ties in with the stakeholders discussion above and a potential Mediawiki foundation.
  • “How can we make extension compatibility work between versions?”. Over the past year or so some work has gone into this and progress is slowly being made with extension registration in Mediawiki and advances in the ExtensionDistribution extension. Still a long way to go.
  • “Should Wikimedia fork MediaWiki?”. Sounds like this could get ugly :/
  • “Do we need to stick with a LAMP stack? Could we decide that some version in the not distant future will be the last “pure PHP” implementation?”. I can see lots of the user base being lost if this were to happen..

#Source-Metadata meetup

Lots of cool stuff is going to be happening with DOIs and Wikidata! (Well more than just DOIs, but DOIs to start). Watch this space!

by addshore at January 30, 2016 11:42 PM

Jeroen De Dauw

Missing in PHP7: function references

This is the first post in my Missing in PHP7 series.

Over time, PHP has improved its capabilities with regards to functions. As of PHP 5.3 you can create anonymous functions and as of 5.4 you can use the callable type hint. However referencing a function still requires using a string.

call_user_func( 'globalFunctionName' );
call_user_func( 'SomeClass::staticFunction' );
call_user_func( [ $someObject, 'someMethod' ] );

Unlike in Languages such as Python, that do provide proper function references, tools provide no support for the PHP approach whatsoever. No autocompletion or type checking in your IDE. No warnings from static code analysis tools. No “find usages” support.


A common place where I run into this limitation is when I have a method that needs to return a modified version of an input array.

public function someStuff( array $input ) {
    $output = [];

    foreach ( $input as $element ) {
        $output[] = $this->somePrivateMethod( $element );

    return $output;

In such cases array_map and similar higher order functions are much nicer than creating additional state, doing an imperative loop and a bunch of assignments.

public function someStuff( array $input ) {
    return array_map(
        [ $this, 'somePrivateMethod' ],

I consider the benefit of tool support big enough to prefer the following code over the above:

public function someStuff( array $input ) {
    return array_map(
        function( $element ) {
            return $this->somePrivateMethod( $element );

This does make the already hugely verbose array map even more verbose, and makes this one of these scenarios where I go “really PHP? really?” when I come across it.

Related: class references

A similar stringly-typed problem in PHP used to be creating mocks in PHPUnit. Which of course is not a PHP problem (in itself), though still something affecting many PHP projects.

$kittenRepo = $this->getMock( 'Awesome\Software\KittenRepo' );

This causes the same types of problems as the lack of function references. If you now rename or move KittenRepo, tools will not update these string references. If you try to find usages of the class, you’ll miss this one, unless you do string search.

Luckily PHP 5.5 introduced the class:: construct, which allows doing the following:

$kittenRepo = $this->getMock( KittenRepo::class );

Where KittenRepo got imported with an use statement.

by Jeroen at January 30, 2016 10:28 PM

January 29, 2016

Liam Wyatt (Witty Lama)

Strategy and controversy, part 2

It’s been a busy time at Wikimedia Foundation HQ since my first post in this series, summarising the several simultaneous controversies and attempting to draw a coherent connecting-line between them. The most visible change is Arnnon Geshuri agreeing to vacate his appointed seat on the WMF Board of Trustees after sustained pressure; including a community-petition, several former Board members speaking out, and mainstream media attention – as summarised in The Signpost. This departure is notwithstanding the entirely unconventional act of Silicon Valley native Guy Kawasaki in voting against the petition to the Board despite the fact that he’s on the Board and that it was effectively his first public action relating to Wikimedia since receiving that appointment – as I described on Meta.

Although this news about Geshuri was well received, I feel that this controversy became the flash point because it was easily definable, and had a binary decision associated with it – go or stay. Most problems aren’t so neatly resolvable. Hopefully then, the fact that it is mostly resolved (pending the now highly sensitive task of finding his replacement) should allow focus to be drawn back to more fundamental issues of leadership.

Earlier this month The Signpost published details from the internal WMF staff survey:

We understand that there was a healthy 93% response rate among some 240 staff. While numbers approached 90% for pride in working at the WMF and confidence in line managers, the responses to four propositions may raise eyebrows:

  • Senior leadership at Wikimedia have communicated a vision that motivates me: 7% agree
  • Senior leadership at Wikimedia keep people informed about what is happening: 7% agree
  • I have confidence in senior leadership at Wikimedia: 10% agree
  • Senior leadership effectively directs resources (funding, people and effort) towards the Foundation’s goals: 10% agree

The Signpost has been informed that among the “C-levels” (members of the executive), only one has confidence in senior leadership.

A week later the head of the HR department Boryana Dineva – the person with arguably the most difficult job at the WMF right now – gave a summary of that survey in the publicly recorded monthly metrics meeting – starting at 42 minutes in:

Notice the complete absence of mention of the part of the survey which was highlighted by the Signpost? You’re not the only one. In the following Q&A came a question from Frances Hocutt, later paraphased on-wiki by Aaron Halfaker – “Why are we not speaking clearly about the most concerning results of the engagement survey? “. Starting at 56 minutes in:

It is my supposition that the extremely low confidence in senior leadership among the staff including by the “C-Levels” is directly connected to both:

  1. a lack of clarity in the organisation’s strategic direction following a long period since the previous strategy expired and several false-starts (such as the 2-question survey), leading to sudden and unexplained departmental re-organisations, and  delays in the current process.
  2. the organisation’s recent apparent failures to abide by its own organisation Values. Notably in this case, the values of “independence”, “diversity”, and “transparency”.

Anne Clin – better known to Wikimedians as Risker – neatly tied these two threads together earlier this month in her keynote to the WMF annual all-staff meeting. In a speech entitled “Keep your eye on the Mission” she stated:

Wikimedia watchers have known for quite a while that the Foundation has decided that search and discovery should be a strategic priority. It’s not clear on what this decision has been based, although one could shoe-horn it into the mission under disseminating information effectively and globally. It wasn’t something that was fleshed out during the 2015 Strategy community consultation a year ago, and it wasn’t discussed in the Call to Action. The recent announcement about the Knight Foundation grant tells us it is for short-term funding to research and prototype improvements to how people “discover” information on Wikimedia projects. No doubt Search and Discovery, which already has a large number of talented staff affiliated with it, will show up near the top of proposed strategic priorities next week when they are announced to the community – and will be assigned a sizeable chunk of the 2016-17 budget. The results of the Knight Foundation funded research probably won’t be available early enough to use it for budgeting purposes.

This is the only picture I can find of that speech – Anne at the lectern discussing “the board” :-)

Arguably, she actually got that prediction wrong. Of 18 different approaches identified in the now-public strategic planning consultation process only one of them seems directly related to the search and discovery team’s work: “Explore ways to scale machine-generated, machine-verified and machine-assisted content“. It is also literally the last of the 18 topics listed (6 in each of reach, communities and knowledge) and is softened with the verb “explore” (rather than other items which have firmer targets to “increase”, “provide”, etc.). This quasi-hidden element of the strategy therefore invites the question – if this is such a small part of the documented strategy, why is “Discovery” receiving such disproportionate staffing, funding, attention? All of the projects listed on their portal and their three year plan are desirable and welcome, but the team is clearly staffed-up in preparation for significantly more ambitious efforts.

Anne again:

This mission statement was last revised in November 2012 – it is incorporated into the bylaws of the Wikimedia Foundation. And this revision of the mission statement occurred shortly after what many of us remember as the “narrowing focus” decision. Notice what isn’t included in the mission statement:

Not a word about the Wikimedia Foundation being a “tech and grantmaking organization”. While it is quite true that the bulk of the budget is directly linked to these two areas, the Board continues to recognize that the primary mission is dissemination of educational material, not technology or grants….

…Engineering – or as it is now called, “Product”, had three significant objectives set for it back in late 2012: develop Visual Editor, develop Mobile, and make a significant dent in the longstanding technical debt. The first two have come a long way – not without hiccups, but there’s been major progress. And there has been some work on the technical debt – HHVM being only one significant example. But the MediaWiki core is still full of crufty code, moribund and unloved extensions, and experiments that went nowhere. That’s not improving significantly; in fact, we’re seeing the technical debt start to build as new extensions are added that lose their support when someone changes teams or they leave the organization. Volunteer-managed extensions and tools suffer entropy when the volunteer developer moves on, and there’s no plan to effectively deprecate the software or to properly integrate and support it. There’s no obvious plan to maintain and improve the core infrastructure; instead the talk is all of new extensions, new PRODUCTS. From the outside, it looks like the Foundation is busy building detours instead of fixing the potholes in the highways.

It is my understanding that the original grant request to the Knight Foundation was MUCH larger than the $250,000 actually received. Jimmy Wales declared that concerns about the details of this grant are a “red herring” and that ousted Board member James Heilman’s concerns about transparency are “utter fucking bullshit” (causing James to announce he will soon be providing proof of his claims). Hopefully the grant agreement itself will be published soon, as Jimmy implied, so we can actually know what it is that has been promised.

It is worth noting that the “Call to action” mentioned above was part of the mid-2015 to mid-2016 Annual Plan, but that the risk assessment component of that plan was only published this week. Presumably this was written at the time but unintentionally left-off the final publication. Nevertheless, it includes some rather ironic statements when read in hindsight:

Risk: Failure to create a strong, consistent values­ based work culture could cause valued staff to leave.

Mitigation strategies:

  • Establish initiatives that support our commitment to diversity and creating spaces for constructive, direct and honest communications.
  • Communicate and listen effectively with staff on values and initiatives undertaken.

Significantly, the WMF’s Statement of Purpose as described in its own bylaws, states that it will perform its mission “In coordination with a network of individual volunteers and our independent movement organizations, including recognized Chapters, Thematic Organizations, User Groups, and Partners”. This corresponds to the last of the official organisation Values: “Our community is our biggest asset”. At its meeting this weekend, the Board will have to determine whether the current executive leadership can demonstrate adherence to these avowed values – particularly coordination and transparency of its vision to the community and the staff – and is fit to deliver this latest strategy process.

[The first post in this Montgomerology* series “Strategy and Controversy” was published on January 8.]

Edit: Within the hour of publishing this blogpost, and one day before the board meeting, a “background on the Knowledge Engine grant” has now been published on Lila’s talkpage.

*Montgomerology: The pseudo-science of interpretation of meaning in signals emanating from the WMF headquarters at New Montgomerology St., San Francisco. cf. Vaticanology or Kremlinology.

by wittylama at January 29, 2016 11:07 PM

Wikimedia Foundation

Wikimedia Research Newsletter, January 2016

Wikimedia Research Newsletter
Wikimedia Research Newsletter Logo.png

Vol: 6 • Issue: 01 • January 2016 [contribute] [archives] Syndicate the Wikimedia Research Newsletter feed

Bursty edits; how politics beat religion but then lost to sports; notability as a glass ceiling

With contributions by: Brian Keegan, Piotr Konieczny, and Tilman Bayer

Burstiness in Wikipedia editing

Reviewed by Brian Keegan

Wikipedia pages are edited with varying levels of consistency: stubs may only have a dozen or fewer revisions and controversial topics might have more than 10,000 revisions. However, this editing activity is not evenly spaced out over time either: some revisions occur in very quick succession while other revisions might persist for weeks or months before another change is made. Many social and technical systems exhibit “bursty” qualities of intensive activity separated by long periods of inactivity. In a pre-print submitted to arXiv, a team of physicists at the Belgian Université de Namur and Portuguese University of Coimbra examine this phenomenon of “burstiness” in editing activity on the English Wikipedia.[1]

The authors use a database dump containing the revision history until January 2010 of 4.6 million English Wikipedia pages. Filtering out pages and editors with fewer than 2000 revisions, bots, and edits from unregistered accounts, the paper adopts some previously-defined measures of burstiness and cyclicality in these editing patterns. The measures of editors’ revisions’ burstiness and memory fall outside of the limits found in prior work about human dynamics, suggesting different mechanisms are at work on Wikipedia editing than in mobile phone communication, for example.

Using a fast Fourier transform, the paper finds the 100 most active editors have signals occurring at a 24-hour frequency (and associated harmonics) indicating they follow a circadian pattern of revising daily as well as differences by day of week and hour of day. However, the 100 most-revised pages lack a similar peak in the power spectrum: there is no characteristic hourly, daily, weekly, etc. revision pattern. Despite these circadian patterns, editors’ revision histories still show bursty patterns with long-tailed inter-event times across different time windows.

The paper concludes by arguing, “before performing an action, we must overcome a “barrier”, acting as a cost, which depends, among many other things, on the time of day. However, once that “barrier” has been crossed, the time taken by that activity no longer depends on the time of day at which we decided to perform it. … It could be related to some sort of queuing process, but we prefer to see it as due to resource allocation (attention, time, energy), which exhibits a broad distribution: shorter activities are more likely to be executed next than the longer ones.”

Emerging trends based on Wikipedia traffic data and contextual networks

Reviewed by Brian Keegan

Google Trends is widely used in academic research to model the relationship between information seeking and other social and behavioral phenomenon. However, Wikipedia pageview data can provide a superior – if underused – alternative that has attracted some attention for public health and economic modeling, but not to the same extent as Google Trends. The authors cite the relative openness of Wikipedia pageview data, the semantic disambiguation, and absolute counts of activity in contrast to Google Trends’ closed API, semantic ambiguity of keywords, and relative query share data. However, Trends data (at a weekly level) does go back to 2004, while pageview data (at an hourly level) is only available from 2008.

In a peer-reviewed paper published by PLoS ONE, a team of physicists perform a variety of time series analyses to evaluate changes in attention around the “big data” topic of Hadoop.[2] Defining two key constructs of relevance and representation based on the interlanguage links as well as hyperlinks to/from other concepts, they examine changes in these features over time. In particular, changes in the articles’ content and attention occurred in concert with the release of new versions and the adoption of the technology by new firms.

The time series analyses (and terms used to refer to them) will be difficult for non-statisticians to follow, but the paper makes several promising contributions. First, it provides a number of good critiques of research relying exclusive on Google Trends data (outlined above). Second, it provides some methods for incorporating behavioral data from strongly related topics and examining these changes over time in a principled manner. Third, the paper examines behavior across multiple languages editions rather than focusing solely on the English Wikipedia. The paper points to ways in which Wikipedia is an important information sources for tracking publication and recognition of new topics.

“Hidden revolution of human priorities: An analysis of biographical data from Wikipedia”

Reviewed by Piotr Konieczny

This paper[3] data mines Wikipedia’s biographies, focusing on individuals’ longevity, profession and cause of death. The authors are not the first to observe that the majority of Wikipedia biographies are about sportspeople (half of them soccer players), followed by artists and politicians. But they do make some interesting historical observations, such as that the sport rises only in the 20th century (particularly from the 1990s), that politics surpassed religion in the 13th century, until it was surpassed by sport, and so on. The authors divide the biographies into public (politicians, businessmen, religion) and private (artists and sportspeople) and note that it was only in the last few decades that the second group started to significantly outnumber the first; they conclude that this represents a major shift in societal values, which they refer to as “hidden revolution in human priorities”. It is an interesting argument, though the paper is unfortunately completely missing the discussion of some important topics, such as the possible bias introduced by Wikipedia’s notability policies.

“Women through the glass-ceiling: gender asymmetries in Wikipedia”

Reviewed by Piotr Konieczny

This paper[4] looks into gender inequalities in Wikipedia articles, presenting a computational method for assessing gender bias in Wikipedia along several dimensions. It touches on a number of interesting questions, such as whether the same rules are used to determine whether women and men are notable; whether there is linguistic bias, and whether articles about men and women have similar structural properties (e. g., similar meta-data, and network properties in the hyperlink network).

They conclude that notability guidelines seem to be more strictly enforced for women than for men, that linguistic bias exists (ex. one of the four words most strongly associated with female biographies is “husband”, whereas such family-oriented words are much less likely to be found in biographies of male subjects), and that as the majority of biographies are about men and men tend to link more to men than to women, this lowers visibility of female biographies (for example, in search engines like Google). The authors suggest that Wikipedia community should consider lowering notability requirements for women (controversial), and adding gender-neutral language requirements to the Manual of Style (a much more sensible proposal).


Wikipedia influences medical decisionmaking in acute and critical care

Reviewed by Tilman Bayer

A survey[5] of 372 anesthesists and critical care providers in Austria and Australia found that “In order to get a fast overview about a medical problem, physicians would prefer Google (32%) over Wikipedia (19%), UpToDate (18%), or PubMed (17%). 39% would, at least sometimes, base their medical decisions on non peer-reviewed resources. Wikipedia is used often or sometimes by 77% of the interns, 74% of residents, and 65% of consultants to get a fast overview of a medical problem. Consulting Wikipedia or Google first in order to get more information about the pathophysiology, drug dosage, or diagnostic options in a rare medical condition was the choice of 66%, 10% or 34%, respectively.” (A 2012 literature review found that “Wikipedia is widely used as a reference tool” among clinicians.)

Other recent publications

A list of other recent publications that could not be covered in time for this issue – contributions are always welcome for reviewing or summarizing newly published research.

Papers about medical content on Wikipedia and its usage

  • “How do Twitter, Wikipedia, and Harrison’s Principles of Medicine describe heart attacks?”[6] From the abstract: “For heart attacks, the chapters from Harrison’s had higher Jaccard similarity to Wikipedia than Braunwald’s or Twitter. For palpitations, no pair of sources had a higher Jaccard (token) similarity than any other pair. For no source was the Jaccard (token) similarity attributable to semantic similarity. This suggests that technical and popular sources of medical information focus on different aspects of medicine, rather than one describing a simplified version of the other.”
  • “Information-seeking behaviour for epilepsy: an infodemiological study of searches for Wikipedia articles[7] From the abstract: “Fears and worries about epileptic seizures, their impact on driving and employment, and news about celebrities with epilepsy might be major determinants in searching Wikipedia for information.”
  • “Wikipedia and neurological disorders”[8] From the abstract: “We determined the highest search volume peaks to identify possible relation with online news headlines. No relation between incidence or prevalence of neurological disorders and the search volume for the related articles was found. Seven out of 10 neurological conditions showed relations in search volume peaks and news headlines. Six of these seven peaks were related to news about famous people suffering from neurological disorders, especially those from showbusiness. Identification of discrepancies between disease burden and health seeking behavior on Wikipedia is useful in the planning of public health campaigns. Celebrities who publicly announce their neurological diagnosis might effectively promote awareness programs, increase public knowledge and reduce stigma related to diagnoses of neurological disorders.”
  • “Medical student preferences for self-directed study resources in gross anatomy”[9] From the abstract: “To gain insight into preclinical versus clinical medical students’ preferences for SDS resources for learning gross anatomy, […] students were surveyed at two Australian medical schools, one undergraduate-entry and the other graduate-entry. Lecture/tutorial/practical notes were ranked first by 33% of 156 respondents (mean rank ± SD, 2.48 ± 1.38), textbooks by 26% (2.62 ± 1.35), atlases 20% (2.80 ± 1.44), videos 10% (4.34 ± 1.68), software 5% (4.78 ± 1.50), and websites 4% (4.24 ± 1.34). Among CAL resources, Wikipedia was ranked highest.”

Papers analyzing community processes and policies

  • “Transparency, control, and content generation on Wikipedia: editorial strategies and technical affordances”[10] From the abstract: “Even though the process of social production that undergirds Wikipedia is rife with conflict, power struggles, revert wars, content transactions, and coordination efforts, not to mention vandalism, the article pages on Wikipedia shun information gauges that highlight the social nature of the contributions. Rather, they are characterized by a “less is more” ideology of design, which aims to maximize readability and to encourage future contributions. … Closer investigation reveals that the deceivingly simple nature of the interface is in fact a method to attract new collaborators and to establish content credibility. As Wikipedia has matured, its public notoriety demands a new approach to the manner in which Wikipedia reflects the rather complex process of authorship on its content pages. This chapter discusses a number of visualizations designed to support this goal, and discusses why they have not as yet been adopted into the Wikipedia interface.”
  • “Policies for the production of content in Wikipedia, the free encyclopedia”[11] From the abstract: “It is a case study with qualitative approach that had Laurence Bardin‘s content analysis as theoretical and methodological reference.”
  • “Validity claims of information in face of authority of the argument on Wikipedia”[12] From the abstract: “proposes to approach the claims of validity made by Jürgen Habermas in the face of the authority of the better argument. It points out that Wikipedia is built as an emancipatory discourse according to Habermas’ argumentative discourse considering the process of discursive validation of information.”
  • “Wikipedia and history: a worthwhile partnership in the digital era?”[13]
  • “Is Wikipedia really neutral? A sentiment perspective study of war-related Wikipedia articles since 1945”[14] From the abstract: “The results obtained so far show that reasons such as people’s feelings of involvement and empathy can lead to sentiment expression differences across multilingual Wikipedia on war-related topics; the more people contribute to an article on a war-related topic, the more extreme sentiment the article will express; different cultures also focus on different concepts about the same war and present different sentiments towards them.”
  • “The heart work of Wikipedia: gendered, emotional labor in the world’s largest online encyclopedia”[15] (CHI 2015 Best Papers award, slides)
  • “Knowledge quality of collaborative editing in Wikipedia: an integrative perspective of social capital and team conflict”[16] From the abstract: “Despite the abundant researches on Wikipedia, to the best of our knowledge, no one has considered the integration of social capital and conflict. […] our study proposes the nonlinear relationship between task conflict and knowledge quality instead of linear relationships in prior studies. We also postulate the moderating effect of task complexity. […] This paper aims at proposing a theoretical model to examine the effect of social capital and conflict, meanwhile taking the task complexity into account.”

Papers about visualizing or mining Wikipedia content

  • “Visualizing Wikipedia article and user networks: extracting knowledge structures using NodeXL[17]
  • “Utilising Wikipedia for text mining applications”[18] From the abstract: “Wikipedia … has proven to be one of the most valuable resources in dealing with various problems in the domain of text mining. However, previous Wikipedia-based research efforts have not taken both Wikipedia categories and Wikipedia articles together as a source of information. This thesis serves as a first step in eliminating this gap and throughout the contributions made in this thesis, we have shown the effectiveness of Wikipedia category-article structure for various text mining tasks. … First, we show the effectiveness of exploiting Wikipedia for two classification tasks i.e., 1- classifying the tweets being relevant/irrelevant to an entity or brand, 2- classifying the tweets into different topical dimensions such as tweets related with workplace, innovation, etc. To do so, we define the notion of relatedness between the text in tweet and the information embedded within the Wikipedia category-article structure.”
  • “Integrated parallel sentence and gragment extraction from comparable corpora: a case study on Chinese-Japanese Wikipedia”[19] From the abstract: “A case study on the Chinese–Japanese Wikipedia indicates that our proposed methods outperform previously proposed methods, and the parallel data extracted by our system significantly improves SMT [ statistical machine translation ] performance.”
  • “How structure shapes dynamics: knowledge development in Wikipedia – a network multilevel modeling approach”‘[20] From the abstract: “The data consists of the articles in two adjacent knowledge domains: psychology and education. We analyze the development of networks of knowledge consisting of interlinked articles at seven snapshots from 2006 to 2012 with an interval of one year between them. Longitudinal data on the topological position of each article in the networks is used to model the appearance of new knowledge over time. […] Using multilevel modeling as well as eigenvector and betweenness measures, we explain the significance of pivotal articles that are either central within one of the knowledge domains or boundary-crossing between the two domains at a given point in time for the future development of new knowledge in the knowledge base.” (cf. earlier paper coauthored by the same researchers: “Knowledge Construction in Wikipedia: A Systemic-Constructivist Analysis”)


  1. Gandica, Yerali; Carvalho, Joao; Aidos, Fernando Sampaio Dos; Lambiotte, Renaud; Carletti, Timoteo (2016-01-05). “On the origin of burstiness in human behavior: The wikipedia edits case”. arXiv:1601.00864 [physics]. 
  2. Kämpf, Mirko; Tessenow, Eric; Kenett, Dror Y.; Kantelhardt, Jan W. (2015-12-31). “The Detection of Emerging Trends Using Wikipedia Traffic Data and Context Networks”. PLoS ONE 10 (12): e0141892. doi:10.1371/journal.pone.0141892. 
  3. Reznik, Ilia; Shatalov, Vladimir (February 2016). “Hidden revolution of human priorities: An analysis of biographical data from Wikipedia”. Journal of Informetrics 10 (1): 124–131. doi:10.1016/j.joi.2015.12.002. ISSN 1751-1577.  Closed access
  4. Wagner, Claudia; Graells-Garrido, Eduardo; Garcia, David (2016-01-19). “Women Through the Glass-Ceiling: Gender Asymmetries in Wikipedia”. arXiv:1601.04890 [cs]. Jupyter notebooks
  5. Rössler, B.; Holldack, H.; Schebesta, K. (2015-10-01). “Influence of wikipedia and other web resources on acute and critical care decisions. a web-based survey”. Intensive Care Medicine Experimental 3 (Suppl 1): –867. doi:10.1186/2197-425X-3-S1-A867. ISSN 2197-425X.  (Poster presentation)
  6. Devraj, Nikhil; Chary, Michael (2015). “How Do Twitter, Wikipedia, and Harrison’s Principles of Medicine Describe Heart Attacks?”. Proceedings of the 6th ACM Conference on Bioinformatics, Computational Biology and Health Informatics. BCB ’15. New York, NY, USA: ACM. pp. 610–614. doi:10.1145/2808719.2812591. ISBN 978-1-4503-3853-0. 
  7. Brigo F, Otte WM, Igwe SC, Ausserer H, Nardone R, Tezzon F, Trinka E. Information-seeking behaviour for epilepsy: an infodemiological study of searches for Wikipedia articles. Epileptic Disorders, 2015 Dec 1;17(4):460 DOI:10.1684/epd.2015.0772 Closed access
  8. Brigo, Francesco; Igwe, Stanley C.; Nardone, Raffaele; Lochner, Piergiorgio; Tezzon, Frediano; Otte, Willem M. (July 2015). “Wikipedia and neurological disorders”. Journal of Clinical Neuroscience: Official Journal of the Neurosurgical Society of Australasia 22 (7): 1170–1172. doi:10.1016/j.jocn.2015.02.006. ISSN 1532-2653. PMID 25890773. 
  9. Choi-Lundberg, Derek L.; Low, Tze Feng; Patman, Phillip; Turner, Paul; Sinha, Sankar N. (2015-05-01). “Medical student preferences for self-directed study resources in gross anatomy”. Anatomical Sciences Education: n/a. doi:10.1002/ase.1549. ISSN 1935-9780.  Closed access
  10. Matei, Sorin Adam; Foote, Jeremy (2015). “Transparency, Control, and Content Generation on Wikipedia: Editorial Strategies and Technical Affordances”. In Sorin Adam Matei, Martha G. Russell, Elisa Bertino (eds.). Transparency in Social Media. Computational Social Sciences. Springer International Publishing. pp. 239–253. ISBN 978-3-319-18551-4.  Closed access
  11. Sandrine Cristina de Figueirêdo Braz, Edivanio Duarte de Souza: Políticas para produção de conteúdos na Wikipédia, a enciclopédia livre (Policies For The Production Of Contents In The Wikipedia, The Free Encyclopedia). In: ENCONTRO NACIONAL DE PESQUISA EM CIÊNCIA DA INFORMAÇÃO, 15., 2014, Belo Horizonte. Anais … Belo Horizonte: UFMG, 2014. PDF (in Portuguese, with English abstract)
  12. Marcio Gonçalves, Clóvis Montenegro de Lima: Pretensões de validade da informação diante da autoridade do argumento na wikipédia (Validity claims of information in face of authority of the argument on wikipedia). In: ENCONTRO NACIONAL DE PESQUISA EM CIÊNCIA DA INFORMAÇÃO, 15., 2014, Belo Horizonte. Anais … Belo Horizonte: UFMG, 2014. PDF (in Portuguese, with English abstract)
  13. Phillips, Murray G. (2015-10-07). “Wikipedia and history: a worthwhile partnership in the digital era?”. Rethinking History 0 (0): 1–21. doi:10.1080/13642529.2015.1091566. ISSN 1364-2529.  Closed access
  14. Yiwei Zhou, Alexandra I. Cristea and Zachary Roberts: Is Wikipedia really neutral? A sentiment perspective study of war-related Wikipedia articles since 1945. 29th Pacific Asia Conference on Language, Information and Computation pages 160–68. Shanghai, China, October 30 – November 1, 2015 PDF
  15. Menking, Amanda; Erickson, Ingrid (2015). “The heart work of Wikipedia: gendered, emotional labor in the world’s largest online encyclopedia”. Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. CHI ’15. New York, NY, USA: ACM. pp. 207–210. doi:10.1145/2702123.2702514. ISBN 978-1-4503-3145-6.  Closed access , also as draft version on Wikimedia Commons
  16. Zhan, Liuhan; Wang, Nan; Shen, Xiao-Liang; Sun, Yongqiang (2015-01-01). “Knowledge quality of collaborative editing in Wikipedia: an integrative perspective of social capital and team conflict”. PACIS 2015 Proceedings. 
  17. Shalin Hai-Jew (Kansas State University, US): Visualizing Wikipedia article and user networks: extracting knowledge structures using NodeXL. In: Developing Successful Strategies for Global Policies and Cyber Transparency in E-Learning. DOI:10.4018/978-1-4666-8844-5.ch005 Closed access
  18. Qureshi, Muhammad Atif (2015-10-08). “Utilising Wikipedia for text mining applications”.  (PhD thesis, U Galway)
  19. Chu, Chenhui; Nakazawa, Toshiaki; Kurohashi, Sadao (December 2015). “Integrated parallel sentence and gragment extraction from comparable corpora: a case study on Chinese-Japanese Wikipedia”. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 15 (2). doi:10.1145/2833089. ISSN 2375-4699.  Closed access
  20. Halatchliyski, Iassen; Cress, Ulrike (2014-11-03). “How structure shapes dynamics: knowledge development in Wikipedia – a network multilevel modeling approach”. PLoS ONE 9 (11): e111958. doi:10.1371/journal.pone.0111958. 

Wikimedia Research Newsletter
Vol: 6 • Issue: 01 • January 2016
This newletter is brought to you by the Wikimedia Research Committee and The Signpost
Subscribe: Syndicate the Wikimedia Research Newsletter feed Email WikiResearch on Twitter[archives] [signpost edition] [contribute] [research index]

by Tilman Bayer at January 29, 2016 09:07 PM

Wiki Education Foundation

Partnering with the Society for Marine Mammalogy

Educational Partnerships Manager, Jami Mathewson
Educational Partnerships Manager, Jami Mathewson

Wikipedia recently celebrated its 15th birthday, and there’s so much content left to expand and improve. That’s why the Wiki Education Foundation started the Wikipedia Year of Science—to encourage more scientists to join our programs and contribute knowledge to Wikipedia. I’m happy to announce that Wiki Ed has partnered with the Society for Marine Mammalogy (SMM) for just that reason. Expert scientists can improve Wikipedia’s coverage of marine mammal science by assigning their students to edit Wikipedia, or sponsoring a Visiting Scholar.

Read with porpoise

Last month, Outreach Manager Samantha Erickson and I joined Shane Gero of the Society for Marine Mammalogy at their annual meeting in San Francisco. We spent four hours with attendees, discussing Wikipedia’s culture and ideology, highlighting best practices for using Wikipedia in the classroom, and showcasing Wiki Ed’s Dashboard and other tools.

We also experimented with an idea for expert engagement, asking attendees to assess Wikipedia’s existing coverage in their area of study. We used Monika Sengul-Jones’ learning patterns from her experience coordinating what we call a content gap analysis, essentially a needs assessment for Wikipedia content. What’s missing? What could be improved? The marine mammalogists dived into Wikipedia’s content in search of missing sections, missing sources, and missing articles; but also inaccurate information, an imbalance in content compared to the underlying scholarship, and opportunities to turn an article subsection into its own encyclopedic entry.

To guide these scientists, we asked them to identify a gap in marine mammal science related to their research, studies, and expertise (e.g., Are they a leading expert in porpoises? Cetaceans? Marine conservation?) and consider:

  1. Is there content in this article that doesn’t belong there?
  2. Is one research method, point of view, or side of an issue represented in an imbalanced way compared to the academic literature of the topic? Is there evidence of bias in the article?
  3. How does the topic relate to existing content on Wikipedia? What other topics ought link to it, but don’t?
  4. What are the key sources someone would use to write this content? Be as specific as possible—you may even want to add a bibliography.
  5. Optionally, we asked if there were images on Wikimedia Commons that should be in the article. Were there any other media related to this topic that are not yet on Wikimedia Commons but might improve this article?
  6. Again, as an option, we asked if they would draft up a paragraph-long overview of the missing content, aimed to be an inspiration for the lead section of the article.
  7. Finally, we asked: what kind of university or college course studies this topic?

This final question can help Wiki Ed match notes from experts to students participating in the Classroom Program as we grow our partnership.

Cetacean needed

Several marine mammal articles are already high quality, including Featured Articles about killer whales, pinnipeds, and sea otters. SMM conference attendees have identified the following areas for improvement, which we’ll encourage Classroom Program students and Visiting Scholars to use for direction in their editing:

  • Marine mammal monitoring: observation practices and protocols, available technology, and legislative requirements for monitoring marine mammal behavior while mitigating the impact on their ecosystems
  • Aerobic dive limit: an unrepresented concept with nearly 1,000 results on Google Scholar alone
  • Marine mammal health: most of Wikipedia’s threats to marine mammal health revolve around pollution, yet there is little information about disease and harmful algal blooms
  • Human impacts on marine mammal ecosystems: noise pollution, boat speed, etc., and guidelines experts recommend to reduce negative impacts
  • Cetacean intelligence: limited information is available about the cognitive capacity of bottlenose dolphins, though the existing research confirms this is a notable topic

You otter join Wiki Ed’s programs

We’re excited about this partnership with the Society for Marine Mammalogy, and its potential to positively contribute to Wikipedia. Marine mammalogists and SMM members can join the Year of Science in the following ways:

  • Assign students to edit Wikipedia. This is a proven way to amplify impact and improve many articles within the field.
  • Sponsor a Visiting Scholar. This is one of the best ways to affect high-quality changes in articles, as experienced Wikipedia editors can help bring articles up to Good Article or Featured Article status.
  • Review articles within your expertise, and add your comments, suggested sources, and guidelines to the content gap page on Wikipedia.
  • Join the Marine Mammal WikiSprint, a virtual edit-a-thon held regularly, including this week!

Photo: “Sea otters holding hands” by Joe Robertson from Austin, Texas, USA. – holding hands. Licensed under CC BY 2.0 via Wikimedia Commons.

by Jami Mathewson at January 29, 2016 06:16 PM

Wikimedia Foundation

WikipediansSpeak: Odia Wikisourcer shares her journey and goals

File:Odia Wikisourcer Pankajmala Sarangi sharing her experience and future plans.webm

Odia Wikisourcer Pankajmala Sarangi shares her experience and future plans to grow the community. Video by Pankajmala Sarangi (original video) and Subhashish Panigrahi (post production), freely licensed under CC BY-SA 4.0 license.

The most active editor on the Odia Wikisource is Pankajmala Sarangi, a native of Odisha who now lives in New Delhi, where she works at a non-profit. As a leader in a broad community that is dominated by males—indeed, she is the most active contributor to the Odia-language Wikisource—we asked her to share her journey and her goals to grow the project and community as part of the “WikipediansSpeak” interview series.

What is the community like on the Odia Wikisource?

Pankajmala feels that the community is like her home. “I can’t tell how happy I am after seeing that this one year-old project has already digitized over 200 books. With more and more youth coming on the internet, the internet won’t disappoint them when the type and search in Odia language.”

What are projects that you would like to start or get help from the existing community to grow?

  • Forming expert /resource groups to increase the thematic group structure in the community so that each group could work collaboratively for specific goals.
  • We can also create groups through the help of the resident welfare associations in Odisha cities where Odia WikiTungis (in Odia Wikimedia community city based informal groups have been started that actively organize outreach and engage with new Wikimedians) are already working. They can work hand-in-hand. This will help us expand this program to more new places.
  • We can tie up with basic computer training institutes where their students and new Wikimedians who do not have access to computer/internet could learn about Odia Wikipedia editing as a vocational training. These institutes remain idle during day time and get busy after 4 pm as schools and college students come to learn about computer basics after their class hours.
  • One idea could be involving veterans whose expertise could help improve the quality of the articles which is otherwise going in vain after retirement. Post retirement life is otherwise lonely and many feel worthless who could enjoy the company of many new friends. The senior citizens groups could train new Wikimedians using these institutional facilities.
  • Summer vacation Wikipedia outreach for school/college students:
    • It has became mandatory in all private schools and colleges to do voluntary work for a few hours every day for six months to finish up a program. We can ask these private institutions to include editing and contributing to Odia Wikipedia and other Odia Wikimedia projects in their syllabus. They would not only get Wikipedians as facilitators without spending any money but will become part of such a global and multilingual group. We can involve students for both editing Wikipedia articles, and digitizing and correcting typos and other mistakes on Odia Wikisource. If a manual with the above details is available then it would be better to refer it while working. When we are discussing something in our community then the users (who are eligible for giving suggestions) should automatically get a message saying your suggestion/input is required on this (with that link to the page).

An overall statistics of the Odia Wikisource as per December 2015. Infograhphics by Subhashish Panigrahi, freely licensed under CC BY-SA 4.0.

According to a 2011 survey, Wikipedia editors are only about nine percent female. The Wikisource statistics is not yet known, but I would theorize that it is similar. How do you think we could bridge this gap in Odia?

We surely have less women. We could reorient our current work to bring in a few other aspects like more focused outreach in women’s college and schools, creating a network of women who are interested to contribute to Wikimedia projects, making Twitter lists and Facebook groups for women for more friendly conversation and support, inviting and involving more women participants in Wikimedia outreach. I also wonder how we can gift the top contributors some gifts as a token of appreciation. We could also organize field trips for them to a public library, museum or art gallery so that they get to see how Wikimedia projects could grow by imbibing available resources.

What are your personal plans to build a community for Odia Wikisource in New Delhi?

Well, I think I would work on creating a database of all the Odia speakers living in New Delhi and the city organizations that work in propagating Odia language and culture, and plan Wikisource outreach programs for them.

Subhashish Panigrahi, Wikimedian and Programme Officer, Access to Knowledge (CIS-A2K), Centre for Internet and Society
Nasim Ali, Odia and English Wikimedian

This post is part of the WikipediansSpeak series, which aims to chronicle the voices of the Wikipedia community. You can find more of these posts on the Wikimedia Commons.

by Subhashish Panigrahi and Nasim Ali at January 29, 2016 01:07 PM

Content Translation tool has now been used for 50,000 articles

Content Translation session at Wikimania 2015. Photo by Amire80, freely licensed under CC BY-SA 4.0.

Last year around this time, we announced the arrival of a new tool that evolved out of an experiment aimed at making the editing process easier for our users. The tool in question—Content Translation—was initially enabled for 8 languages: Catalan, Danish, Esperanto, Indonesian, Malay, Norwegian Bokmal, Spanish and Portuguese. Today, 12 months later, this article creating tool has been used by more than 11,000 editors across 289 Wikipedias to create more than 50,000 new articles.

Content Translation introduced a simple way to create Wikipedia articles through translation. Many editors have used this method for years in an effort to enrich content in Wikipedias where creation of high quality articles has been an uphill struggle due to many reasons. However, translating a Wikipedia article included several cumbersome steps like copying content across multiple browser tabs, manual adaption of links and references, etc. Content translation abstracts all these steps and provides a neat interface that is easy to use and provides a much faster method of creating a new article.

Content Translation is a beta feature. As part of the beta program, it is available for all logged-in users on 289 Wikipedias to try and provide us with their feedback.

Progress during the year

Over the last year, we have regularly documented the progress of the tool and how it was being adapted. Feedback from the users of Content Translation through many interactions helped us ascertain the features that had been helpful, or lacking and needed more attention. Also, we relied heavily on trends determined through the statistics that were being captured every day. For instance, during initial days we found that many users were unaware of the existence of this tool. To make it easier we surfaced several access points where the tool may be needed, including the contributions page, the list of interwiki languages on an article and other easily accessible spots. Sometime during the middle of 2015, we found that many users had not used the tool after 1 or 2 times. During conversations, users cited several reasons, like lack of machine translation support for their language, technical difficulties with some features, greater effort to find articles that needed translation etc. As a result, we focused on two key aspects:

  1. continued engagement with our returning users, and
  2. increased reliability and stability of the tool.

While working on Content Translation, we also made simultaneous improvements to the Statistics page. This page displays the weekly and total figures related to articles translated and deleted, as well as information related to the active languages. The statistics page (Special:ContentTranslationStats) is available in all wikis where Content Translation extension exists. Several interesting information is surfaced through the statistics page. For instance:

  • 64% of all articles have been translated from the English Wikipedia. Spanish is the second popular choice (12%).
  • more than 1000 new articles have been created in 15 languages, of which 6000 individual articles have been written in both Catalan and Spanish Wikipedias.
  • highest number of individual translators have used Content Translation in the Spanish Wikipedia (more than 2000).
  • the highest number of articles created during a single week is 1968. Over 1900 articles are created using Content Translation every week—up from about 1000 per week in August 2015, the first month when it was enabled in all languages.
  • weekly deletion rates have been found to be between 6 to 8% of the total articles created

Besides this regular set of data, occasionally we have observed some interesting trends related to specific events. For example, when a machine translation system was enabled on the Russian Wikipedia in early November, the weekly article translation numbers doubled and has continued to grow.

Comparison between articles created in Content Translation with and without the suggestions feature. Image by Runa Bhattacharjee, public domain/CC0.

Engagement and Stability

One of the major outcomes in recent months is the addition of the ‘Suggestions’ feature. Instead of searching for what to do, users can view a list of articles that they can translate. This is an ongoing collaboration between the Language and Research teams at the Wikimedia Foundation. Users are displayed a list comprising of articles on topics determined on the basis of various factors like their past translations, popular topics in the language, etc. Additionally, topic-based targeted campaigns with predetermined article lists have also been introduced. The first of these was proposed by the Medical Translation Project and completed for translating a set of articles from English to Persian. A month after this feature was introduced, we found that suggestions have been used to start about 16% of the translations.

In terms of stability, increased usage of the tool has thrown up some of the technical challenges that need further attention. These include better handling of translation saving and publishing errors, reducing wikitext errors in published articles and uninterrupted service uptime through better monitoring of services. As a development team, constant interactions with users of Content Translation have been valuable as a source of information regards the performance of the tool and its shortcomings.

Coming up next

The main focus at the moment continues to be improving the wikitext sanity of the published content, reducing publishing and saving errors, and an overall improvement in stability of the article translation workflow.

Besides this, we will continue improvements of a feature that is an important aspect of this project. Content Translation uses third-party machine translation systems for several languages. To help benefit the wider machine translation development community, we recently completed the initial development of the parallel corpora API that provides an easy access to the human-modified translations. This is an open repository compiling examples of translated content and the corrections users had to make. It will be a valuable resource in improving quality and language coverage in all new and existing machine translation systems.

We would like to sincerely thank everyone for comments, feedback, encouragement and wholehearted participation that provided direction to this project. We look forward to many new things in the next 12 months.

You can share your comments and feedback about the Content Translation tool with the Wikimedia Language team at the project talk page. You can also follow us on twitter (@whattotranslate) for updates and other news.

Runa Bhattacharjee, Language team (Editing)
Wikimedia Foundation

by Runa Bhattacharjee at January 29, 2016 06:43 AM

Pete Forsyth, Wiki Strategies

Grants and transparency: Wikimedia Foundation should follow standards it sets

Former Wikimedia ED Sue Gardner (right) championed strong views about restricted grants and transparency. Have those values survived into the era of Lila Tretikov (left)? Photo by Victor Grigas, licensed CC BY-SA

Former Wikimedia ED Sue Gardner (right) championed strong views about restricted grants and transparency. Have those values survived into the era of Lila Tretikov (left)? Photo by Victor Grigas, licensed CC BY-SA

I wrote and edited a number of grant proposals and reports on behalf of the Wikimedia Foundation (WMF) from 2009 to 2011. In that role, I participated in a number of staff discussions around restricted grants, and transparency in the grant process. I was continually impressed by the dedication to transparency and alignment to mission and strategy.

As of 2015, however, the four people most strongly associated with those efforts at WMF have all left the organization; and I am concerned that the diligence and dedication I experienced may have left the organization along with them. Yesterday’s announcement of a $250,000 grant from the Knight Foundation increases my concern. That grant is apparently restricted to activities that are not explicitly established in any strategy document I’ve seen. It is also not specifically identified as a restricted grant.

In the WMF’s 2015-16 Annual Plan (which was open for public comment for five days in May), this phrase stood out:

Restricted amounts do not appear in this plan. As per the Gift Policy, restricted gifts above $100K are approved on a case-by-case basis by the WMF Board.

There does not appear to be any companion document (or blog posts, press releases, etc.) covering restricted grants.

When I worked for WMF, four people senior to me maintained strong positions about the ethics and mission-alignment relating to restricted grants:

  • Sue Gardner, Executive Director
  • Erik Möller, Deputy Director
  • Frank Schulenburg, Head of Public Outreach
  • Sara Crouse, Head of Partnerships and Foundation Relations

They strongly advocated against accepting restricted grants (primarily Gardner), and for publishing substantial portions of grant applications and reports (primarily Möller). At the time, although we worked to abide by those principles, we did not operate under any formal or even semi-formalized policy or process. [UPDATE Jan. 28: I am reminded that Gardner did in fact articulate a WMF policy on the topic in October 2011. Thanks MZMcBride.] I am proud of the work we did around restricted grants, and I benefited greatly in my understanding of how organizational needs intersect with community values. These principles influenced many activities over many years; in public meeting minutes from 2009, for instance, Gardner articulated a spending area (data centers) that would be appropriate for restricted grants.

Today, however, none of us still works for Wikimedia (though Gardner retains an unpaid position as Special Advisor to the Board Chair).

In the time since I left, there has been very little information published about restricted grants. The English Wikipedia article about the Wikimedia Foundation reflects this: it mentions a few grants, but if I’m not mistaken, the most recent restricted grants mentioned are from 2009.

Restricted grants can play a significant role in how an organization adheres to its mission. Last year, Gardner blogged about this, advocating against their use. While her observations are valuable and well worth consideration, I would not suggest her view settles the issue — restricted grants can be beneficial in many cases. But irrespective of her ultimate conclusion, her post does a good job of identifying important considerations related to restricted grants.

The principles of Open Philanthropy, an idea pioneered by Mozilla Foundation executive director Mark Surman, and long championed by Wikimedia Advisory Board member Wayne Mackintosh, align strongly with Wikimedia’s values. The Open Philanthropy doctrine emphasizes (among other things) publishing grant applications and reports and inviting scrutiny and debate.

In its grant-giving capacity, the Wikimedia Foundation appears to practice Open Philanthropy (though it doesn’t explicitly use the term). It has published principles for funds dissemination:

  • Protect the core
  • Assess impact
  • Promote transparency and stability
  • Support decentralized activity
  • Promote responsibility and accountability
  • Be collaborative and open

Those principles are not mere words, but are incorporated into the organization’s grant-giving activities. For example, the WMF’s Annual Plan program, which funds chapters and affiliates, requires applicants to submit proposals for public review for 30 days, and to make public reports on past grants. The Project and Event Grants program also requires open proposals and open reports.

But the Wikimedia Foundation appears to still lack any clear standard for transparency of the restricted grants it receives. (There is less urgency for openness in the case of unrestricted grants, which by definition do not obligate the recipient to shift its operational priorities. But conditions are sometimes attached to unrestricted or restricted grants, such as the appointment of a Trustee; these should be clearly disclosed as well.) The WMF Gift Policy merely asserts that “Restricted gifts [of $100k+] may be accepted for particular purposes or projects, as specified by the Foundation [and with Board approval].”

Addendum: I have been reminded that in November 2015, the Wikimedia Foundation’s Funds Dissemination Committee — which advises the Board of Trustees on the Annual Plan Grants mentioned above, but has no formal authority over the WMF itself — voiced strong criticism of the Wikimedia Foundation’s lack of adherence to the standards it requires of affiliates. The critique is well worth reading in full, but this sentence captures its spirit:

The FDC is appalled by the closed way that the WMF has undertaken both strategic and annual planning, and the WMF’s approach to budget transparency (or lack thereof).

In December 2015, the Wikimedia Board of Trustees removed one of its own members, Dr. James Heilman — one of the three Trustees selected by community vote. Though the full story of behind this action has not emerged, Dr. Heilman has maintained that his efforts to increase the organization’s transparency were met with resistance.

What can the WMF’s current practices around restricted grants, and grants with conditions attached, tell us about its commitment to transparency? Can, and should, its transparency around grants be improved? I believe there is much room for improvement. The easiest and most sensible standard, I believe, would be for the WMF to adopt the same transparency standards in the grants it pursues, as it requires of the people and organizations it funds.

by Pete Forsyth at January 29, 2016 02:31 AM

January 28, 2016

Wikimedia Foundation

MIT’s Pantheon explores historical culture with Wikipedia

Denis Law, centre, is scientifically the most famous person from my home town. Who’s yours? Photo by apasciuto, freely licensed under CC BY 2.0.

Who is, scientifically, the most famous person in your home town? A new research project might be able to tell you.

Pantheon, a project developed by the Macro Connections group at the MIT Media Lab, is collecting data from thousands of Wikipedia biographies across 25 language editions, then using that data to visualise historical significance.

With the dataset (and its data visualisation kit), it’s possible to create a treemap of the occupations of a certain region’s famous people across history. You can also use it to find out the most famous people in a certain city—in my home town of Aberdeen, Scotland, that honour belongs to Manchester United footballer, and one-third of the “United Trinity“, Denis Law.

The tool uses a metric the team call “Historical Popularity Index”, gained through a variety of methods laid out on the tool’s methods page. One of these is the number of Wikipedia language editions which contain the person’s biography—for example, Jesus Christ is featured in 214 Wikipedias.

César Hidalgo, director of the MIT Media Lab’s Macro Connections group, where he uses Wikipedia data together with Amy Yu, Cristian Jara, and Shahar Ronen, says the project studies “collective memory”. During a previous project, in which he worked on mapping cities and countries’ industrial output, he realised that he was missing a significant part of national exports: “The US exports soybeans and jet engines, but it is also the birthplace of Miles Davis and Malcolm X. That should count for something, but our records of international trade do not gather information on cultural exports.

“I decided to start a project to map globally famous biographies as a mean to map [these] cultural exports,” he says. “By now our thinking has evolved, and we think of this dataset as a biographical view of human collective memory. But the origins came from the cultural exports framing.”

Right now, the dataset is limited to just over 11,000 individuals, thanks in part to manual verification and data cleaning. The team of researchers behind the project say this incompleteness is unavoidable, but that it’s also a motivation for them to continue to improve the service.

To find the most popular person on Wikipedia from your home town, head over to Pantheon’s site and find the right city.

Joe Sutherland, Communications intern
Wikimedia Foundation

by Joe Sutherland at January 28, 2016 10:41 PM

Wikimedia Foundation takes part in Google Code-in 2015

Photo by Swiss National Library/Simon Schmid/Fabian Scherler, freely licensed under CC BY-SA-4.0.

Google Code-in 2015 is over. As a co-admin and mentor for the Wikimedia Foundation—one of the 14 organizations who took part and provided mentors and tasks—I can say it’s been crazy as usual.

To list some of the students’ achievements:

  • More than a dozen of MediaWiki extensions converted to using the extension registration mechanism
  • Confirmation dialogs in UploadWizard and TimedMediaHandler use OOjs-UI
  • Vagrant roles created for the EmbedVideo and YouTube extensions
  • Two more scraping functions in the html-metadata node.js library (used by Citoid)
  • Many MediaWiki documentation pages marked as translatable
  • lc, lcfirst, uc and ucfirst magic words implemented in jqueryMsg
  • Screenshots added to some extension homepages on mediawiki.org
  • ReCaptchaNoCaptcha of the ConfirmEdit extension uses the UI language for the captcha
  • MobileFrontend, MultimediaViewer, UploadWizard, Newsletter, Huggle, andPywikibot received numerous improvements (too many to list)
  • Long deprecated wfMsg* calls were removed from many extensions
  • The CommonsMetadata extension parses vcards in the src field
  • The MediaWiki core API exposes “actual watchers” as in “action=info”
  • MediaWiki image thumbnails are interlaced whenever possible
  • Kiwix is installable/moveable to the SD card, automatically opens the virtual keyboard for “find in page”, (re)starts with the last open article
  • imageinfo queries in MultimediaViewer are cached
  • The Twinkle gadget‘s set of article maintenance tags was audited and its XFD module has preview functionality
  • The RandomRootPage extension got merged into MediaWiki core
  • One can remove items from Gather collections
  • A new MediaWiki maintenance script imports content from text files
  • Pywikibot has action=mergehistory support implemented
  • Huggle makes a tone when someone writes something
  • Many i18n issues fixed and strings improved
  • Namespace aliases added to MediaWiki’s export dumps
  • The Translate extension is compatible with PHP 7
  • …and many, many, more.

Numerous GCI participants also blogged about their GCI experience with Wikimedia:

The Grand Prize winners and finalists will be announced on February 8.

Congratulations to our many students and 35 mentors for fixing 461 tasks, and thank you for your hard work and your contributions to free software and free knowledge.

See you around on IRC, mailing lists, Phabricator tasks, and Gerrit changesets!

Photo by AKlapper (WMF), freely licensed under CC BY-SA-4.0

Andre Klapper, Bug Wrangler
Wikimedia Foundation

This post originally appeared on Andre’s personal blog.

by Andre Klapper at January 28, 2016 09:35 PM

Millions read Bowie biography following sudden death

David Bowie, left, who released twenty-six studio albums during his lifetime, died earlier this month. Photo by CBS Television, public domain.

On January 10, just two days after the release of his twenty-sixth studio album, enigmatic British artist David Bowie died following an eighteen-month battle with liver cancer.

His death kicked off a string of deaths among aging male entertainers, including Alan Rickman, Glenn Frey, René Angélil, and Abe Vigoda—all but the last within six years of age from one another. All five followed Lemmy and Natalie Cole, both of whom died at the end of 2015.

Wikipedia’s page view statistics show that the site is the first stop for many when actors and entertainers pass away. Bowie received almost seven million views in the day after he died—over 185 times the number he had the day before. His article received another four million more hits over the next few days. Rickman peaked at almost 3.4 million after his own death.

Wikipedia’s weekly traffic report shows just how beloved these two entertainers are. Bowie was the most popular article for the time period, followed by Rickman. Bowie’s wife and son followed at numbers 3 and 5, respectively, and his first wife came in at number 10. The total number of views on Bowie’s article in the two weeks after his death—over twelve million when the second week’s stats are added in—would have been enough to put it on the list of Wikipedia’s top 25 most popular articles from the entire year of 2015.

bowie rickman
Graph created with the Wikimedia Foundation’s Sample App for Pageview API, freely licensed under CC BY-SA 3.0. You can read more about it in our December 2015 blog post, “Making our pageview data easily accessible.”

There is generally a huge uptick in interest in an entertainer following their death, both from people who may not be familiar with their work as well as those seeking confirmation and information on the circumstances. Having a fully-fleshed out article on both meant that Wikipedia already had a thorough overview of their lives, more detailed than the media might be able to put out on a short timeframe—especially Bowie’s “featured”-class article.

Even when it comes to updating, Wikipedia has an advantage in that it can be updated immediately by anyone at any time. Indeed, the article has been edited over a thousand times this month—about ten percent of its more than 10,000 edits since 2001.

Bowie’s disease was a tightly guarded secret, to the point that his death came as a big surprise. In particular, it gave new light to his final album, Blackstar—it was only afterwards that critics noted themes of death and mortality within the lyrics. These were most readily apparent in the music video for the single “Lazarus”, in which Bowie is pictured lying on a deathbed, bandage around his head, singing lyrics like “Look up here, I’m in heaven” and “Oh, I’ll be free; just like that bluebird”.

Bowie had been subjected to a number of death hoaxes in the months leading up to Blackstar‘s release, so when his official social media profiles announced his sudden passing, the information was treated with scepticism. For the most part, editors elected to wait for verified sources to confirm the accuracy of the information coming from social media; the edits were made permanent minutes after Bowie’s son, Duncan Jones, confirmed the news on Twitter: “Very sorry and sad to say it’s true, I’ll be offline for a while. Love to all.”

His Wikipedia article—identified as “featured“, peer-reviewed as one of the best articles on the site—provides a thorough overview of his life, his work, and his legacy. It now serves as a fitting obituary for a pioneering musician.

Articles on Wikipedia are unique in that they can turn from biographies to obituaries in only a few edits. The confirmation of Bowie’s death led to several editors making the changes necessary to the article in due course—generally small, but important, changes of tenses and article layout.

Alan Rickman died only four days after Bowie, at the same age. Photo by Marie-Lan Nguyen, freely licensed under CC BY-SA 3.0.

Four days later, we also lost Alan Rickman, an English actor and director who found fame in a number of roles across stage and screen from the 70s to the present day. He first found fame when he was nominated for a Tony Award in 1985 for his role in Les Liaisons dangereuses.

He earned more widespread and commercial success in his film roles, most notably playing Hans Gruber in Die Hard and Severus Snape in the eight-film Harry Potter series.

Both Bowie and Rickman were 69 when they died within days of each other, a curious coincidence compounded by the later deaths of Glenn Frey and Rene Angelil at a similar age.

Having a fully-fleshed out article on both meant that Wikipedia could provide a thorough overview of their lives, more detailed than the media might be able to put out on a short timeframe.

Joe Sutherland, Communications intern
Wikimedia Foundation

by Joe Sutherland at January 28, 2016 06:50 PM

Andre Klapper

Wikimedia in Google Code-in 2015

(Google Code-in and the Google Code-in logo are trademarks of Google Inc.)

Google Code-in 2015 is over. As a co-admin and mentor for Wikimedia (one of the 14 organizations who took part and provided mentors and tasks) I can say it’s been crazy as usual. :)

To list some of the students’ achievements:

  • More than a dozen of MediaWiki extensions converted to using the extension registration mechanism
  • Confirmation dialogs in UploadWizard and TimedMediaHandler use OOjs-UI
  • Vagrant roles created for the EmbedVideo and YouTube extensions
  • Two more scraping functions in the html-metadata node.js library (used by Citoid)
  • Many MediaWiki documentation pages marked as translatable
  • lc, lcfirst, uc and ucfirst magic words implemented in jqueryMsg
  • Screenshots added to some extension homepages on mediawiki.org
  • ReCaptchaNoCaptcha of the ConfirmEdit extension uses the UI language for the captcha
  • MobileFrontend, MultimediaViewer, UploadWizard, Newsletter, Huggle, and Pywikibot received numerous improvements (too many to list)
  • Long deprecated wfMsg* calls were removed from many extensions
  • The CommonsMetadata extension parses vcards in the src field
  • The MediaWiki core API exposes “actual watchers” as in “action=info”
  • MediaWiki image thumbnails are interlaced whenever possible
  • Kiwix is installable/moveable to the SD card, automatically opens the virtual keyboard for “find in page”, (re)starts with the last open article
  • imageinfo queries in MultimediaViewer are cached
  • The Twinkle gadget‘s set of article maintenance tags was audited and its XFD module has preview functionality
  • The RandomRootPage extension got merged into MediaWiki core
  • One can remove items from Gather collections
  • A new MediaWiki maintenance script imports content from text files
  • Pywikibot has action=mergehistory support implemented
  • Huggle makes a tone when someone writes something
  • Many i18n issues fixed and strings improved
  • Namespace aliases added to MediaWiki’s export dumps
  • The Translate extension is compatible with PHP 7
  • …and many, many, more.

Numerous GCI participants also blogged about their GCI experience with Wikimedia:

The Grand Prize winners and finalists will be announced on February 8th.

Congratulations to our many students and 35 mentors for fixing 461 tasks, and thank you for your hard work and your contributions to free software and free knowledge.
See you around on IRC, mailing lists, Phabricator tasks, and Gerrit changesets!

Graph with weekly numbers of Wikimedia GCI tasks

by aklapper at January 28, 2016 06:00 PM

Gerard Meijssen

#Wikipedia - a 20% error rate

The one thing that makes Wikipedia strong are its wiki links. When they work they are great.. when they work they are.

The article on the Spearman Medal is a case in point. This medal is conferred by the British Psychological Society to psychologists. There were 19 links and two were wrong. One link was to a soccer and one to a football player. The award is conferred since 1965 so there ought to be quite a number of red links

With two sportsmen attributed to winning the Spearman Medal there was an error rate of 20%. With all the red links it is easy to be more informative using Wikidata. With such statistics it is obvious to make the argument that replacing links with links through Wikidata will enhance quality in the English Wikipedia.

This is unlikely to happen. Wikipedians seem to be more concerned in finding fault elsewhere than considering the quality of their own project. Particularly when "outsiders" point them to the error in their ways. It is psychology in action. 

by Gerard Meijssen (noreply@blogger.com) at January 28, 2016 10:46 AM

January 27, 2016

Wiki Education Foundation

Writing art history into Wikipedia

Dr. Gretchen K. McKay
Dr. Gretchen K. McKay

Dr. Gretchen McKay is a Professor of Art History and
Chair of the Department of Art and Art History at 
McDaniel College in Maryland. She shares her experience teaching an art history course with Wikipedia.

Nearly a decade ago, the faculty at my small, liberal arts institution, McDaniel College, overhauled our entire general education program.

One of the features of our new program was a change in college writing instruction. We decided to move the first-year writing course to junior year, and made it specific to disciplines. That meant every department had to craft a writing experience for its students. The requirement in my department is a course we call “Writing in the Discipline: Art and Art History.” This course trains studio artists, critics, and art historians in the basics of art writing. That includes visual analysis, critical analysis, gallery exhibition reviews, and research on an artwork that culminates in an online exhibition.

This past fall was the first time I was slated to teach this course. I knew that the core element would be creating an exhibition entry on a work of art of each student’s choosing. That would involve significant research. As I thought about these goals, I realized that an important element was missing. Who was the audience?

I was part of their audience; I grade them. They were writing for each other, too. I learned a lot from our Director of College Writing about peer review, and I employed that strategy during the semester. But if our goals included training students to write in the discipline of their future careers, then digital writing needed to be included in the course. While planning the course, a colleague and digital mentor, Adeline Koh, shared that the Wiki Education Foundation was looking for instructors who would be doing a Wikipedia assignment. I bit, and left my name and contact information as a comment on her post.

I’m very glad that I did.

From that moment, Wiki Ed staff was in constant contact, answering questions and offering suggestions that helped clarify my goals. That assignment became the Wikipedia Edit Day. Students created accounts with quick dispatch on the first day of class. Then, students completed an online training program, provided by Wiki Ed, to learn the basics of language and formatting. The training was straightforward, with some kinks that Wiki Ed are working to fix (but it was nothing insurmountable).

Most of the work was research, and like any other course with a research component, I had to structure and scaffold the assignment. After choosing an artwork, students collected peer-reviewed sources and submitted them for my approval. To be sure they had something to say, students needed at least five peer-reviewed sources the week before the Wiki Edit day. Most had more than five. Students also had to complete an outline that planned out their research. The works ranged from a medieval tapestry to a Futurist painting by Boccioni, so there was a wide range of interpretations and approaches.

When the Edit Day finally came, students had varying reactions. We met in a computer lab, and the Wiki Edit Day class period (90 minutes) began with a general discussion of Wikipedia and what you find on a page. Our Writing Center Director led this discussion.

Then it was time for them to edit. A few students expressed concern that they “didn’t know enough” to be making comments. I assured them that they did know enough, and reminded them that this was why they had been researching prior to the Edit session. I encouraged them to think about their research to frame their contribution. In the future, I will offer students more specific items to think about ahead of the actual Edit session.

The real “a-ha” moment for me as an instructor, and what made me a believer in this assignment, came at the end of the Edit session. I had one of our librarians lead a post-edit discussion. Two comments made my teacher’s heart leap:

One student said, “I realized while I was doing this that I was writing for real people. People were going to read my writing and see my research. That was kind of terrifying, but also really kind of cool.”

Another student realized she couldn’t interpret, because Wikipedia is for factual information. Most of her work to date on her painting was interpretive. Even the descriptions of her painting were interpretive. I include here a link to her entry.

Part of my student’s contribution was to describe the Madonna and Child, a painting that is featured at the top right of the page for the artist. My student’s description reads, “The Madonna boasts timeless stylized features of the Virgin. Her fingers, nose, and neck are exaggeratedly long and slender and her face itself is elongated and narrow. Her soulful eyes are large and intensely focused, lending her visage a particular elegance.” We talked about how her description was leaning towards ekphrastic writing, a prose style that attempts to recreate the image in the viewer’s eye, rather than factual description. It became a joke in the class to see how long that description remained on that page. Many members of the class checked from time to time to see if it was still there (it is!).

This assignment helped my students understand what I had hoped they would: that writing for different audiences might change your word choices.

They enjoyed learning a bit more about the “inner workings” of Wikipedia. All said that they would think about Wikipedia differently the next time they used it for research or data. One student liked the contrast of writing for the online exhibition (you can see my students’ work here) and writing for Wikipedia.

Because of the success of this assignment, I will use Wikipedia again in a future class. I encourage others to do the same, at my institution and beyond.

Image:Berlinghiero Berlinghieri 005” by Berlinghiero Berlinghieri – Metropolitan NY. Licensed under Public Domain via Commons.

by Guest Contributor at January 27, 2016 05:00 PM

Weekly OSM

weekly 288

1/19/2016-1/25/2016 OpenStreetMap unclosed objects (landuse=farmland), these were edited 2 years ago and does’t display on the map, Those bad geometries were detected by OSMLint-unclosedways [1] | Bild von Mapbox Mapping Matias Dahl extended his blogpost with a link to his interactive website which allows for exploring tags related the top 100 amenity tags. Schools: Back […]

by weeklyteam at January 27, 2016 04:27 PM

January 25, 2016

Wikimedia Foundation

Wikimedia Highlights, December 2015

Wikimedia Highlights, December 2015 lead image.png

Here are the highlights from the Wikimedia blog in December 2015.

Wikipedia celebrates 15 years of free knowledge

(2011 Education for All Global Monitoring Report) -Government primary school in Amman, Jordan - Young girls reading.jpg
As Wikipedia marks its 15th anniversary, its community celebrated with nearly 150 events on six continents. Meanwhile, the Wikimedia Foundation is announcing an endowment to sustain Wikipedia for the future. Photo by Tanya Habjouqa, freely licensed under CC BY-SA 3.0 IGO.

On January 15, we celebrated not just Wikipedia, but the birth of an idea: that anyone can contribute to the world’s knowledge. As part of this milestone, the Wikimedia Foundation is pleased to announce the Wikimedia Endowment, a permanent source of funding to ensure Wikipedia thrives for generations to come. The Foundation’s goal is to raise $100 million over the next 10 years. You can follow along with the anniversary by tagging @Wikipedia, using the hashtag #wikipedia15, and visiting 15.wikipedia.org.

Fifteen years ago, Wikipedia was a very different place: Magnus Manske

Wikidata Birthday Talk Magnus Manske.jpg
Magnus Manske, a Wikipedia contributor since 2001, spoke at Wikidata’s third Birthday Party in 2015 at Wikimedia Deutschland. Photo by Jason Krüger, freely licensed under CC BY-SA 4.0.

Manske vividly remembers the early days of Wikipedia: “Back in 2001, Wikipedia was the new kid on the block. We were the underdogs, starting from a blank slate, taking on entities like Brockhaus and Britannica, seemingly eternal giants in the encyclopedia world. I remember the Main Page saying ‘We currently have 15 not-so-bad articles. We want to make 100,000, so let’s get to work.’ ‘Not-so-bad’ referred to stubs with at least one comma.” In those days, even MediaWiki—the software that underpins Wikipedia and other wiki sites around the word—didn’t exist. However, the site’s growth posed problems for the original UseModWiki code, as it could not scale up to meet the demand. Manske coded a replacement for UseMod, which he called Phase II. It introduced a number of innovations that Wikipedia editors still use today, such as namespaces, watchlists, and user contribution lists.

Making our pageview data easily accessible

Solomon Northup by Nebro, edit.jpg
Solomon Northup was the most-visited Wikipedia article on December 12, according to HatNote’s Top 100—a new app that takes advantage of the new pageview API. Illustration from Twelve Years a Slave (1853), public domain.

Wikipedia and its sister projects receive more than 16 billion pageviews each month—more than double the earth’s population. The popularity of different Wikipedia articles can reflect trends in society if we ask simple questions: what’s more popular on Spanish Wikipedia, fideuà or paella? How many views did Punjabi Wikipedia get after the last editathon? What are the top destinations people look up on German Wikivoyage?

You can now use the Wikimedia Foundation’s new pageview API to get these answers quickly and easily. The API is built on a RESTful architecture making it easy to retrieve data with a URI. To make it easier to use, there is a R client and a recently released python client.

In brief

Discovery: What happens when you search Wikipedia?: The Wikimedia Foundation’s Discovery team has replaced prefix search with a “completion suggester.” They are also working on improving search for multilingual users.
Maithili Wikipedia turns one year old: Maithili Wikipedians from Rajbiraj, Nepal organized an event to facilitate the winners of first Maithili Edit-a-thon and to celebrate first anniversary of Maithili Wikipedia.
Live a year in 4 minutes: Introducing #Edit2015, Wikipedia’s year-in-review video: You can experience some of the wonder, pain, and triumph of 2015 in four minutes with #Edit2015, the Wikimedia Foundation’s second year-in-review video, replaying a year through the lens of history’s largest crowd-sourced movement.
Documenting Tunis for future generations: A series of workshops are taking place in the Tunisian Association of the Preservation of the Medina – Tunis (ASM Tunis) headquarters. Volunteers will be gathered on a monthly basis and work in their spare time on four themes: madressas (schools), souks (markets), diar (palaces), and mosques/mausoleums.

Andrew Sherman, Digital Communications Intern
Wikimedia Foundation

Photo Montage credits: “Wikidata Birthday Talk Magnus Manske.jpg” by Jason Krüger, freely licensed under CC BY-SA 4.0; “Solomon_Northup_by_Nebro,_edit.jpg” from Twelve Years a Slave (1853), public domain.; “(2011 Education for All Global Monitoring Report) -Government primary school in Amman, Jordan – Young girls reading.jpg” by Tanya Habjouqa, freely licensed under CC BY-SA 3.0 IGO; Collage by Andrew Sherman.

Information For versions in other languages, please check the wiki version of this report, or add your own translation there!

by Andrew Sherman at January 25, 2016 11:16 PM

Jeroen De Dauw

Replicator: Wikidata import tool

I’m happy to announce the first release of Replicator, a CLI tool for importing entities from Wikidata.

Replicator was created for importing data from Wikidata into the QueryR REST API persistence. It has two big conceptual components: getting entities from a specified source, and then doing something with said entities.

Entity sources

Wikidata Web API

As the above ascii cast shows, you can import entities via the Wikidata web API. You need to be able to connect to the API, and this is by far the slowest way to import, however still much more convenient than getting a dump in case you just want to import a few entities for testing purposes.

The one required argument for the API import command is the list of entities to import. This argument accepts single entity IDs such as Q1 and P42, as well as ranges such as Q1-Q100. You can have as many of these as you want, for instance P42 Q1-100 P20-30 Q1337. The -r or –include-references flag allows you to specify all referenced entities should also be imported. This is particularly useful when you need the labels of these entities when displaying the one you actually imported. Finally there is a verbosity option that allows switching between 3 different levels of output.

The command internally does batching using my Batching Iterator PHP library. You can specify the batch size with the -b or –batchsize option. The command can also be safely aborted via ctrl+c. Rather than immediately dying and leaving your database (or other output) in a potentially inconsistent state, Replicator will finish importing the current entity before exiting. A summary of the import is displayed once it completed or was aborted.

Wikidata dumps

It is possible to import data from both compressed and uncompressed JSON dumps. This functionality is exposed via several commands. import:json for uncompressed dumps, import:bz2 for bzip2 compressed dumps and import:gz for gzip compressed dumps. It is possible to specify a maximum number of entities to import, or to safely and interactively abort the import via ctrl+c. In both cases you will be presented with a continuation token that can be used to continue the import from where it stopped.

The JSON import functionality is build on my Wikidata JSON Dump Reader PHP library. You can even import XML dumps via the import:xml command, though are likely better off sticking with the recommended JSON dumps.

Import targets

As Replicator was written for the QueryR REST API, it by default imports into the persistence used by this API. This persistence is composed of the QueryR EntityStore, the QueryR TermStore and Wikibase QueryEngine, all open source PHP libraries providing persistence for Wikibase data.

While internally Replicator uses a plugin system, there currently is no way to add additional sources without modifying the PHP code. The needed modifications are very trivial, and it is also relatively simple to make the application as a whole truly extensible. While I’m currently working on other projects, I suspect this capability is useful for various use cases. All it takes is implementing an interface with a method handleEntity( EntityDocument $entity ), and suddenly the tool becomes capable of importing into your MediaWiki, your Wikibase or your custom persistence.

Let me know if you are interested in creating such a plugin, then I will add the missing parts of the plugin system. I might get to this in some time anyway, and then do another blog post covering the details.

Further points of interest

I should mention that the Replicator application is build on top of the Wikibase DataModel and Wikibase DataModel Serialization PHP libraries, without which creating such a tool would be a lot more work, both initially and maintenance wise. It also uses the Symfony Console component, which I can highly recommend for anyone creating a CLI application in PHP.

See also

If you are running a Wikibase instance on your wiki, take a look at the Wikibase Import MediaWiki extension by Aude. If you want to import things into Wikidata, then have a look at this reference Microdata import script by Addshore. If you are working with Java and want to import dumps, check out Wikidata Toolkit.

by Jeroen at January 25, 2016 06:08 PM

Wikimedia Tech Blog

Wikimedia and zero rating: clear principles for free knowledge

Photo by Victor Grigas, freely licensed under CC BY-SA 3.0.

The Wikimedia Foundation works to expand free and open access to knowledge everywhere, including areas where affordable access to the internet is a fundamental barrier.

In some regions, the Foundation has utilized “zero-rating” to make mobile traffic to Wikipedia and the Wikimedia sites entirely free. This approach removes the barrier of cost for those wishing to read, learn, and contribute to Wikipedia, in any language. We call this program Wikipedia Zero. Today, more than 600 million people in 64 countries can read, edit, and contribute to Wikipedia through Wikipedia Zero partnerships.

Recently, Facebook’s zero rating program, Free Basics, has come under public scrutiny in India. Facebook’s Free Basics includes Wikipedia as one of its services, but we wish to be clear that neither Wikipedia nor the Wikimedia Foundation are partners of Free Basics. Wikipedia is included in the Free Basics package through our free license. In line with our open policies, anyone can use and distribute Wikipedia content without formal permission.

We have our own approach to zero rating that we believe respects the fundamental values of the Wikimedia movement. This approach was first articulated in our Operating Principles, which are used in considering each Wikipedia Zero partnership.

Our guiding concepts are:

  • No collection of personal information. Carriers receive the IP addresses of sites that will be zero-rated so that they can identify Wikipedia Zero traffic. Wikipedia Zero does not enable carriers to collect or receive personal information about Wikimedia users.
  • No compromise of experience. Carriers zero-rate access to the regular mobile version of Wikipedia and other Wikimedia sites. To ensure users do not mistakenly incur data charges, they are prompted with a notice if they are about to leave a zero-rated page.
  • No shift of editorial control. Wikipedia articles and other Wikimedia content are community curated and will remain that way. Zero-rating agreements do not shift editorial considerations, responsibilities, or policies. In fact, partnerships are meant to extend access to local Wikimedia volunteers and chapters around the world and aid in their community work.
  • No exchange of payment. The Wikimedia Foundation does not pay carriers to zero-rate access to the Wikimedia sites and does not receive payments from carriers through Wikipedia Zero.
  • No exclusive rights. We try to partner with as many carriers as possible to maximize the number of users that can benefit from the initiative.
  • No commercial bundling. Access to the Wikimedia sites through Wikipedia Zero cannot be sold through limited service bundles.
  • Commitment to collaboration with other public interest sites. Our main goal is to promote free access to knowledge, and we want to help other similar services interested in doing the same (just contact us!).

As a Foundation dedicated to free and open access to knowledge for all, we yearn for affordable internet around the world. Until then, we will continue working to bring free access to knowledge to every person on the planet.

Adele Vrana, Head of Strategic Partnerships, Global Emerging Markets
Smriti Gupta, Regional Manager, Strategic Partnerships—Asia
Wikimedia Foundation

by Adele Vrana and Smriti Gupta at January 25, 2016 03:00 AM

January 24, 2016

Alice Wiegand


Schweigen ist eine Form der nonverbalen Kommunikation, bei der nicht gesprochen wird und bei der auch keine Laute erzeugt werden. (Seite „Schweigen“. In: Wikipedia, Die freie Enzyklopädie. Bearbeitungsstand: 11. Januar 2016, 23:53 UTC. URL: https://de.wikipedia.org/w/index.php?title=Schweigen&oldid=150101218 (Abgerufen: 24. Januar 2016, 08:16 UTC))

Nein, wir haben gerade keinen guten Lauf im Kuratorium der Wikimedia Foundation. Unsere letzten Entscheidungen sind für die Community überraschend und wenig nachvollziehbar. Unverständnis, Enttäuschung, Hilflosigkeit und Wut machten sich breit und mündeten zuletzt in ein Misstrauensvotum.

Und das Kuratorium schweigt.

Was viele für einen weiteren Beweis dafür nehmen, dass sich die Wikimedia Foundation und insbesondere das Kuratorium komplett von der Community verabschiedet haben, ist aus meiner Sicht dem Zusammentreffen verschiedener Aspekte nicht angemessener Kommunikation geschuldet. Weder Absicht noch Desinteresse liegen dem zugrunde. Das macht es nicht wirklich besser, das ist klar.

Nehmen wir die Diskussion um die Ernennung von Arnnon Geshuri. Nach der Bekanntmachung, dass wir zum 1. Januar 2016 mit Kelly Battles und Arnnon Gehuri zwei weitere Amerikaner in das Kuratorium berufen, deren beruflicher Werdegang auch Unternehmen des Silicon Valley umfasst, wurde zunächst kritisiert, dass bei der Auswahl zu wenig Gewicht auf Vielfalt gelegt wurde. Sehr bald brachte ein Communitymitglied Arnnons Beteiligung bei den Absprachen großer Tech-Unternehmen wie Google, Apple und Intel gegen die Abwerbung von Beschäftigten in die Diskussion.

Und zack – auf kaltem Fuß erwischt.

Während die Community auf der Mailingliste schnell zu Urteilen kam, musste sich das Kuratorium zunächst mit der Situation an sich auseinandersetzen. Welche Fakten liegen uns vor, was sagt Arnnon dazu, warum wurde das Thema bei der Auswahl der Kandidaten nicht berücksichtigt? Hätte es einen Unterschied gemacht, wenn das Auswahlkomitee das gewusst hätte? Hätten wir während dieser Zeit zumindest eine kurze Mail absetzen können, in dem wir mitteilen, dass wir die Diskussion verfolgen, uns mit der Angelegenheit beschäftigen aber noch etwas Zeit brauchen, bis wir uns dazu äußern? Ja, das hätten wir tun können und auch sollen.

Wenn ich in schwierigen Situationen das Wirrwarr meiner Gedanken ordnen und zu einer Entscheidung führen muss und nicht weiterkomme, laufe ich eine Runde durch den Wald, lege mich in die Badewanne oder höre ausgiebig laute Musik. In komplexen Fällen kommen alle drei Maßnahmen der Reihe nach zum Zuge. In einem Gremium, in dem die Mitglieder zudem noch räumlich über mehrere Zeitzonen und etliche Sprachgrenzen voneinander getrennt sind, funktioniert das leider nicht. Und plötzlich steckt man in einem Vakuum, in dem völlig unklar ist, ob man sich als Gremium äußert, mit welchem Schwerpunkt man das tut, ob es sich um eine kurze und knappe Mitteilung oder doch mehr um eine ausführliche Betrachtung handeln soll. Hätten wir während dieser Phase zumindest eine kurze Mail absetzen können? Ja. Hat aber niemand.

Das Schweigen verfestigt und verselbstständigt sich mit jedem weiteren Tag. Es lähmt, es verunsichert und es schadet. Nicht nur dem Kuratorium und der Organisation dahinter, sondern auch der Diskussionskultur schlechthin. Doch es ist schwierig. Das Verfolgen der Diskussionen an verschiedenen Orten frisst nicht nur Zeit sondern auch Lust. Mit jeder Mail und jedem Beitrag setzt man sich erneut der öffentlichen Empörung, der Diskussion um die Person mehr als um die Sache und weiteren Nachfragen in einem teilweise sehr irritierenden investigativen Stil aus. Muss man das nicht aushalten als Kuratoriumsmitglied? Nein, meine Vorstellung von respektvollem Umgang ist eine andere und ich halte eine solche Erwartungshaltung für realitätsfremd.

Vieles an der laufenden Diskussion gefällt mir nicht. Sie zeigt aber deutlich, wie sensibel die Community auf unsere Entscheidungen reagiert, wie sehr sie befürchtet, dass entscheidende Veränderungen ohne ihr Zutun durchgesetzt werden und wie groß der Vertrauensverlust in das Gremium als solches geworden ist. Manches fußt auf falschen Vorstellungen über Art und Umfang der Entscheidungen des Kuratoriums. Was erneut zeigt, dass wir es noch immer nicht geschafft haben, unser Selbstverständnis als Kuratorium, unsere Aufgaben und unsere Vorstellungen klar zu kommunizieren.

Und was jetzt?

In der Diskussion um Arnnons Ernennung bereiten wir ein Statement vor und Arnnon wird sich auch persönlich einbringen. Vor allem aber müssen wir aus dem Mustopf raus, weg von der Nabelschau hin zu den eigentlichen Fragen rund um freies Wissen. Als Kuratorium müssen wir uns fragen, warum wir uns nur selten in grundsätzliche Diskussionen einbringen, an welcher Stelle unser Input etwas bewirken kann und soll und auch, wie wir gemeinsam mit der Community ein Miteinander schaffen können, in dem wir voneinander profitieren und uns auch über kontroverse Themen zivilisiert austauschen können.

Ich nehme das sehr ernst und es ist weiter mein Ziel, ein Stück dazu beizutragen. Und ich weiß, dass, während ich meine Freizeit in Hangouts, auf Mailinglisten und in Telefonaten verbringe, anderswo weiter Artikel für die Wikipedia geschrieben und verbessert werden. Mit hoher Wahrscheinlichkeit sogar überwiegend von Menschen, die sich noch nicht einmal dafür interessieren, ob es ein Kuratorium der Wikimedia Foundation gibt oder nicht.

by Alice Wiegand at January 24, 2016 11:18 AM

January 22, 2016

Wikimedia UK

Wikimedian in Residence at the Wellcome Library

The post was written by Phoebe Harkins, Communications Co-ordinator at the Wellcome Library. It was originally on the Library’s blog.

Incurably curious? Interested in the history of medicine? Know a bit about Wikipedia?

Would you like to work with us on a fantastic new project and be our Wikimedian in Residence?

Building on our previous projects with Wikimedia UK and our commitment to share our fantastic collections as widely as possible, we’re now looking for a Wikimedian in Residence to help us make that happen.

Wikipedia is one of the world’s most popular websites, and is often the first place people look for content about subjects covered by our collections. That’s why we want to make the content on Wikipedia as rich and comprehensive as we possibly can.

Our collections cover so much more than the history of medicine – essentially life, death and everything in between, so there’s huge potential for improving the content on Wikipedia. We’ll also be looking at enriching other Wikimedia projects.

The Wikimedian will work with us on the project to help develop areas of Wikipedia covered by our amazing holdings. We’d love you to help us to make our world-renowned collections, knowledge and expertise here at the Wellcome Library even more accessible.

The Reading Room at Wellcome Collection. Wellcome Images reference: C0108488. Credit: Wellcome Trust.

Working with staff here at the Wellcome Library and our colleagues in Wellcome Collection, you’ll have access to amazing content and resources. One of the main aspects of the project will be to help us to develop and sustain relations with Wikimedia UK and the wider Wikimedia community, so as well as helping us to share our amazing collections with the world online, you’ll also be working with us to develop some edit-a-thons and outreach events in our amazing Reading Room.

This is a flexible position, and will last between 6 and 12 months depending on the projects the Wikimedian proposes and develops.

Further details about the post and the project can be found in the terms of reference. If you have any questions about the project just drop me an email.

To apply for the role, send a CV and covering letter to: p.harkins@wellcome.ac.uk

Closing date for applications: 12 February 2016



by Richard Nevell at January 22, 2016 03:11 PM

Gerard Meijssen

#Wikipedia - Nbr of statements on #Wikidata

Edo de Roo produced some stats showing the number of statements on items that have a Wikipedia article in Dutch.

I like his approach; it shows how well Wikidata might serve a Wikipedia. It may infer how well subjects that relate to the Netherlands are supported on other Wikipedias as well. There is more about subjects related to the Netherlands than what you find only on the Dutch Wikipedia but that is a bit more complicated.

When you compare these numbers to Magnus's stats, it is not bad at all. Most items for all of of Wikidata have 1 statement (4,474,865 23.22%) and for the Dutch it is 6 statements (36078).

Obviously there is room for growth.

by Gerard Meijssen (noreply@blogger.com) at January 22, 2016 07:20 AM

Wikimedia Foundation

Community digest: Urdu Wikipedia reaches 100,000 articles

The tiger shark was the Urdu Wikipedia’s 100,000th article. Photo by Albert kok, freely licensed under CC BY-SA 3.0.

On December 29, 2015, the Urdu Wikipedia community achieved a major milestone: 100,000 articles. The article in question, tiger shark (en), was created by Ameen Akbar, an administrator who has made almost 17,000 edits to the project.

The Urdu Wikipedia has come a long way since its inception in 2004. It has a very large number of contributors from Pakistan, where Urdu is the national language, followed by India (where Urdu is among the official languages). There are also contributors from the United States, Finland and Germany.

Urdu Wikipedia bureaucrat Mohammad Shuaib says that, with this feat, the project now joins the bigger league of Wikimedia communities with a much wider presence of free knowledge, and that it is imperative to make it a high-quality, referenced tool, useful to people from all walks of life, including students and teachers.

Mohammad specifically cited the recent success of Quarterly Editing Project, and hopes that, in the near future, Urdu Wikipedians will rise to the position of stewardship, discharging the administrative and maintenance actions with higher implication spread across all Wikimedia projects.

The community’s achievement comes six months after Tahir Mahmood, a bureaucrat, was featured on the Wikimedia blog on achieving a record number of edits. His tally now exceeds 150,000. Second-placed Sajid Amjad has over 50,000 edits, and is the senior most active sysop since 2010.

A survey of ten active Urdu Wikipedians found the community now hopes to consolidate on quality improvement measures such as stub expansion, grammar correction, incorporating more elaborate themes and topics, as well as developing a scheme of action leading to an integrated approach for the further development of all south Asian Perso-Arabic script Wikipedias.

Among those surveyed was Ameen Akbar—who wrote the 100,000th article on the project—who emphasized the need for outreach and engaging more editors. Arif Soomro, several of whose articles have been peer-reviewed and are now in the featured category, listed his future editing priorities in the areas of Urdu, Sindhi and world literature.

Another administrator, Obaid Raza, placed stress on popularizing the Urdu keyboard and making multilingual editing user-friendly. In this direction, Usman Khan has recently embarked on the mission of creating instructional videos on how to use the Urdu Wikipedia for the first time.

A notable idea which has gained significant momentum recently as a result of the recent increase in the number of active editors is the planned coordination of Wikipedias in Perso-Arabic scripts, such as Western Punjabi, Sindhi, Balochi and Pushtu with Urdu, to ensure that these projects have the widest reach among the masses in south Asia.

Syed Muzammiluddin
Urdu Wikipedia Sysop and Wikimedia community volunteer

In brief

An Education Hackathon was conducted by Wikimedia Argentina that attracted over 400 people from around the world.
First results from the community wishlist survey are in: Danny Horn of the WMF writes that they have committed to investigating and responding to the top ten requests. So far, two are being worked on: migrating dead links to the Internet Archive, which maintains an enormous database of webpages going back to the early 2000s, and coming up with a new pageview tool to replace the occasionally unreliable alternative.
Wikimedia Endowment: The WMF is forming an endowment with funds from a legacy donor. The eventual goal for the fund is “to serve as a perpetual source of support for the operation and activities of Wikipedia and its sister projects.” More information and required legal information is available in the official announcement.
WMF needs your input on its future strategy: There is a strategy consultation with the community that the WMF needs feedback on. Executive Director Lila Tretikov writes that your time and effort will “help guide the Foundation in its work to support the movement.” An FAQ is available. The organization hopes to have a strategy in place in time for the community to review its annual budget, which community member Pete Forsyth calls “an important step toward running its finances in a more transparent and accountable way.”
#1Lib1Ref campaign wrapping up: The libraries community has an opportunity to help bridge Wikipedia’s systemic gaps. To get these people to participate, the WMF’s Wikipedia Library team is engaging them in an innovative social media campaign with the hashtag #1Lib#1Ref; organizations and associations like the Internet Archive have also lended assistance in spreading the word. Urge your local librarian(s) to join this global movement! See the official announcement, updated for January 21.
Wikimedia tutorial videos: Pre-production has started for a series of motivational and educational videos that will introduce Wikipedia and some of its sister projects to new contributors, featuring VisualEditor, Citoid (a new citation tool), and Wikimedia Commons. Translations and feedback on the script outline are needed, and there will be an office hour IRC meeting in #wikimedia-office on Monday, January 25, at 11 AM PST / 2 PM EST / 7 PM UTC.

Ed Erhart, Editorial Associate
Wikimedia Foundation

by Syed Muzammiluddin and Ed Erhart at January 22, 2016 05:04 AM

January 21, 2016

Wiki Education Foundation

Monthly Report for December 2015


  • The Wiki Education Foundation launched new onboarding and online orientations for instructors through the Wiki Ed Dashboard. We simultaneously launched a suite of modular trainings for student editors. The new trainings focus core learnings in two introductory modules, then provide specific trainings for students based on what students are asked to do in the course, and when. This allows for deeper, more focused, and more timely training experiences for students.
  • Wiki Ed staff visited the University of California Berkeley campus for a faculty meeting with more than 30 instructors. Educational Partnerships Manager Jami Mathewson and Outreach Manager Samantha Erickson also presented Wikipedia assignments to the annual meeting of the Society for Marine Mammalogy. That presentation ended with an expert review of select Wikipedia articles on marine life, which was documented and shared with Wikipedians.
  • Google awarded the Wiki Education Foundation a major sponsorship in support of the Year of Science. Thanks in part to Google’s leadership role, the Year of Science aims to improve science content on 5,000 Wikipedia articles, enhance visual representation of science topics on Wikipedia, develop science communication skills among thousands of students, and improve the coverage of the lives and works of women scientists.


In mid-December, the Programs team participated in a half-day retreat. They engaged in team-building and discussed the evolution of the Classroom Program and Visiting Scholars as well as lessons learned. The purpose of the retreat was to continue the onboarding of Director of Programs Tanya I. Garcia, establish rapport among team members, and set the stage for work in 2016 and beyond.

Educational Partnerships

Samantha discusses Wikipedia with a participant at the Society for Marine Mammalogy workshop.

In early December, Jami Mathewson and Samantha visited faculty at the University of California, Berkeley, for a presentation about teaching with Wikipedia. They met with more than 30 instructors in departments ranging from American cultures, to plant biology, to Spanish. Instructors had many questions about Wikipedia assignments and the impact students can have on public scholarship.

Samantha and Jami hosted a half-day workshop at the Society for Marine Mammalogy’s annual meeting in San Francisco. The workshop had four themes:

  • How Wikipedia can advance the public scholarship of marine mammal science
  • An introduction to Wikipedia’s policies, community, and editing basics
  • How to use Wikipedia as a teaching tool
  • A content gap analysis.

A dozen participants spent an hour reviewing Wikipedia articles in their areas of expertise. They documented their analysis to help others improve the quality of those articles.

The outreach team spent most of the month preparing for the spring 2016 term and the kick-off of the Year of Science. The team is proud to have on-boarded 39 spring courses in December, compared to 10 courses at the end of December 2014. Since then, we’ve identified peak periods for reaching out to instructors, and expanded staff time and attention to ensure those instructors have the support the need to participate.

Wiki Ed’s partnership with the National Women’s Studies Association continues to flourish. This fall, we supported 26 courses and nearly 700 students in women’s and gender studies courses. These students added 437,000 words to 553 articles, primarily about women, women’s studies, gender studies, and feminism. Here’s a sample of the articles improved by students:

We’re excited to bring students to Wikipedia to contribute to the public scholarship of these important topics.

Classroom Program

Status of the Classroom Program for Fall 2015 in numbers, as of December 31:

  • 162 Wiki Ed-supported courses had Course Pages (72, or 45%, were led by returning instructors)
  • 3,710 student editors were enrolled
  • 2,751 (or 74%) students successfully completed the online training
  • Students edited 4,670 articles and created 449 new entries.

The fall 2015 term saw some big changes to the Classroom Program. At the beginning of the term we launched dashboard.wikiedu.org, and shortly thereafter, ask.wikiedu.org, which contributed greatly to our ability to support what has become our largest number of classes and students to date. From spring 2015, we saw a roughly 40% increase in the number of courses in our program, and a 60% jump in the number of students enrolled in these courses. The dashboard allowed instructors and Wiki Ed staff to more easily follow student work, ensuring that quality and quantity went hand-in-hand.

Student work highlights

Ruth Kittner’s class at Carnegie Mellon University was embroiled in the politics and personalities of the French Revolution, working on 17 articles in the topic area. They expanded articles on politicians, plots, and broader social history all during the month of December. Their largest expansion took Wikipedia’s article on the Declaration of the Rights of Woman and the Female Citizen, Olympe de Gouges’ feminist tract that spurred Mary Wollstonecraft to write A Vindication of the Rights of Woman and led to her arrest for treason.

Westchester University’s History on the Web, taught by Janneken Smucker, asked students to focus on Philadelphia history. User:Smithje2012 expanded the article on the Philadelphia Tribune, the oldest continually published African American newspaper in the United States. Before this term, the article had been a largely unsourced stub for almost a decade. Now the article is cited to multiple journal articles and offers a history of the paper.

Karyl Ketchum’s students at California State University, Fullerton explored a variety of topics in their course on Gender and Technoculture. Students have been working throughout the term to expand articles such as women in government, women and video games. All told, the class worked on 45 different articles. Many of the students added to existing articles, but some created new articles, such as a biography of Rose Lavelle, a member of the US Women’s National Soccer Team, and an article on the protest movement Blue Lives Matter.

Insects that go through all four developmental stages (egg, larva, pupa, and adult) exhibit holometabolism (also called ‘complete metamorphosis’). A student in the University of Chicago’s Evolution and Development class doubled the size of the holometabolism article, adding two paragraphs on the evolutionary context and and another five discussing theories on the evolution of this developmental sequence.

Students in North Dakota State University’s Mineralogy class created 16 new articles about minerals. These include George-ericksenite, a 3000-word article about a yellow mineral named after USGS geologist George E. Ericksen. Several of the articles created by the students were about minerals named for famous geologists including Bobdownsite, named after University of Arizona professor Robert Terrace Downs; and Waterhouseite, named for Australian geologist Frederick George Waterhouse. The class created two articles, Allendeite and Hexamolybdenum, minerals which were first discovered in the Allende meteorite that struck the earth in 1969.

Students in the University of Pennsylvania’s Medical Missionaries to Community Partners class created 17 new articles about medical missionaries. That included a 3,300-word article about Helene Bresslau Schweitzer, a “medical missionary, nurse, social worker, linguist, public medicine enthusiast, editor, feminist, [and] sociologist”. Schweitzer partnered with her husband Albert Schweitzer‘s pioneering medical work in Gabon, including establishing the Albert Schweitzer Hospital.

 Community Engagement

Visiting Scholar Barbara Page created this chart for use in a new article she wrote on chest pain in children.
Visiting Scholar Barbara Page created this chart for use in a new article she wrote on chest pain in children.

Community Engagement Manager Ryan McGrady focused on preparation for the Wikipedia Year of Science, including developing collaborations with WikiProject Women in Red, WikiProject Women Scientists, the WikiCup, and others in the Wikipedia and academic communities.
Two articles by George Mason University Visiting Scholar Gary Greenbaum, Template:Wuser, were promoted to Featured Articles, the highest quality designation on Wikipedia. These articles were Wendell Willkie, about the Republican candidate in the 1940 presidential election who lost to FDR, and GMU’s namesake George Mason, father of the American Bill of Rights. Boroughitis, another of Gary’s articles (see our blog post from November), appeared on Wikipedia’s main page on December 2 as “Today’s featured article,” bringing increased visibility and recognition for the quality of that article.

On December 18, the Main Page of Wikipedia included the following “Did You Know”, based on an article by University of Pittsburgh Visiting Scholar Casey Monaghan:

Did you know “that in 1917, future folk musician and Carnegie Institute of Technology professor emeritus Robert Schmertz was arrested while dressed in ‘a girl’s middy blouse and a small white hat’?”

Barbara Page, a Visiting Scholar also at the University of Pittsburgh, created a chart (top of section) for use in a new article she wrote on chest pain in children.

Program Support


Communications Manager Eryk Salvaggio has spent most of December creating and refining content for the new online training tools. This feedback came from user testing with first-time instructors, alongside Product Manager for Digital Services Sage Ross. Adjustments were ongoing through December, but the training is now live and ready for new instructors and new students.

Blog posts:

External press:

Digital Infrastructure

WINTR staff map out forthcoming Wiki Ed technical projects.
WINTR staff map out forthcoming Wiki Ed technical projects.

In December, we rolled out major improvements of a couple of key areas of dashboard.wikiedu.org:

  • a new onboarding experience to help students and first-time instructors get oriented quickly;
  • a full set of training modules for students, including a new interactive tutorial for VisualEditor; and
  • a major update of the editing interface for course timelines, which makes both creating assignment plans and rearranging existing timelines much easier.

Along with a number of smaller interface refinements and bug fixes, those three updates address the most significant user experience pain points we identified in fall 2015. We’re looking forward to in-depth user tests for the new student training modules, which we will continue to refine into January.

Alongside the Dashboard system, we started planning something new in December: a social media tool for the Year of Science to help people share the joy they find in Wikipedia. Executive Director Frank Schulenburg and Director of Program Support LiAnna Davis joined Sage in Seattle for a planning sprint in the offices of our development partner WINTR. This project — we’re calling it the ‘Wikipedia playlist’ — will launch in February.

Finance & Administration / Fundraising

Finance & Administration

Expenses December 2015In December, we received funding from the Simons Foundation ($350k), Google ($500k), and individual contributors ($44k). Expenses were $225,184 versus the plan of $282,321. The main cause for the $57k variance was an earlier decision to hold off on expanding office space and staffing until we had a better handle on long-term funding.

Expenses YTD December 2015

Year-To-Date expenses are $1,488,042, versus the plan of $1,871,304. The $383k variance is the result of:

  • Savings on expenditures from prior months (Promotional Items, $7k; and Staff Meetings, $8k);
  • Certain expenditures held until long-term funding is clearer:
    • Additional Personnel including recruiting and equipment costs, $149k
    • Additional Office Space, $82k
    • Creative Design Services, $40k
    • Promotional Items, $4k
    • Community workshops, $30k
    • Other Outside Contractors, $15k
  • Timing of certain expenses (Staff Development, $21k; Travel and Conferences, $18k).

Our current spending level has averaged and maintained at 80% of planned over the last 3 months.


  • Google awarded the Wiki Education Foundation a major sponsorship in support of the Year of Science. Thanks in part to Google’s leadership role, the Year of Science aims to improve science content on 5,000 Wikipedia articles, enhance visual representation of science topics on Wikipedia, develop science communication skills among thousands of students, and improve the coverage of the lives and works of women scientists.
  • 100% of the Foundation’s Board of Directors participated in the 2015 year-end giving campaign.
  • The development team has begun planning for 2016 cultivation events in the Bay Area and Houston, Texas.
  • Improvements were made to the Foundation online donation portal, located at www.wikiedu.org/donate

Office of the ED

Frank discusses his vision for the Playlist with WINTR staff.
Frank discusses his vision for the Playlist with WINTR staff.
  • Current priorities:
    • Supporting the fundraising team in securing funding
    • Preparing the board meeting, scheduled for the end of January
    • Overseeing the annual planning and budgeting process for fiscal year 2016–17
  • Frank has followed up with the New York Academy of Sciences on a possible collaboration during the second half of the upcoming Year of Science 2016. The New York Academy of Sciences will provide Wiki Ed with a first budget estimate, so that it can be decided whether that collaboration will be included in next fiscal year’s plan.
  • Executive Assistant to the ED, Renée LeVesque, organized a holiday staff lunch at the restaurant “Il Fornaio” for people in San Francisco. She is also getting ready for an emergency preparedness training for San Francisco staff to be held in January.
  • Frank developed the process for the annual planning and budgeting cycle that will start in January 2016. He introduced the Senior Leadership team to the process and ensured that the group has a shared understanding of roles, responsibilities, and expectations.

Visitors and guests

  • Greg Boustead, Simons Foundation
  • John Tracey, Simons Foundation

by Eryk Salvaggio at January 21, 2016 09:58 PM

Weekly OSM

weekly 287

1/12/2016-1/18/2016 OSM Schools Project by Postcode Area [1] | Map by Robert Whittaker Mapping [1] The first GB Quarterly Mapping Project for 2016 is mapping schools in Great Britain; details on the wiki page. Progress can be followed with a number of tools and queries.And a similar effort has begun in Belgium! User MKnight shares […]

by weeklyteam at January 21, 2016 02:21 PM


Cross-wiki notifications on test wikis

As was teased in this week's tech news, cross-wiki notifications are now available as a BetaFeature on testwiki and test2wiki. Simply go to your preferences, enable "Enhanced notifications", trigger a notification for yourself on test2.wikipedia.org (e.g. log out and edit your talk page), and open up your notifications flyout!

The next steps are going to be populating the backend on all wikis, and then enabling it as a BetaFeature on more wikis (T124234).

Please try it out! If you find any bugs, please file a task in the newly-renamed Notifications Phabricator project.

by legoktm at January 21, 2016 08:57 AM

Gerard Meijssen

#Wikimedia - Perfection is the enemy of the good

I was wrong to tell people that Mr Anthony needs an article. I was wrong because the item I added on Wikidata was not perfect and, indeed it is not. I was wrong because among the awards Mr Anthony received was the "President's Award for Distinguished Federal Civilian Service".  It says so on the website of the Boston University..

There are several approaches to such criticism. It is true, obviously. I made the point that Mr Anthony received awards to do with psychiatry; he is very important in this field and deserves at least some of our attention. Mr Anthony received indeed the President's award. It is awarded to only five people a year.. It makes him notable in the biggers scale of things. It could be a reason for Wikipedians to take note of him and the other people who were celebrated in this way.

I am proud that I make mistakes; it proves at least that I do something. I think that this is a worthwhile thought. When what is done is not perfect in the eyes of others, so be it. It is all too easy to find fault. When many people make an attempt to do good, it is wonderful. It is how Wikipedia became what it is. It is not perfect and by finding fault at what others do, attention is diverted from what makes a project good, even better.

The aim of the Wikimedia foundation is "to share in the sum of all knowledge". Arguably our projects serve what all our editors put in there. Arguably Mr Voltaire already knew that perfection is the enemy of the good. My sense is that the arguments around the Wikimedia Foundation, its software, its communities have lost much of its validity. It is a bit like with psychiatry, for many psychiatry is business as usual: rehashing the old arguments, iterating on the method of operandi and applying the same stigmas. It takes people like Mr Anthony to bring hope, to tell people that there is room for recovery, that it is important to (re)connect to the values that are important.

There is hope for Wikimedia and it is not in rehashing time and again what went wrong. It is in what went right and what it takes to make it go right again and again. It is in reconnecting to important values like "be bold" and in recognising the work people do.

by Gerard Meijssen (noreply@blogger.com) at January 21, 2016 08:49 AM

January 20, 2016

Wiki Education Foundation

Read a Visiting Scholar’s Featured Article on Founding Father

The most exciting thing about the Visiting Scholars program is watching experienced Wikipedians, empowered by the resources of a university library, develop prominent subjects to higher levels of quality.

Reaching the pinnacle of “Featured Article” demands more than writing and editing abilities. It calls for substantial experience with Wikipedia, and a commitment to a peer review and revision process that can take months.

George Mason is a great example. Visiting Scholar Gary Greenbaum started work on the article in September. Since then, he’s made more than 200 edits, adding more than 80,000 bytes to its length. After a peer review in November, he started the “Featured Article Candidates” process. In December, after more peer review, it was promoted to a Featured article.

George Mason is a Founding Father in the United States. He is especially important to the history of civil rights. He was a delegate to the 1787 Constitutional Convention. But he was one of three attendees who didn’t sign, arguing it did not do enough to protect individual rights.

The Constitution didn’t have a bill of rights in its original form. When James Madison proposed the amendments that would become the United States Bill of Rights, he based it on a document Mason had written years before, the 1776 Virginia Declaration of Rights. The Virginia Declaration was also influential abroad. Thomas Jefferson used it when working with Lafayette on the French Declaration of the Rights of Man and of the Citizen.

This is the kind of meaningful, well-cited information that Visiting Scholars can bring to Wikipedia. To sponsor or become a Visiting Scholar, see our Visiting Scholars page.

Ryan McGrady
Community Engagement Manager

Image: George Mason portrait” by Albert Rosenthal – http://memory.loc.gov/award/icufaw/apc0009v.jpg. Licensed under Public Domain via Commons.


by Ryan McGrady at January 20, 2016 04:00 PM

Gerard Meijssen

#Wikimedia - What I wish was strategic

The Wikimedia Foundation asks people to consider its future. This is part of a strategic reevaluation of what it does. The aim is to renew the vision so that in the future our readers will be optimally served.

  • Our objective is to "share in the sum of all knowledge". This should stand but practically we should "share in the sum of all available knowledge"
  • We should research how we can improve our reach for all our projects. 
  • Wikisource should get an additional front end that is mostly about presenting books that are finished to the public. This will likely exponentially increase the relevance of Wikisource
  • Wikidata could be central to many processes in other projects. The problem is that only in Germany development work is done. It is why opportunities like replacing wiki links and red links are not considered to improve quality.
  • We need to map where our projects are weak. There are many relevant subjects that are underserved. Gender diversity has a momentum of its own nowadays so let us focus more on the subjects where we are really weak.
  • The Wikimedia Foundation is more than Wikipedia. The singular focus it gets diminishes our other projects and as a result we do not realise what we aim to achieve; "share in the sum of all knowledge".

by Gerard Meijssen (noreply@blogger.com) at January 20, 2016 09:14 AM

Joseph Reagle

Repositories for sublimetext and Word

I think Writemonkey is the best prose editor out there, it works well with NaturallySpeaking, and 3.0 looks to be amazing -- seemingly taking some cues from sublimetext. Unfortunately, it only runs on Windows, and it breaks virtualbox's and vmware's clipboard synchronization. Hence, I haven't been able to use it for a while.

But once I got used to its repository function (i.e., quickly scoot selected text to a repository file) I couldn't do without. Hence, I have plugins for the two text editors I tend to use the most, sadly, neither of which are free: sublimetext (for its power and speed) and MS Word (for speech dictation). I just realized I've never really shared these.

from os.path import abspath, basename, dirname, exists, join, splitext
import sublime, sublime_plugin
import time

# https://www.sublimetext.com/docs/3/api_reference.html

class MoveToRepoCommand(sublime_plugin.TextCommand):
    def run(self, edit):
        for region in self.view.sel():  
            if not region.empty():  
                selection = self.view.substr(region)  
                date = time.strftime("%Y-%m-%d %H:%M %Z", time.localtime())
                chunk = '\n<!-- moved from main text: %s -->\n\n%s\n\n' % (
                    date, selection)
                fn = abspath(self.view.file_name())
                path = dirname(fn)
                fn_base, fn_ext = splitext(basename(fn))
                fn_repo = join(path, fn_base + '.repo_md')
                print('fn_repo = %s' % fn_repo)
                if exists(fn_repo):
                    with open(fn_repo, 'r') as repo: 
                        repo_content = repo.read()
                    repo_content = ''
                with open(fn_repo, 'w') as repo: 
                    repo.write(chunk + repo_content)
                self.view.replace(edit, region, '')

Sub repo()
Attribute repo.VB_ProcData.VB_Invoke_Func = "Normal.NewMacros.repo"
' repo Macro

    Dim fn As String
    Dim FSO As Object
    Set FSO = CreateObject("Scripting.FileSystemObject")

    ChangeFileOpenDirectory ActiveDocument.path
    fn = FSO.GetBaseName(ActiveDocument.Name) & ".repo_md"
    Debug.Print fn

    Documents.Open filename:=fn, _
        ConfirmConversions:=False, ReadOnly:=False, AddToRecentFiles:=False, _
        PasswordDocument:="", PasswordTemplate:="", Revert:=False, _
        WritePasswordDocument:="", WritePasswordTemplate:="", Format:= _
        wdOpenFormatAuto, XMLTransform:="", Encoding:=1252
    Selection.TypeText ("<!-- moved from main text: ")
    Selection.InsertDateTime DateTimeFormat:="yyyyy-MM-dd HH:mm", _
    Selection.TypeText (" -->")
    Selection.PasteAndFormat (wdFormatPlainText)
    Application.Run MacroName:="Normal.NewMacros.FileSave"
End Sub

by Joseph Reagle at January 20, 2016 05:00 AM

January 19, 2016

Greg Sabino Mullane

MediaWiki minor upgrade with patches

One of the more mundane (but important!) tasks for those running MediaWiki is keeping it updated with the latest version of the software. This is usually a fairly easy process. While the offical upgrade instructions for MediaWiki are good, they are missing some important items. I will lay out in detail what we do to upgrade MediaWiki installations.

Note that this is for "minor" upgrades to MediaWiki, where minor is defined as not moving more than a couple of actual versions, and not requiring anything other than patching some files. I will cover major upgrades in a future post. For this article, I assume you have full shell access, and not simply FTP, to the server that MediaWiki is running on.

The first step in upgrading is knowing when to upgrade - in other words, making sure you know about new releases. The best way to do this is to subscribe to the low-volume mediawiki-announce mailing list. The MediaWiki maintainers have a wonderful new policy of sending out "pre-announcement" emails stating the exact time that the new version will be released. Once we see that announcement, or when the version is actually released, we open a support ticket, which serves the dual purpose of making sure the upgrade does not get forgotten about, and of keeping an official record of the upgrade.

The official announcement should mention the location of a patch tarball, for example http://releases.wikimedia.org/mediawiki/1.23/mediawiki-1.23.5.patch.gz. If not, you can find the patches in the directory at http://releases.wikimedia.org/mediawiki/: look for your version, and the relevant patch. Download the patch, and grab the signature file as well, which will be the same file with "dot sig" appended to it. In the example above, the sig file would be http://releases.wikimedia.org/mediawiki/1.23/mediawiki-1.23.5.patch.gz.sig.

It is important to know that these patch files *only* cover patching from the previous version. If you are running version 1.23.2, for example, you would need to download and apply the patches for versions 1.23.3 and 1.23.4, before tackling version 1.23.5. You can also create your own patch file by checking out the MediaWiki git repository and using the version tags. In the previous example, you could run "git diff 1.23.2 1.23.5".

Once the patch is downloaded, I like to give it three sanity checks before installing it. First, is the PGP signature valid? Second, does this patch look sane? Third, does the patch match what is in the official git repository for MediaWiki?

To check the PGP signature, you use the sig file, which is a small external signature that one of the MediaWiki maintainers has generated for the patch itself. Since you may not have the public PGP key already, you should both verify the file and ask gpg to download the needed public key in one step. Here's what it looks like when you do:

$ gpg --keyserver pgp.mit.edu --keyserver-options auto-key-retrieve --verify mediawiki‑1.23.5.patch.gz.sig 
gpg: Signature made Wed 01 Oct 2014 06:21:47 PM EDT using RSA key ID 5DC00AA7
gpg: requesting key 5DC00AA7 from hkp server pgp.mit.edu
gpg: key 5DC00AA7: public key "Markus Glaser " imported
gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model
gpg: depth: 0  valid:   5  signed:   0  trust: 0-, 0q, 0n, 0m, 0f, 5u
gpg: Total number processed: 1
gpg:               imported: 1  (RSA: 1)
gpg: Good signature from "Markus Glaser "
gpg: WARNING: This key is not certified with a trusted signature!
gpg:          There is no indication that the signature belongs to the owner.
Primary key fingerprint: 280D B784 5A1D CAC9 2BB5  A00A 946B 0256 5DC0 0AA7

The important line here is the one saying "Good signature". The usage of gpg and PGP is beyond the scope of this article, but feel free to ask questions in the comments. Once verified, the next step is to make sure the patch looks sane. In other words, read through it and see exactly what it does! It helps to read the release notes right before you do this. Then:

$ gunzip -c mediawiki-1.23.5.patch.gz | more

While reading through, make note of any files that have been locally patched - you will need to check on them later. If you are not used to reading diff outputs, this may be a little confusing, but give it a shot anyway, so you know what you are patching. Most MediaWiki version upgrades are very small patches, and only alter a few items across a few files. Once that is done, the final sanity check is to make sure this patch matches what it in the canonical MediaWiki git repository.

This is actually a fairly tricky task, as it turns out the patch files are generated from a custom script, and are not just the output of "git diff old_version new_version". Feel free to skip ahead, this is one method I found for making sure the patch file and the git diff match up. By "git diff", I mean the output of "git diff 1.23.4 1.23.5", for example. The biggest problem is that the files are ordered differently. Thus, even if you remove all but the actual diff portions, you cannot easily compare them. Here, "patchfile" is the downloaded and gunzipped patch file, e.g. mediawiki-1.23.5.patch, and "gitfile" is the output of git diff across two different versions, e.g. the output of "git diff 1.23.4 1.23.5". First, we want to ensure that they both have the same group of files being diffed. Then we walk through each file in the order given by the patchfile, and generate a cross-tag diff. This is saved to a file, and then compared to the original patchfile. They will not be identical, but should match up for the actual diff portions of the file.

## The -f42 may change from version to version
$ diff -s <(grep diff patchfile | cut -d' ' -f42 | cut -d/ -f2- | sort) <( grep diff gitfile | cut -d' ' -f4 | cut -d/ -f2- | sort)
Files /dev/fd/63 and /dev/fd/62 are identical
$ grep diff patchfile | cut -d' ' -f24 | cut -d/ -f2- | grep -v RELEASE | xargs -L1 git diff 1.23.4 1.23.5 > gitfile2
$ diff -b patchfile gitfile2

Okay, we have verified that the patch looks sane. The next step is to make sure your MediaWiki has a clean git status. If you don't have your MediaWiki in git, now is the time to do so. It's as simple as:

$ cd /your/wiki/directory
$ echo -ne "images/\ncache/\n" > .gitignore
$ git init
$ git add .
$ git commit -a -q -m "Initial import of our MediaWiki directory"

Run "git status" and make sure you don't have any changed but uncommitted files. Once that is done, you are ready to apply the patch. Gunzip the patch file first, run the actual patch command in dryrun mode first, then do the final patch:

$ gunzip ~/mediawiki-1.23.5.patch.gz
$ patch -p1 --dry-run -i ~/mediawiki-1.23.5.patch
$ patch -p1 -i ~/mediawiki-1.23.5.patch

You may not have the "tests" directory installed, in which case it is safe to skip any missing file errors related to that directory. Just answer "Y" when asked if it is okay to skip that file. Here is an example of an actual patch from MediaWiki 1.23.3 to version 1.23.4:

$ patch -p1 -i ~/mediawiki-1.23.4.patch
patching file includes/config/GlobalVarConfig.php
patching file includes/db/DatabaseMysqli.php
patching file includes/DefaultSettings.php
patching file includes/libs/XmlTypeCheck.php
patching file includes/Sanitizer.php
patching file includes/upload/UploadBase.php
patching file RELEASE-NOTES-1.23
can't find file to patch at input line 387
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
|diff -Nruw -x messages -x '*.png' -x '*.jpg' -x '*.xcf' -x '*.gif' -x '*.svg' -x '*.tiff' -x '*.zip' -x '*.xmp' -x '.git*' mediawiki-1.23.3/tests/phpunit/includes/upload/UploadBaseTest.php mediawiki-1.23.4/tests/phpunit/includes/upload/UploadBaseTest.php
|--- mediawiki-1.23.3/tests/phpunit/includes/upload/UploadBaseTest.php 2014-09-24 19:58:10.961599096 +0000
|+++ mediawiki-1.23.4/tests/phpunit/includes/upload/UploadBaseTest.php 2014-09-24 19:55:15.538575503 +0000
File to patch: 
Skip this patch? [y] y
Skipping patch.
2 out of 2 hunks ignored

The jump from 1.23.4 to 1.23.5 was much cleaner:

$ patch -p1 -i ~/mediawiki-1.23.5.patch
patching file includes/DefaultSettings.php
patching file includes/OutputPage.php
patching file RELEASE-NOTES-1.23

Once the patch is applied, immediately check everything into git. This keeps the patch separate from other changes in your git history, and allows us to roll back the patch easily if needed. State the version in your commit message:

$ git commit -a -m "Applied mediawiki-1.23.5.patch to move from version 1.23.4 to 1.23.5"

The next step is to run the update script. This almost always does nothing for minor releases, but it's a good practice to get into. Running it is simple:

$ php maintenance/update.php --quiet --quick

The "quick" option prevents the usual five-second warning. The "quiet" option is supposed to turn off any non-error output, but if you are using Semantic MediaWiki, you will still receive a screen-full of unwanted output. I need to submit a patch to fix that someday. :)

Now that the new version is installed, make sure the wiki is still working! First, visit the Special:Version page and confirm that the new version number appears. Then make sure you can view a random page, that you can edit a page, and that you can upload an image. Finally, load your extension testing page.

You don't have an extension testing page? To make one, create a new page named "Extension_testing". On this page, include as many working examples of your extensions as possible, especially non-standard or heavily-used ones. For each extension, put the name of the extension in a header, describe what the output should be, and then have the extension do something interesting in such a way that a non-working extension will be noticed very quickly when viewing the page!

If you have any locally patched files (we almost always do, especially UserMailer.php!), now is the time to check that the patch did not mess up your local changes. If they did, make adjustments as needed, then make sure to git commit everything.

At this point, your wiki should be up and running the latest version of MediaWiki. Notify the users of the wiki as needed, then close out the support ticket, noting any problems you encountered. Upgrading via patch is a very straightforward procedure, but major upgrades are not! Watch for a future post on that.

by Greg Sabino Mullane (noreply@blogger.com) at January 19, 2016 03:20 PM

Wiki Education Foundation

The Wikipedia Year of Science is here!

Year_of_Science_Logo.svgThe Year of Science is here!

In 2016, the Wiki Education Foundation will be working to get comprehensive — and comprehensible — science information into the screens and pockets of millions of readers by improving Wikipedia articles in the sciences. Along the way, we’ll challenge university students across the United States and Canada to share their knowledge of science with millions of readers.

The Year of Science is a collective effort. We’re calling on experienced Wikipedia editors, and new ones. We’re calling on experienced science professors, Ph.D. candidates, and undergraduates. We’re calling on librarians. We’re calling on instructors from every university and college in the United States and Canada, in every science-related subject.

Improving Wikipedia as a resource is our top priority. As a globally accessible free knowledge resource, there’s nothing else like it. We’re asking all of you to help us make the Year of Science a powerful opportunity to empower student learning through Wikipedia.

There are rewards beyond the public service of educating millions.

Instructors, across disciplines, will have an innovative and inspiring classroom assignment that challenges students to do their best work, knowing that it will be seen by many more than just their own instructor. We’ve already seen some spectacular results in fields such as chemistry, ecology, animal behavior, environmental historygeotectonics, archaeology, and even the geography of terrestrial planets.

Students develop research, communication, and media literacy skills. They find and evaluate sources, synthesize information, and work to communicate that learning to others beyond the walls of their classroom.

Librarians can tap Wikipedia to teach research skills, or sponsor a Visiting Scholar. Libraries see greater visibility for their public collections, a core part of their educational mission. By sharing these resources with experienced Wikipedians, they’ll improve public scholarship and provide greater access to the knowledge they have on site.

We’ll be helping students write biographical articles about women scientists that inspire them, or will. By closing Wikipedia’s gender gap, these articles will raise awareness of women’s contributions to science for the next generation of scholars.

We’re starting the year with better tools, better training resources, and a new portal for Wikipedia editors to rally around science events and ideas.

Students will see themselves as contributors to public knowledge of science, and we’ll all see better content across Wikipedia.

We’re grateful to Google and the Simons Foundation for providing major support for the Year of Science 2016.

I’m excited to see what students learn and contribute. Find out how you can join us — contact us through our Year of Science page.

Frank Schulenburg
Executive Director

Photo: “StFX Physical Sciences Lab” by StFX – StFX. Licensed under CC0 via Wikimedia Commons.

by Frank Schulenburg at January 19, 2016 02:00 PM


Wikipedia turns 15

Wikipedia turned 15 last week. Aside from having my friends wish me a happy birthday, I also went to the party in San Francisco. I had a good time meeting up with some Wikimedians that I hadn't seen in a while, and also enjoyed the talks. My favorite was by User:Dreamyshade, who talked about "Stories from the weird old days". I'd recommend watching it if you have 30 minutes of free time :-)

And as always, new laptop sticker ^.^

by legoktm at January 19, 2016 07:30 AM