January 18, 2018

Wikimedia Foundation

The public domain starts growing again next year, and it’s about time

Image by the Electronic Frontier Foundation, CC BY 3.0 US.

Have you ever wondered how it’s possible for there to be two Jungle Book movies to be in development at the same time? Why everything seems to be based on a work by Shakespeare? Or why it always seems like someone is telling a version of The Wizard of Oz? The answer is that these works are in the public domain, meaning that copyright law no longer prevents other artists from adapting them to create new works.

One major rationale for copyright is supposedly that, by giving an exclusive set of rights to artists for their work, we incentivize creativity by making it possible for artists to benefit from releasing works to the public. But copyright protection is supposed to be limited, and once it expires, a work enters the public domain, where anyone can use it.

In the United States, the length of the copyright term has been steadily extended so that published works are effectively copyrighted for 95 years (for corporate works) or until 70 years after an author’s death (for individual works). This has resulted in a public domain that saw increasingly less materials being added to it, limiting the ability of artists to build on works that came before them. The last time Congress changed the law in the 1998 Copyright Term Extension Act, it was applied retroactively. Effectively, it meant that nothing has entered the public domain in the United States for years. January 1, 2019 will mark the end of this dry spell as works first published in 1923 will finally enter the public domain. That mean works like Cecil B. DeMille’s The Ten Commandments and Universal’s silent version of The Hunchback of Notre Dame, two movies released in 1923, will be eligible to join the public domain.

Writers, filmmakers, musicians, and artists wear their influences on their sleeves, and whole branches of critique is devoted to teasing them out. It’s not newThe Aeneid was Virgil playing in the universe of Homer. Recently, and infamously, Fifty Shades of Grey was originally a piece of Twilight fanfiction. The Internet speaks in the language of pop culture: GIFs, mashups, retellings, fan fiction—all find life on the Internet.

It’s not just small artists that rely on the public domain. Disney’s built an empire on making movies based on public domain fairy tales. Just last year, Disney released a live-action version of its animated take on Beauty and the Beast, a story that has been around since the 1700s. But Disney hasn’t been the best in allowing its own works to become part of the public domain. Disney is a huge beneficiary of the extended copyright term, locking down more and more famous works and worlds for its sole use.

While new technology has made it easier to make art and find audiences, the expansion of the copyright term has made it easier for huge companies to devote resources to shutting them down. And even if a new creator is in the right, by relying on such doctrines as fair use for example, they often don’t have the resources to prove it. More works in the public domain mean more works indisputably available for new artists to build on. More public domain works mean more books available for free to read, movies to watch, music to listen to. And even if that does not inspire new works, it allows new generations to rediscover works of old.

Our language is made up of references, and our art should reflect that. Creativity is enriched when the public domain is robust and easily accessed, and we look forward to finally seeing it grow once again in 2019.

Katharine Trendacosta, Policy Analyst
Electronic Frontier Foundation

This blog post has been republished from the Electronic Frontier Foundation’s (EFF) Deeplinks blog, where it is available under a CC BY 3.0 US license. We have modified it only slightly for publication on the Wikimedia Blog. The EFF has been and will be publishing a series of blog posts around Copyright Week, which you can find on Deeplinks.

by Katharine Trendacosta at January 18, 2018 02:04 AM

January 17, 2018

Wiki Education Foundation

Kicking off 2018 by recruiting new instructors

To kick off 2018, we attended two conferences, joining university faculty and inviting them to participate in our programs to improve Wikipedia. At the Linguistic Society of America (LSA) conference, which we attended for the third year in a row, dozens of instructors were excited to sign up and join LSA’s initiative to document and preserve language on Wikipedia. It was great to see how word-of-mouth can spread within a discipline—thanks in part to LSA members like Gretchen McCullough and Lauren Collister. They coordinated a Wikipedia-editing session for attendees, and we found a buzz in the air about the dire need to make Wikipedia as comprehensive and accurate as possible.

One of our instructors, Dr. Margaret Thomas, was slated to speak at the conference with a former student, Jared Collier, who participated in the Classroom Program. After she was delayed by weather, another student from the Spring 2017 course, Sissi Liu, joined Jared to take her place and share the classroom experience. It was great to hear from two former students about how hard they worked to add high quality research to Wikipedia. Now that we’re supporting so many linguistics instructors, courses, and students each term, we’ve created a linguistic-specific guideline for students about how to edit Wikipedia, helping more students achieve this level of work.

Next, we joined astronomers at the American Astronomical Society’s annual meeting. We spoke with astronomers who love Wikipedia, more students who have participated in our Classroom Program, and university instructors who hope to learn how they can join our efforts to make science more accessible to the world. We look forward to the great work to come from the newly inspired linguists, astronomers, and their students.

by Jami Mathewson at January 17, 2018 07:01 PM

Teaching with Wikipedia in an Introductory-Level History Class

Dr. Elizabeth Manley is Associate Professor of History at Xavier University of Louisiana. This last fall, she conducted a Wikipedia assignment with her course, Human Rights and World History. She reflects on the experience here.

Dr. Elizabeth Manley 
Image: File:ProfileESManley.jpg, Profemanley, CC BY-SA 4.0, via Wikimedia Commons.

As a professor of history I have always told my students not to trust Wikipedia as a research source. I have regularly reminded classes that information found on the site cannot necessarily be trusted and should never be used in research projects, and I know that many of my colleagues repeatedly tell students the same. So it created no small bit of confusion when I directed two entire sections of students in my Human Rights and World History course to conceive and complete additions to a Wikipedia entry of their choosing for their semester-long research projects.

For several years now a colleague at my institution, Dr. Megan Osterbur, had been using Wikipedia as a tool for students in a number of her classes in political science and freshman studies. I was both skeptical of its utility as a teaching tool and baffled at how I might institute such a project in my own history courses. After significant prodding, and a week-long pedagogical development seminar run by our seminar for teaching advancement (CAT+FD), I embarked on a course redevelopment that required students to contribute to (or create new) Wikipedia entries on human-rights related topics.

One of the most compelling components of the Wikipedia endeavor, and what had initially changed my mind, was the way in which it allowed students to become knowledge producers rather than mere consumers. I pushed this from the start of the project, reminding students that by expanding on existing articles (or, in a few cases, creating new ones) allowed them to change, if ever so slightly, the existing narrative of the past. Not surprisingly, this became the most rewarding element of the project for the students that really pushed themselves. Several commented in their final presentations that they had showed their contributions with pride to their friends or family; a number of others, while not exactly completing stellar projects, indicated a new awareness for how and why the site works as it does and how they might contribute in the future.

Adding any new element to a course often entails re-adjustment, unforeseen challenges, and frustrations; the addition of a Wikipedia project was no exception, and the hurdles were very new to me and sometimes difficult to solve. Logistically, students struggled to enroll in the course through the Wiki Education portal, remember to post their work in their sandbox, and follow the instructions provided for each new step of the project. A number were highly resistant and lagged far behind the weekly assignments. One student dropped the class, claiming the Wikipedia project was more than she had signed up for, while another plagiarized their entire contribution. Students also had difficulty choosing appropriate topics/entries to work on, finding reputable research materials, and properly inserting new information into existing entries. There were more than a few failures. While I only required that the students contribute ten new points to any existing entry, a number of them added little of value to their chosen topics, including dubious and un-sourced claims, personal opinion, or repeat information. The help of the Wiki Education staff, however, was crucial in attacking some of these problems, as was the colleague who had pushed me toward the medium. They were particularly helpful in evaluation (how to view student contributions alphabetically on the Dashboard and why never to combine two sections!), finding student work that had been deleted (where did it go?), encouraging students (subtle nudges in sandboxes), and helping remediate cases of plagiarism (yes, students might steal straight from Wikipedia for Wikipedia).

However, there were a number of positive outcomes that made the project worthy of continued engagement. Several students really embraced the challenge and chose topics that had minimal or no presence on Wikipedia, including the creation of an entry on African American Women Artists, significant amplification of the Northern California Innocence Project page, and the addition of an African American Beauty page. Others found topics that I believe they will continue to pursue during their college careers. Even in the less successful cases, students expressed their gratitude in coming to better understand how Wikipedia works, demonstrated an appreciation for learning that they could add to a site like Wikipedia, and displayed greater awareness of how certain narratives get told often to the exclusion of others. In sum, while I would encourage faculty members to proceed cautiously in developing a Wikipedia assignment for an introductory general education course, I believe the overall benefits outweigh the challenges. I imagine that as I engage with similar assignments more regularly the scale will continue to tip further toward the creation of knowledge producers!

To learn more about teaching with Wikipedia, visit teach.wikiedu.org or reach out to contact@wikiedu.org.

Image: File:Xavier University’s campus.jpg, User: (WT-shared) Sapphire at wts wikivoyage, CC BY-SA 3.0, via Wikimedia Commons.

by Guest Contributor at January 17, 2018 05:25 PM

Wikimedia Foundation

How to add your photos to Wikimedia Commons and add to the sum of all knowledge

Photo by Dustin Kelling/US Navy, public domain.

Most people in the world have never heard of Wikimedia Commons. They have no idea what it is, who is behind it, and why it exists.

And that’s surprising, as it is the photo and media site that helps power Wikipedia.

Most of the pictures you see on the world’s largest encyclopedia come from Commons, which at 43 million files is one of the world’s largest freely licensed media repositories for “educational media content,” as defined in its scope. (This means that most media files you’d like to upload are acceptable—but not all.) Commons has its own distinct community of volunteer editors who take and upload photographs, enforce that scope, search for copyright violations, and more.

Today, we’re going to run through a little text tutorial about how you (yes, you!) can upload your own content to Commons, donating it to the sum of all knowledge.

Note: This tutorial will focus on uploading photos that you’ve taken, but you can also upload images that are demonstrably freely licensed, as in copyright, or in the public domain. It’s easiest to do this from a computer—in fact, I wouldn’t recommend trying it through a phone unless you’re using the Commons Android app.


1: Your first step should be to head over to Commons’ upload wizard. It’ll ask you to log in; please create an account.[1]


2: Read the poster that appears. Most photos that you’ve taken are going to be okay, but there are quirks in national laws—like freedom of panorama. (Here’s one example.)


3: Move on to the “upload” step. Here, you’ll select the files that you’d like to donate. Again, please make sure that they are only photos that you’ve taken and own the copyright to.


4: We’re up to “release rights.” Select “This file is my own work”/”These files are my own work.” The wizard will default to the CC BY-SA 4.0 license, drawn up by our friends over at Creative Commons.

CC BY-SA 4.0 essentially means that anyone can use your photo for any reason as long as they attribute the photographer (you) and share it under the same license. All of the text on Wikipedia is available under a similar license.

While CC BY-SA is a pretty standard release for Commons photographers, you’re also free to make the license less restrictive via clicking “Use a different license.” You can just require attribution to you (“Attribution,” also known as “CC BY”), disclaim all rights entirely (“CC0”), or use a different license, which I won’t cover here for simplicity.


5: We’ve hit the “describe” phase! Here, you’ll provide a file name for what you’ve uploaded (please be succinct yet thorough!), describe what is happening in the photograph, and add categories.

Categories are incredibly important. Until structured data is finished, they’re the primary way of discovering imagery on Commons.

Generally, you’ll be able to guess what categories you’ll need. For example, a photo of the Mackinac Bridge will go in the “Mackinac Bridge” category.

A big exception is for plants and animals, as they can be listed under their Latin names. Open a new tab, if needed, to find the right ones. (Search Commons with “Category:” in front.)


6: Once all of the fields are filled in, Hit “next”—and you’re done! You’ve helped expand the sum of the world’s knowledge. Thank you!


Now, here’s a bonus round. Do you think that photo you just uploaded is good enough to be added into a Wikipedia article? If so, go to your image’s new page on Commons on a computer. See that “Use this file” with the little Wikipedia icon at the top? Click there, and copy the thumbnail text.

Next, go to a relevant Wikipedia article. Click “edit” at the top. Paste that text into the article in a useful location—it should look a little like the image above. That bit at the end is a caption; you should try to add something relevant.

Once that’s done, click save, and admire your work!

Ed Erhart, Senior Editorial Associate, Communications
Wikimedia Foundation

This blog post was originally created as one of several “tweetstorms” put out by the @Wikipedia Twitter account (1, 2, 3). The image being “uploaded” is an old public domain photograph from the Library of Congress (restored by a volunteer editor); it is used here only as an example, and is most definitely not my own work.


  1. Here’s our privacy policy.

by Ed Erhart at January 17, 2018 04:16 PM

January 16, 2018

Wikimedia Foundation

Celebrate Wikipedia’s birthday by joining your local library in the #1Lib1Ref campaign

Photo by Diego Delso, CC BY-SA 4.0.

Yesterday marked the launch of the third year of the #1Lib1Ref campaign!* The campaign is simple: we invite librarians to give Wikipedia a birthday gift of a citation, helping make sure that information on the encyclopedia is verifiable and grounded in reliable sources.

You can participate in the campaign with five easy steps:

  1. Find an article that needs a citation, using Citation Hunt
  2. Find a reliable source that can support that article
  3. Add a citation using referencing tools
  4. Add the project hashtag #1Lib1Ref in the Wikipedia edit summary
  5. Share your edit on social media and invite other librarians to participate!

Video by Felix Nartey/Jessamyn West/Wikimedia Foundation, CC BY-SA 4.0. Due to browser limitations, the video will not play on Microsoft Edge, Internet Explorer, or Safari. Please try Mozilla Firefox instead, or watch it directly on Wikimedia Commons.

The campaign is an opportunity to talk about why libraries and Wikimedians are allies in their mission to share knowledge.

Why Wikipedia and libraries?

Wikipedia is founded on the idea that everyone should have access to the sum of all human knowledge in their own language. To make this possible, Wikipedians have adopted approaches long used by academic and other fields: the citation, requiring facts and information to come from editorially controlled sources, highlighting the disagreement amongst different researchers, and striving for an inclusive representation of the world. These practices overlap very closely with the work that librarians do every day.

Librarians enter librarianship with a group of values: a desire to provide knowledge and access to that knowledge, alongside a desire to ensure that a diversity of kinds of knowledge can be accessed and heard. In 2017, the Wikimedia community went through a movement strategy process and described its direction in much those same terms: Knowledge Service and Knowledge Equity.

Especially in a digital environment that spreads #fakenews, and allows major languages and cultural voices to dominate, building a more diverse, more dynamic knowledge commons for the future requires the collaboration of librarians, Wikimedians and knowledge seekers around the world.

A strong 2017 for libraries and Wikipedia

2017 was an exciting year for Wikipedia collaborating with libraries. Alongside dozens of local initiatives, outreach activities and programs, a number of movement-wide opportunities helped grow the conversation between Wikipedia and libraries:

  • In January 2017, we ran the second #1lib1ref campaign: following on the year before, it doubled in both size and language reach, engaging hundreds of librarians in nearly thousands of citations added! Read the report here.
  • As part of the #1lib1ref campaign, the International Federation of Library Associations and Institutions and the Wikimedia Foundation published opportunity papers highlighting the potential opportunities for collaboration with the Wikimedia community: see the papers here.
  • The WikiCite Initiative met in May 2017 to discuss the use of Wikidata for structured bibliographic information. A significant portion of the attendees represented several dozen libraries. Read the report here .
  • In May 2017, OCLC and the Wikimedia Foundation announced that the citation generation engine on Wikipedia supports using both WorldCat IDs and ISBNs. Now Wikimedia projects can use the cataloguing records from librarians around the world, to create citations that lead back to libraries.
  • In August at Wikimania 2017, the annual international gathering of Wikimedians, included a number of librarians from throughout the world, a large libraries meetup, and preconference activities at the National Library of Quebec. Questions of how to expand and communicate the importance of Wikipedia for libraries were at the center of many conference sessions.
  • In August, the National Library of Wales created a “National Wikimedian” role for long-time Wikipedian in Residence Jason Evans. Read an interview about the change.
  • In October, the Association of Research Libraries in the United States announced an initiative to use Wikidata to support enrichment of indigenous knowledge in collections.
  • Following a tradition of teaching Wikipedia workshops to librarians around the world (for example, in Catalonia, Spain, Serbia, Argentina, Italy, and many other countries), OCLC’s WebJunction created a MOOC full of educational content focused on public librarians. The project, which included nearly 300 librarians in a 9-week course, is the biggest librarian training to date in the movement. Read an interview about the training.

These activities only highlight the largest scale collaborations between the library community and Wikipedia through the work of local Wikimedia organizations: from Spain to India, Côte d’Ivoire to Canada and Hungary.

Photo by Stanford White via the Detroit Publishing Company/Library of Congress, restored by Durova, public domain.

Where will Wikipedia and libraries collaborate in 2018?

There are some clear trends in the Wikimedia community towards supporting a greater variety of education and outreach opportunities, increased use of Wikidata to enrich collections and catalogues, and towards building a more diverse and representative collection of knowledge on Wikimedia projects.  The future of Wikipedia and Libraries starts with #1Lib1Ref citations, but ends with you: go forth and share your vision of Wikipedia and Libraries through #1lib1ref!

To learn more about the #1lib1ref campaign, check out 1lib1ref.org

Alex Stinson, Strategist, Community Programs
Wikimedia Foundation

*Editor’s note: The #1lib1ref campaign started yesterday, which was a holiday (Martin Luther King Jr. Day) in the United States. We apologize for the delay in posting.

by Alex Stinson at January 16, 2018 07:00 PM

Wiki Education Foundation

Roundup: Black Lives and Deaths

In light of Dr. Martin Luther King, Jr.’s birthday yesterday, we’re looking at notable contributions that students have made to Wikipedia that shed light on systemic issues that African-American communities have faced, and continue to face, in this country.

Ask any educator and they’ll be sure to tell you that history – not to mention present day – is full of African-Americans who have contributed to the collective culture and history of the United States of America and the world at large. Last winter students in University of Michigan instructor Fabian Neuner’s class wrote and edited Wikipedia articles on Black Lives and Deaths. Their work was thought provoking and intriguing, especially as it helps showcase people and issues that the general public may know little to nothing about.

Students in Neuner’s class looked at topics that included police related matters, eugenics, racism and prejudice, and incarceration. The criminal stereotype of African Americans is one such topic and students greatly expanded the article to include information on self-reporting statistics and more information on the stereotype’s history. According to some historians cited in the article, the idea of a black person as a dangerous criminal was heavily perpetuated during slavery times as a tool to suppress rebellions. This stereotype persisted throughout the years and, according to the executive director of the Sentencing Project Marc Mauer, it became more threatening during the 1970s and early 1980s, leading Melissa Hickman Barlow to remark that “talking about crime is talking about race”. This has led to some groups isolating themselves from their local police force.

There have been attempts to breach this gap, as in the case of the Chicago Alternative Policing Strategy (CAPS). CAPS used five tools—problem-solving, turf orientation, community involvement, linkage to city services, and new tools for police—that were intended to help lower crime and bring unity between the police and community. Results from the strategy showed that while overall crime did decrease, there was no proof that the lowered crime rate was a result of CAPS. Some areas actually experienced more crime. A study showed that the results also found that the strategy worked best in areas where the citizens were more financially and socially secure and had more in common with the police. The study also pointed out that while CAPS was more likely to succeed in these communities, these areas also had a racial divide that led to a lack of coordination.

Part of what makes student work invaluable is that they help expand knowledge on underrepresented topics and people like Kalief Browder. This article was fairly short when the students found it. But through their work, they expanded it to about three times its length. Kalief’s tragic story began in the Bronx during 2010, when he was sixteen. He and a friend were stopped by the police, who suspected them of stealing a book bag. Despite not finding the bag and the accuser changing their story multiple times during police questioning, Kalief and his friend were arrested. While his friend was allowed to return home while awaiting the trial, Kalief was held in jail due to his past criminal record and was taken to Riker’s Island when his family was unable to immediately afford his bail. Kalief’s stay in jail was met with beatings, abuse, and degradation from both staff and inmates. To make matters worse, the Bronx District Attorney’s office had a backlog of cases and Kalief was forced to remain in the jail for three years. During this time he was offered several plea bargains, which Kalief refused, citing his innocence. He spent 800 out of his 1,000 day stay in prison in solitary confinement, due to Kalief’s involvement in fights that were often provoked by others – especially prison staff. Kalief rapidly lost weight while in prison due to meal portions not being big enough for a growing teenager and also due to reports of his prison guards deliberately starving him. He also experienced lapses in his mental health and Kalief tried to commit suicide on several occasions. When his case finally came to trial in 2013 his trial was ultimately dismissed due to his accuser having left the country, leaving the courts without any testimony. Kalief was finally free at 20 years of age, however the inhumane treatment in prison and his unjust sentence left him with indelible scars, causing him to become withdrawn, depressed, and suicidal. He and his family attempted to file a lawsuit against the New York Police Department, the Bronx District Attorney, and the Department of Corrections, but met with no success during Kalief’s life. Kalief Browder committed suicide on June 6, 2015. His death sent shock waves throughout the community.

As stated above, the work students perform on Wikipedia is incredibly important. Sometimes it’s a major expansion like the one made to the Kalief Browder article, other times it’s copy-editing and small additions. No matter how large or small the contributions are, they still help expand the largest general reference work on the Internet – Wikipedia. Their contributions help ensure that this information is accessible to billions of people worldwide.

If you are interested in using Wikipedia with your next class, visit teach.wikiedu.org. Or reach out to contact@wikiedu.org to find out how you can gain access to tools, online trainings, and printed materials.

ImageFile:Martin Luther King Jr St Paul Campus U MN.jpg, Minnesota Historical Society, CC BY-SA 2.0, via Wikimedia Commons.

by Shalor Toncray at January 16, 2018 05:23 PM

Wikimedia Tech Blog

New monthly dataset shows where people fall into Wikipedia rabbit holes

Photo by Taxiarchos228, Free Art License 1.3.

Have you ever looked up a Wikipedia article about your favorite TV show just to end up hours later reading on some obscure episode in medieval history? First, know that you’re not the only person who’s done this. Roughly one out of three Wikipedia readers look up a topic because of a mention in the media, and often get lost following whatever link their curiosity takes them to.

Aggregate data on how readers browse Wikipedia contents can provide priceless insights into the structure of free knowledge and how different topics relate to each other. It can help identify gaps in content coverage (do readers stop browsing when they can’t find what they are looking for?) and help determine if the link structure of the largest online encyclopedia is optimally designed to support a learner’s needs.

Perhaps the most obvious usage of this data is to find where Wikipedia gets its traffic from. Not only clickstream data can be used to confirm that most traffic to Wikipedia comes via search engines, it can also be analyzed to find out—at any given time—which topics were popular on social media that resulted in a large number of clicks to Wikipedia articles.

In 2015, we released a first snapshot of this data, aggregated from nearly 7 million page requests. A step-by-step introduction to this dataset, with several examples of analysis it can be used for, is in a blog post by Ellery Wulczyn, one of the authors of the original dataset.

Since this data was first made available, it has been reused in a growing body of scholarly research. Researchers have studied how Wikipedia content policies affect and bias reader navigation patterns (Lamprecht et al, 2015); how clickstream data can shed light on the topical distribution of a reading session (Rodi et al, 2017); how the links readers follow are shaped by article structure and link position (Dimitrov et al, 2016; Lamprecht et al, 2017); how to leverage this data to generate related article recommendations (Schwarzer et al, 2016), and how the overall link structure can be improved to better serve readers’ need (Paranjape et al, 2016;)

Due to growing interest in this data, the Wikimedia Analytics team has worked towards the release of a regular series of clickstream data dumps, produced at monthly intervals, for 5 of the largest Wikipedia language editions (English, Russian, German, Spanish, and Japanese). This data is available monthly, starting from November 2017.

A quick look into the November 2017 data for English Wikipedia tells us it contains nearly 26 million distinct links, between over 4.4 million nodes (articles), for a total of more than 6.7 billion clicks. The distribution of distinct links by type (see Ellery’s blog post for more details) is as follow:

    • 60% of links (15.6M) are internal and account for 1.2 billion clicks (18%).
    • 37% of links (9.6M) are from external entry-points (like a Google search results page) to an article and count for 5.5 billion clicks.
    • 3% of links (773k) have type “other”, meaning they reference internal articles but the link to the destination page was not present in the source article at the time of computation. They account for 46 million clicks.

If we build a graph where nodes are articles and edges are clicks between articles, it is interesting to observe that the global graph is strongly connected (157 nodes not connected to the main cluster). This means that between any two nodes on the graph (article or external entrypoint), a path exists between them. When looking at the subgraph of internal links, the number of disconnected components grows dramatically to almost 1.9 million forests, with a main cluster of 2.5M nodes. This difference is due to external links having very few source nodes connected to many article nodes. Removing external links allows us to focus on navigation within articles.

In this context, a large number of disconnected forests lends itself to many interpretations. If we assume that Wikipedia readers come to the site to read articles about just sports or politics but neither reader is interested in the other category we would expect two “forests”. There will be few edges over from the “politics” forest to the “sports” one. The existence of 1.9 million forests could shed light on related areas of interest among readers – as well as articles that have lower link density – and topics that have a relatively small volume of traffic, making them appear as isolated nodes.

Using the igraph library together with ggraph, we can obtain a list of articles linked from net neutrality, treat that neighborhood of articles as a network, and then visualize how those are connected by the number of clicks and neighbors. Diagram by Mikhail Popov/Wikimedia Foundation, CC BY-SA 4.0.

If you’re interested in studying Wikipedia reader behavior and in using this dataset in your research, we encourage you to cite it via its DOI (doi.org/10.6084/m9.figshare.1305770) and to peruse its documentation. You may also be interested in additional datasets that Wikimedia Analytics publishes (such as article pageview data) or in navigation vectors learned from a corpus of Wikipedia readers’ browsing sessions.

Joseph Allemandou, Senior Software Engineer, Analytics
Mikhail Popov, Data Analyst, Reading Product
Dario Taraborelli, Director, Head of Research

by Joseph Allemandou, Mikhail Popov and Dario Taraborelli at January 16, 2018 04:11 PM

Benjamin Mako Hill

OpenSym 2017 Program Postmortem

The International Symposium on Open Collaboration (OpenSym, formerly WikiSym) is the premier academic venue exclusively focused on scholarly research into open collaboration. OpenSym is an ACM conference which means that, like conferences in computer science, it’s really more like a journal that gets published once a year than it is like most social science conferences. The “journal”, in iithis case, is called the Proceedings of the International Symposium on Open Collaboration and it consists of final copies of papers which are typically also presented at the conference. Like journal articles, papers that are published in the proceedings are not typically published elsewhere.

Along with Claudia Müller-Birn from the Freie Universtät Berlin, I served as the Program Chair for OpenSym 2017. For the social scientists reading this, the role of program chair is similar to being an editor for a journal. My job was not to organize keynotes or logistics at the conference—that is the job of the General Chair. Indeed, in the end I didn’t even attend the conference! Along with Claudia, my role as Program Chair was to recruit submissions, recruit reviewers, coordinate and manage the review process, make final decisions on papers, and ensure that everything makes it into the published proceedings in good shape.

In OpenSym 2017, we made several changes to the way the conference has been run:

  • In previous years, OpenSym had tracks on topics like free/open source software, wikis, open innovation, open education, and so on. In 2017, we used a single track model.
  • Because we eliminated tracks, we also eliminated track-level chairs. Instead, we appointed Associate Chairs or ACs.
  • We eliminated page limits and the distinction between full papers and notes.
  • We allowed authors to write rebuttals before reviews were finalized. Reviewers and ACs were allowed to modify their reviews and decisions based on rebuttals.
  • To assist in assigning papers to ACs and reviewers, we made extensive use of bidding. This means we had to recruit the pool of reviewers before papers were submitted.

Although each of these things have been tried in other conferences, or even piloted within individual tracks in OpenSym, all were new to OpenSym in general.


Papers submitted 44
Papers accepted 20
Acceptance rate 45%
Posters submitted 2
Posters presented 9
Associate Chairs 8
PC Members 59
Authors 108
Author countries 20

The program was similar in size to the ones in the last 2-3 years in terms of the number of submissions. OpenSym is a small but mature and stable venue for research on open collaboration. This year was also similar, although slightly more competitive, in terms of the conference acceptance rate (45%—it had been slightly above 50% in previous years).

As in recent years, there were more posters presented than submitted because the PC found that some rejected work, although not ready to be published in the proceedings, was promising and advanced enough to be presented as a poster at the conference. Authors of posters submitted 4-page extended abstracts for their projects which were published in a “Companion to the Proceedings.”


Over the years, OpenSym has established a clear set of niches. Although we eliminated tracks, we asked authors to choose from a set of categories when submitting their work. These categories are similar to the tracks at OpenSym 2016. Interestingly, a number of authors selected more than one category. This would have led to difficult decisions in the old track-based system.

distribution of papers across topics with breakdown by accept/poster/reject

The figure above shows a breakdown of papers in terms of these categories as well as indicators of how many papers in each group were accepted. Papers in multiple categories are counted multiple times. Research on FLOSS and Wikimedia/Wikipedia continue to make up a sizable chunk of OpenSym’s submissions and publications. That said, these now make up a minority of total submissions. Although Wikipedia and Wikimedia research made up a smaller proportion of the submission pool, it was accepted at a higher rate. Also notable is the fact that 2017 saw an uptick in the number of papers on open innovation. I suspect this was due, at least in part, to work by the General Chair Lorraine Morgan’s involvement (she specializes in that area). Somewhat surprisingly to me, we had a number of submission about Bitcoin and blockchains. These are natural areas of growth for OpenSym but have never been a big part of work in our community in the past.

Scores and Reviews

As in previous years, review was single blind in that reviewers’ identities are hidden but authors identities are not. Each paper received between 3 and 4 reviews plus a metareview by the Associate Chair assigned to the paper. All papers received 3 reviews but ACs were encouraged to call in a 4th reviewer at any point in the process. In addition to the text of the reviews, we used a -3 to +3 scoring system where papers that are seen as borderline will be scored as 0. Reviewers scored papers using full-point increments.

scores for each paper submitted to opensym 2017: average, distribution, etc

The figure above shows scores for each paper submitted. The vertical grey lines reflect the distribution of scores where the minimum and maximum scores for each paper are the ends of the lines. The colored dots show the arithmetic mean for each score (unweighted by reviewer confidence). Colors show whether the papers were accepted, rejected, or presented as a poster. It’s important to keep in mind that two papers were submitted as posters.

Although Associate Chairs made the final decisions on a case-by-case basis, every paper that had an average score of less than 0 (the horizontal orange line) was rejected or presented as a poster and most (but not all) papers with positive average scores were accepted. Although a positive average score seemed to be a requirement for publication, negative individual scores weren’t necessary showstoppers. We accepted 6 papers with at least one negative score. We ultimately accepted 20 papers—45% of those submitted.


This was the first time that OpenSym used a rebuttal or author response and we are thrilled with how it went. Although they were entirely optional, almost every team of authors used it! Authors of 40 of our 46 submissions (87%!) submitted rebuttals.

Lower Unchanged Higher
6 24 10

The table above shows how average scores changed after authors submitted rebuttals. The table shows that rebuttals’ effect was typically neutral or positive. Most average scores stayed the same but nearly two times as many average scores increased as decreased in the post-rebuttal period. We hope that this made the process feel more fair for authors and I feel, having read them all, that it led to improvements in the quality of final papers.

Page Lengths

In previous years, OpenSym followed most other venues in computer science by allowing submission of two kinds of papers: full papers which could be up to 10 pages long and short papers which could be up to 4. Following some other conferences, we eliminated page limits altogether. This is the text we used in the OpenSym 2017 CFP:

There is no minimum or maximum length for submitted papers. Rather, reviewers will be instructed to weigh the contribution of a paper relative to its length. Papers should report research thoroughly but succinctly: brevity is a virtue. A typical length of a “long research paper” is 10 pages (formerly the maximum length limit and the limit on OpenSym tracks), but may be shorter if the contribution can be described and supported in fewer pages— shorter, more focused papers (called “short research papers” previously) are encouraged and will be reviewed like any other paper. While we will review papers longer than 10 pages, the contribution must warrant the extra length. Reviewers will be instructed to reject papers whose length is incommensurate with the size of their contribution.

The following graph shows the distribution of page lengths across papers in our final program.

histogram of paper lengths for final accepted papersIn the end 3 of 20 published papers (15%) were over 10 pages. More surprisingly, 11 of the accepted papers (55%) were below the old 10-page limit. Fears that some have expressed that page limits are the only thing keeping OpenSym from publshing enormous rambling manuscripts seems to be unwarranted—at least so far.


Although, I won’t post any analysis or graphs, bidding worked well. With only two exceptions, every single assigned review was to someone who had bid “yes” or “maybe” for the paper in question and the vast majority went to people that had bid “yes.” However, this comes with one major proviso: people that did not bid at all were marked as “maybe” for every single paper.

Given a reviewer pool whose diversity of expertise matches that in your pool of authors, bidding works fantastically. But everybody needs to bid. The only problems with reviewers we had were with people that had failed to bid. It might be reviewers who don’t bid are less committed to the conference, more overextended, more likely to drop things in general, etc. It might also be that reviewers who fail to bid get poor matches which cause them to become less interested, willing, or able to do their reviews well and on time.

Having used bidding twice as chair or track-chair, my sense is that bidding is a fantastic thing to incorporate into any conference review process. The major limitations are that you need to build a program committee (PC) before the conference (rather than finding the perfect reviewers for specific papers) and you have to find ways to incentivize or communicate the importance of getting your PC members to bid.


The final results were a fantastic collection of published papers. Of course, it couldn’t have been possible without the huge collection of conference chairs, associate chairs, program committee members, external reviewers, and staff supporters.

Although we tried quite a lot of new things, my sense is that nothing we changed made things worse and many changes made things smoother or better. Although I’m not directly involved in organizing OpenSym 2018, I am on the OpenSym steering committee. My sense is that most of the changes we made are going to be carried over this year.

Finally, it’s also been announced that OpenSym 2018 will be in Paris on August 22-24. The call for papers should be out soon and the OpenSym 2018 paper deadline has already been announced as March 15, 2018. You should consider submitting! I hope to see you in Paris!

This Analysis

OpenSym used the gratis version of EasyChair to manage the conference which doesn’t allow chairs to export data. As a result, data used in this this postmortem was scraped from EasyChair using two Python scripts. Numbers and graphs were created using a knitr file that combines R visualization and analysis code with markdown to create the HTML directly from the datasets. I’ve made all the code I used to produce this analysis available in this git repository. I hope someone else finds it useful. Because the data contains sensitive information on the review process, I’m not publishing the data.

This blog post was originally posted on the Community Data Science Collective blog.

by Benjamin Mako Hill at January 16, 2018 04:01 AM

January 15, 2018

Brion Vibber

Daydream View phone VR headset

I somehow ended up with a $100 credit at the Google Store, and decided to splurge on something I didn’t need but wanted — the Daydream View phone VR headset.

This is basically one step up from the old Google Cardboard viewer where you put your phone in a lens/goggles getup, but it actually straps to your head so you don’t have to hold it to your face manually. It’s also got a wireless controller instead of just a single clicky button, so you get a basic pointer within the VR world with selection & home buttons and even a little touchpad, allowing a fair range of controls within VR apps.

Using the controller, the Daydream launcher interface also provides an in-VR UI for browsing Google Play Store and purchasing new games/apps. You can even broadcast a 2d view to a ChromeCast device to share your fun with the rest of the room, though there’s no apparent way to save a recording.

The hardware is still pretty limiting compared to what the PC VR headsets like the Rift and Vive can do — there’s no positional tracking for either head tracking or the controller, so you’re limited to rotation. This means things in the VR world don’t move properly as you move your head around, and you can only use the controller to point, not to pick up VR objects in 3d space. And it’s still running on a phone (Pixel 2 in my case) so you’re not going to get the richest graphics — most apps I’ve tried so far have limited, cartoony graphics.

Still it’s pretty fun to play with; I’ll probably play through a few of these games and then put it away mostly forever, unless I figure out some kind of Wikipedia-related VR project to work on on the side. 🙂

by brion at January 15, 2018 08:34 PM

January 13, 2018

Weekly OSM

weeklyOSM 390



Heatmap1 of bike parking density in Nantes, France | © OpenStreetMap Contributors CC-BY-SA 2.0


  • Fernando Trebien writes to the tagging mailing list seeking information about adding members in all types of routes.
  • Mateusz Konieczny wonders about the unit of seamark:light:range for light towers and beacons. The values were always entered in nautical miles. This was not yet among the default units though. The wiki was adapted in the course of the mailing list discussion.
  • User Spanholz published short videos explaining the iD editor on Reddit. The videos are under CC0.


  • Engineers at Azavea used GeoPySpark to build multi-centre isochrone map of the Isle of Man for walkers based on OpenStreetMap data.
  • Dzertanoj wonders whether the fragmentation of communications channels both within and between OSM communities make it more, rather than less, difficult for people to interact with each other.
  • Antoine Riche wrote a detailed tutorial on Carto’Cité’s wiki about using Overpass queries with uMap. The use-case he picked is the rendering of a heatmap of bike parking density in Nantes, France.


  • Two mappers say that the Facebook “AI” import in Thailand has made OSM unusable in some rural areas and wonder if it should be reverted.

OpenStreetMap Foundation

  • The German-speaking association FOSSGIS e.V. is now an official OSM local chapter! FOSSGIS e.V. was founded in 2001 and was initially focused on free and open source mapping software. They have provided support to the German OpenStreetMap community since 2008.
  • OSM US have commenced the search for a full-time Executive Director, Michal Migurski has already announced his interest in joining the hiring committee.


  • Frederik Ramm invites the OSM community to a hack weekend in Karlsruhe, Germany.
  • The call for proposals for the State of the Map 2018 in Milan is open! The deadline to submit the session proposals is Sunday, 18th February 2018.


  • Hans Hack has published a map that allows to discover the points in Germany that are furthest away from a road.
  • Over the last weeks the recently introduced Climate Protection Map, which is based on the user-contributed data from OpenStreetMap, has been extended to cover the whole world. It now also features some additional layers.


  • Journocode, a German site dedicated to ‘closing the gap between journalism and data science’, publishes a guide to extracting data from OSM using osmfilter (and osmconvert).


  • The NZZ newspaper has released a demo version of its web-based storytelling tool Q, which can also be used to create maps based on Mapbox.


  • With not-for-profit software, iOS and MacOS have lagged behind Android because the higher costs associated with Apple deterred developers. Now, however, there is the possibility for non-commercial organizations (including educational institutions and U.S. government agencies) to ask not to pay Apple Developer Program fees. The waiver only applies for developers of suites of free apps. It will resolve part of the problem (though not all of it; you still have to be able to afford a Mac and/or iPhone).
  • A number of OSM Foundation services are being migrated to HTTPS and therefore will not be reachable via HTTP. Software that cannot cope with this (for example, by following redirection requests from http to https) will stop working when this happens. Work is underway to minimise such disruption.


  • The new stable release of JOSM is here, find out what’s new!

Did you know …

  • MAPCAT.com? It renders vector tiles, so you can choose in which languages to show the labels. It’s based on OpenStreetMap data.
  • … the Android app BumpRecorder? The app recognizes potholes and can show them on an OSM map. The data should help the road construction authority to recognize potholes before they get bigger.

Other “geo” things

  • There is a free source from where historical aerial photos, thematic maps and expedition results from the Antarctic can be downloaded.
  • German magazine Spiegel Online reports (automatic translation) that fake geospatial information has always been common, and not just for military reasons.
  • A post on the “MapPorn” subreddit features a fascinating map that shows present-day country borders and their date of origin. The discussion starts with details of the complex research behind this map and continues with more, often surprising, border facts.
  • According to golem.de the map data provider HERE is coming under (automatic translation) increasing pressure and has added Bosch and Continental as shareholders.
  • The Guardian reports on why developers of autonomous vehicles think that the chaotic traffic conditions make Moscow a great test-bed.
  • Ellie Craven shows why unique identifiers are rarely truly unique.

Upcoming Events

Where What When Country
Cologne Bonn Airport Bonner Stammtisch 2018-01-16 germany
Lüneburg Lüneburger Mappertreffen 2018-01-16 germany
Cologne Köln Stammtisch 2018-01-17 germany
Toulouse Réunion mensuelle 2018-01-17 france
Leoben Stammtisch Obersteiermark 2018-01-18 austria
Turin Torino Hacknight 2018-01-18 italy
Nottingham Pub Meetup 2018-01-23 united kingdom
Viersen OSM Stammtisch Viersen 2018-01-23 germany
Urspring Stammtisch Ulmer Alb 2018-01-25 germany
Lübeck Lübecker Mappertreffen 2018-01-25 germany
Essen Mappertreffen 2018-01-28 germany
Rome FOSS4G-IT 2018 2018-02-19-2018-02-22 italy
Cologne Bonn Airport FOSSGIS 2018 2018-03-21-2018-03-24 germany
Turin MERGE-it 2018-03-23-2018-03-24 italy
Poznań State of the Map Poland 2018 2018-04-13-2018-04-14 poland
Bordeaux State of the Map France 2018 2018-06-01-2018-06-03 france
Milan State of the Map 2018 (international conference) 2018-07-28-2018-07-30 italy

Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please check your event in our public calendar preview and correct it, where appropriate.

This weeklyOSM was produced by Anne Ghisla, Peda, Polyglot, SK53, SomeoneElse, Spanholz, Tordanik, jcoupey, jinalfoflia.

by weeklyteam at January 13, 2018 06:00 PM

January 12, 2018

Wikimedia Performance Team

Measuring Wikipedia page load times

This post shows how we measure and interpret load times on Wikipedia. It also explains what real-user metrics are, and how percentiles work.

Navigation Timing

When a browser loads a page, the page can include program code (JavaScript). This program will run inside the browser, alongside the page. This makes it possible for a page to become dynamic (more than static text and images). When you search on Wikipedia.org, the suggestions that appear are made with JavaScript.

Browsers allow JavaScript to access some internal systems. One such system is Navigation Timing, which tracks how long each step takes. For example:

  • How long to establish a connection to the server?
  • When did the response from the server start arriving?
  • When did the browser finish loading the page?

Where to measure: Real-user and synthetic

There are two ways to measure performance: Real user monitoring, and synthetic testing. Both play an important role in understanding performance, and in detecting changes.

Synthetic testing can give high confidence in change detection. To detect changes, we use an automated mechanism to continually load a page and extract a result (eg. load time). When there is a difference between results, it likely means that our website changed. This assumes other factors remained constant in the test environment. Factors such as network latency, operating system, browser version, and so on.

This is good for understanding relative change. But synthetic testing does not measure the performance as perceived by users. For that, we need to collect measurements from the user’s browser.

Our JavaScript code reads the measurements from Navigation Timing, and sends them back to Wikipedia.org. This is real-user monitoring.

How to measure: Percentiles

Imagine 9 users each send a request: 5 users get a result in 5ms, 3 users get a result in 70ms, and for one user the result took 560ms. The average is 88ms. But, the average does not match anyone’s real experience. Let’s explore percentiles!

Diagram showing 9 labels: 5ms, 5ms, 5ms, 5ms, 5ms, 70ms, 70ms, 70ms, and 560ms.

The first number after the lower half (or middle) is the median (or 50th percentile). Here, the median is 5ms. The first number after the lower 75% is 70ms (75th percentile). We can say that "for 75% of users, the service responded within 70ms". That’s more useful.

When working on a service used by millions, we focus on the 99th percentile and the highest value (100th percentile). Using medians, or percentiles lower than 99%, would exclude many users. A problem with 1% of requests is a serious problem. To understand why, it is important to understand that, 1% of requests does not mean 1% of page views, or even 1% of users.

A typical Wikipedia pageview makes 20 requests to the server (1 document, 3 stylesheets, 4 scripts, 12 images). A typical user views 3 pages during their session (on average).

This means our problem with 1% of requests, could affect 20% of pageviews (20 requests x 1% = 20% = ⅕). And 60% of users (3 pages x 20 objects x 1% = 60% ≈ ⅔). Even worse, over a long period of time, it is most likely that every user will experience the problem at least once. This is like rolling dice in a game. With a 16% (⅙) chance of rolling a six, if everyone keeps rolling, everyone should get a six eventually.

Real-user variables

The previous section focussed on performance as measured inside our servers. These measurements start when our servers receive a request, and end once we have sent a response. This is back-end performance. In this context, our servers are the back-end, and the user’s device is the front-end.

It takes time for the request to travel from the user’s device to our systems (through cellular or WiFi radio waves, and through wires.) It also takes time for our response to travel back over similar networks to the user’s device. Once there, it takes even more time for the device’s operating system and browser to process and display the information. Measuring this is part of front-end performance.

Differences in back-end performance may affect all users. But, differences in front-end performance are influenced by factors we don’t control. Such as network quality, device hardware capability, browser, browser version, and more.

Even when we make no changes, the front-end measurements do change. Possible causes:

  • Network. ISPs and mobile network carriers can make changes that affect network performance. Existing users may switch carriers. New users come online with a different choice distribution of carrier than current users.
  • Device. Operating system and browser vendors release upgrades that may affect page load performance. Existing users may switch browsers. New users may choose browsers or devices differently than current users.
  • Content change. Especially for Wikipedia, the composition of an article may change at any moment.
  • Content choice. Trends in news or social media may cause a shift towards different (kinds of) pages.
  • Device choice. Users that own multiple devices may choose a different device to view the (same) content.

The most likely cause for a sudden change in metrics is ourselves. Given our scale, the above factors usually change only for a small number of users at once. Or the change might happen slowly.

Yet, sometimes these external factors do cause a sudden change in metrics.

Case in point: Mobile Safari 9

Shortly after Apple released iOS 9 (in 2015), our global measurements were higher than before. We found this was due to Mobile Safari 9 introducing support for Navigation Timing.

Before this event, our metrics only represented mobile users on Android. With iOS 9, our data increased its scope to include Mobile Safari.

iOS 9, or the networks of iOS 9 users, were not significantly faster or slower than Android’s. The iOS upgrade affected our metrics because we now include an extra 15% of users – those on Mobile Safari.

Where desktop latency is around 330ms; mobile latency is around 520ms. Having more metrics from mobile, skewed the global metrics toward that category.

Line graph for responseStart metric from desktop pageviews. Values range from 250ms to 450ms. Averaging around 330ms.
Line graph for responseStart metric from mobile pageviews. Values range from 350ms to 700ms. Averaging around 520ms.

The above graphs plot the "75th percentile" of responseStart for desktop and mobile (from November 2015). We combine these metrics into one data point for each minute. The above graphs show data for one month. There is only enough space on the screen to have each point represent 3 hours. This works by taking the mean average of the per-minute values within each 3 hour block. While this provides a rough impression, this graph does not show the 75th percentile for November 2015. The next section explains why.

Average of percentiles

Opinions vary on how bad it is to take the average of percentiles over time. But one thing is clear: The average of many 1-minute percentiles is not the percentile for those minutes. Every minute is different, and the number of values also varies each minute. To get the percentile for one hour, we need all values from that hour, not the percentile summary from each minute.

Below is an example with values from three minutes of time. Each value is the response time for one request. Within each minute, the values sort from low to high.

Diagram with four sections. Section One is for the minute 08:00 to 08:01, it has nine values with the middle value of 5ms marked as the median. Section Two is for 08:01 to 08:02 and contains five values, the median is 560ms. Section Three is 08:02 to 08:03, contains five values, the median of Section Three is 70ms. The last section, Section Four, is the combined diagram from 08:00 to 08:03 showing all nineteen values. The median is 70ms.

The average of the three separate medians is 211ms. This is the result of (5 + 560 + 70) / 3. The actual median of these values combined, is 70ms.


To compute the percentile over a large period, we must have all original values. But, it’s not efficient to store data about every visit to Wikipedia for a long time. We could not quickly compute percentiles either.

A different way of summarising data is by using buckets. We can create one bucket for each range of values. Then, when we process a time value, we only increment the counter for that bucket. When using a bucket in this way, it is also called a histogram bin.

Let’s process the same example values as before, but this time using buckets.

There are four buckets. Bucket A is for values below 11ms. Bucket B is for 11ms to 100ms. Bucket C is for 101ms to 1000ms. And Bucket D is for values above 1000ms. For each of the 19 values, we find the associated bucket and increase its counter.

After processing all values, the counters are as follows. Bucket A holds 9, Bucket B holds 4, Bucket C holds 6, and Bucket D holds 0.

Based on the total count (19) we know that the median (10th value) must be in bucket B, because bucket B contains values 10 to 13. And that the 75th percentile (15th value) must be in bucket C because it contains values 14 to 19.

We cannot know the exact millisecond value of the median, but we know the median must be between 11ms and 100ms. (This matches our previous calculation, which produced 70ms.)

When we use exact percentiles, our goal was for that percentile to be a certain number. For example, if our 75th percentile today is 560ms, this means for 75% of users a response takes 560ms or less. Our goal could be to reduce the 75th percentile to below 500ms.

When using buckets, goals are defined differently. In our example, 6 out of 19 responses (32%) are above 100ms (bucket C and D), and 13 of 19 (68%) are below 100ms (bucket A and B). Our goal could be to reduce the percentage of responses above 100ms. Or the opposite, to increase the percentage of responses within 100ms.

Rise of mobile

Traffic trends are generally moving towards mobile. In fact, April 2017 was the first month where Wikimedia mobile pageviews reached 50% of all Wikimedia pageviews. And after June 2017, mobile traffic has stayed above 50%.

Bar chart showing percentages of mobile and desktop pageviews for each month in 2017. They mostly swing equal at around 50%. Looking closely, we see mobile first reaches 51% in April. In May it was below 50% again. But for June and every month since then mobile has remained above 50%. The peak was in October 2017, where mobile accounted for 59% of pageviews. The last month in the graph, November 2017 shows 53% of mobile pageviews.

Global changes like this have a big impact on our measurements. This is the kind of change that drives us to rethink how we measure performance, and (more importantly) what we monitor.

In the next post we’ll discuss how we aggregate this data, and which metrics we monitor on our dashboards.

Further reading

by Krinkle (Timo Tijhof) at January 12, 2018 07:11 PM

Wiki Education Foundation

What we’re celebrating on Wikipedia Day

January 15th of this year marks the 17th anniversary of the site’s launch and is thus named Wikipedia Day! This day is celebrated around the world as the birth of the most popular open source encyclopedic resource in the world.

We often hear from students that before completing a Wikipedia assignment, they used Wikipedia almost daily and yet weren’t aware of the mechanisms behind it. There are plenty of questions one might ask oneself when using this encyclopedic resource:

How does information get onto Wikipedia? How is it updated? What information is included? What’s missing? Who are the people engaging in this production of knowledge? Who’s not engaging?

Students in our Classroom Program are confronted with these questions as they learn how to contribute to the online encyclopedia through a Wikipedia assignment. Students must critically evaluate Wikipedia for content gaps and work to correct those gaps. These gaps reveal part of the bias that exists in Wikipedia’s content. Over 80% of the volunteers who actively contribute to and edit English Wikipedia articles are young Western men. They tend to add information aligned with their interests and expertise, unintentionally leaving many areas on Wikipedia underdeveloped. Wiki Education works to bring a diversity of content and editors to Wikipedia. For example, 68% of students in Wiki Education supported courses are women!

A valuable take-away for students after completing a Wikipedia assignment is an increased understanding of digital environments that they encounter every day. What does it mean to participate in digital spaces, to be active producers of knowledge rather than mere consumers, to be held accountable by the Wikipedia community and Wikipedia’s worldwide readership?

Understanding the mechanisms of this resource prepares students to evaluate the accuracy of all information that they encounter online. What does legitimacy look like on Wikipedia? On the internet, in general? By participating in those workings themselves, students come to understand the forces at work in knowledge production–what information is available to the public? Who writes it? Is it trustworthy?

Instructors, academics, and others with information literacy skills can evaluate Wikipedia for accuracy. But no one is born with these skills. We’ve found that if we want students to use Wikipedia the way we do, we must teach them how.

Academia is also recognizing the importance of digital writing. The faculty of York University recently awarded a writing prize to a student for his Wikipedia article, which he created in a Wiki Education supported course. In an interview, the coordinator of the competition, Jon Sufrin, spoke to the importance of writing in a digital space:

“You have to be able to sort through all the available sources, have skills at hyperlinking, and understand how to make use of the web as a dynamic medium. Digital writing isn’t just screen prose, it’s interactive prose. All of these skills are in addition to actually being able to write something.”

Ultimately, this is what it means for students to be adding to Wikipedia:

  • Students improve a resource used daily by millions around the world.
  • Students are familiarized with the politics of knowledge production. They understand the inner-workings of a source they use all the time and are then able to evaluate it for accuracy and gaps. Hurray for digital literacy!
  • Student engagement works to correct the content and gender bias on Wikipedia. Students also make academic scholarship (often restricted behind paywalls) accessible to the public, increasing public knowledge of important topics.

So on Wikipedia Day this year, we’re celebrating the efforts of more than 40,000 students and more than 1,000 instructors that have participated in our programs–and all that they have done to better the resource for everyone.

If you’re in the New York area, there will be a Wikipedia Day celebration and mini-conference on Sunday, January 14, at the Ace Hotel in Manhattan. All are welcome, and it’s free to attend. Community Engagement Manager Ryan McGrady will be moderating a panel on the subject of Wikipedia in education with instructors Rachel Bogan (CUNY Graduate Center, New Jersey Institute of Technology), Jeffrey Keefer (New York University), and Shelly Eversley (Baruch College). For more information about the event see its on-wiki page here.

See how we celebrated last year! Interested in learning more? Visit teach.wikiedu.org or reach out to contact@wikiedu.org with questions.

Image: File:Wikipedia 10 Milano, Lyonora 4178.jpgLeonora Giovanazzi, CC BY-SA 3.0, via Wikimedia Commons.

by Cassidy Villeneuve at January 12, 2018 05:00 PM

Magnus Manske

Playing cards on Twitter

So this happened.

Yesterday, Andy Mabbett asked me on Twitter for a new feature of Reasonator: Twitter cards, for small previews of Wikidata items on Twitter. After some initial hesitation (for technical reasons), I started playing with the idea in a test tweet (and several replies to myself), using Andy as the guinea pig item:

Soon, I was contacted by Erika Herzog, who I did work with before on Wikidata projects:

That seemed like an odd thing to request, but I try to be a nice guy, and if there are some … personal issues between Wikidata users, I have no intention of causing unnecessary upset. So, after some more testing (meanwhile, I had found a Twitter page to do the tests on), I announced the new feature to the world, using what would undoubtedly be a suitable test subject for the Wikipedia/Wikidata folk:

Boy was I wrong:

I basically woke up to this reply. Under-caffeinated, I saw someone tell me what to (not) tweet. Twice. No reason. No explanation. Not a word on why Oprah would be a better choice as a test subject in a tweet about a new feature for a Wikidata-based tool. Just increasing aggressiveness, going from “problematic” to “Ugh” and “Gads” (whatever that is).

Now, I don’t know much about Oprah. All I know is, basically, what I heard characters in U.S. sit-coms say about her, none of which was very flattering. I know she is (was?) a U.S. TV talk show host, and that she recently gave some speech in the #metoo context. I never saw one of her talk shows. She is probably pretty good at what she does. I don’t really care about her, one way or the other. So far, Oprah has been a distinctively unimportant figure in my life.

Now, I was wondering why Erika kept telling me what to (not) tweet, and why it should be Oprah, of all people. But at that time, all I had the energy to muster as a reply was “Really?”. To that, I got a reply with more Jimbo-bashing:

At which point I just had about enough of this particular jewel of conversation to make my morning:

What follows is a long, horrible conversation with Erika (mostly), with me guessing what, exactly, she wants from me. Many tweets down, it turns out that, apparently, her initial tweets were addressing a “representation issue“. At my incredulous question if  she seriously demanded a “women’s quota” for my two original tweets (honestly, I have no idea what else this could be about by now), I am finally identified as the misogynist cause of all women’s peril in the WikiVerse:

Good thing we finally found the problem! And it was right in front of us the whole time! How did we not see this earlier? I am a misogynist pig (no offence to Pigsonthewing)! What else could it be?

Well, I certainly learned my lesson. I now see the error of my ways, and will try to better myself. The next time someone tries to tell me what to (not) tweet, I’ll tell them to bugger off right away.

by Magnus at January 12, 2018 03:39 PM

Wikimedia Foundation

Wikimedia engineer contributes several fonts to Malayalam language

Photo by Victor Grigas, CC BY-SA 3.0.

When the INR 500 currency was released last year in India, it featured a Malayalam font that was engineered by Santhosh Thottingal, a Senior Software Engineering on the Wikimedia Foundation’s Global Collaboration team. Thottingal led the engineering work on the Content Translation tool, released in 2014, and designs and then develops work that helps MediaWiki support hundreds of languages.

But outside of work, Thottingal is known for creating solutions that help languages be better supported by software. For the past decade, he has concentrated on creating high quality fonts for the script that is used for writing his native language Malayalam. Of the <20 fonts that are available for Malayalam, he maintains and/or has engineered about 12 (the others are maintained by organizations including Google, Microsoft, Indian Type Foundry etc.).

We reached out to Thottingal to learn more about creating fonts, and his other projects related to digital access.

What initially sparked your interest in helping Malayalam be more widely supported in digital spaces?

When I was studying at engineering college (2001–2005) I was introduced to the free software philosophy and fascinated by its impact and potential. After my education, I got my first job, and I started working on small free software projects and joined local free software community. It is at that time, I realized that, our computers had so many limitations with Malayalam, my mother tongue. I also found out that the Free Software Foundation of India did have some efforts in this direction, but that they had stalled because they did not get enough developers.

There was a community project named Swathanthra Malayalam Computing, which had also stalled. We decided to take that project forward. Malayalam did not have functional bug free fonts (not only in free software but in proprietary operating systems too). Rendering engines could not handle many complexities of Malayalam script. Input tools were not there. This was the first challenge we had to solve. I don’t have any training with computational linguistics. But I managed to self learn. By 2010, we had worked with various free software upstream projects to build this computing infrastructure for Malayalam. We developed input tools, fixed rendering issues, designed and developed new fonts, and defined and implemented many computational algorithms for Malayalam like collation, hyphenation. And we continued to work on more complex projects. The community project became a larger group of volunteers. We also participated in Google Summer of Code several times. Government programs and IT education helped to deliver these results to Malayalam users.

You write that Malayalam script is relatively new to the digital age. What factors affected the timeline, and how have you noticed Malayalam content changing online in the past decade?

The Malayalam script is one of the most complex scripts in India. It was encoded in Unicode in 2001. It took many years to have its technical infrastructure mature enough for day to day usage — such as input support, fonts, operating system support etc. Malayalam Wikipedia started in 2002. But it was around 2004 when some initial websites appeared on the Internet. Blogs presented the initial boost to the Malayalam content, and then in the past decade it became social media that drove more material.

You maintain the majority of fonts available in Malayalam. I’m wondering how you decide to create a new font, and how you think about the design.

As part of our community project mentioned above, Swathanthra Malayalam Computing, one of the initial challenge was how to fix many rendering issues with the script. I learned opentype technology and helped solve this issue. I started to maintain the technical aspects of the fonts, which is also known as opentype engineering. More fonts were designed and I worked on the technical parts of those fonts. By 2012, I was maintaining a dozen of Malayalam fonts – all widely used in digital spaces, including for government orders, news portals, and every place you can imagine a language would appear — including my own wedding invitation :).

Because of this, unintentionally, I was also closely watching the glyphs of these fonts, its design characteristics and some typographic elements. I started to observe wall writings, street boards, and handwriting more closely.

In 2013, I conducted an experimental attempt with typography to design a handwriting style unicode font for Malayalam. It took two months of my free time, but once released, it immediately became a hit. People loved it so much. Every day I could watch it used in some place when I open my social media. Handwriting font design is easy. Because of the irregularity is its characteristics, it takes less time to design. (Note that it was not my handwriting.)

It was during that time I read about a paper by Raph Levien called “From Spiral to Spline: Optimal Techniques in Interactive Curve Design.” It is about the optimal curve designs. I found that Malayalam script can have its curves defined by spiral splines. With this design theory in place, I ventured into my next font project — a regular font, with three style variants – bold, italic, and regular. This was not an easy project because a regular font require mathematical precisions with typography theories applied. It took 1.5 years of learning, too many reworks, but the result was getting better every day. The font, named Manjari was released in July of 2016. It surpassed all my expectations about acceptance from Malayalam users. Now it is there, everywhere. In newspapers, magazines, street side banners, commercial advertisements, movie titles and a children’s science magazine.

Including my own fonts, I maintain 12 Malayalam fonts. All are very popular, and maintaining these fonts, releasing new versions consume lot of our team’s time. And it is no longer a ‘hobby’ :)

Could you talk a little about the relationship between fonts and digital access?

If you don’t have a functional, bug free font for a script, you can not read the content. Wikipedia started in Malayalam in 2002. These issues with fonts, its availability, and bugs all got fixed to a satisfactory level, but it took more time to really reach to end users. I was active in Malayalam wiki projects and its awareness programs since 2007, and was advocating that Wikipedia should have technology to solve this gap between the content and its ‘effective reading’ by readers. Similarly Wikipedia wouldn’t be able to bring in editors who speak various languages, if those editors didn’t have tools to input in their languages.

I was not alone with making these arguments. In 2011, the Wikimedia Foundation created the Language Engineering Team with developing non-Latin reading and writing tools as its first priority goal. I was hired by that team. Here’s a presentation I gave in 2012 on some of what we accomplished in that first year.

But now, in 2017, not having the right fonts and input methods is not really an issue. These challenges for non-Latin scripts has improved a lot. One thing that Wikipedia has to improve though is its content presentation in languages including English. We are not giving enough attention to the readability by focusing on typography. Different languages have different typography best practices.

What has the reception been?

One major difference between designing and engineering a typeface for English vs. a non-Latin complex script is that the Latin typeface design field is so saturated – It has thousands of fonts in all possible designs you can imagine. It is very difficult to get attention if you design a new font there. But here, in Malayalam, there are only a dozen functional unicode fonts. People are eagerly looking for new fonts. People want to see their favourite writing style in the form of a digital font. So the attention, appreciation (or even criticisms) you get is enormous in comparison.

One of the font I maintain and engineered is Meera – this is the font you see in the Malayalam Wikipedia logo. This is also the same font used for Malayalam script in the INR 500 currency newly designed and released last year in India.

How does this relate and/or help with the work you’ve done at the Wikimedia Foundation leading the work on the the Content Translation tool?

My experience with type designs has no intersection with projects at the Foundation – at least for now :)

My language computing related specialization and learning from my pet projects has helped in all my projects at the Wikimedia Foundation.

What would you like to do next?

I have several active ongoing projects involving natural language processing for Malayalam.

Interview by Melody Kramer, Senior Audience Development Manager, Communications
Wikimedia Foundation

by Melody Kramer at January 12, 2018 02:43 PM

January 11, 2018

Wiki Education Foundation

University of Windsor Seeks Wikipedia Visiting Scholar

Windsor, Ontario incorporated as a small village in 1854. The University of Windsor was founded just three years later. Today, Windsor is a major Canadian city and the school is a large, public university counting more than 15,000 students in 255 undergraduate and graduate degree programs.

Leddy Library at the University of Windsor.
Image: File:Leddy Library at the University of Windsor on July 2010.jpgAlam1s

With 160 years of history, the university’s library has accumulated a wealth of resources about the history of Canada in general and the region in particular. That’s why I’m pleased to announce an opportunity for a Wikipedia editor to gain remote access to those materials for use in improving articles about the history of southwestern Ontario. The Wikipedia Visiting Scholar will receive a login to access the library’s full suite of digital resources, including databases, ebooks, and digitized collections.

Something we love about the Visiting Scholars program is the way it empowers passionate people to fill content gaps and improve public knowledge about topics otherwise underrepresented on Wikipedia. Wikipedia has developed an incredible amount of high-quality content on many subjects, but it’s much stronger in some areas than others. As the product of volunteers, the content in many ways reflects the interests and experiences of the predominantly white, male, English-speaking people who write it.

One of the library’s strengths that the Visiting Scholar could take advantage of is the numerous materials it has about the region’s First Nations people. In particular, the university sits on the Three Fires Confederacy of First Nations, comprised of the Ojibway, the Odawa, and the Potawatomie.

Visiting Scholars at the University of Windsor is coordinated through the Centre for Digital Scholarship, which brings together the digital services offered by the Leddy Library to support its students, faculty, and staff. It develops and curates research and archival tools and is also an active publisher of academic journals, monographs, and conference proceedings.

If you’re a Wikipedian with an interest in the history of southwestern Ontario, or if you just want to learn more about being a Visiting Scholar, visit the Visiting Scholars section of our website here.

Image: File:University of Windsor campus on August 2006.jpgMikerussellGNU Free Documentation, via Wikimedia Commons. 

by Ryan McGrady at January 11, 2018 05:22 PM

Jeroen De Dauw

Generic Entity handling code

In this blog post I outline my thinking on sharing code that deals with different types of Entities in your domain. We’ll cover what Entities are, code reuse strategies, pitfalls such as Shotgun Surgery and Anemic Domain Models and finally Bounded Contexts.

Why I wrote this post

I work at Wikimedia Deutschland, where amongst other things, we are working on a software called Wikibase, which is what powers the Wikidata project. We have a dedicated team for this software, called the Wikidata team, which I am not part of. As an outsider that is somewhat familiar with the Wikibase codebase, I came across a writeup of a perceived problem in this codebase and a pair of possible solutions. I happen to disagree with what the actual problem is, and as a consequence also the solutions. Since explaining why I think that takes a lot of general (non-Wikibase specific) explanation, I decided to write a blog post.

DDD Entities

Let’s start with defining what an Entity is. Entities are a tactical Domain Driven Design pattern. They are things that can change over time and are compared by identity rather than by value, unlike Value Objects, which do not have an identity.

Wikibase has objects which are conceptually such Entities, though are implemented … oddly from a DDD perspective. In the above excerpt, the word entity, is confusingly, not referring to the DDD concept. Instead, the Wikibase domain has a concept called Entity, implemented by an abstract class with the same name, and derived from by specific types of Entities, i.e. Item and Property. Those are the objects that are conceptually DDD Entities, yet diverge from what a DDD Entity looks like.

Entities normally contain domain logic (the lack of this is called an Anemic Domain Model), and don’t have setters. The lack of setters does not mean they are immutable, it’s just that actions are performed through methods in the domain language (see Ubiquitous Language). For instance “confirmBooked()” and “cancel()” instead of “setStatus()”.

The perceived problem

What follows is an excerpt from a document aimed at figuring out how to best construct entities in Wikibase:

Some entity types have required fields:

  • Properties require a data type
  • Lexemes require a language and a lexical category (both ItemIds)
  • Forms require a grammatical feature (an ItemId)

The ID field is required by all entities. This is less problematic however, since the ID can be constructed and treated the same way for all kinds of entities. Furthermore, the ID can never change, while other required fields could be modified by an edit (even a property’s data type can be changed using a maintenance script).

The fact that Properties require the data type ID to be provided to the constructor is problematic in the current code, as evidenced in EditEntity::clearEntity:

// FIXME how to avoid special case handling here?
if ( $entity instanceof Property ) {
  /** @var Property $newEntity */
  $newEntity->setDataTypeId( $entity->getDataTypeId() );

…as well as in EditEntity::modifyEntity():

// if we create a new property, make sure we set the datatype
if ( !$exists && $entity instanceof Property ) {
  if ( !isset( $data['datatype'] ) ) {
     $this->errorReporter->dieError( 'No datatype given', 'param-illegal' );
  } elseif ( !in_array( $data['datatype'], $this->propertyDataTypes ) ) {
     $this->errorReporter->dieError( 'Invalid datatype given', 'param-illegal' );
  } else {
     $entity->setDataTypeId( $data['datatype'] );

Such special case handling will not be possible for entity types defined in extensions.

It is very natural for (DDD) Entities to have required fields. That is not a problem in itself. For examples you can look at our Fundraising software.

So what is the problem really?

Generic vs specific entity handling code

Normally when you have a (DDD) Entity, say a Donation, you also have dedicated code that deals with those Donation objects. If you have another entity, say MembershipApplication, you will have other code that deals with it.

If the code handling Donation and the code handing MembershipApplication is very similar, there might be an opportunity to share things via composition. One should be very careful to not do this for things that happen to be the same but are conceptually different, and might thus change differently in the future. It’s very easy to add a lot of complexity and coupling by extracting small bits of what would otherwise be two sets of simple and easy to maintain code. This is a topic worthy of its own blog post, and indeed, I might publish one titled The Fallacy of DRY in the near future.

This sharing via composition is not really visible “from outside” of the involved services, except for the code that constructs them. If you have a DonationRepository and a MembershipRepository interface, they will look the same if their implementations share something, or not. Repositories might share cross cutting concerns such as logging. Logging is not something you want to do in your repository implementations themselves, but you can easily create simple logging decorators. A LoggingDonationRepostory and LoggingMembershipRepository could both depend on the same Logger class (or interface more  likely), and thus be sharing code via composition. In the end, the DonationRepository still just deals with Donation objects, the MembershipRepository still just deals with Membership objects, and both remain completely decoupled from each other.

In the Wikibase codebase there is an attempt at code reuse by having services that can deal with all types of Entities. Phrased like this it sounds nice. From the perspective of the user of the service, things are great at first glance. Thing is, those services then are forced to actually deal with all types of Entities, which almost guarantees greater complexity than having dedicated services that focus on a single entity.

If your Donation and MembershipApplication entities both implement Foobarable and you have a FoobarExecution service that operates on Foobarable instances, that is entirely fine. Things get dodgy when your Entities don’t always share the things your service needs, and the service ends up getting instances of object, or perhaps some minimal EntityInterface type.

In those cases the service can add a bunch of “if has method doFoobar, call it with these arguments” logic. Or perhaps you’re checking against an interface instead of method, though this is by and large the same. This approach leads to Shotgun Surgery. It is particularly bad if you have a general service. If your service is really only about the doFoobar method, then at least you won’t need to poke at it when a new Entity is added to the system that has nothing to do with the Foobar concept. If the service on the other hands needs to fully save something or send an email with a summary of the data, each new Entity type will force you to change your service.

The “if doFoobar exists” approach does not work if you want plugins to your system to be able to use your generic services with their own types of Entities. To enable that, and avoid the Shotgun Surgery, your general service can delegate to specific ones. For instance, you can have an EntityRepository service with a save method that takes an EntityInterface. In it’s constructor it would take an array of specific repositories, i.e. a DonationRepository and a MembershipRepository. In its save method it would loop through these specific repositories and somehow determine which one to use. Perhaps they would have a canHandle method that takes an EntityInterface, or perhaps EntityInterface has a getType method that returns a string that is also used as keys in the array of specific repositories. Once the right one is found, the EntitiyInterface instance is handed over to its save method.

interface Repository {
    public function save( EntityInterface $entity );
    public function canHandle( EntityInterface $entity ): bool;

class DonationRepository implements Repository { /**/ }
class MembershipRepository implements Repository { /**/ }

class GenericEntityRepository {
     * @var Repository[] $repositories
    public function __construct( array $repositories ) {
        $this->repositories = $repositories;

    public function save( EntityInterface $entity ) {
        foreach ( $this->repositories as $repository ) {
            if ( $repository->canHandle( $entity ) ) {
                $repository->save( $entity );

This delegation approach is sane enough from a OO perspective. It does however involve specific repositories, which begs the question of why you are creating a general one in the first place. If there is no compelling reason to create the general one, just stick to specific ones and save yourself all this not needed complexity and vagueness.

In Wikibase there is a generic web API endpoint for creating new entities. The users provide a pile of information via JSON or a bunch of parameters, which includes the type of Entity they are trying to create. If you have this type of functionality, you are forced to deal with this in some way, and probably want to go with the delegation approach. To me having such an API endpoint is very questionable, with dedicated endpoints being the simpler solution for everyone involved.

To wrap this up: dedicated entity handling code is much simpler than generic code, making it easier to write, use, understand and modify. Code reuse, where warranted, is possible via composition inside of implementations without changing the interfaces of services. Generic entity handling code is almost always a bad choice.

On top of what I already outlined, there is another big issue you can run into when creating generic entity handling code like is done in Wikibase.

Bounded Contexts

Bounded Contexts are a key strategic concept from Domain Driven Design. They are key in the sense that if you don’t apply them in your project, you cannot effectively apply tactical patterns such as Entities and Value Objects, and are not really doing DDD at all.

“Strategy without tactics is the slowest route to victory. Tactics without strategy are the noise before defeat.” — Sun Tzu

Bounded Contexts allow you to segregate your domain models, ideally having a Bounded Context per subdomain. A detailed explanation and motivation of this pattern is out of scope for this post, but suffice to say is that Bounded Contexts allow for simplification and thus make it easier to write and maintain code. For more information I can recommend Domain-Driven Design Destilled.

In case of Wikibase there are likely a dozen or so relevant subdomains. While I did not do the analysis to create a comprehensive picture of which subdomains there are, which types they have, and which Bounded Contexts would make sense, a few easily stand out.

There is the so-called core Wikibase software, which was created for Wikidata.org, and deals with structured data for Wikipedia. It has two types of Entities (both in the Wikibase and in the DDD sense): Item and Property. Then there is (planned) functionality for Wiktionary, which will be structured dictionary data, and for Wikimedia Commons, which will be structured media data. These are two separate subdomains, and thus each deserve their own Bounded Context. This means having no code and no conceptual dependencies on each other or the existing Big Ball of Mud type “Bounded Context” in the Wikibase core software.


When standard approaches are followed, Entities can easily have required fields and optional fields. Creating generic code that deals with different types of entities is very suspect and can easily lead to great complexity and brittle code, as seen in Wikibase. It is also a road to not separating concepts properly, which is particularly bad when crossing subdomain boundaries.

by Jeroen at January 11, 2018 02:40 PM

Wikimedia Foundation

How I make video ‘newsreels’ for social media—so you can too

This is a ‘newsreel’ designed to play for social media. It’s short and you can watch it with or without audio and still get the message.

Editor’s note: This blog post makes extensive use of WebM videos, a media file format that as of publishing time is not compatible with Microsoft Edge, Internet Explorer, or Safari. Please try Mozilla Firefox instead.

A few years ago, Facebook, Twitter and other social media channels tweaked their user interaction and enabled video that auto-plays on your social media feed with the sound off. The result influenced video production globally and revived the 100+ year-old ‘silent newsreel’. People all over the world are doing amazing, selfless work with Wikimedia projects and a well-made video newsreel can document these efforts and inspire others to be involved.

If you want to be part of the new Inspire New Readers campaign to raise awareness about Wikipedia where you live or if you want to make videos for other purposes, as a video producer and storyteller for the Wikimedia Foundation I’ve produced a few of these types of videos, and I wanted to share some practical advice on video production with the greater Wikimedia movement.

I’ve found:

  • You can make these types of newsreels using still images and video from Wikimedia Commons and even footage from your cellphone.
  • It needs to be short, because in all likelihood your audience’s thumb is ready to swipe up to the next thing.
  • It needs to be understood with or without the audio on.
  • If you make a version without titles or text on screen, Wikimedians can translate it into their language for their audiences.


So, this is what a 100-year-old newsreel looks like:

This is a silent newsreel from 1918 made by the great early film director Dziga Vertov.

You can see stuff happening with people in it, and then you see text that explains the stuff you saw (in this case you need to speak Russian to understand it). It was made to play in theaters. Today’s video editing software makes it relatively easy to imitate this format using digital video and photography. Here’s how:

Part 1: Write your titles

This is a ‘silent newsreel’ I made about the value and purpose of the public domain. It uses ‘innertitles,’ or titles over moving or still backgrounds. I wrote text first, and then found images that I liked to fit the message of the text.

Titles or intertitles are text that the audience sees on screen. When I set out to make a newsreel, I usually write this text first and then that functions as my script that I can use to narrow what kinds of imagery, sounds or music I may want to use. This sounds easy, but to make it good can take some time. What are you trying to communicate? What’s your topic? Draft what you want to say. It’s usually a good idea to answer the who, what, when, where, why and how of a topic or event. Keep in mind that if you plan on doing on-camera interviews, the interviewees can address these questions with their answers, so as you write, you can make a note to ask the right questions of people to give you the audio you may want. Then you can use a mix of titles and sound bites to communicate your message.

It’s also a good idea to also think about who your audience is and try to be specific; don’t say ‘everybody’. If you are addressing students, you may want to talk about classrooms, the school-year, teachers and so on. If you are talking to doctors, you may want to talk about medicine, patients and so on. The more specific you can make your audience, the better focused your messaging will likely be.

For the actual text, generally I find that to be able to read the text on your phone, it has to be BIG FAT TEXT and that means that you have to write little skinny sentences. It’s kind of like writing one or two haikus. Usually I end up with maybe three or four short sentences of text to put on screen, broken up into parts of the sentence. Sometimes you have to chop the sentences in half and let the audience read the first half of the text before you show the next half. This gives you the opportunity to show half a sentence with one image and then swap to another image and show the second half of the sentence. This notion gives you a way to think about how you may want to write, and what in what order you may want to show things to your audience. Usually there’s a call to action (like the link to a website) at the end of the video, or at the bottom of the whole video.

Part 2: Music

Now you should find some music. You don’t need to use music for these types of videos, but it’s nice to have. If you are a musician, you can use music you’ve composed and recorded, otherwise you need to find some. Try to find instrumental music that you would be comfortable hearing over and over (while you edit your newsreel). I have used all these sites to find media that’s public domain, CC0, CC-BY or CC by-SA so that it’s free to remix and is compatible with Wikimedia projects:

You can also use this music that was made by Andy R. Jordan that I commissioned specifically for Wikimedia newsreels. There are 3060 and 90 second versions designed to be looped and faded into each other if needed.

I should note that there is about ten times more music available under non-commercial or non-derivative licenses, but those are not compatible with Wikimedia projects. I say this now so you don’t fall in love with a bit of music that you can’t use.

This newsreel uses text and a few images with a few video filters to make the point that images can be remixed. Note that the text and images are paced to the beat of the music, which must be done carefully so that everything appears in sync. If a few frames are off, it can feel off.

Part 3: Find or record your media

Look at your script-copy-titles. What visual media would illustrate what you want to communicate? Are you talking about an event? An abstract idea? Look at the copy you wrote, and that should give you an idea of what kinds of media you may want to search for or create, if any. You can make a video that uses only text:

This is a newsreel that uses only text. This type of video can be cheap and fast to produce. The music was reused from a previous production.

If you use still images, when you edit it can be useful to animate them so that the audience knows that the video is playing and not frozen:

This video for Wiki Loves Monuments is made of still images, music and short sentences of text that is placed over the images.

If you decide to shoot video on your own, here are my crash-course suggestions:

  • Get permission from locations and people to record them.
  • Try to keep the camera or phone steady, and hold the camera on a subject or scene five seconds longer than you think you need to at both the beginning and end of recording. This makes it easier to edit later.
  • If you record an interview, record it in a quiet room, and use a lapel microphone. Good audio is more important than good video if you intend to use audio from people speaking.
  • If you record a presentation where someone is speaking into a microphone (like at a podium), there may be an audio mixing board that you can plug into if you have audio cables for it.
  • Use headphones.
  • Avoid recording in places with copyrighted material in the background. This includes music or audio that can be heard or television or billboard advertising.
  • Make sure you record b-roll (extra footage of the event) that illustrates the setting, inside and outside. Example of event b-roll here.

20 minutes of B-roll of Wikimedia servers. You can also find this footage on Vimeo. I edited this footage into a video to solicit donations for the Wikimedia Foundation, but could have used it for just about any purpose at all.

This is an example of a ‘silent newsreel’ that uses live footage with interviews.

This Spanish-language newsreel uses only b-roll, music and no interviews.

As for cameras, I’ll note that I’m only aware of three cameras (Canon 5D Mark IV and the no-longer-manufactured Nikon D90 and Pentax K-7) that shoot video natively in an open-source format, the Motion JPEG. In theory, this video container could be enabled on Wikimedia Commons (there’s a phabricator ticket for it), and then footage could be uploaded from these cameras directly to Wikimedia Commons without any conversion (more on that below), patent licensing fees or loss of image fidelity.

If you shoot on your cellphone, I’d recommend buying a cellphone lens kit so that you have a range of lenses to better capture your environment. Today these lenses are very inexpensive and clip-on to almost any phone.

To find video media to repurpose for a Wikimedia newsreel, I’d recommend looking in the following places:

  • https://commons.wikimedia.org/
    • to find a video under a particular subject, type ‘filetype:video’ then a space, then type your subject.
  • https://www.youtube.com/– click ‘Filter’ in the top right corner and then sort by ‘creative commons’. Be careful though, there is a lot of copyright infringement on YouTube.
  • https://vimeo.com/– You can search by CC-by, CC by-SA, and CC0. Some videos are easy to download too.
  • https://www.flickr.com/– You can search by license under ‘commercial use and mods allowed’, ‘U.S. Government works’ and ‘no known copyright restrictions’.

You can also use the puzzle globe logo (with appropriate attribution, of course). See this on Commons.

Part 4: Grants and equipment

Once you have your script roughly developed and have an idea of what music and other footage you may want to record or re-purpose, you may want to consider sharing it on IdeaLab. Fellow Wikimedians might give you feedback and help make it better. Beyond that, at any point in the year, you can apply to a rapid grant, which may be able to fund particular aspects of your video up to $2000 USD. This might be particularly useful if you are inexperienced with video production. A grant may be able to pay for transportation, equipment rental, professional help with shooting and editing video, food or other production costs. You just need to be ready to share a distribution plan for your video—who is your audience and what results would you consider success based on what you are trying to do with your video? The grants team will probably also try to gauge your skill level with video too, to better evaluate what your needs might be. If you have questions, feel free to send an email to rapidgrants[at]wikimedia[dot]org.

As for equipment rental, every major city usually has at least one camera rental house in it. This is a great place to ask questions and learn because they make money if you rent gear and you’ll only rent gear that you know how to use, so they have an economic interest in advising you. They aren’t schools, and are used to dealing with professionals, so it is best to share the experience level you are at so they can better serve you. There are also many online equipment rental companies that ship equipment.

Always test the gear in the rental house. You don’t want to rent defective gear, and you can usually experiment there at the checkout counter before you rent without spending any money. If you come with a plan in place for your production, you’ll be able to share it with the rental company and they may be able to recommend particular equipment for you. You will probably need a credit card for a deposit.

Part 5: Conversion

Wikimedia Commons uses only open-source containers for video. If you want to stay open source, even though it’s not quite a professional editing system, I’d recommend using Openshot especially because it can edit in the open-source WebM format (which is the dominant video container on Wikimedia Commons today). Otherwise you’ll have to convert the video you make into WebM or another free format if you want to share it onto Wikimedia Commons. The workflow I recommend once you’ve finished editing (more on that below) is uploading your video to YouTube.com or Vimeo.com, checking the right open license and then using the video2commons tool to migrate it to Wikimedia Commons.

Part 6: Editing

Editing can be complicated if you are inexperienced, so I’m going to speak generally about it. You will need to learn how to use editing software. First you should pick video editing software you may be familiar with or comfortable with. Some are open source, but most are not. When you start your edit, you have to decide the frame rate and aspect ratio of your final video. Frame rate describes the frames per second that the viewer will see. This is what creates the feeling of motion in the video. Aspect ratio describes the shape of the rectangle of your video (I’ll note that while my examples here are horizontal, many video newsreels today are square or vertical for the social media feed). Your resolution is how sharp (how many pixels) your image will be. It’s best if the majority of your footage that’s already shot that you plan to edit and the final video are the same frame rate and aspect ratio. If you have to convert your frame rates, the final result may look ‘choppy’ and not smooth, especially if there are fast moving objects in your video (more on this below). I like to edit in 24p or 60p frame rates depending on the production, but depending on where you are and the cameras and footage you are using you may want to use other frame rates.

This was made using footage sent to me by volunteers. All the footage was shot using different phones or cameras and all used different frame rates. I had to choose one frame rate and let the editing software convert (or render) that footage into the final video.

As for editing your content together as a narrative, story or thesis, you have maybe four things you can use to communicate: on-screen text, visual imagery, audio (things like music and sound effects which affect mood and setting) and dialogue (things people say). What order for each of these works best? Look at your script and that that be your guide. Generally I find that you can put everything into an editing timeline and then just start trimming it all down and mixing it around. Start at the beginning with something that feels right. When you play it back, it’s a good rule of thumb to read any text you put on screen aloud, as a way to measure how long it can take an audience to read it and understand it when they read it for the first time.

Part 7: Fonts

You have a high chance that people will be watching your video on their phone, and that the sound will be off. It helps to use FAT BOLD FONTS. I use the open source fonts OpenSansMontserrat, or Linux Libertine. If there is text in the video already (like text on a newspaper or a computer screen that someone in the shot is looking at) that ‘competes’ with the titles you are creating, italicize the titles you use. You can also use dropshadows or other graphics to separate the text from the background. I play with the kerning and spacing of my fonts to get them to look bigger—but not so that they are illegible.

Part 8: Credits and attribution

This part is painful and important. You have to give all the appropriate credit to all the media you’ve used. Patience is key. Just go through your video, find each file that you use and write in a document:

  1. Name of the file
  2. The author
  3. The license
  4. The link (sometimes optional)
  5. A note if you modified the original

Add this to the credits at the end of your video. Don’t forget to add your name and license to the new derivative work you’ve made and to credit anyone else who may have contributed to your video in any meaningful way. Double-check everything. If you’re using public domain or CC0 works, you don’t need to give any attribution, but you might still want all the info so you have evidence that the work is actually public domain or CC0.

You can also add links to the original media files that you found on commons in the metadata under the video. This makes it easy for others to find the media should they want to use it too (more on this below). Whatever you do, don’t use other works that you are not allowed to use.

Part 9: Subtitles and captions

Keep in mind, for this type of video we have to assume that the audience may or may not have the audio turned on on their phone, so if there is information conveyed by someone speaking, there needs to be text there to communicate that.

This video uses burned-in subtitles when people talk (dialogue).

Subtitles and captions are there to as a way for people to read what dialogue is being spoken in a video. Each has it’s own function. The difference between subtitles and captions are that captions are metadata that can be turned on and off. Subtitles are burned into the video and once created, cannot be removed from that version of the video. It’s like once you bake a cake, you can’t take the eggs out. I like to use subtitles along with intertitles because both give me a way to describe the diagetic and non-diagetic aspects of the video—the things that are actually happening in the scene versus the text I’m putting on top of it as an interpretation of that scene. It makes a clear distinction between the two. I use bold yellow subtitles with a blurred dropshadow.

I’ll say that even though I use them, the problem with subtitles (and burned-in intertitles too) is that they can’t translate well. There’s more below about this, but I’ll say that captions can be a good way to make a video that could have dialogue or text translated by the Wikimedia community. Captions are really just a text document that says ‘show this text from this time to this time’ when playing a video. If you do make captions, I highly recommend Amara.org to make your captions after you’ve published your video on a platform like YouTube.com. Once you have captions, you can add them to your video on Wikimedia Commons by clicking the ‘cc’ button and following the links from there. If you want to translate the captions, you can copy the original text with the timecode, paste it into another language within the ‘cc’ menu and translate the text. I’ll note one neat feature of captions on Wikimedia projects is that the text that is shown when the video plays can be made into a hyperlink using normal Wikitext.

Part 10: Forking

Because intertitles and subtitles are burned-into the final newsreels I make, that locks in the language of the videos I make (in most cases that’s English). So, I’ve learned to export a version of the videos without any text except for the credits and release that version so that others can fork (remix) a copy into their own language by watching the English version and translating the text.

Here is an example of the Wiki Loves Monuments video, slightly different from the one above:

This is a version with no titles, it is not intended to be viewed by the public.

This version is not intended to be viewed by the public. This particular video has been forked into Georgian, BasqueSpanishUkrainian, and Greek.

For the purpose of illustrating a point I made before about why it’s important to use the same frame rate and aspect ratio of the original video you plan to edit, I’m going to show you the Georgian fork of this video:

This version was made with Georgian text, but the frame rate and aspect ratios were not well chosen.

As you can see, the fidelity—that is, how accurately the video has been forked—is poor. The video still gets the basic message across, but probably without the same feeling. It’s very easy to make mistakes in this area. It’s important to experiment early on with your edits to make sure you have chosen the right frame rate, aspect ratio and resolution.

After I have both a version with and without burned-in text, I make sure that both versions link to each other in the metadata under each video. That way it’s somewhat easy to track versions that I or others may create.

I’ll note that you could also add a transcript of your video to the metadata on commons to help others to translate.

Part 11: Keep your original footage and editing files

You never know if you may want to access the original footage again. Maybe you made a mistake somewhere and need to update some small part of your video or maybe you want to make a new version of your video, so I’d recommend that you hold onto all your original footage. Ideally you should make backups too.


Below are a few more examples of silent newsreels I’ve produced for the Wikimedia Foundation. I hope you can learn from watching them what you might need to make one for yourself.

This newsreel is about Wiki Indaba. There are only a few moments when people speak, and most of the information is communicated by intertitles.

The 2016 Wikimedia Hackathon in Jerusalem. The video shooter was given a short list of questions to ask everyone: Who are you? Where are you from? What is a Hackathon and what are you working on? Covering the who, what, when, where why and how of the event. They interviewed about 6-7 people and then the footage was edited for the best parts of the interviews.

This is a newsreel of a meetup in Baghdad, Iraq to teach how to edit Wikipedia. A version was made in both Arabic and English.

Victor Grigas, Video Production Manager and Storyteller, Communications
Wikimedia Foundation

All videos in this post were produced by Victor Grigas/Wikimedia Foundation, and are freely licensed under CC BY-SA 3.0 and 4.0. Some images used within the videos may have different licenses. See their file pages on Wikimedia Commons for more.

by Victor Grigas at January 11, 2018 02:32 PM

January 10, 2018

Wikimedia Foundation

On the year where “a very fundamental human right—the right to access information” was challenged: Raju Narisetti, Wikimedia Foundation Board member

Photo by Niccolò Caranti, CC BY-SA 4.0.

Raju Narisetti, a veteran media executive and journalist, joined the Wikimedia Foundation Board of Trustees in October 2017. Narisetti is currently CEO of Univision Communications Inc’s Gizmodo Media Group, the publisher of Gizmodo, Jezebel, Lifehacker, The Root, and others, and spends a lot of time thinking about how to reach new audiences and the changing media landscape. We recently asked him about some of the challenges facing information and access in the coming years. Our conversation is below.


You recently said, “There has never been more urgency in Wikipedia’s 16-year history than now, for upholding the values of free exchange of information and knowledge.” Would you be able to further expand on your thoughts and the topics mentioned in this quote?

To me, it is increasingly obvious that we will look back at 2017 as a year where there were systematic and sustained challenges worldwide to a very fundamental human right—the right to access information.

Couple that with the epidemic of false or fake news that we are encountering; the growing challenges of hundreds of millions of people “living” inside seemingly closed information loops; and the relentless exploitation of data being collected passively and actively—and we have the makings of a perfect storm of misinformation, leading to unrest and even potential conflict.

In Wikipedia and the global community that underpins that living repository, we have a proven antidote to many of these problems. So, the onus is all of us, particularly at the Foundation, to make sure we are in a position to be that resource today, tomorrow, and for years to come, because this issue is not going away anytime soon.


Why did you join the Wikimedia Foundation’s Board of Trustees?

Can I confess and admit that it was combination of a big dollop of guilt coupled with a lot of excitement? The guilt was really from years of being a consumer of Wikimedia and not contributing to it.

So, when the opportunity to join Wikimedia Foundation’s Board of Trustees was presented, it felt like a way for me to start contributing to the continued success of the community. The anticipation and excitement was around the challenges we face—in reaching new, younger audiences; deepening reader and contributor ties in more countries around world, including many non-English language-centric nations; in making sure we stay ahead of mobile and other technologies that significantly influence and impact how information is created and distributed. These are all topics where, in nearly three decades of working in journalism across three continents, I have substantial experience, even if I don’t have all the answers.

And I hope I can be of help as the Foundation looks to support the movement in multiple ways and in early stages of a longer term strategic vision and roadmap that makes Wikipedia the bedrock platform for free knowledge. I am in a learning mode and already in constant awe of the durability and mission-driven values of the global Wikipedia community, and the ever-growing scale of what we collectively create and disseminate.


What is, in your view, Wikipedia’s greatest challenge in coming years?

To have the resources and ability to stay free, and to be the place where we willingly put in information and knowledge, and take out a greater understanding, in equal measure. This will require providing the Wikipedia community with the resources and air-cover needed to continue that primary mission, without fear or favor.


In the US, as in many other parts of the world, there is an ongoing discussion about journalism’s future in an ecosystem that’s increasingly dependent on platforms for distribution. Many of the models have been upended in recent years, particularly for publications that have largely been dependent on advertising dollars. How has this changed the media landscape in the US and in the rest of the world?

In many ways, some of these challenges are not very different from some of the challenges we face at Wikipedia. In the media industry, it is around a fundamental change to how we have behaved for over a century—moving from being trusted information gatekeepers to trusted information “gate-openers.”

Thanks to digital technologies, our previously ‘captive’ audiences are now a lot more promiscuous because, with the click of a mouse, the swipe of a finger and increasingly by talking to our devices, we can often go to myriad sources anywhere in the world. As a result, news organizations can no longer simply rely on people coming to their platforms—in print or digital—and now have to go where audiences are, whether it is a social media platform or larger ecosystems such as chat apps, which are increasingly becoming a “closed” proxy for the open internet. That fundamentally alters the business model, one that was previously based on “monetizing” audiences who come to the news brand’s platform. And that has been the existential challenge for media companies in figuring out how to continue to fund the creation of relevant and useful journalism.

For us at Wikimedia, it is also a challenge as how do we take Wikipedia to corners of the digital world instead of always assuming audiences can come to us through search or directly. This will be a major work-in-progress because we do want to make sure we are exposing millions of potential Wikipedia consumers—and hopefully, contributors—to our offerings where and how they use the internet.

Interview by Melody Kramer, Senior Audience Development Manager, Communications
Wikimedia Foundation

The title of this piece has been updated.

by Melody Kramer at January 10, 2018 08:08 PM

Wiki Education Foundation

Help students do an assignment that’s out of this world

This week, we’re at the American Astronomical Society’s (AAS) winter meeting in Washington, DC. We’re encouraging astronomers to engage their students in writing Wikipedia articles for a classroom assignment. By producing Wikipedia content, students begin to learn the nuts and bolts of the online encyclopedia, which teaches them how to use it productively. The assignment engages students in an active learning environment, motivating them to document scientific knowledge that will inform the rest of the world.

Astronomy students should learn how to improve Wikipedia’s coverage of the topic because astronomy is one of the most publicly accessible sciences. Growing up, children often have telescopes and take field trips to planetariums, or they join their peers and family in watching meteor showers or eclipses. When these momentous occasions hit the news, children and adults alike want more information. Wikipedia is where they hope to find the answers they’re looking for. If Wikipedia’s articles are incomplete or written in dense, jargon-filled language, they may not find those answers.

Since attending the AAS meeting in June 2016, Wiki Education has supported several astronomy courses. Those students in our Classroom Program have expanded the coverage of Makemakesuper star clustersram pressurestarburst regionsexozodiacal dust, and more.

This week, we hope to inspire the next group of instructors to join our efforts to improve Wikipedia’s coverage of science. If you’re attending the conference, please join us in the exhibit hall to learn how we can support you with our free suite of tools. Otherwise, send an email to contact@wikiedu.org or follow the steps here to join our program.

Image: File:Total Solar Eclipse 2017 – Corona and Earthlight (36450742744).jpgBernd Thaller, CC BY 2.0, via Wikimedia Commons.

by Jami Mathewson at January 10, 2018 08:00 PM


Talking about the fear of failure

Before we go any further and talk about my strategies to bring more attention to technical translation in Wikimedia projects, I need to touch on a subject that most likely affects every human being: the fear of failure. As I wrote on January 2 in my daily notes, a few days ago I was filled with apprehension and uneasiness as I began to feel the burden of my responsibility as an Outreachy intern and to question how much room for failure I have. Although anxious to put my knowledge and learning into practice, I feared making mistakes and not achieving the expected success. What happens if I do not find the perfect solution, if all my ideas are unsuccessful?

I expressed my fears and insecurities to Johan and Benoît at our last meeting and they, being the good mentors they are, reminded me of several important aspects of the nature of my project and of a professional life as a whole:

This project is not like the others

While defining expected results for projects involving the implementation of a particular functionality in a software is an easy process, for one who proposes to find new strategies for recruiting new contributors this is somewhat more complicated. It is unfair to expect that I will solve all problems related to technical translation on Wikimedia Foundation in three months of action, and it is unrealistic to expect that the courses of action I propose will work out to the point where we can set goals in terms of numbers of new translators and increases in the percentage of documentation translated into each language. This leads us to the second point:

This is a research-focused project

And as in any other research elsewhere, results that are considered "negative" remain good results given the possibility of learning both with failures and successes. That is why, in addition to wanting to write a final report on my experiences, impressions and recommendations, I try to document my work as much as I can - thus, it is possible to better understand the paths that led me to those conclusions.

There is not such a thing as a "perfect decision"

When our plans fail, we tend to think that the other options we had at the time of decision could have led us to success. And while it is natural to reevaluate our deliberation process, it is unhealthy to be obsessed with the fantasy of possibility. The decision we make at one point is often the best we can do. We act taking into account the information available and our best judgment at the time.

"You could, for example, get hit by a bus if you had chosen something else," Johan told me. This may seem cynical and overly absurd, but it is a very appropriate statement. There is nothing, absolutely nothing, to assure you that the other options would have been better choices.

Being flexible is necessary

Making a decision does not necessarily mean becoming devoted to it. It's a good idea to set up a evaluation routine to check if you're headed in the right direction to achieve your goals, and there's nothing wrong with reconsidering other options even if you've invested a lot of time. In software development, this is one of the central points of the so-called (and extensively used) agile methodology.

Once again, it is important to note that testing options and recognizing that they are flawed does not mean wasting time and resources. Every action is an opportunity for learning and the result will always be to become a more conscious, experienced and wise person.

The responsibility is shared

I am not alone - my mentors and I are a team. We take action only after we have dialogued with each other and reached a consensus. If I do not agree to something, I am free to express my point of view and counter-arguments. If my ideas have no foundation or are something that has already been done and failed, they will let me know.

This attitude ends up creating a support network. It is not expected of me that I carry the weight of decisions (and their consequences) on my own. "We got your back, don't worry", they both said to me.

As a person who has been watching the technology world dehumanize those who work in the area with absurd working hours and deadlines, the possibility of being honest about my feelings and conflicts with the people who are mentoring me is certainly one of the most important aspects of the experience I'm having in this internship.

The process of growing — or, even better, flourishing — involves much more than having opportunities to put your skills into practice or people who believe in what you can offer. It requires, above all, a great understanding of human nature and an immense respect for its limits. And it is heartening to experience all of this while working with the Wikimedia Foundation.

by Anna e só at January 10, 2018 01:30 PM

Falando sobre o medo do fracasso

Antes de seguirmos em frente e falarmos de minhas estratégias para trazer mais atenção à tradução técnica nos projetos da Wikimedia, preciso tocar em um assunto que muito provavelmente afeta todo ser humano: o medo do fracasso. Como escrevi no dia 2 de janeiro em minhas notas diárias, há alguns dias fui tomada por apreensão e inquietude pois comecei a sentir o peso de minha responsabilidade como estagiária e a questionar até que ponto eu poderia falhar. Apesar de estar ansiosa para colocar os meus conhecimentos e aprendizados em prática, temia cometer erros e não obter o sucesso esperado. O que acontecerá se eu não encontrar a solução perfeita, se todas as minhas ideias forem infrutíferas?

Expressei os meus medos e inseguranças a Johan e Benoît em nossa última reunião e eles, sendo os bons mentores que são, lembraram-me de vários aspectos importantes da natureza do meu projeto e da vida profissional como um todo:

Este não é um projeto como os outros

Enquanto definir resultados esperados para projetos que envolvem a implementação de uma determinada funcionalidade em um software é um processo fácil, para um que se propõe a encontrar novas estratégias de recrutamento de contribuidores isso é um tanto mais complicado. É injusto esperar que eu resolva todos os problemas relacionados à tradução técnica na Wikimedia Foundation em três meses de atuação, e não é realista ter a expectativa de que os cursos de ação que eu propor darão certo a ponto de podermos definir metas em termos de números de novos tradutores e aumentos da porcentagem de documentação traduzida em cada idioma. Isso nos leva ao segundo ponto:

Este é um projeto focado em pesquisa

Como em qualquer outra pesquisa em qualquer outra área, resultados tidos como "negativos" continuam sendo bons resultados dada a possibilidade de aprendermos tanto com falhas quanto com sucessos. É por isso que, além de pretender escrever um relatório final sobre as minhas experiências, impressões e recomendações, proponho-me a documentar o meu trabalho o máximo que posso — dessa forma, é possível entender melhor os caminhos que me levaram àquelas conclusões.

A decisão perfeita não existe

Quando os nossos planos falham, tendemos a pensar que as outras opções que tínhamos à disposição no momento de decisão poderiam ter nos levado ao sucesso. E enquanto é natural reavaliar o nosso processo de deliberação, não é nada saudável ficarmos obcecados pela fantasia da possibilidade. A decisão que tomamos em determinado instante é, muitas vezes, a melhor que poderíamos fazer. Agimos levando em conta as informações disponíveis e nosso melhor julgamento no momento.

"Você poderia, por exemplo, ser atropelada por um ônibus se escolhesse outra coisa", disse-me Johan. Isso pode parecer cínico e demasiadamente absurdo, mas é uma colocação bastante apropriada. Não há nada, absolutamente nada, que lhe garanta que as outras opções teriam sido melhores escolhas.

Ser flexível é preciso

Tomar uma decisão não significa necessariamente tornar-se devoto a ela. É uma boa ideia estabelecer uma rotina de avaliações para checar se você se encontra na direção certa para atingir os seus objetivos, e não há nada de errado em reconsiderar outras opções ainda que se tenha investido muito tempo. No desenvolvimento de software, inclusive, esse é um dos pontos centrais da tão falada (e extensivamente usada) metodologia ágil.

Novamente, é importante salientar que testar opções e reconhecer que elas são falhas não significa desperdiçar tempo e recursos. Toda ação é uma oportunidade de aprendizado e o resultado sempre será se tornar uma pessoa mais consciente, experiente e sábia.

A responsabilidade é compartilhada

Não estou só — eu e os meus mentores somos uma equipe. Tomamos ação apenas após dialogarmos um com os outros e chegarmos a um consenso. Se eu não concordar com algo, sou livre para expressar o meu ponto de vista e contra-argumentos. Se as minhas ideias não tiverem fundamento algum ou forem algo que já foi feito e fracassou, eles me comunicarão disso.

Essa atitude acaba criando uma rede de suporte. Não é esperado de mim que eu carregue o peso das decisões (e as suas consequências) sozinha. "Nós estamos contigo, não se preocupe", ambos me disseram.

Como uma pessoa que tem assistido aos poucos o mundo da tecnologia desumanizar aqueles que trabalham na área com exigências de cargas horárias e prazos cada vez mais absurdos, a possibilidade de poder ser sincera sobre os meus sentimentos e conflitos com as pessoas que estão me mentorando é com certeza um dos aspectos mais importantes da experiência que estou tendo neste estágio.

O processo de crescimento — ou, ainda melhor, de florescimento — envolve muito mais do que ter oportunidades de colocar as suas habilidades em prática ou pessoas que acreditam no que você pode oferecer. Ele exige, principalmente, uma enorme compreensão da natureza humana e um imenso respeito pelos seus limites. E é acalentador vivenciar tudo isso enquanto trabalho com a Wikimedia Foundation.

by Anna e só at January 10, 2018 01:30 PM

Gerard Meijssen

#Wikipedia - fiduciary responsibilities for #Wikipedia #Medical

Retraction Watch has a very relevant article for one of the most important resources for medical information: Wikipedia. Its title: “A concerning – largely unrecognised – threat to patient safety:” Nursing reviews cite retracted trials. It is a follow up interview of an article in the International Journal of Nursing Studies with Richard Gray the principal author.

Given that Wikipedia is the most read resource by medical practitioners, the interview has many relevant pointers on ensuring safe practices. I quote them from the paper and with some modifications they apply to any and all sources used in Wikimedia content.

  1. A retraction filter (or whatever mechanism the database in question allows) must be applied to the end output of any search strategy.
  2. Journals/databases must make retractions more visible (step 1 above depends on it).
  3. Collaborations (e.g. Cochrane, Campbell, The JBI) need to incorporate into their handbooks directives around retraction. For example, a scan for retractions after data sourcing; a scan for retractions before data extraction; a scan for retractions before review submission.
  4. The reporting guidelines for systematic reviews (Preferred Reporting Items for Systematic Reviews and Meta-Analyses, PRISMA) needs to include an item stating that authors have checked if any included studies have been retracted.
  5. Journal editors should require authors, when submitting manuscripts, to confirm that they have checked that none of the included studies have been retracted. Authors should also include a statement in the paper stating they have done this.
  6. Proofreaders may also have an important role to play. For example, authors of one review included in their reference list a citation that clearly indicated the reference was for a retracted paper. Proofreaders could be trained to spot and report these anomalies.
Registering retractions in Wikidata would be a start.

by Gerard Meijssen (noreply@blogger.com) at January 10, 2018 12:48 PM

#Wikidata - Rachael E Jack; Spearman medal winner

On Facebook I mentioned a 2016 blog post about the Spearman Medal. I checked for missing entries; they were the two 2017 award winners, Mrs Claire Haworth and Mrs Rachael E Jack.

Adding award winners to Wikidata is something I do regularly. It always starts with a search. Mrs Jack was known as "Rachael Jack" on Wikipedia and by drilling down into the ORCID information I found confirmation that this is indeed the same person.

Mrs Haworth is known to ORCID as well, and through a link to a profile, there was a confirmation that it was the same person; the award winner of the Spearman medal.

Typically I do not spend that much time on red links. What I wanted to know is the value of the network. Given the titles of publications known at ORCID, some of the publications of Mrs Haworth could already be found in Wikidata and were linked.

Thanks to all the work done on scholarly publications, scaffolding information for Wikipedia articles become available.. These two ladies are notable if only because of being recipients of the Spearman medal.

by Gerard Meijssen (noreply@blogger.com) at January 10, 2018 12:16 PM

January 09, 2018

Wikimedia Tech Blog

“We keep the servers going … and much more”: Recent highlights from our Technology department

Photo by Victor Grigas/Wikimedia Foundation, CC BY-SA 3.0.

The most important responsibility of the Wikimedia Foundation’s Technology department is to “keep the servers running”: to operate the computers that provide Wikipedia and other Wikimedia sites.

But running the servers is only a fraction of the work our eighty-person team does. The Technology department also provides a variety of other essential services and platforms to the rest of the Wikimedia Foundation and to the public. In this post, we’ll introduce you to all of the programs that make up the Technology department, and highlight some of our work from the past year.

The 18 members of the Technical Operations team maintain the servers and run the wikis.  Over the last year, the team delivered an uptime of 99.97% (according to independent monitoring) across all wikis. The team also practiced switching the wikis back and forth between data centers, so that the sites are resilient in case of site failure.  This year, they performed the second ever switchover from primary to secondary data center and back, and more than doubled the speed of the switchover (more information).

The ten-person Fundraising Technology team is responsible for the security, stability, and development of the Wikimedia Foundation’s online donation systems. Millions of relatively small donations (average of about $15 per transaction) make up the majority of the Wikimedia Foundation’s operating budget every year.  The team maintains integration with 6 major payment processors, and several smaller ones, enabling online fundraising campaigns in approximately 30 countries each year. The team also maintains donor databases and other tools supporting fundraising.

You may have noticed that saving edits on Wikimedia got faster last year.  For this, credit the Performance team.  Last year, they tackled technical debt, and focused on the most central piece of our code’s infrastructure, MediaWiki Core, looking for the highest value improvements to make biggest performance impact.The four-person team was responsible for 27% of all contributions to MediaWiki Core last year (source).  Their biggest success was reducing the time to save an edit by 15% for the median and by 25% for the 99th percentile (the 1% slowest edit saves). This is a performance improvement directly felt by all editors of our wikis.

The eight people on the Release Engineering team (RelEng) maintain the complicated clusters of code and servers needed to deploy new versions of Mediawiki and supporting services to the servers and to monitor the results.  Last year they consolidated to a single deployment tool, which we expect to permanently reduce the cost of Wikimedia website maintenance.  A creeping increase in maintenance costs is a major Achilles’ heel (“a weakness in spite of overall strength, which can lead to downfall”) of complex websites, so any improvement there is a major victory.

It’s hard to know if you are improving something if you can’t measure the improvement, and you can’t measure improvements to something you aren’t measuring in the first place.  For example, the English Wikipedia community is experimenting with a different model for creating articles (ACTRIAL), and will need reliable data to know what the result of the experiment actually is.  The seven-person Analytics Engineering team builds and supports measurement tools that support this and many other uses, while working within the Privacy Policy and the values of the movement that constrain what data can be collected. The team is working in new initiatives to process data real time that, for example, enable the fundraising team to get same-day turnaround on questions about fundraising effectiveness.  One of the main projects this year is Wikistats 2.  Wikistats has been the canonical source of statistics for the Wikimedia movement since its inception. Wikistats 2 has been redesigned for architectural simplicity, faster data processing, and a more dynamic and interactive user experience. The alpha for the UI and new APIs was launched on December 2019. Although the tool and APIs are geared to the community, anyone can use Wikistats UI and APIs to access information about Wikipedia.

Picture of the Wikistats UI, accessible at http://stats.wikimedia.org/v2.

The Initiative for Open Citations, one of the projects led by the Research team, made citation data from 16 million papers freely available. Thanks to this grassroots initiative, availability of citation data went from 1% to over 45% of the scientific literature.  We created a coalition of 60+ partner organizations, funders, publishers, tech platforms supporting the “unconstrained availability of scholarly citation data”.  This data is actively being reused by volunteers in Wikidata.  The Research team comprises six people, working on this among many other projects.

A list of organizations that support the Initiative for Open Citations. Logos drawn from the organizations identified, with most being non-free and/or trademarked.

The six people on the Search team work to make it easier to to find information on MediaWiki sites.  Recently they have been focusing on integrating machine learning to drastically reduce the time needed to tune search results.  The hypothesis is that it should go from taking 2 or 3 days of manually tweaking interdependent algorithms to taking an hour or two to set up new search ranking features in a given model. We are focusing on automating as much of this process as possible by the end of Q2.  While we are too early in the deployment phase to enjoy a significant time reduction, we do know that machine learning is already showing a 5-6% improvement in search result clickthroughs, according to our initial tests on the English Wikipedia, using the same set of features we have been manually tuning up to this point (analysis).  Ultimately, we will be able to deploy machine learning models on a per-wiki basis, whereas our manual approach is tuned against the English Wikipedia only and applied across the board based on that.

While traffic on Wikimedia sites are still served overwhelmingly as pages, we also serve high volumes of traffic via APIs. The Action API supports users such as bots, the mobile apps, and large content consumers like Google.  The architecture of Mediawiki itself is also changing so that many “internal” human-facing features, such as VisualEditor and Page Previews, are also powered under the covers by the same API service; in this model, Wikimedia provides raw data and shifts the rendering burden to the client, i.e. VisualEditor or other in-browser tools, the mobile apps, or the third-party user.  This infrastructure is the responsibility of the three people in the Services Platform team.  This year they released version 1.0 of the REST API.  This new API was engineered to support high-volume content and data access, as well as new use cases such as VisualEditor and section-level editing.  It was growing rapidly even before version 1.0 (see chart).  In the first 27 days of September 2017, it served 14.3 billion requests, compared to 14.9 billion for Action API.  The release of version 1.0 signals commitment and maturity to the API user community commensurate with growth in traffic, and prepares WMF sites for high-volume API-driven experiences.

Graph, public domain.

This list of recent highlights captures only a fraction of the output of the Technology Department’s programs. In future blog posts, we’ll talk more about what the department is working on, what it’s planning to do, and how you can participate in the open-source production of Mediawiki software and the sustainment of the Wikipedia family of websites.

Victoria Coleman, Chief Technology Officer, Wikimedia Foundation, with assistance from many members of the Foundation’s Technology department.

This post has been updated to shorten the conclusion.

by Victoria Coleman at January 09, 2018 10:57 PM

This month in GLAM

This Month in GLAM: December 2017

by Admin at January 09, 2018 09:45 PM

Wiki Education Foundation

Visiting Scholar draws upon her archival expertise to improve Wikipedia

If you visited Wikipedia’s main page on December 26, you may have seen that the Featured Article of the day was a fascinating entry about the Canadian Indian residential school system. Danielle Robichaud, a Digital Archivist at the University of Waterloo Library, began work on that article while she was a Wikipedia Visiting Scholar at McMaster University in 2015-16, and continued to develop it afterwards until it was promoted to Featured Article in August. In this blog post, Danielle shares her experiences, insights, and perspectives as a Visiting Scholar contributing to this and several other topics.

Danielle Robichaud

I was attracted to McMaster’s Visiting Scholar (VS) position because it centered the university’s special and archival collections. I had been editing regularly for about two years and was increasingly curious about the relationship between Wikipedia and archival collections. Well versed in linking to archival holdings from the External links section of relevant pages, I was ready to explore how those same holdings might drive page creation or quality improvements to existing pages. That the Visiting Scholar program is rooted in the freedom to edit based on personal interests and expertise solidified my decision to apply.

Although I worked on a variety of pages during my VS position, the bulk of my edits were tied to focusing on people and events who were represented in McMaster’s archival holdings but were poorly or underrepresented on the site. My approach to page improvement and creation was informed by archivists and librarians who had previously worked to integrate information about special collections into Wikipedia. Michele Coombs’ (2011), for example, outlined how to move beyond spam and conflict of interest challenges by establishing two guidelines: “add content whenever possible, not just as a link; and only link if our collection (as represented by our finding aid) has something unusual or significant to offer.” A similar editing guideline has been put into place at the Archives of American Art where, as much as possible, contributions “must serve the goal of making the encyclopedia better.” (Snyder 2014)

Page improvements tied to archival practice

The page for Lady Constance Malleson was the first to get my editing attention. Created in 2003, it positioned her relationships with other people as grounds for notability rather than her achievements as an actor, writer and activist who also happened to be the wife Miles Malleson and the one-time mistress of Bertrand Russell. Using subscription newspaper databases, Internet Archive holdings and Google Book previews, I worked to expand the page to shift its focus to Malleson in her own right so that her professional accomplishments and agency in her relationships are no longer tangential. In fact, one may now be inspired to make use of Malleson’s personal papers to better understand her life and professional achievements rather than simply make note of her as someone’s romantic interest.

Another page I worked on was dedicated to Louise Bennett-Coverley whose page reflected an area where archivists can meaningfully contribute to Wikipedia: writing quality. When I began working on the page I immediately noted that in addition to lacking sufficient inline citations, the bulk of the page was plagiarized. To improve it I drew on my training as an archivist, which includes writing biographical sketches to provide researchers with an overview of the major milestones and achievements of a person’s life. Conveniently the process closely mirrors the development of a biographical Wiki page confirming my belief that archivists are well positioned to create and improve biographical content given that our professional training centers the relevant skills. Further, our ability to do so is often well supported by the reference collections, including out-of-print or yet to be digitized biographical resources, many archives maintain as part of collection development processes.

Making a difference through administrative engagement

Beyond editing achievements, a primary benefit of my involvement with the VS program was that it prompted me to engage more actively with the administrative aspects of Wikipedia. One example was my involvement with the development of the Cite archive template. I contributed feedback about the draft template; ensured that archival descriptive terminology from outside of the United States was integrated into the available fields; and figured out how to reference primary resources without overstepping No original research guidelines. I hit on the approach while working on Marian Engel’s page where I used the template as a way to complement multiple secondary references that made note of a letter of support written by Robertson Davies. By using the Cite archive template to point to a digitized version of the letter, positioning it after two secondary resources preferred in established referencing guidelines, I helped provide immediate access to the letter while pointing to archival holdings as a source of relevant information and further reading.

Another example of engagement with Wikipedia administration was learning to navigate page reviews ranging from requesting basic page quality assessments to moving through the peer review process. I drew on this experience to continue working on the Canadian Indian residential school system after the end of my VS term. I approached my editing through a framework of reconciliation[1] informed by the Truth and Reconciliation Commission of Canada (TRC) Calls to Action (PDF) and Indigenous scholars, including Métis writer and lawyer Chelsea Vowel (2015), who have repeatedly called on Canadians to read the Executive Summary of the TRC and do the work of educating themselves about the residential school system.

Ultimately my aim was to participate in the reconciliation process by holding myself accountable as a white settler with what I recognized as a superficial understanding of the residential school system. By making the decision to move beyond guilt and defensiveness to an action-oriented view of reconciliation I was able to focus on an area where I knew I could make a difference: creating a reliable page to help raise awareness about the school system and facilitating the retrieval of resources by others seeking to improve their own understanding of its impact. Further, while working to improve the page I was exposed to the scornful and undermining attitudes of other editors regarding Indigenous peoples. The prevalence of these viewpoints underscored the importance of having a well referenced and peer reviewed page approved by the Wikipedia community available for those seeking to move beyond an understanding of the school system rooted in colonial nostalgia and apathetic dismissal. Particularly when for some it may well be the only page they ever read.

Thanks to my involvement with the VS program I was able to confidently make substantive improvements to the clarity, readability, and focus of the existing page eventually securing it Featured Article status. On December 26th it appeared on the Main Page, just in time to round out the list of Canadian-focused pages that were featured during the country’s 150th anniversary. It’s an accomplishment I could not have achieved without my experience as a VS, the thoughtful and constructive contributions of other editors, including fellow librarians and archivists, or the tireless and unyielding work of residential school survivors and their families to hold Canadians accountable for their actions past and present.

[1] Editing as an act of reconciliation is an approach Krista McCracken and I spoke further about at the 2017 annual conference of the Archives Association of Ontario during a talk titled Collaborative archival practice: Rethinking outreach, access, and reconciliation using Wikipedia.


Combs, M (2011). “Wikipedia as an access point for manuscript collections.” A Different Kind of Web: New Connections Between Archives and Our Users. Kate Theimer. Society of American Archivists. 139–147.

Snyder, Sara (2014). “Wikipedia Is Made of People! Revelations from Collaborating with the World’s Most Popular Encyclopedia.” Outreach: Innovative Practices for Archives and Special Collections. Kate Theimer. Rowman & Littlefield. 91-106.

Vowel, Chelsea (2015). Read the Truth and Reconciliation Report Before You Form an Opinion.

Image: File:Quappelle-indian-school-sask.jpg, public domain, via Wikimedia Commons. 

by Guest Contributor at January 09, 2018 04:27 PM

January 08, 2018

Wikimedia Foundation

Inspire New Readers campaign: Raise awareness of Wikipedia where you live

Photo by Victor Grigas, CC BY-SA 3.0.

Did you know that only 33% of internet users in India have heard of Wikipedia? Like this, there are many other “low-awareness” regions all around the world. For example, only 19% of internet users in Iraq, and 39% in Brazil know about Wikipedia. If you lived in one of these places, what would you do to attract new readers for Wikipedia?

We want to hear your ideas! From January 8 to February 4, we will be running a new crowdsourcing campaign: Inspire New Readers. The goal of this campaign is to come up with ideas about how to increase awareness of Wikipedia where you live. Over the next month, share your ideas, discuss with others and plan a new project on the Inspire campaign page on Meta. After the campaign, grants are available to turn these ideas into collective action. For projects that do not need funding, planning and logistical support is available.

Why new readers?

The goal of this campaign is to come up with ideas about how to increase awareness of Wikipedia. It’s about bringing new readers into the movement who may have never used Wikipedia before and helping them understand the incredible work thousands of volunteers do to build the world’s largest online source of free knowledge.

This campaign is one of a series of Wikimedia Foundation projects that aim to increase awareness of Wikimedia projects. This work is vitally important: we know, based on recent research, that awareness of Wikipedia differs around the world. In the United and Western Europe, an average of 85 percent of internet users have heard of Wikipedia, that number drops sharply around the globe. Research also shows that only 33 percent of Internet users in India, 19 percent of internet users in Iraq, and 39 percent of Internet users in Brazil have heard of Wikipedia.

Slide from presentation by Zack McCune/Wikimedia Foundation, CC BY-SA 4.0.

Why is awareness important? Awareness is the first step in building new users, support, and ultimately participation in Wikimedia projects. We know that low awareness of Wikipedia is associated with low usage, and without usage people will never become contributors or advocates for free knowledge. Access to knowledge is a universal human right. By focusing on increasing participation, we are both working towards open access to information, and, most importantly, for a more diverse source of open knowledge, where everyone has equal footing in documenting history.

Join the campaign!

Inspire Campaigns are month-long events to focus collaborative efforts on some of the most pressing challenges of the Wikimedia movement. This is a time to share and create new ideas, and there are many ways to participate: you can contribute your own ideas, give feedback on other people’s ideas, and sign up as a volunteer to help in other participant’s projects.

Resources are available to help you think through new ideas. You can find two videos explaining recent efforts that focus on awareness of Wikipedia in Nigeria and India on the campaign page.  We will also be hosting two workshops: one on how to think about awareness, and another one on how to plan a pilot. Find the details and sign up to attend the workshops here.

Join the Inspire New Readers campaign and help us to bring the joy of Wikipedia to new readers around the world.

María Cruz, Communication and Outreach Project Manager, Learning and Evaluation
Wikimedia Foundation

This Inspire campaign is being led by the Wikimedia Foundation’s Community Resources team with support from its New Readers team, which includes folks from Audiences, Communications, and Partnerships. 

Submit your proposals!

by María Cruz at January 08, 2018 05:22 PM

Wiki Education Foundation

Sustaining the impact of the Year of Science

In 2016, Wiki Education ran an initiative called the Year of Science, dedicated to improving science content on Wikipedia. We nailed our goals for the year-long campaign: nearly 5 million words of science content added to more than 5,700 science articles on Wikipedia by more than 6,300 students.

In 2017, we had surpassed the impact of the “Year of Science” by early October.

That’s right: In 2017, Wiki Education’s programs added more than 7.68 million words of science content to Wikipedia. More than 8,500 students — enrolled in 383 science courses — improved 9,310 science articles, including creating nearly 1,000 new articles. For context, 7.68 million words is 17.5% of the words in the last print edition of Encyclopædia Britannica. These dramatically larger impact numbers highlight the sustainable nature of Wiki Education’s Classroom Program, the cornerstone of 2016’s initiative.

In our Classroom Program, higher education faculty assign students to improve course-related Wikipedia articles. Throughout 2016, we set out to increase the number of science courses in the program as part of the Year of Science. Our goal was to both improve the science content on Wikipedia during the initiative, as well as to build a network to sustain the impact of the Year of Science for years to come.

Sustaining Science,” our shorthand for the ongoing initiative to further the impact of the Year of Science, got off to a roaring start in 2017. Here’s why: As part of the work we did in 2015 and 2016 to attend science-focused academic conferences, we collected contact information for a lot of science instructors. Some were eager to incorporate Wikipedia assignments into the next course they taught; those 150 first-time science instructors participated in the formal Year-of initiative in 2016. But many weren’t teaching a Wikipedia-appropriate course in 2016, or had another project lined up for their course already. These are among the 178 science instructors who taught with Wikipedia for the first time in 2017.

It’s not just the new instructors, however: 97% of our instructors report they will teach with Wikipedia again. That means the new instructors we bring on to teach science with Wikipedia tend to return year after year, bringing more and more students and improving more and more articles on Wikipedia. We also have a healthy word-of-mouth recruiting pipeline; since our instructors see the value in teaching with Wikipedia, they often encourage colleagues to participate in our program as well.

Finally, partnerships we established with academic associations in the sciences (such as with the American Chemical Society, American Society of Plant Biologists, and Midwest Political Science Association) continue to provide a network to reach faculty in science disciplines. These partners encourage their members — many of whom teach science courses — to participate in our programs.

Our retention, ongoing recruitment, and partnerships foster a sustainable base of science courses to continue the impact we’ve had to improving science content on Wikipedia for years to come. For more information, see our Sustaining Science page.

by LiAnna Davis at January 08, 2018 05:17 PM

January 06, 2018

Weekly OSM

weeklyOSM 389



Lighthouses of Europe (OpenStreetMap data) 1 | © OpenStreetMap Contriubutors © map_creator

About us

  • We join with the OSMF in wishing everyone a Happy New Year for 2018! Your feedback and engagement with OSM Weekly is an important source of motivation for the members of our team.We also welcome bjoern_m and Tordanik as members of the team. Of course, as ever, we are still looking for interested people who want to participate.


  • Bryan Housel asks for feedback on whether self-intersecting lines should always generate an error in the iD editor. You can comment on the GitHub issue here.
  • Volker Schmidt wonders why emergency bays are tagged in a different way from bus bays, even though they are physically very similar.
  • The German forum fails to agree on whether addresses should be tagged on the entrance or the building outline, but in a lively discussion (automatic translation) the various possibilities are mentioned.


  • Daniel Koć asks on the OSM-talk mailing list how to find new contributors for the OSM-carto map style. There is already a list of Issues for beginners.
  • SomeoneElse responds with a blow-by-blow account of what is involved in creating a successful pull request for OSM-Carto.
  • The 6th edition of State of the Map France will take place in Bordeaux, on 1st-3rd June 2018.
  • Researchers from the University of Minnesota have investigated (pdf) the mapping behaviour of OpenStreetMap users. They claim geographical bias of mapping contributions is innate (“born, not made”)
  • James Derrick reflects on how mapping, software & technology have changed in the 10 years he has been contributing to OSM.
  • Joost Schouppe has created an analysis of the OSM Notes. Among other things, he discusses the effects of Maps.me.
  • On Reddit “map_creator” has shared a map of lighthouses mapped in OpenStreetMap in Europe. Some feedback follows about which lighthouses are not mapped yet and who to contact.

OpenStreetMap Foundation

  • The OSMF rejoices about a bitcoin donation of £27k, and refers to this donation path.
  • The OpenStreetMap Foundation welcomes OpenStreetMap France as an official local chapter in France.

Humanitarian OSM

  • In an OSM diary entry, Tasauf, writes about the contributions and impact of the BHOOT (Bangladesh Humanitarian OpenStreetMap Operations Team) that was formed as a initiative of BOIL (Bangladesh Open Innovation Lab) to support HOT (Humanitarian OpenStreetMap Team) with all their activations supporting all the disasters & emergency humanitarian crisis throughout the globe including Bangladesh.


  • Michael Spreng announces(de) the availability of an OSRM installation on routing.openstreetmap.de, with routing profiles for cyclists, pedestrians and cars. The server is funded by FOSSGIS, the German local chapter of OSMF. Michael emphasizes that this routing service is run by the community, rather than as a commercial operation.
  • On his Github pages, marcusyoung provides a tutorial for OpenTripPlanner, with special reference to the use of isochrones.


  • Mapzen will cease operations at the end of January 2018. The migration guide lists all projects which will continue existing. According to a tweet and the project description, the Valhalla developers are now working for Mapbox (Mapbox already has a routing engine – OSRM).
  • Stefan Keller presented a tool on the Swiss mailing list talk-ch to make it easier for business owners to enter their data on OSM and keep it up to date. He asks for feedback.
  • With its Google Impact Challenge (GIC) program in Germany (international site), Google offers the opportunity (translation) to raise money for a coding project. Deadlines for submissions is January 10th, 2018.
  • OpenStreetMap plans to apply once again as a mentoring organisation in the 2018 edition of Google Summer of Code.

Did you know …

  • … that Belgium and The Netherlands ratified a change in their borders, to align again on river Maas (Meuse), which was straightened back in 1961. It goes without saying that OSM is already up-to-date. The Netherlands gained 13.28 ha. Both countries stand to gain though, as police interventions and land maintenance are simplified.
  • …. the information on the wiki about the 9th German OpenStreetMap and the 13th FOSSGIS conference in Bonn in March 2018? especially the information about OSM Saturday?

Other “geo” things

  • GEOBIA’ 18 will take place in Montpellier on June 18th-22nd. The platform is intended to promote scientific research and development of global Earth problems worldwide, with a particular focus on Big Data Integration and Open Source solutions, and to support selected young researchers.
  • The first open source satellites orbit in space! The system was presented (de) (Google Translate) at the 34C3 (the biggest German hacker conference) in Leipzig.
  • Watson reports about the Swiss Confederation map, which shows all current departures and delays of public transport in Switzerland. The data can be queried using this API.
  • What would the London’s “Overground” railway network map look like if it was designed like the map of Berlin’s S-Bahn? Here you go. It makes a change from everything being orange.

Upcoming Events

Where What When Country
Passau Niederbayerntreffen 2018-01-08 germany
Lyon Rencontre libre mensuelle 2018-01-09 france
Nantes Réunion mensuelle 2018-01-09 france
Rennes Réunion mensuelle 2018-01-10 france
Berlin 115. Berlin-Brandenburg Stammtisch 2018-01-11 germany
Tokyo 東京!街歩き!マッピングパーティ:第15回 品川神社 2018-01-13 japan
Kyoto 幕末京都マッピングパーティ#01:京の町家と大獄と 2018-01-13 japan
Brisbane Brisbane Photo Mapping 2018-01-13 australia
Cologne Köln Stammtisch 2018-01-17 germany
Toulouse Réunion mensuelle 2018-01-17 france
Leoben Stammtisch Obersteiermark 2018-01-18 austria
Rome FOSS4G-IT 2018 2018-02-19-2018-02-22 italy
Cologne Bonn Airport FOSSGIS 2018 2018-03-21-2018-03-24 germany
Poznań State of the Map Poland 2018 2018-04-13-2018-04-14 poland
Bordeaux State of the Map France 2018 2018-06-01-2018-06-03 france
Milan State of the Map 2018 (international conference) 2018-07-28-2018-07-30 italy

Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please check your event in our public calendar preview and correct it, where appropriate.

This weeklyOSM was produced by Nakaner, Polyglot, SK53, SeleneYang, SomeoneElse, Spanholz, Tordanik, derFred, jcoupey, jinalfoflia.

by weeklyteam at January 06, 2018 11:46 PM

Wikimedia Cloud Services

Ubuntu Trusty now deprecated for new WMCS instances

Long ago, the Wikimedia Operations team made the decision to phase out use of Ubuntu servers in favor of Debian. It's a long, slow process that is still ongoing, but in production Trusty is running on an ever-shrinking minority of our servers.

As Trusty becomes more of an odd duck in production, it grows harder to support in Cloud Services as well. Right now we have no planned timeline for phasing out Trusty instances (there are 238 of them!) but in anticipation of that phase-out we've now disabled creation of new Trusty VMs.

This is an extremely minor technical change (the base image is still there, just marked as 'private' in OpenStack Glance). Existing Trusty VMs are unaffected by this change, as are present ToolForge workflows.

Even though any new Trusty images represent additional technical debt, The WMCS team anticipates that there will still be occasional, niche requirements for Trusty (for example when testing behavior of those few remaining production Trusty instances, or to support software that's not yet packaged on Debian). These requests will be handled via phabricator requests and a bit of commandline magic.

by Andrew (Andrew Bogott) at January 06, 2018 11:31 PM

Gerard Meijssen

#Wikidata - Lindsay McLaren; #science under attack

Mrs McLaren is the first author of a paper about the (negative) effects of the cessation of adding fluoride to drink water. Retraction Watch mentioned the aggressive attacks on Mrs McLaren by people opposed to the addition of fluoride to tab water and it refers to an article in the National Post.

Adverts bought from Google may give the impression that something is wrong with the science. Not so. Reason enough to put some positive spin on Mrs McLaren and provide her with an item in Wikidata. You may consider this to be an invite to write a Wikipedia article..

by Gerard Meijssen (noreply@blogger.com) at January 06, 2018 03:59 PM

Noella - My Outreachy 17

Next Move!!

The next move requires no past mistakes to be done.

I am getting to the 5th week of the journey and thee more time goes the more the work becomes more and more serious. From the few weeks that have gone already, I have improve on my organisational skills and have learnt how to push  work fast enough to meet with datelines. For the next move I am also taking the next task. My work is divided into six tasks as per my timeline and am on the second now 😟. I have to press the acceleration button if I want to meet with datelines but that's not a big deal when you have mentors like mine.

Since coding started I keep on learning. I become more comfortable with the code base and I think accelerating things a bit will not harm and instead help. Let see in one week time what the output will be :).

by Noella Teke (noreply@blogger.com) at January 06, 2018 02:04 AM

January 05, 2018

Wiki Education Foundation

Join linguists as they preserve languages on Wikipedia

In November 2015, Wiki Education announced a partnership with the Linguistic Society of America (LSA) to bring linguistics instructors and students into our programs. The idea was to combine Wiki Education’s successful programs with LSA’s visibility among linguists, inviting their members to teach with Wikipedia or host a Wikipedia Visiting Scholar.

Since that time, we have worked with a staggering 57 linguistics courses with over 900 students who have added 804,000 words to Wikipedia. That’s nearly a million words that undergraduate and graduate students have contributed to the public scholarship of linguistics, amplifying the reach of the sources they cite.

This week, Wiki Education is at the LSA conference in Salt Lake City, where we’re looking to get even more courses involved. We’re speaking with university instructors about the power of adding content about linguistics and language to Wikipedia, where the general public gets information.

When these instructors join Wiki Education’s Classroom Program, they’ll assign students to research course-related topics, synthesize what they learn, and write a new Wikipedia article or expand an existing one. They’ll join a committed group of linguists and aspiring linguists aiming to preserve and document languages and peoples from around the world. Students have already significantly expanded Wikipedia’s coverage of topics like Blackfloot languageAbui peopleTagish languageOkinawan language, and Kutenai language. Students are also educating the masses about important linguistic concepts like fluencyvariationlanguage immersion, and the feminization of language.

If you’re at LSA this week, stop by the exhibit hall to see us, and we can begin working out the details of your next Wikipedia assignment. You can also email us at contact@wikiedu.org.

by Jami Mathewson at January 05, 2018 09:49 PM

Engaging engineering students in the humanities

Dr. Kathleen Sheppard is Associate Professor in the Department of History and Political Science at Missouri University of Science and Technology. This fall, she conducted a Wikipedia assignment in her course, History of Science in Latin America. In this post, she reflects on the experience.

ImageFile:Kathleen Sheppard WEF blog.jpg, Kathleen Sheppard, CC BY-SA 4.0, via Wikimedia Commons.

Every semester, I teach a survey course in the history of science.  I usually teach a European-focused survey that extends, as historians term it, from Plato to NATO, or from ancient times to the present.  However, in the Fall 2017 semester, due to the opportunity given to me through being part of two grants—one from the Department of Education to establish a minor in Latin American Studies with Technical Applications, and the other through the National Endowment for the Humanities: Humanities Connections—I got to teach the history of science in Latin America.  Since I’m in the history department at an engineering university, most of my students are engineering majors, a situation which presents unique challenges in getting them interested in course topics. Out of 26 students in this history of science class, 25 were engineers and 17 were women.  For the last 4 years I have consistently used some sort of blended format for my lecture courses.  These usually consist of applying gamification or simply using a modified discussion format—mainly because I did not have time to teach engineers how to write (and I did not want to grade) a longer term paper.

Thanks to the encouragement by colleagues at other institutions, this semester I used the resources from Wiki Education to organize a new type of project for my students.  Since the history of science in Latin America is not something that a lot of historians of science study or teach, I decided to have students complete the full twelve-week assignment for editing and adding to Wikipedia articles.  It turned out to be an engaging, fun, semester-long project for my students, and I plan to use the assignment again in the Spring for the European history of science survey.

I’m not really a theorist, but as a historian I always need to answer the “why” question.  Namely: it’s great to use new tools in class, but how do they enhance student learning?  First, I wanted students to learn to think and write critically.  Term papers can do that, but Wiki Education has online trainings that allow students to learn the process of writing through an exercise that has real-world consequences.  The second reason, related to the first, is the failure of pseudotransactionality. Pseudotransactionality is the practice of having students pretend to write a letter to an employer, a newspaper article, or even a tweet. It is a real process, but with an artificial end; they know this, so they tend not to work that hard at it. Students, and in my experience especially the non-humanities engineering majors, think that the reason for writing in many classes—for the professor to see, grade, and stuff in a file to be forever lost to the void—is a waste of time.  However, I drove home the point that writing for Wikipedia is a real transaction between the student and the real-world reader. That reality—what I called the power of public shame—is a powerful motivator for my students.  I imagine it is for many students across the US.  Finally, as James Lang has argued in Cheating Lessons: Learning from Academic Dishonesty (2013), doing grounded assessment, projects that have an immediate impact on a student’s immediate environment (e.g. the internet), not only helps to eliminate cheating but more importantly increases student learning. A Wikipedia assignment did all of this for my students.

I arranged my course so that I lectured about course topics on Mondays and Wednesdays, then Fridays we worked on the Wikipedia project.  This meant that one day per week for 12 weeks I got to talk, in detail, with my students about how to think about course topics that they were interested in.  Wiki Education’s training goes through important research steps:

  • thinking critically about topics, the discussions surrounding the topics, and sources;
  • figuring out what is missing in any given article and what students would like to add to the conversation;
  • editing, both their own work and that of their peers;
  • and, I would argue most importantly, giving students the confidence to be experts on a topic and to put their expertise out into the public sphere.  

In essence, students learn the entire writing process, but they think they’re just writing about stuff they like.

One of my favorite student experiences from this semester was when I was talking to a student about her topic.  She had found two different print sources that mentioned one source in particular, but did not give any details about the third source.  She asked if she should find that third source, and how she should go about doing that.  I told her definitely yes, and showed her the process for getting the source sent to our library.  When she got the book, she brought it to class and had so much to say about what the source would add her topic, how she was going to use it, and why.  I explained to her, and the class, that that was what research was.  I also got to share with them stories of trials and tribulations of my finding sources for various projects—and they listened.

When they wrote their final reflection papers, every single student said that they had been resistant to the idea of doing work on Wikipedia because they just wanted to do a paper.  But during and after the process, they all said how much they 1) enjoyed the project; 2) learned more content from adding 500 words to an article than they had from doing long papers in other classes; and 3) learned more about the writing process than in other projects they had done.  All of them had been worried the work would be too much for them, but doing most of the work in-class tended to ameliorate that concern.

Assessing this type of project is difficult, and Wiki Education’s rubrics and guides were extremely helpful in figuring out how to do this.  I was focused on assessing the various steps along the way to the finished product.  One of the main reasons I chose this project was so I could help guide students’ research and writing skills.  I used detailed rubrics, and focused on having these be low-stakes assignments.  For example, even though the entire project was worth 600 total points (60% of the total course grade), all the steps leading up to the final article edits were worth relatively few points. Students could make changes all the way through, with the final goal being getting full marks on final deliverables: the Wikipedia article (180 points), presentation (60), and reflection paper (60).

Students were passionate about their topics and excited about the process. Students may have worked harder on this than on a term paper of the low-stakes, incremental grounded assessment and real-world implications of their projects. As with every new classroom tool, we must ask ourselves how students will engage with it and how the tool will help them thrive. With the resources developed by Wiki Education, students gained in depth understanding of how Wikipedia works and they learned the writing process—from beginning to end—and they had fun doing it.

For more information about teaching with Wikipedia, visit teach.wikiedu.org.

Image: File:Mststonehenge.jpg, Nebraska Puffer Fish, CC BY-SA 4.0, via Wikimedia Commons.

by Guest Contributor at January 05, 2018 05:30 PM

Neha Jha

Another Step Forward

This week my task involved working with databases. I am going to summarise here a few important points.

A transaction is a group of operations that are performed as a single unit of work.

While working with transactions, we should follow a few rules that will help us in accessing and working with database safely i.e maintain consistency in database.

ACID Properties

Atomicity-This means that either the complete transaction will get commited or none of it will get committed i.e ‘all or nothing’. If there is any failure in the middle of a transaction all the changes are rolled back.

Consistency- This property means that the database should be in a consistent state before and after a transaction.

Isolation- This property ensures that multiple transactions can occur consurrently without interfering. The changes done by transaction T1 will be visible in transaction T2 only after it is commited.

Durability- This rule ensures that all the updates done by the transaction should persist even if there is a system failure.

try {
$db->query("execute query 1");
$db->query("execute query 2");
//if there are no exceptions
} catch( Exception $e) {
// if there is any exception, transaction
// is rolled back.

MySQL does not support nested transactions. Since wikimedia-slimapp already uses transactions internally, we had to implement our own transaction methods. My mentor suggested two possible approaches of this problem. One approach is to use counting semaphores. The other is to use SAVEPOINT feature of MySQL innoDB.

Another important thing that I have learned is the importance of clear communication. While working remote, communication plays an important role. It matters to me a lot since I aspire to become a mentor too. So, I have decided to follow these rules-

  1. Do not assume anything about the problem

2. Provide as much details as possible for any solution

3. Include references if any

In the coming weeks, I hope to increase my productivity and communicate more clearly.

by Neha Jha at January 05, 2018 12:20 PM

Megha Sharma

Outreachy Chapter 3: The nitpick flood!

The year is new and so are the hopes! Goals have been set high; resolutions have been taken and highly ambitious time tables have been framed. So firstly, a very Happy New Year to my readers!

This year, my project has occupied a big place in my list of goals. I want my tool to have a huge outreach and benefit many! This project started as an experimentation venture but I see it turning out into a full blown project. I know the journey won’t be easy but so have been my goals — like getting into Outreachy without any prior Open Source experience.

Now coming to my current standing, I’ve successfully wound up the Requirements phase (yay!) and have moved to the UI Design phase.

With time, work is getting tougher and more challenging.

This assertion can be easily justified by the number of revisions my design has gone through. It can serve as an apt example of ‘before’ and ‘after’ :P. The transition has been huge!

Each time you look at the design and say that ‘it’s final. The finest I’ve ever seen’. But then the review makes you realize that you are far away from perfection. This has been my situation not once but several times!

High waves of the nitpick flood kept me drowned for long. But finally, I survived it and came out with ‘the finally final design’.

You can find the complete Design Mockup here and let me know how did you find it.

Against the normal expectations, the experience was amazing! I learnt a lot. Some major learnings that I want to share with you guys are —

  1. The biggest learning was to not splash all the colors you like in the design :P

2. Your design should speak for itself — that is it should be self explanatory

3. Keep the design simple, subtle and consistent

4. Last but not the least, it’s not easy to convert requirements into design

Things won’t be complete until I mention that how patient my mentors have been. Irrespective of how much time and how many reviews did it take, they kept on pouring their suggestions till the best came out. So, a big applause for them!

Next I’m starting with the implementation. For the details, do come back next week :)

by Megha Sharma at January 05, 2018 05:31 AM

January 04, 2018

Wikimedia Tech Blog

Designing for offline on Android

Photo by Hamza-sia, modified by Julian Herzog, CC BY-SA 3.0.

At Wikimedia, we like to start our design process with understanding our audiences. Earlier in 2017, our New Readers initiative conducted ethnographic research in Nigeria and India. A few interesting tidbits strongly resonated with us on the Wikipedia Android team:

Mobile dominates for getting online, and Android is the platform of choice … Mobile apps have exploded in popularity, with instant messaging and social media at the top.[1]

Prompted by these findings, we’ve introduced a number of improvements to the Android app in order to better serve these app users who have restricted or low-bandwidth access to the internet.

Features for offline

During the past year, we’ve worked on a number of offline features throughout the app.

  • Reading lists: Users can easily save articles to reading lists to view later when offline.
  • Caching by default: All articles which have been opened are cached, and remain available even when the internet connection is lost.
  • Offline Library: This feature envisions a seamless experience browsing Wikipedia from online to offline in one place. Users can download collections of Wikipedia articles to their ‘offline library’, and continue to search and read those articles with or without internet. An initial prototype was tested with users in India (see research findings) in September, and will be available in early 2018.

Designing for offline

We’ve tried to apply a number of best practices when designing for offline and low-bandwidth audiences on Android. We want to share the following key considerations as a handy guide for those of you who may be designing for similar audiences.

  1. State awareness

Knowing the state of a user’s internet connection and reflecting this in the product design is essential when those connections are unreliable.

We’ve introduced clearer in-product messaging throughout the app, so users are always aware of their connectivity status and know when they are reading offline content.

Examples of different notifications when the app is offline. Left: ‘Toast’ notification when an offline version of an article is shown. Center: Message shown in the Offline Library when searching whilst offline. Right: Message to refresh when.

  1. Contextual actions

Related to giving clear indication of the connection status to users, we are also providing more contextually relevant actions that appear when the app is offline. An example of this is when a user taps on a link whilst reading an article. When that user is online, a preview of the linked article is displayed, but when they are offline, rather than showing a ‘no connection’ message, they are provided with the option to save the article for reading later once their connection is restored.

Difference in a link preview card when a user is online vs offline

  1. Feedback on slower connections

For those on slower bandwidth speeds, it’s particularly important to be assured that actions taken have been recognized and something is loading. Lack of feedback may lead to unnecessary time and/or data wasted in as a process is re-triggered, or worse yet people may abandon whatever they were trying to do if they feel the app is unresponsive on slow connections.

Bearing this in mind, we’ve added more progress indicators to provide this feedback to users when an article is being saved to their reading list, and users are also clearly shown the progress of article packs being downloaded to their Offline Library.

Left: Progress indicator that an article will be saved to a reading list once a user is back online. Right: Message showing the progress of an article collection being downloaded into an offline library.

We’re also planning to revamp the loading screen to show a ‘skeleton’ version of the app, so that users may better anticipate what content is being gathered from the moment they open the app, and provides a greater sense of progress than the current static screen showing a ‘W’ Wikipedia icon.

^ Left: Current static loading screen. Right: Upcoming ‘skeleton’ loading screen

  1. Smarter caching for unreliable connections

Rather than putting the onus on users with flaky connections to explicitly save every article they have open for offline, we changed the system behavior to cache articles as soon as they are opened. This way, all articles people have open (and even articles in their recent history) remain browseable even when they lose their connection midway through reading.

  1. Data usage controls

More settings are now available for those who wish to save on data usage, such as an option to disable images, and another option to ‘prefer offline content’ instead of always loading the latest version of an article.

Left: Show images setting. Right: Prefer offline content setting.

In future, even more controls are planned, including:

  • An option to download articles on WiFi only: T163596
  • An explicit offline mode: T164756
  • Loading lower resolution images for thumbnails prior to loading full resolution images:  T159712
  1. Data usage and storage

By being more transparent about how much storage space is used by offline articles, we hope it helps people with limited and expensive data packages to have more visibility of their data usage and make decisions accordingly.

Left: Article collections for download clearly shown with filesize. Center: Total storage space used by the Offline Library. Right:  Reading lists show the number of articles available offline and the list file size.

We’re also investigating ways to reduce the size of the app itself by reviewing alternatives to data-heavy features like using Mapbox to view articles nearby in a map view.

  1. User education

Part of creating a better offline/low-bandwidth experience requires making people aware of those benefits with clear messaging, so that they know how and where to access to content in preparation of being offline.

With that in mind, we’ve introduced user education onboarding guides and also taking advantage of ‘empty states’ to provide more information about features as users interact with the app.

User education screens for the Offline Library feature

Empty state screens

  1. Sharing offline

Another key finding from the research was that “People are increasingly getting information online, then consuming or sharing it offline”.[2]

The offline library feature was designed with this in mind, with users who download article collections on one device able to copy and share the same files with others via USB, transferring to microSD card, or even via a Bluetooth connection. The app can detect multiple article collection files whether they are installed on a device or anywhere in an external SD card.

Finally, besides the ‘Offline library’ feature, the Wikipedia app itself may also be sideloaded, available on F-Droid (outside of the Google Play store), or else the APK for the current and previous versions are available to download and share from our Wikimedia Android releases page.

  1. Battery saving considerations

Users in low-bandwidth areas tend to have devices with lower battery capacity and have less opportunities to recharge during the day,[3] so we’re also starting to consider ways our app can reduce battery consumption.

A recent update made to the app with battery saving in mind is the addition of a ‘Black’ mode, since it has been shown that AMOLED devices derive substantial power savings when using a mostly black colored UI as compared to colour/light mode version of an app.[4] It’s a notable example of the impact of design decisions in this space as well as the more ‘expected’ development decisions (batching network calls, reducing app wake time, etc).

And as a final example of all the changes to the app, it’s also humbling to note that this feature was initially brought to our attention via feedback from our users.

Examples of Black mode

Your input

As the Android team strives to keep improving our app, we always welcome input and feedback from our users, and can be contacted via IRC (#wikimedia-mobile) or by emailing mobile-android-wikipedia@wikimedia.org.  And as an open source project, we also welcome contributions! For developers who wish to get involved, more information on getting started is on our App hacking page.

Further reading

The Wikimedia Foundation’s Design team as a whole is continually striving to improve our products for offline and accessibility, which is one of the reasons for our iOS app being named as an Editor’s Choice). This is in line with efforts from multiple organizations seeking to better cater for a large number of people worldwide with technical barriers to knowledge. Below are some links to our internal research and well as other interesting resources we’ve come across in this space.

Rita Ho, Senior User Experience Designer, Audiences Design
Wikimedia Foundation

Screenshots of the Wikipedia app above include Creative Commons-licensed content from the Wikipedia articles “Puffin,” “Atlantic Puffin,” “Marie Curie” (both CC BY-SA 3.0), and the Wikimedia Commons images Puffin002.jpg (CC BY-SA 3.0) and Atlantic Puffin Fratercula arctica.jpg (CC BY-SA 2.5).


  1. New Readers research findings presentation Sep 2016’, 2016, p36, 51.
  2. New Readers research findings presentation Sep 2016’, 2016, p58.
  3. Building for billions’, Accessed Dec 5. 2017.
  4. Triggs, R., ‘Do black interfaces really save power on AMOLED displays?’, Oct 22, 2014. Accessed Dec 5. 2017.

by Rita Ho at January 04, 2018 06:27 PM

Joseph Reagle

The Geek Syndrome, Brains, and Gender

In a 1993 Wired article, writer Steve Silberman characterized autism as the “geek syndrome.” Given that autism is partially hereditary, Silberman asked if the concentrations (and eventual pairing, known as assortative mating) of geeky folk in places like Silicon Valley meant that the “genes responsible for bestowing certain special gifts on slightly autistic adults—the very abilities that have made them dreamers and architects of our technological future—are capable of bringing a plague down on the best minds of the next generation.”[1] A few years later, neuroscientist Simon Baron-Cohen and his colleagues tested if there was a link between profession, heredity, and autism. Analyzing surveys from parents of children with autism, they found that fathers of these children were twice as likely to be in an engineering field than those of other children.[2] Baron-Cohen would go on to posit gendered types of thinking (female/empathizing vs male/systematizing) and argue that autism is actually a type of extreme male brain.[3] He came to call this “The Hyper-Systemizing, Assortative Mating Theory of Autism.”[4]

Baron-Cohen’s theory of gender and autism is controversial. On one hand, gender essentialism, attributing fixed and innate psychological attributes to men and women, is popular, as seen in the self-help classic Men are from Mars, Women are from Venus. Yet, despite small average differences and variance at the extremes, there is a lot of overlap in the personality, cognition, and behavior of men and women.[5] Cordelia Fine, a critic of gender essentialism, notes that even Baron-Cohen concedes that only half of the women in his studies have the “female” or “empathizing” sort of brain.[6] In related literature, one study found a weak association between men and systematic thinking, others found the inverse or no significant gender pattern at all.[7] Physical indicators like height, waist-to-hip ratio, and voice pitch are predictive of one another, but this is not the case for behavior: gender is multi-dimensional rather than categorical and scoring stereotypically in one behavior is not a good predictor of others, which is also true of the underlying brain structures.[8]

Additionally, the literature on autism and profession is inconclusive. With respect to parents, studies have found no association and a slight association with mothers’, rather than fathers’, professions. Among subjects themselves, a study of almost half a million viewers of UK Channel 4 found a positive relationship between being male, working in a STEM field (science, technology, engineering, and mathematical), and having a high Autism-Spectrum Quotient.[9] In my experiences with life hackers, in person and online, I encountered a lot of men, many of whom had technical jobs, and some of whom may have been somewhere on the autism spectrum. It is enough of an imbalance to be noticeable, to invite research, and to affect the larger culture, but we cannot yet declaim its causes.

All of this is complicated by the fact that our understanding of autism is biased. On the gender front, autism’s initial formulations were largely based on boys; autistic girls may have different behaviors or topics of enthusiasm, and society’s expectations about appropriate behavior likely affect girls’ socialization and diagnoses.[10] On the question of parents’ professions, Silicon Valley parents may have been more likely to push for diagnosis and treatment given their financial resources and early networking online. Additionally, the literature mentioned so far tends towards the medical view of autism, seeing it as a disorder, whereas others see neurodiversity and seek understanding rather than a cure.

The “geek syndrome” hypothesis is a contentious one, among researchers and the public. It intersects with debates about gender difference and essentialism, cognitive difference and disability, and questions of identity and culture. In 2016, Silberman returned to the topic in NeuroTribes: The Legacy Of Autism And The Future Of Neurodiversity. Silberman describes the history of autism diagnosis and treatment, of its likely association with early geek culture (especially ham radio and the MIT Railroad Club), of its normalization in popular media (by way of the film Rainman), and the rise of parent and autist advocacy.[11] In the intervening twenty-three years, Silberman realized that the “geek syndrome” is more complex and controversial than he originally conceived.

  1. Steve Silberman, “The Geek Syndrome,” Wired, August 30, 1993, http://www.wired.com/2001/12/aspergers/.

  2. Simon Baron-Cohen et al., “Is There a Link Between Engineering and Autism?” Autism 1 (1997): 153–63, http://docs.autismresearchcentre.com/papers/1997_BCetal_Engineer.pdf.

  3. Simon Baron-Cohen, The Essential Difference: The Truth About the Male and Female Brain (New York: Basic Books, 2003).

  4. Simon Baron-Cohen, “The Hyper-Systemizing, Assortative Mating Theory of Autism,” Progress in Neuro-Psychopharmacology and Biological Psychiatry 30, no. 5 (July 2006), https://doi.org/10.1016/j.pnpbp.2006.01.010.

  5. Lise Eliot, Pink Brain, Blue Brain: How Small Differences Grow into Troublesome Gaps – and What We Can Do About It (Boston: Marine or Books, 2009); Yanna J. Weisberg, Colin G. Deyoung, and Jacob B. Hirsh, “Gender Differences in Personality Across the Ten Aspects of the Big Five,” Ncbi, August 1, 2011, http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3149680/.

  6. Cordelia Fine, Delusions of Gender: How Our Minds, Society, and Neurosexism Create Difference (New York: WW Norton & Co., 2010), 16.

  7. Paul Norris and Seymour Epstein, “An Experiential Thinking Style: Its Facets and Relations with Objective and Subjective Criterion Measures,” Journal of Personality 79, no. 5 (September 26, 2011), https://doi.org/10.1111/j.1467-6494.2011.00718.x; Christopher Allinson and John Hayes, The Cognitive Style Index: Technical Manual and User Guide (United Kingdom: Pearson Education, 2012); Sarah Moore, Donncha O’Maidin, and Annette Mcelligott, “Cognitive Styles Among Computer Systems Students: Preliminary Findings,” Journal of Computing in Higher Education 14, no. 2 (2002): 45–67, https://doi.org/10.1007/BF02940938; Lilach Sagiv et al., “Not All Great Minds Think Alike: Systematic and Intuitive Cognitive Styles,” Journal of Personality 82, no. 5 (October 21, 2013), https://doi.org/10.1111/jopy.12071.

  8. Carothers and Reis, “Men and Women Are from Earth,” 12; Daphna Joel et al., “Sex Beyond the Genitalia: The Human Brain Mosaic,” PNAS, October 2015, 1, https://doi.org/10.1073/pnas.1509654112.

  9. Rosa A. Hoekstra et al., “Heritability of Autistic Traits in the General Population,” Archives of Pediatric and Adolescent Medicine 161, no. 4 (2007): 372–77, http://docs.autismresearchcentre.com/papers/2007_Hoekstra_etal_AQin_twins_APAM2007.pdf; Gayle C. Windham, Karen Fessel, and Judith K. Grether, “Autism Spectrum Disorders in Relation to Parental Occupation in Technical Fields,” Autism Research 2, no. 4 (2009): 183–91 page 186. STEM professionals (m = 21.92, SD = 8.92) scored higher than those in non-STEM careers (m = 18.92, SD = 8.48). Similarly, men (m = 21.55, SD = 8.82) scored higher than women (m = 18.95; SD = 8.52), Emily Ruzich et al., “Sex and STEM Occupation Predict Autism-Spectrum Quotient (AQ) Scores in Half a Million People,” ed. Masako Taniike, PLOS ONE 10, no. 10 (October 21, 2015): e0141229, http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0141229.

  10. Meng-Chuan Lai et al., “A Behavioral Comparison of Male and Female Adults with High Functioning Autism Spectrum Conditions,” ed. James Scott, PLoS ONE 6, no. 6 (June 13, 2011): e20835, https://doi.org/10.1371/journal.pone.0020835; Jordynn Jack, Autism and Gender: From Refrigerator Mothers to Computer Geeks (Urbana: University of Illinois, 2014), 151; Meng-Chuan Lai et al., “Sex/Gender Differences and Autism: Setting the Scene for Future Research,” Journal of the American Academy of Child & Adolescent Psychiatry 54, no. 1 (January 2015): 11–24, https://doi.org/10.1016/j.jaac.2014.10.003.

  11. Steve Silberman, NeuroTribes: The Legacy of Autism and the Future of Neurodiversity (New York: Avery, 2016).

by Joseph Reagle at January 04, 2018 05:00 AM

January 03, 2018

William Beutler

The Top 10 Wikipedia Stories of 2017

Every year since 2010, The Wikipedian has delivered a roundup of the most interesting events, trends, situations, occasions, and general goings-on that marked the foregoing year on Wikipedia and in the broader Wikimedia community. Last year’s edition remarked upon the head-spinning series of events that made 2016 the “worst year ever”—or so we thought at the time—and now, looking ahead to 2018, we have a stronger sense that the most realistic expectation is more of the same.

Where does Wikipedia fit into that? Following the U.S. presidential election, it became briefly fashionable to see Wikipedia as a bulwark against “fake news”, but in a year where the new American president suffered vanishingly few consequences for his constant issuance of falsehoods, 2017 very much felt like a year when truth was under constant attack. These ten stories depict a Wikipedia editorial community and readership not necessarily in the midst of a crisis, but of life during informational wartime. Let’s go:

10. In the Wikimedia Year 2030…

Wikimedia 2030, photo by Avery JensenLast year’s list was dominated by a metastasizing organizational breakdown culminating in a change of leadership at the Wikimedia Foundation (WMF). Among many complaints about the non-profit’s former executive director, two of the most important were vision and communication, which is to say their lack. Katherine Maher, WMF’s current chief, seems determined not to let the same be said of her. In August 2017, a little over a year into her tenure, she announced an initiative called “Wikimedia 2030”, starting with a high-minded re-articulation of the Wikimedia movement’s mission statement and a series of commitments to (paraphrasing from the document itself) advancing the world through knowledge. It’s obviously operating on a very long time frame, and a lot depends on its implementation, which is yet to come. But the document received overwhelming support by community members in October, which is at least a positive sign in this otherwise fractured age.

9. The Daily Mail and Governance

Daily Mail clock, photo by Alex Muller / WikideaWikipedia’s quality is highly dependent on the sources it allows to verify its information. In February Wikipedia’s community decided it was fed up with the website of UK tabloid The Daily Mail for its mendacious unreliability, and so “voted” to “ban” its use. This apparent decision was widely reported, including by this blog. And yet, that’s not quite what happened. Rather than an official blacklisting, the Daily Mail was simply added to a list of potentially unreliable sources, and it’s possible to find instances of the website being used as a reference since, perhaps by contributors entirely unaware there was a controversy in the first place. This is how Wikipedia works: it has very few rules that cannot be overcome by editorial clout, determined obstinacy, continued evasion, or blithe disregard. On the whole, Wikipedia works pretty well, but breaks down at the edges: and that is still where the Daily Mail remains.

8. “Monkey Selfie” Reckoning

First, a mea culpa: as far as I can tell, The Wikipedian has never written a word about the Monkey selfie copyright dispute, as Wikipedia’s own article on the subject calls it.

Monkey selfie by David SlaterWikipedia played only a small role in the legal case, which primarily involved nature photographer David Slater being sued by the People for the Ethical Treatment of Animals on behalf of a Celebes crested macaque who had no idea any of this was taking place. The legal matter isn’t quite settled, but as of September it seems close: Slater keeps the copyright, with concessions. Yet Wikipedia played a much larger role in the sense that there may never have been a case at all, or it would have remained quite obscure, had the WMF not refused to abide by Slater’s request and delete the photo from Wikimedia Commons. By virtue of its high profile, Wikipedia magnifies everything.

What’s more, the enthusiasm of its community also obscures: I remember the photo being everywhere at Wikimania 2014 in London and, being charmed like everyone else, I played along and used it in a slide presentation without looking into it further. I’m more regretful of this than my own non-coverage, and consider it still unresolved whether WMF is on the side of virtue in this matter. (Why am I using it here, then? For the same reason Wikipedia uses copyrighted logos: for identification.)

It seems indisputable to me that the copyright should belong with the human who went to considerable lengths at personal cost to facilitate its creation, regardless of which bipedal mammal clicked the button, and if the law is unclear on this, then the law should be clarified. If you haven’t listened to This American Life’s episode about the case from November, it’s worth your time—and Wikipedia doesn’t come across terribly well.

7. Burger King’s Way

Burger KingRemember this? In April, Burger King announced a television ad for the U.S. and UK markets featuring dialogue intended to activate Google Home and read out Wikipedia’s entry for the Whopper. Almost immediately, The Verge noticed that Burger King’s ad team had surreptitiously edited the Whopper entry from Wikipedia’s typical dispassionate summary “…signature hamburger product sold by the international fast-food restaurant chain…” to unambiguous marketing-speak “…flame-grilled patty made with 100 percent beef with no preservatives or fillers…” Then, predictably, unidentified randos joined in and hijacked the entry to disparage the mass-market burger, producing head-scratching headlines like this one from BBC: “Burger King advert sabotaged on Wikipedia”.

Although Burger King was probably unaware of Wikipedia’s policy “Wikipedia is not a soapbox or means of promotion” and practically guaranteed ignorant of the guideline “Do not disrupt Wikipedia to illustrate a point” that should hardly matter; Burger King knew what it was doing, and figured the ensuing coverage was worth the cost. They were probably right. But I can’t not play the schoolmarm, and tsk-tsk: it’s one thing for a high-school student to vandalize Wikipedia for fun, but quite another for a multinational corporation.

6. Wikipedia Vandalism is Fun for All

Last year’s version of this column decried the phenomenon of lazy sports-bloggers leaning on blink-and-you-missed-it vandalism of sports-related Wikipedia articles for amusement and clicks, and this continued unabated throughout 2017. Most of these stories came from minor sports websites and local news teams, but just as Wikipedia’s prominence owes to its high Google search ranking, so too are these time-wasters afforded visibility by Google News. But this year, we got something else: ostensibly serious news publications marveling over a pattern of self-aware edits coming from U.S. congressional computers.

US CSince 2014, the automated Twitter account @CongressEdits has tracked and reposted every edit made from House and Senate offices; in October, BuzzFeed and CNN both noticed that someone on the Hill was editing articles from Carly Rae Jepsen to Chuck E. Cheese, and on subjects as ubiquitous as Star Wars to obscure as indie band The Mountain Goats. In December, a college student and former congressional aide claimed credit to The Daily Beast, which led to other former interns and anonymous persons crying out for recognition as well. Whether for the lulz, or as part of “the resistance”, these edits at least proved that curiosity about Wikipedia’s willful vulnerability to nonsense appeals to journalists and readers who should probably be focused on something else.

5. Signpost of the Times

WikipediaSignpostIcon.svgA year ago, this list bemoaned the decline of Wikipedia criticism, largely based on the departure of critical thinkers (or at least decent writers) from forums such as Wikipediocracy. This year, I find myself concerned with Wikipedia’s own community news source, The Signpost. A bi-weekly online “newspaper”, The Signpost has been around since 2005, written and edited by volunteers much as Wikipedia itself is. In early 2016 a new editor-in-chief took the reins, led with an ambitious and hopeful editor’s note, produced three issues by the end of February, and then simply stopped.

The editor, a longtime community veteran and onetime WMF staffer, in fact ceased editing Wikipedia almost entirely. I thought about investigating it at the time, but figured I already knew the basics: burnout is a natural occurrence and all but inevitable, although it’s less typical for a project leader to step away without so much as a “gone fishin'” sign. By June, a skeleton crew of former contributors had banded together to put out an edition on at least a once-per-month basis, with a new permanent editor named as of September. Here’s hoping they can return the Signpost to its former schedule and retain its high quality.

In the meantime, I’ll say again what I’ve said many times before: The Signpost is hard work and is a crucial service for the core Wikipedia community; its health is in some ways a measure of the health of the community itself. Its editorship should be a stipended position, funded by but free from oversight of the Wikimedia Foundation. Wikipedia does not depend upon volunteer developers, nor should it depend on volunteer reporters.

4. Everipedia Stalking

What’s Everipedia? Oh, it’s just the latest upstart challenging Wikipedia, this time an actual startup: a rival wiki-based online encyclopedia launched in 2014 by a couple of UCLA students, which later attracted investment from excommunicated Rap Genius co-founder Mahbod Moghadam, and in December also the involvement of expatriate Wikipedia co-founder Larry Sanger.

195px-L_SangerEveripedia is certainly audacious, calling itself the world’s biggest encyclopedia (for having exported all of Wikipedia’s entries and then adding more Wikipedia wouldn’t accept) and it projects a certain braggadocio not typically found in online knowledge repositories (at one time, its founders liked to call it “Thug Wikipedia”). It’s also not Sanger’s first attempt at a do-over, having left Wikipedia citing philosophical differences early on; his decidedly more staid Citizendium effort is itself now more than 10 years old, but with only a handful of active editors, is all but a dead project.

The most interesting thing about Everipedia, though, is its pivot to using blockchain technology and announced development of a cyrptocurrency with which to pay contributors. I’m curious to be sure, but even more sure of my skepticism. No question, Wikipedia is built on a relatively ancient software framework, and there is a case to be made that blockchain’s public ledger could represent an advancement in recording all “transactions”. But this is what Harvard’s Clayton Christensen would call a “sustaining innovation”, not a “disruptive innovation”—there’s no reason Wikipedia couldn’t adopt a blockchain ledger should the idea prove meritorious, meanwhile there’s very little chance that Everipedia can replace the day-to-day deliberations of an editorial community more than 15 years old. Culture is impossible to replicate, and extremely difficult to develop. I can’t promise an assortment of brogrammers and Wikipedia’s kooky uncle won’t pull it off, but I have my doubts.

3. Hey, Big Spenders

Wikimedia_Foundation_financial_development_multilanguage.svgWikipedia’s fundraising prowess, ever-growing expenses, and nevertheless-expanding bank account are a matter of interest year in and year out. From about $56,000 in the bank at the end of the 2004 fiscal year to more than $90 million by 2016, Wikipedia’s financial situation is still growing in a way that’s entirely divorced from the number of volunteers actively participating. In February, a 12-year veteran editor published an alarming (or alarmist) op-ed at the then-functioning Signpost with the unfortunate headline “Wikipedia Has Cancer”.

The controversial connotation (which I realize I’ve also made in #10) was very much intended: Wikipedia’s financial position has far exceeded what is necessary for the running of this non-profit, volunteer-driven project. What happens if (and presumably when) revenues slow—will the Wikimedia Foundation adjust spending downward, or start taking on debt? Pointing to recent failures in WMF software development initiatives as a reason to worry about Wikipedia’s leadership, the op-ed called for a spending freeze and greater transparency in financial matters. With some fiscal discipline, and Wikipedia’s newly-established endowment, Wikipedia could live comfortably off its prior fundraising indefinitely. Although the rhetoric was probably excessive, it struck a nerve, attracting an overwhelming number of comments in a discussion that continued for months. Soon after, an article in Quartz called the resulting frenzy “nuts”, and published a chart comparing Wikipedia favorably to similar institutions, including the New York Public Library and even the British Museum.

2. Slow Wiki Movement

Given the lack of high-impact news events surrounding Wikipedia, here is a new one: nothing really happened this year. That’s probably good news, but it doesn’t make for an exciting story. And for an avowed non-story, it’s relatively high-positioned as well. But as I contemplated the mood around Wikipedia over the past twelve months, I found it rather fitting.

320px-Wikidatacon_ux_participatorydesignworkshop_11Two items that just missed the cut: the WMF’s 2015 lawsuit against the NSA, dismissed by one court, was reinstated by another, and this could well be a standalone entry next year. And Wikipedia’s open database, Wikidata, continued to develop and grow, but all of this happened behind the scenes, without any single inflection point (though attendees of the first-ever Wikidatacon are free to disagree with me).

Meanwhile, Wikipedia’s edit wars and paid editing scuffles continued, but few made actual news. Trolls, especially of the GamerGate variety, continued to be a nuisance, but (for now) are not an existential threat. Wikipedia’s gender imbalance barely registered a blip, Wikipedia’s editorship numbers again ticked upward, and Wikimania Montreal went off without a hitch. Other topics this year-end report card series has discussed before were also ho-hum: no major sock puppet networks detected, no major article-creation milestones (we’re just over halfway to 6 million), the detente between Wikipedia and education continues, and the Visual Editor continues to work even as most veterans ignore it. Yes, Turkey blocked Wikipedia, but following China and Russia having done so in previous years, it hardly made a dent.

This is what maturity looks like: Wikipedia is Wikipedia, and seems likely to continue doing what it does for a long time to come. So, does it feel like we’re celebrating?

1. WikiTribune’s Rocky Start

wikitribuneIn keeping with the somnolence of the previous item, this year’s top story isn’t even about Wikipedia: it’s about WikiTribune, the other new initiative from Wikipedia’s other co-founder, Jimmy Wales. Announced to great fanfare and no little skepticism in April, Wales’ long-dreamed wiki-based online news site finally launched at the end of October. Early reviews were not enthusiastic, and it has been little remarked-upon since. As of this writing, it has continued publishing a few stories a day, none with any apparent impact. WikiTribune offers little more than what other news operations are doing, and less of it.

In May, this blog offered advice about how it might stand out in a crowded online world: by focusing on developing news teams at the local level, and trial-run innovations that might be ported back Wikipedia. But WikiTribune seems determined to cover international news with no discernible viewpoint or special access, and has no connection to Wikipedia besides its name and famous founder. Why would anyone visit WikiTribune for news over any other publication? I have no idea. Alas, WikiTribune looks like just another much-heralded effort to reinvent news by doing the exact same thing that other news publications were already struggling to keep doing in seemingly impossible circumstances. Whether WikiTribune survives to see the end of 2018, or makes this list a year from now, I have no idea either.

Photo credits, in order: Avery Jensen; Alex Muller / Wikidea; David Slater; Restaurant Brands International; Public domain; Kjoonlee; Larry Sanger; Sameboat; Jan Dittrich; WikiTribune.

by William Beutler at January 03, 2018 11:16 PM

Wikimedia Foundation

What’s your second screen? Film, television, and the British monarchy fill Wikipedia’s most-viewed articles of 2017

Image via Pexels, CC0.

Anglophile (noun): “a person who greatly admires or favors England and things English.” (Merriam-Webster)

The term certainly applies to 2017, as seen through the lens of the English Wikipedia’s readers. This year, millions used the free encyclopedia to learn more Queen Elizabeth, Queen Victoria, and the engagement of Meghan Markle and Prince Harry.

Anglophilia also feeds into into this year’s other theme: television and film, via The Crown and Victoria. In just the top 25 most-viewed articles, people visited Wikipedia over 200 million times to learn about the newest shows or as a second screen.

The list “speaks to the power that entertainment now holds over us,” says Wikipedia editor Stormy clouds, a volunteer Wikipedia editor and a member of Wikipedia’s Top 25 team, who helped compile a larger top 50 most-viewed list with commentary. He was struck by “the terrific power of Netflix,” which via The Crown helped power Queen Elizabeth to the third-highest view count of the year,[1] and believes that her “continued strong performance should act to dispel the myth that no one cares about the monarchy.”

To be fair, however, the monarchy’s popularity is not solely limited to the queen. The second screen effect is easily discernible from pageviews to the articles on Queen Victoria and Victoria the TV series. Both had jumps each week when people tuned in to watch the show on TV—and pointed their second screens at Wikipedia.[2]

Piling on top of all this, Markle’s article came in fifth—and although we’ve limited this list to 25 entries, Queen Elizabeth’s sister Margaret came in 37th, and Elizabeth’s husband came in 44th. Both, like Elizabeth, were featured on The Crown in its most recent season.


Outside Britain, the strength of India broke through in the realm of film: the most-viewed article about a film this year was Baahubali 2: The Conclusion. The film has seen extremely levels of popularity since being released last April—with over 105 million tickets sold, it quickly became the highest-grossing film ever in India ever by what can only be described as a ludicrous margin.[3] Baahubali 2 has made over ₹17.065 billion on a budget of ₹2.5 billion.

Wonder Woman flexed its muscles as well, placing both it, its leading star, and the ensemble Justice League film in the top 25. This may have been apparent as soon as June, where the first two articles were the most popular of the month. Classicfilms, who helped author the English Wikipedia’s article on Wonder Woman, told our own Samir Elsharbaty back then that:

[Patty Jenkins, the film’s director] decided to focus on inclusivity as a way to work through the various complications of gender that seemed to stall the film for two decades. Her vision appears to have worked, particularly with that segment of the population who either grew up reading comic books or playing games related to these comic characters, and I think most would agree, that this is a tough crowd.

The top 25 most-read articles on Wikipedia in the year 2017 follow.[4] Our grateful thanks go to researcher Andrew West for collating the data, and to the Top 25 team for their work in verifying the entries that appear here.

  1. Deaths in 2017, 37,387,010 views[5]
  2. Donald Trump, 29,644,764
  3. Elizabeth II, 19,290,956
  4. Game of Thrones (season 7), 18,792,746
  5. Meghan Markle, 16,944,130
  6. Game of Thrones, 16,833,302
  7. List of Bollywood films of 2017, 16,391,427
  8. United States, 15,763,915
  9. Bitcoin, 15,026,561
  10. 13 Reasons Why, 14,934,202
  11. Baahubali 2: The Conclusion, 14,607,282
  12. It (2017 film), 14,539,123
  13. Queen Victoria, 14,164,451
  14. List of highest-grossing Indian films, 14,091,348
  15. Gal Gadot, 14,034,958
  16. Logan (film), 14,030,384
  17. Millennials, 13,417,915
  18. Riverdale (2017 TV series), 13,360,398
  19. 2017 in film, 13,298,613
  20. Stranger Things, 13,132,129
  21. Wonder Woman (2017 film), 13,062,375
  22. Dwayne Johnson, 12,444,987
  23. Star Wars: The Last Jedi, 12,442,644           
  24. Justice League (film), 12,048,341
  25. Elon Musk, 11,968,362

Quick hits

  • Two years ago, we asked if the dominance of film in that list was the harbinger of things to come. The answer appears to be yes … and no. While film is much more prominent now, TV is holding its own thanks in no small part to Netflix (#10, 20, partial credit for 3) and Game of Thrones.
  • Film articles on this list: #7, 11, 12, 14, 16, 19, 21, 22, 23, 24.
  • Film and television take between 14 and 16 spots, depending on how you count Queens Elizabeth and Victoria.
  • Millennials?
  • Deaths in <year> has been the most-read article in all but one year we’re published this list.
  • Page views to Donald Trump, the most popular article last year, fell by more than 45 million.
  • Only two articles on the list, Deaths in 2017 and 2017 in film, had less than half of their pageviews comes from mobile devices.



  1. Netflix’s 13 Reasons Why and Stranger Things also placed 10th and 20th, respectively.
  2. Victoria‘s first season aired last year, but was shown in the United States early in 2017; its second season appeared later in the year.
  3. Amusingly, that list of highest-grossing films in India was itself one of the English Wikipedia’s most popular articles of the year.
  4. As with every year we’ve done this, the top articles include the percentage of mobile views for screening purposes. However, we’ve upped the percentages to remove articles with less than 10% or more than 90% mobile views, as it almost always indicates that a significant amount of the pageviews stemmed from spam, botnets, or other errors. Beyond that, we’ve agreed with the Top 25 team’s decision to remove several entries with unclear reasons for their high popularity, like AMGTV, Lali Espósito, and “XXX.” This year’s xXx: Return of Xander Cage was popular, for example, but its release does not track with a major increase in popularity for XXX. The other two are standard removals in the Top 25’s weekly lists, with Espósito in particular having a strange percentage of views from mobile and Wikipedia Zero. You can see the full list and the cleaned list over on Wikipedia.
  5. Wikipedians chronicle the deaths by month, so the page now redirects to a “list of lists of deaths.” This year’s list has already been started at deaths in 2018.


Ed Erhart, Senior Editorial Associate, Communications
Wikimedia Foundation

Like what you’ve read? You can see a list of 2017’s most-edited English Wikipedia articles and previous most-viewed lists from 2016 and 2015. Most-viewed English Wikipedia articles of each week are available through the Top 25 Report.

This post has been updated to clarify that there were 200 million pageviews to film and television articles in the top 25—a figure that does not necessarily equate to the number of people visiting those articles.


by Ed Erhart at January 03, 2018 06:44 PM

Vinitha VS

Step by step


“It is impossible to live without failing at something, unless you live so cautiously that you might as well not have lived at all – in which case, you fail by default.”

J. K. Rowling


I learned two important lessons last week.

One is that organizing your tasks can help you do things in a systematic manner. I had many small tasks to do and the details of these tasks were mostly in emails. But referring to emails every time to check details of tasks is very tedious and inefficient. However, there is a much better way – put everything in Phabricator tasks and link information as necessary. Using the Phabricator workboard to organize my tasks, everything was visible to me in one place. I could arrange tasks as per their priority and dependencies. Earlier, I used to spend a lot of time thinking about what I had to do and should not forget. This would go on repeatedly in my mind, though not consciously. Now I am freer to focus on doing the actual work.

The second crucial lesson is regarding communicating ideas clearly, especially to people who do not have any prior context. Since I have been thinking about the project most of the time, I (unknowingly) assume that everyone else has some basic idea about it. This can lead to unfortunate misunderstandings. There is no point blaming Murphy when there is something we can do to improve the situation.  In communication, the sender should not assume anything about the receiver and make clear the ideas to be communicated. This was a great tip I received from my mentors, who explained to me about the XY problem. Also, it is always good to talk about the problem as clearly and explicitly as possible. The receiver should also not assume anything and should ask questions if something is unclear.

This week, I also learned about the use of virtual machines as a safe sandbox for testing software without affecting the host system and explored some methods for Outlier detection, trying out different features and reading more about how to improve the accuracy.

I feel this is just the tip of the iceberg. There are many more things to do. And a lot more to explore, learn and implement.

by vinithavs at January 03, 2018 05:40 PM

Wiki Education Foundation

Contributing accurate medical information to Wikipedia

January happens to be Thyroid Awareness Month. If you heard this and went in search of thyroid information, you’d likely end up on Wikipedia. There, you’d find the article for thyroid disease, which was heavily expanded during Dr. Amin Azzam’s Fall 2016 course for fourth year medical students at UCSF. That article has been viewed 26,778 times since these students added to it.

Did you know that January is also Cervical Health Awareness Month? Other students in Amin’s class expanded the article on vaginal discharge, which has now been viewed 30,458 times since. And hey — the last week of January is National Drug and Alcohol Facts Week. A student in Amin’s Spring 2017 course contributed to the article on opioid use disorder. And that article has since been viewed 82,818 times.

Our point is: students in our program are contributing highly relevant information to a public resource that millions of people are accessing around the world.

The public looks to Wikipedia for medical information

We’ve written about the impact of teaching with Wikipedia in a medical classroom before. Wikipedia is the leading source for health information on the web; it gets more traffic on medical articles than sites for the NIH, WebMD, Mayo Clinic, NHO, or WHO. So we really want to make sure that information is accurate. Wikipedia has strict guidelines for sourcing when it comes to medical information on the site to safeguard against misinformation. Still, there are always content gaps to fill. That’s where students in Wiki Education supported courses come in.

Benefit of med students contributing to Wikipedia

Medical students who complete a Wikipedia assignment are in a unique position to disseminate important health knowledge. They have access to great sources and experts in their field. They are also more likely to remember what it was like not knowing this information and therefore have an ability to present complicated medical topics in a clear, understandable way. As Dr. Amin Azzam has said before, they’re far enough along in their medical training that they have the confidence to share their expertise, but they’re not so far along in their careers that “they can no longer speak English.”

Not only are medical students uniquely positioned to contribute to Wikipedia for the benefit of public knowledge, but they gain a lot from a Wikipedia assignment as well. Learning to contribute to, and therefore understand the inner-workings, of a resource they use all the time provides valuable media literacy skills.

According to a study recently published in the JMIR Medical Education journal, medical students often look to Wikipedia for health information themselves. In the study, those who used Wikipedia to learn new concepts “had superior short-term knowledge acquisition compared with those who used a digital textbook.” Ultimately, it concludes:

“This study challenges the view that Wikipedia should be discouraged among medical students, instead suggesting a potential role in medical education.”

Engaging with a resource that both these students and the public are relying upon for health information is a beneficial experience for students.

“My students realized just how many people turn to Wikipedia for health-related information, which motivated them to work much harder than a traditional ‘throw-away’ assignment,” one instructor said in their survey response from last fall.

Check out videos we made in collaboration with UCSF faculty about why they choose to teach with Wikipedia.

What this means for public knowledge

In a previous post of ours, How Wikipedia is unlocking scientific knowledge, Classroom Program Manager Helaine Blumenthal speaks to the impact of a Wikipedia assignment in the sciences:

“When individuals have access to accurate information, they are presented with a world of possibility and choice. … With every contribution our students make, the gap between scientific expertise and public knowledge shrinks, but those students also walk away with a lifetime ability to communicate highly specialized knowledge to a general audience. Whether they choose to pursue careers in the sciences or not, they understand the importance of sharing knowledge, the ability to do so, and with Wikipedia, the forum to make it all happen.”

For information about editing medical articles on Wikipedia, see our free brochure resource. We also helped to develop a new video for people editing medical topics on Wikipedia. If you’re interested in teaching with Wikipedia, visit teach.wiki.edu or reach out to contact@wikiedu.org.

by Cassidy Villeneuve at January 03, 2018 05:13 PM

Lorna M Campbell

2017 Highs, Lows and Losses

I ended up taking an unscheduled break from blogging and social media over the holidays as I was laid up with a nasty virus and its after effects.  Bleh.  So in an attempt to get back into the saddle, I’m taking a leaf out of Anne-Marie’s book with this “Some things that happpened in 2017” post.  So in no particular order here’s a ramble through some of the things that made an impression on me, for one reason or another, over the last year.


OER is my conference;  I’ve never missed a single one since it kicked off in 2010.  They’re always thought provoking and topical events, but OER17 The Politics of Open was particularly timely and unexpectedly emotional. I was fortunate to take part on several panels and talks, but the one that will always stay with me is Shouting from the Heart, a very short, very personal, lightning talk about what writing, openness and politics means to me.  I’d never given such a personal talk before and, not to put too fine a point on it, I was fucking terrified. I was supposed to end with a quote from the Declaration of Arbroath but I bottled it and had to stop because I was in danger of crying in front of everyone. It was a deeply emotional experience, but the overwhelming response more than made up for for my mortification.   I was also extremely grateful to meet up with many old friends and to meet many new friends too.

International Women’s Day

I was honoured to be name checked on International Women’s Day by several colleagues who I respect and admire hugely.  I’m still deeply touched.  Thank you.

Mashrou’ Leila  مشروع ليلى

Mashrou’ Leila مشروع ليلى are a Lebanese indy rock band whose lead singer Hamed Sinno is openly gay and a vocal advocate for LGBTQ issues, women’s rights and contemporary Arab identity. Mashrou’ Leila also happen to be one of my favourite bands of the last year so I was over the moon to be in London when they played an amazing open air gig at Somerset House in July.  It was a fabulous night and I don’t think I’ve ever seen such a diverse crowd at a music event.  I got quite emotional seeing the rainbow flag flying over Somerset House. Sadly, when Mashrou’ Leila played in Cairo a few months later, seven concert goers were arrested for raising that same rainbow flag and were subsequently charged with promoting sexual deviancy.

Mashrou’ Leila, Somerset House, CC BY Lorna M. Campbell

Wiki Loves Monuments

I’ve meant to take part in the Wiki Loves Monuments photography competition for years now.  I’ve taken hundreds of photographs of monuments over the years and they really should be in the public domain rather than languishing on various ancient laptops.  But it took my fabulous colleague and University of Edinburgh Wikimedian in Residence, Ewan McAndrew, to prod me into contributing.  Ewan made it his mission to get as many photographs of Scottish monuments uploaded to Wikipedia Commons as possible, and maybe try to beat the Welsh in the process.  The whole competition was hugely enjoyable and got very competitive. By the time it closed at the end of September over 2000 new images of Scottish monuments had been uploaded, and 184 of my old holiday snaps had found a new lease of life on Wikimedia Commons. Hats of to Ewan and Anne-Marie for the hundreds of amazing photographs they submitted to the competition.

A few of my pics…

Women in Red

In 2016 I was honoured to join Wikimedia UK’s Board of Trustees but it was in 2017 that I really started editing Wikipedia in earnest.  I created a number of new pages for notable women who previously didn’t have entries.  The ones I’m most proud of are:

Mary Susan MacIntosh, sociologist, feminist, lesbian, and campaigner for lesbian and gay rights.  MacIntosh was a founding member of the London Gay Liberation Front, she sat on the Criminal Law Revision Committee which lowered the age of male homosexual consent, and she played a crucial role in shaping the theory of social constructionism, a theory later developed by, and widely attributed to Michel Foucault. MacIntosh’s Wikipedia page still needs a lot more work, so please, if you can help, go ahead and edit it.

Elizabeth Slater a British archaeologist specialising in archaeometallurgy. She was the first female professor of archaeology appointed by the University of Liverpool.  Liz was also the only female lecturer teaching archaeology at the University of Glasgow when I was a student there and her lectures made a huge impression on me. I was chuffed to be able to build a Wikipedia page for her.

Open Tumshies

Mah tumshie appeared in The Scotsman online! And you can read about it here 🙂

Open tumshies ftw!

Audierne Bay

In July my partner drove our aged VW camper van all the way to Brittany and we spent two weeks camping in Finistère with our daughter.  While we were there we visited Audierne Bay, where the Droits de L’Homme frigate engagement took place during a ferocious gale on the night of 13th January 1797.  This engagement was the starting point for the book Hornblower’s Historical Shipmates, which I wrote with my dear friend Heather Noel-Smith.  The day I visited Audierne Bay was bright and sunny and the beach was filled was families and holiday makers.  It was a sobering thought to stand there and look out at the reefs where hundreds of men lost their lives two hundred years before.

Audierne Bay, CC BY Lorna M. Campbell


Finally, after years of procrastinating, I wrote my portfolio and became a Certified Member of the Association for Learning Technology.  And I did it all in the open!

Me and inspirational ALT CEO, Maren Deepwell, CC BY, @ammienoot

UNESCO OER World Congress

In September I was honoured to attend the UNESCO OER World Congress in Ljubljana to represent the University of Edinburgh and Open Scotland, along with my colleague Joe Wilson. I’m so glad we were able to attend because, along with the fabulous Leo Havemann, we were the only people there from the UK.  It was a really interesting event and I hope the resulting OER Action Plan it will help to raise the profile of OER worldwide.

UNESCO OER World Congress, CC BY Slovenian Press Agency


In November I was invited to give a talk about OER and open education at UCLouvain. It was a brief but enjoyable trip and I’d like to thank Christine Jacqmot and Yves Deville for their hospitality and for showing me around their unique city and university.

Mural, Louvain-la-Neuve, CC BY Lorna M. Campbell


I don’t get to dance much these days, due to work, commuting, childcare etc, but I did get to have one or two tango adventures this year.

A wedding and a ridiculous frock

In October my sister got married in Stornoway and I promised to buy the most ridiculous vintage frock I could find for the wedding.  I think I succeeded.

Channelling Abigail’s Party…

These guys…

Nike & Josh, CC BY Lorna M. Campbell

Also these guys…

We had a family of foxes living in the garden this year.  When I was working from home through the summer months I often had two or three foxes curled up sleeping in the sun outside my window, if not even closer!

Josh & friend, CC BY Lorna M. Campbell

Inevitably there was some real low points and losses during the year too.

I had a horrible medical emergency while travelling to Brittany and had to get blue-lighted off the boat in an ambulance and carted off to hospital in Morlaix.  Never, ever, have I been so glad that my partner is a nurse and stubborn as hell.  Without him, I don’t know what would have happened.

Public Transpot

I don’t drive.  That’s a choice, not an accident.  But I travel continually so I spent a lot of my time on public transport. I take the bus and the train to work, which is a four hour commute twice a week.  When public transport isn’t available, I use a local taxi firm.  I never use Uber, because fuck that for a business model. I keep reading all this stuff about automated and driverless cars but tbh, I don’t want any more cars on the road, driverless or otherwise.  I want decent public transport, which is regular, reliable, clean, and safe for women travelling alone at any hour of the day or night. Oh, and I also want the people who work for these public transport systems to earn a decent living wage.  Is that too much to ask?

Maryam Mirzakhani

Maryam Mirzakhani was an Iranian mathematician, professor at Stamford University and the first woman to win the Fields Medal for mathematics.  In March I was invited to speak at the International Open Science Conference in Berlin and I took the title of my talk, Crossing the Field Boundaries, from an interview with Maryam.

“I like crossing the imaginary boundaries people set up between different fields—it’s very refreshing. There are lots of tools, and you don’t know which one would work. It’s about being optimistic and trying to connect things.”

A Tenacious Explorer of Abstract SurfacesQuanta Magazine, August 2014

Four months later, I was deeply saddened to hear that Maryam had died of breast cancer at the age of 40.  The loss of such a gifted woman is unfathomable.

Bassel Khartabil

In August we heard the devastating news that the detained Syrian open knowledge advocate Bassel Khartabil had been executed by the Syrian government in 2015.  I never met Bassel, but I was deeply moved by his story and I contributed to a number of initiatives that tried to raise awareness of his plight. I will never forget that this man lost his liberty and his life for doing a similar job that I, and many of my colleagues, do every day.  This is my memorial to him.

by admin at January 03, 2018 01:19 AM

January 02, 2018

Wikimedia Tech Blog

A new platform to explore statistics about Wikimedia projects

Photo by SpaceX, CC0.

Wikistats 2 builds on the success of Wikistats, the project started more than 15 years ago by Erik Zachte. Wikistats has been the canonical source of statistics about the reach and impact of the Wikimedia movement for many years. It offered a quantitative mirror to the Wikimedia communities to reflect on their growth, gaps and strategic opportunities. It also provided one of the earliest public data sources for the study of large-scale peer production communities, and as such has been cited nearly a thousand times in the literature.

As detailed in Wikistats 2’s documentation, there are several noticeable changes in the new site’s design, but the biggest changes come on the backend. In this post, we’ll detail what changes you’ll see, and explain how to access the data programmatically.

What’s new? Pretty much everything … but the data!

The data-processing pipeline for the new Wikistats has been rebuilt from scratch. It uses distributed-computing open source technology such as Hadoop, Spark, Sqoop, and Hive to ingest and enhance projects data, and loads a prepared version of the whole history of every projects into Druid, a fast-computing analytics server. Druid then serves sliced and diced subsets of data through the Analytics Query Service, the MediaWiki external API for analytics data.

A brand new front-end has also been designed and built on top of the new API. The dashboard concentrates many information, providing an easy way to overlook any project at a glance. More details can be found in the three sections of the dashboard which are labeled Contributing, Reading and Content. The Contributing section is about edits and editors, the Reading one about visited articles and unique-devices, and the Content contains article-level statistics.

You may notice that the data that exists in Wikistats 2.0 is the same data that existed in Wikistats. For this alpha release, we decided to replicate the existing metrics. In doing so we had two goals in mind: We wanted to test this new dashboard against a time-proofed one, and we also wanted to provide existing Wikistats users with statistics that closely matched those they are familiar with. We succeeded relatively well at replicating the existing statistics.

How to access the data programmatically

You can access the same data that powers the new user-interface by querying a RESTful API. The full documentation is available on this page, but we’ll walk you through some examples.

Let’s get the number of edits made every day in October 2017 for Wikipedia in Spanish:


There are two parameters in the above URL telling us about editor-types and page-types. The editor-types parameter allows to filter by anonymous users (anonymous), registered users declared as robots (group-bot), registered users not declared as bots but that we suspect are nonetheless (name-bot), and registered users the we think are legitimate humans (user). The page-types parameter is  about content versus non-content pages. Content pages are located in the main namespace, while non-content pages refer to talk pages, and others special namespaces.

A second example: We want to find number of human editors who have made more than 100 edits over the course of a month, each month between January and July 2015 on the Commons project:


This request introduces a new parameter, named activity-level. It is defined for requests on editors and edited-pages and allows to filter for specific levels of activity (1..4-edits, 5..24-edits, 25-99-edits, 100..-edits, or all-activity-levels for no filtering).

And a last one, just for fun! Let’s say we want to find the number of  pages visited by regular users (not bots) between december 2016 and January 2017 on the English-language edition of Wikipedia. You can see how to add dates below:


That’s it! Please let us know what you like or dislike about the new dashboard, and particularly don’t hesitate to file bugs. This will help us graduate that alpha version to the beta stage.

Joseph Allemandou, Senior Software Engineer, Analytics
Wikimedia Foundation

by Joseph Allemandou at January 02, 2018 07:00 PM

Wiki Education Foundation

Everyone has a voice…

The human voice is an amazing thing. It’s capable of making us laugh, cry, and feel a broad range of emotions. While it’s far from the only way to impart messages or portray emotions, our lives would be lacking if every person were to fall eternally silent. This makes the study of Voice Disorders so vital, as the discovery of new breakthroughs and the dissemination of both old and new research is of great benefit to society. In Fall 2016 McGill University educator Nicole Li-Jessen and her students took on this task of ensuring that the public has access to pertinent and updated information on this topic by editing Wikipedia, one of the most frequently accessed sites on the Internet.

Vocal cord dysfunction (VCD) is something that can have a negative effect on a person’s life and well-being. People who have this disorder can have symptoms that resemble other conditions like asthma and vocal cord paralysis, which make it difficult to make a timely and accurate diagnosis. Treatments for VCD can vary and run from behavioral to medical and psychological.

Laryngitis is an inflammation of the voice box that typically lasts for a few weeks. Symptoms for this may vary in type and severity, with one of the most common symptoms being a hoarse voice. Laryngitis comes in two forms, acute and chronic. The former can be caused by viral, fungal, or bacterial infections or even trauma to the vocal cords. If you’ve ever heard the phrase “I’m going to yell myself hoarse”, this is the source of that reference. For those unfortunate enough to have chronic laryngitis, their inflammation can be caused by allergies and autoimmune disorders such as rheumatoid arthritis. Reflux is something that may also cause chronic laryngitis, as the gastro-oesophageal reflux may irritate and inflame vocal cords, making it difficult to speak. Treatment for laryngitis depends heavily on the causes and severity, as treating the inflammation can include medication or may only require that the individual make several lifestyle changes until they’re better.

To quote the the Wikipedia article on Wikipedia, “Wikipedia is the largest and most popular general reference work on the Internet”, which makes contributions from students and educators that much more important. Want to help share knowledge with the world? Contact Wiki Education at contact@wikiedu.org to find out how you can gain access to tools, online trainings, and printed materials to help your class edit Wikipedia. Or visit teach.wikiedu.org.

Image: File:Arts Building, McGill University, Montréal, East view 20170410 1.jpgDXR, CC BY-SA 4.0, via Wikimedia Commons.

by Shalor Toncray at January 02, 2018 05:25 PM

Wikimedia Cloud Services

New Wiki Replica servers ready for use


  • Change your tools and scripts to use:
    • *.web.db.svc.eqiad.wmflabs (real-time response needed)
    • *.analytics.db.svc.eqiad.wmflabs (batch jobs; long queries)
  • Replace * with either a shard name (e.g. s1) or a wikidb name (e.g. enwiki).
  • The new servers do not support user created databases/tables because replication can't be guaranteed. See T156869 and below for more information. tools.db.svc.eqiad.wmflabs (also known as tools.labsdb) will continue to support user created databases and tables.
  • Report any bugs you find with these servers in Phabricator using the Data-Services tag.

Wiki Replicas

The Wiki Replicas are one of the unique services that Wikimedia Cloud Services helps make available to our communities. Wiki Replicas are real-time replicas of the production Wikimedia MediaWiki wiki databases with privacy-sensitive data removed. These databases hold copies of all the metadata about content and interactions on the wikis. You can read more about these databases on Wikitech if you are unfamiliar with their details.

The current physical servers for the <wiki>_p Wiki Replica databases are at the end of their useful life. Work started over a year ago on a project involving the DBA team and cloud-services-team to replace these aging servers (T140788). Besides being five years old, the current servers have other issues that the DBA team took this opportunity to fix:

  • Data drift from production (T138967)
  • No way to give different levels of service for realtime applications vs analytics queries
  • No automatic failover to another server when one failed

Bigger, faster, more available

Each of the three new servers is much larger and faster than the servers they are replacing. Five years is a very long time in the world of computer hardware:

We have also upgraded the database software itself. The new servers are running MariaDB version 10.1. Among other improvements, this newer database software allows us to use a permissions system that is simpler and more secure for managing the large number of individual tools that are granted access to the databases.

The new servers use InnoDB tables rather than the previous TokuDB storage. TokuDB was used on the old servers as a space-saving measure, but it has also had bugs in the past that caused delays to replication. InnoDB is used widely in the Wikimedia production databases without these problems.

The new cluster is configured with automatic load balancing and failover using HAProxy. All three hosts have identical data. Currently, two of the hosts are actively accepting connections and processing queries. The third is a ready replacement for either of the others in case of unexpected failure or when we need to do maintenance on the servers themselves. As we learn more about usage and utilization on these new hosts we can change things to better support the workloads that are actually being generated. This may include setting up different query duration limits or pooling the third server to support some of the load. The main point is that the new system provides us with the ability to make these types of changes which were not possible previously.

Improved replication

The work of scrubbing private data is done on a set of servers that we call "sanitarium" hosts. The sanitarium servers receive data from the production primary servers. They then in turn act as the primary servers which are replicated to the Wiki Replica cluster. The two sanitarium servers for the new Wiki Replica cluster use row-based replication (RBR). @Marostegui explains the importance of this change and its relationship to T138967: Labs database replica drift:

[W]e are ensuring that whatever comes to those hosts (which are, in some cases, normal production slaves) is exactly what is being replicated to the [Cloud] servers. Preventing us from data drifts, as any data drift on row based replication would break replication on the [Cloud] servers. Which is bad, because they get replication broken, but at the same time is good, because it is a heads up that the data isn't exactly as we have it in core. Which allows us to maintain a sanitized and healthy dataset, avoiding all the issues we have had in the past.

The data replicated to the new servers has been completely rebuilt from scratch using the RBR method. This has fixed many replication drift problems that exist on the older servers (T138967). If your tool performs tasks where data accuracy is important (counting edits, checking if a page has been deleted, etc), you should switch to using the new servers as soon as possible.

New service names

Populating the new sanitarium servers with data was a long process (T153743), but now that it is done our three new Wiki Replica servers are ready for use. With the old setup, we asked people to use a unique hostname with each database they connected to (e.g. enwiki.labsdb). The new cluster extends this by adding using service names to separate usage by the type of queries that are being run:

  • Use *.web.db.svc.eqiad.wmflabs for webservices and other tools that need to make small queries and get responses quickly.
  • Use *.analytics.db.svc.eqiad.wmflabs for longer running queries that can be slower.

If you were using enwiki.labsdb you should switch to either enwiki.analytics.db.svc.eqiad.wmflabs or enwiki.web.db.svc.eqiad.wmflabs. The choice of "analytics" or "web" depends on what your tool is doing, but a good rule of thumb is that any query that routinely takes more than 10 seconds to run should probably use the "analytics" service.

Right now there is no actual difference between connecting to the "web" or "analytics" service names. As these servers get more usage and we understand the limitations they have this may change. Having a way for a user to explicitly choose between real-time responses and slower responses for more complicated queries will provide more flexibility in tuning the systems. We expect to be able to allow queries to run for a much longer time on the new analytics service names than we can on the current servers. This in turn should help people who have been struggling to gather the data needed for complex reports within the current per-request timeout limits.

A breaking change

These new servers will not allow users to create their own databases/tables co-located with the replicated content. This was a feature of the older database servers that some tools used to improve performance by making intermediate tables that could then be JOINed to other tables to produce certain results.

We looked for solutions that would allow us to replicate user created data across the three servers, but we could not come up with a solution that would guarantee success. The user created tables on the current servers are not backed up or replicated and have always carried the disclaimer that these tables may disappear at any time. With the improvements in our ability to fail over and rebalance traffic under load, it is more likely on the new cluster that these tables would randomly appear and disappear from the point of view of a given user. This kind of disruption will break tools if we allow it. It seems a safer solution for everyone to disallow the former functionality.

User created databases and tables are still supported on the tools.db.svc.eqiad.wmflabs server (also known as tools.labsdb). If you are using tables co-located on the current c1.labsdb or c3.labsdb hosts we are recommending that your tool/scripts be updated to instead keep all user managed data on tools.db.svc.eqiad.wmflabs and perform any joining of replica data and user created data in application space rather than with cross-database joins.

There will be further announcements before the old servers are completely taken offline, but tools maintainers are urged to make changes as soon as they can. The hardware for the older servers is very old and may fail in a non-recoverable way unexpectedly (T126942).

Curated datasets

There are some datasets produced by ORES, the Analytics team, or volunteers that really do need to be co-located with the wiki tables to be useful. We are looking for a solution for these datasets that will allow them to be replicated properly and available everywhere. See T173511: Implement technical details and process for "datasets_p" on wikireplica hosts for further discussion of providing some method for 'curated' datasets to be added to the new cluster.

Quarry will be switched over to use *.analytics.db.svc.eqiad.wmflabs soon. As noted previously, using the analytics service names should allow more complex queries to complete which will be a big benefit for Quarry's users who are doing analytics work. This change may however temporarily interrupt usage of some datasets that are blocked by T173511. Follow that task for more information if your work is affected.

You can help test the new servers

Before we make the new servers the default for everyone, we would like some early adopters to use them and help us find issues like:

  • missing permissions
  • missing views
  • missing wikidb service names

To use them, change your tool to connect to:

  • *.web.db.svc.eqiad.wmflabs (real-time response needed)
  • *.analytics.db.svc.eqiad.wmflabs (batch jobs; long queries)

Replace the * with either a shard name (e.g. s1) or a wikidb name (e.g. enwiki).

For interactive queries, use one of:

  • sql --cluster analytics <database_name>
  • mysql --defaults-file=$HOME/replica.my.cnf -h <wikidb>.analytics.db.svc.eqiad.wmflabs <database_name>

Report any bugs you find with these servers and their data in Phabricator using the Data-Services tag.


The cloud-services-team would like to especially thank @jcrespo and @Marostegui for the work that they put into designing and implementing this new cluster. Without their technical experience and time this project would never have been successful.

Learn more

by bd808 (Bryan Davis) at January 02, 2018 03:59 PM

Harry Burt

Wikimedia Hackathon 2015

I am once again delighted to be able to attend the Wikimedia Hackathon, an event that rotates around Europe. This year’s is in the picturesque town of Lyon in south-central France — its winding boulevards and riverside façades looking rather beautiful (and very French) in the summer sun. Conveniently, Eurostar have just begun direct trains from London St Pancras, and by booking in advance tickets were competitively priced (£110pp outbound, £65pp inbound). Okay, so the trains took a while (4h45 outbound, 5h45 inbound) but I booked early enough to get a table, and on the return journey at least was pretty productive.


Five men surround three laptops on a table
Developers sitting at the i18n table

My main work was on TranslateSvg, a project I started several years ago as part of a Google Summer of Code project. Admittedly it is annoying not to have the extension live (although Brian tells me that the feature we did eventually manage to land is actually being used, so that’s something). On the other hand, I can understand why Wikimedia now demands high quality code (see below), and in particular good unit tests. I simply haven’t been able to put in the time required to deliver those (except in very short bursts), and that’s fundamentally my fault.

Anyhow, to focus on the positive, I used Lyon — and in particular the train back — to commit a load of patches. These get test coverage up to about 50% on a line-by-line basis, and, more importantly, led me to uncover a bunch of bugs I hadn’t found before. I also re-ran an analysis I first conducted almost 3 years ago and found that TranslateSvg was performing worse now than then! As ever, uncovering the bug was 90% of the challenge and the project is now back to where it was in August 2012 on that particular metric.

A more professional MediaWiki

I guess my other contribution during the Lyon Hackathon was a question to Lila Tretikov, ED of the Wikimedia Foundation. Someone else had asked by the relative balance between professional and volunteer developers had (it seemed) shifted away from the latter to the former. Other people had quite rightly pointed out that the WMF had hired many of the former volunteers, and, in particular, had hired many of the most “omnipresent” volunteers.

The point I wanted to make, however, is that MediaWiki as a platform has come a long way. It is a lot more professional, and that means standards are higher. By definition, you make it harder for part-timers (many of whose skillsets are understandably incomplete or out-dated) to contribute on a level footing. FOr example, a move from CSS to LESS reduces overhead for “experts” but makes it harder for those who just know CSS (i.e. most developers) and do not have the time to retrain to contribute. It was also pointed out that moving to a system of pre-commit review (as MediaWiki did in March 2012) encourages high standards: you’re not able to join the elite club of MediaWiki contributors without having your commit peer-reviewed first, whereas before you just had to fix it later (and even then you had status quo bias working with you rather than against you).

Lila’s response was to point to the ongoing work moving MediaWiki from being a monolithic pile of code, to something much more modular and service oriented so newcomers. I think this goes both ways: yes, it means newcomers can find a happy corner that they can work in, but it also allows our increasingly professionalised developer base to fulfil their burning desire to ditch PHP in favour of their own preferred language, with unintended consequences for the volunteer community.

by Harry at January 02, 2018 11:11 AM


Shocking tales from ornithology

Manipulative people have always made use of the dynamics of ingroups and outgroups to create diversions from bigger issues. The situation is made worse when misguided philosophies are peddled by governments that place economics ahead of ecology. The pursuit of easily gamed targets such as GDP is easy since money is a man-made and controllable entity. Nationalism, pride, other forms of chauvinism, the creation of enemies and the magnification of war threats are all effective tools in the arsenal of Machiavelli for use in misdirecting the masses. One might imagine that the educated, especially scientists, would be smart enough not to fall into these traps but cases from recent history will dampen hopes for such optimism.

There is a very interesting book in German by Eugeniusz Nowak called "Wissenschaftler in turbulenten Zeiten" (or scientists in turbulent times) that deals with the lives of ornithologists, conservationists and other naturalists during the Second World War. Preceded by a series of recollections published in various journals, the book was published in 2010 but I became aware of it only recently while translating some biographies into the English Wikipedia. I have not yet actually seen the book (it has about five pages on Salim Ali as well) and have had to go by secondary quotations in other content. Nowak was a student of Erwin Stresemann (with whom the first chapter deals with) and he writes about several European (but mostly German, Polish and Russian) ornithologists and their lives during the turbulent 1930s and 40s. Although Europe is pretty far from India, there are ripples that reached afar. Incidentally, Nowak's ornithological research includes studies on the expansion in range of the collared dove (Streptopelia decaocto) which the Germans called the Tuerkentaube, literally the "Turkish dove", a name with a baggage of cultural prejudices.

Nowak's first paper of "recollections" notes that: [he] presents the facts not as accusations or indictments, but rather as a stimulus to the younger generation of scientists to consider the issues, in particular to think “What would I have done if I had lived there or at that time?” - a thought to keep as you read on.

A shocker from this period is a paper by Dr Günther Niethammer on the birds of Auschwitz (Birkenau). This paper (read it online here) was published when Niethammer was posted to the security at the main gate of the concentration camp. You might be forgiven if you thought he was just a victim of the war. Niethammer was a proud nationalist and volunteered to join the Nazi forces in 1937 leaving his position as a curator at the Museum Koenig at Bonn.
The contrast provided by Niethammer who looked at the birds on one side
while ignoring inhumanity on the other provided
novelist Arno Surminski with a title for his 2008 novel -
Die Vogelwelt von Auschwitz
- ie. the birdlife of Auschwitz.

G. Niethammer
Niethammer studied birds around Auschwitz and also shot ducks in numbers for himself and to supply the commandant of the camp Rudolf Höss (if the name does not mean anything please do go to the linked article / or search for the name online).  Upon the death of Niethammer, an obituary (open access PDF here) was published in the Ibis of 1975 - a tribute with little mention of the war years or the fact that he rose to the rank of Obersturmführer. The Bonn museum journal had a special tribute issue noting the works and influence of Niethammer. Among the many tributes is one by Hans Kumerloeve (starts here online). A subspecies of the common jay was named as Garrulus glandarius hansguentheri by Hungarian ornithologist Andreas Keve in 1967 after the first names of Kumerloeve and Niethammer. Fortunately for the poor jay, this name is a junior synonym of  G. g. anatoliae described by Seebohm in 1883.

Meanwhile inside Auschwitz, the Polish artist Wladyslaw Siwek was making sketches of everyday life  in the camp. After the war he became a zoological artist of repute. Unfortunately there is very little that is readily accessible to English readers on the internet.
Siwek, artist who documented life at Auschwitz
before working as a wildlife artist.
Hans Kumerloeve
Now for Niethammer's friend Dr Kumerloeve who also worked in the Museum Koenig at Bonn. His name was originally spelt Kummerlöwe and was, like Niethammer, a doctoral student of Johannes Meisenheimer. Kummerloeve and Niethammer made journeys on a small motorcyle to study the birds of Turkey. Kummerlöwe's political activities started earlier than Niethammer, joining the NSDAP (German: Nationalsozialistische Deutsche Arbeiterpartei = The National Socialist German Workers' Party)  in 1925 and starting the first student union of the party in 1933. Kummerlöwe soon became part of the Ahnenerbe, a think tank meant to give  "scientific" support to the party-ideas on race and history. In 1939 he wrote an anthropological study on "Polish prisoners of war". At the museum in Dresden which he headed, he thought up ideas to promote politics and he published them in 1939 and 1940. After the war, it is thought that he went to all the European libraries that held copies of this journal (Anyone interested in hunting it down should look for copies of Abhandlungen und Berichte aus den Staatlichen Museen für Tierkunde und Völkerkunde in Dresden 20:1-15.) and purged them of his article. According to Nowak, he even managed to get his hands (and scissors) on copies held in Moscow and Leningrad!  

The Dresden museum was also home to the German ornithologist Adolf Bernhard Meyer (1840–1911). In 1858, he translated the works of Charles Darwin and Alfred Russel Wallace into German and introduced evolutionary theory to a whole generation of German scientists. Among Meyer's amazing works is a series of avian osteological works which uses photography and depict birds in nearly-life-like positions - a less artistic precursor to Katrina van Grouw's 2012 book The Unfeathered Bird. Meyer's skeleton images can be found here. In 1904 Meyer was eased out of the Dresden museum because of rising anti-semitism. Meyer does not find a place in Nowak's book.

Nowak's book includes entries on the following scientists: (I keep this here partly for my reference as I intend to improve Wikipedia entries on several of them as and when time and resources permit. Would be amazing if others could pitch in!).
In the first of his "recollection papers" (his 1998 article) he writes about the reason for writing them  - the obituary for Prof. Ernst Schäfer  was a whitewash that carefully avoided any mention of his wartime activities. And this brings us to India. In a recent article in Indian Birds, Sylke Frahnert and others have written about the bird collections from Sikkim in the Berlin natural history museum. In their article there is a brief statement that "The  collection  in  Berlin  has  remained  almost  unknown due  to  the  political  circumstances  of  the  expedition". This might be a bit cryptic for many but the best read on the topic is Himmler's Crusade: The true story of the 1939 Nazi expedition into Tibet (2009) by Christopher Hale. Hale writes about Himmler: 
He revered the ancient cultures of India and the East, or at least his own weird vision of them.
These were not private enthusiasms, and they were certainly not harmless. Cranky pseudoscience nourished Himmler’s own murderous convictions about race and inspired ways of convincing others...
Himmler regarded himself not as the fantasist he was but as a patron of science. He believed that most conventional wisdom was bogus and that his power gave him a unique opportunity to promulgate new thinking. He founded the Ahnenerbe specifically to advance the study of the Aryan (or Nordic or Indo-German) race and its origins
From there Hale goes on to examine the motivations of Schäfer and his team. He looks at how much of the science was politically driven. Swastika signs dominate some of the photos from the expedition - as if it provided for a natural tie with Buddhism in Tibet. It seems that Himmler gave Schäfer the opportunity to rise within the political hierarchy. The team that went to Sikkim included Bruno Beger. Beger was a physical anthropologist but with less than innocent motivations although that would be much harder to ascribe to the team's other pursuits like botany and ornithology. One of the results from the expedition was a film made by the entomologist of the group, Ernst Krause - Geheimnis Tibet - or secret Tibet - a copy of this 1 hour and 40 minute film is on YouTube. At around 26 minutes, you can see Bruno Beger creating face casts - first as a negative in Plaster of Paris from which a positive copy was made using resin. Hale talks about how one of the Tibetans put into a cast with just straws to breathe from went into an epileptic seizure from the claustrophobia and fear induced. The real horror however is revealed when Hale quotes a May 1943 letter from an SS officer to Beger - ‘What exactly is happening with the Jewish heads? They are lying around and taking up valuable space . . . In my opinion, the most reasonable course of action is to send them to Strasbourg . . .’ Apparently Beger had to select some prisoners from Auschwitz who appeared to have Asiatic features. Hale shows that Beger knew the fate of his selection - they were gassed for research conducted by Beger and August Hirt.
SS-Sturmbannführer Schäfer at the head of the table in Lhasa

In all Hale, makes a clear case that the Schäfer mission had quite a bit of political activity underneath. We find that Sven Hedin (Schäfer was a big fan of him in his youth. Hedin was a Nazi sympathizer who funded and supported the mission) was in contact with fellow Nazi supporter Erica Schneider-Filchner and her father Wilhelm Filchner in India, both of whom were interned later at Satara. while Bruno Beger made contact with Subhash Chandra Bose more than once. [Two of the pictures from the Bundesarchiv show a certain Bhattacharya - who appears to be a chemist working on snake venom at the Calcutta snake park - one wonders if he is Abhinash Bhattacharya.]

My review of Nowak's book must be uniquely flawed as  I have never managed to access it beyond some online snippets and English reviews.  The war had impacts on the entire region and Nowak's coverage is limited and there were many other interesting characters including the Russian ornithologist Malchevsky  who survived German bullets thanks to a fat bird observation notebook in his pocket! In the 1950's Trofim Lysenko, the crank scientist who controlled science in the USSR sought Malchevsky's help in proving his own pet theories - one of which was the ideas that cuckoos were the result of feeding hairy caterpillars to young warblers!

Issues arising from race and perceptions are of course not restricted to this period or region, one of the less glorious stories of the Smithsonian Institution concerns the honorary curator Robert Wilson Shufeldt (1850 – 1934) who in the infamous Audubon affair made his personal troubles with his second wife, a grand-daughter of Audubon, into one of race. He also wrote such books as America's Greatest Problem: The Negro (1915) in which we learn of the ideas of other scientists of the period like Edward Drinker Cope! Like many other obituaries, Shufeldt's is a classic whitewash.  

Even as recently as 2015, the University of Salzburg withdrew an honorary doctorate that they had given to the Nobel prize winning Konrad Lorenz for his support of the political setup and racial beliefs. It should not be that hard for scientists to figure out whether they are on the wrong side of history even if they are funded by the state. Perhaps salaried scientists in India would do well to look at the legal contracts they sign with their employers, the state, more carefully.

PS: Mixing natural history with war sometimes led to tragedy for the participants as well. In the case of Dr Manfred Oberdörffer who used his cover as an expert on leprosy to visit the borders of Afghanistan with entomologist Fred Hermann Brandt (1908–1994), an exchange of gunfire with British forces killed him although Brandt lived on to tell the tale.

by Shyamal L. (noreply@blogger.com) at January 02, 2018 07:45 AM

January 01, 2018

Gerard Meijssen

#Wikidata - #CocaCola, what science in paid "science"?

Greenpeace has a reputation on the science it uses to base its actions on. Its objective is one that should be not controversial but it is because it affects business as usual for industries like the plastic bottled drinks of a Coca Cola or the production of oil by a Shell.

Industry has a long tradition of performing research and of keeping it confidential when this is considered in its interest. Another, new strategy is to commission research to find the numbers to shore up its market position.

When the numbers do not add up because reality is different, the last bastion to defend is the integrity of the science and its scientists. Even when a case goes to court, the findings of a judge are disputed when other scientists do not consider the legal findings. In a post at the Dutch Greenpeace website, 5 reasons to dispose of rebate and 12 reasons to reinforce rebate, multiple examples of doctored science are mentioned. Mentioned in a way where research is invalidated by research. When a bad faith actor like the plastic bottle industry buys research, it follows that the research is easily suspect and with the same brush, the organisations, the people involved.

When science is pseudo science, when both Wikipedia and Wikidata use sources to establish points of view it follows that this pseudo science is used to establish a neutral point of view. That is exactly why a Coca Cola invests in these programs; just to shore up its business. Obviously the court cases, the papers trouncing pseudo science should be prominently included. This pseudo science has no place in the Wikimedia projects except when it is obvious for what it is.

by Gerard Meijssen (noreply@blogger.com) at January 01, 2018 10:17 AM

December 30, 2017

Weekly OSM

weeklyOSM 388


speed Graphhopper

Display all different maximum allowed speed values along the path 1 | © OpenStreetMap Contriubutors © Graphhopper


  • DevSeed announces their Data Team for humanitarian response, professional geospatial data management and machine learning. To better support their partners’ missions, they are closely partnering with Mapbox to expand their mapping capacity. Effective immediately, the Mapbox data team Peru will operate as part of Development Seed and form the core of the DevSeed Data Team.
  • The voting for the tagging of Magnetic levitation monorail has been running since October 6th.


  • Björn wants to know whether you can create a PDF file directly from an OpenStreetMap URL.
  • User Zverik reports in his blog that Maps.Me performs routing via subway and light_rail relations.
  • TheSilphRoad is a site that investigates the game of Pokémon Go. A published article helps Trainers understand what OSM is and how to contribute in useful ways.
  • Wille writes a diary announcing a mailing list for OSMCha. This mailing list will be vital to ask general questions, make announcements regarding OSMCha, collaborate on plans and new features, and inform the group of changes.


  • Sometimes it helps to get things out of your system – someone mapping in Calistoga has added a self-described rant about the problems that come with mapping in an area dominated by bad imports. The forum replies and explains what happened, and say that thankfully even in this area such problems are largely a thing of the past.


  • Oliver Rudzick promotes the OSGeo Code Sprint 2018 at the BaseCamp in Bonn and recommends a fast reservation because of the limited but interesting overnight stays. the sprint will take place parallel to the FOSSGIS conference. As Michael Reichert emphasizes, the OSM Saturday will take place on March 24th.
  • FOSS4G will take place in the new year 2018 in Dar es Salaam.
  • For April 13-14th 2018 the Polish community has organized a State of the map conference in Poznan.


  • Bernd Weigelt discovers another way to install maps with hillshading for the Germany-Austria-Switzerland area on Garmin devices.
  • Oleksiy Muzalyev presents his new multilingual web application, Travel Info Pack. It lets the user select a category of POI to display prominently on a map, and additionally offers a search for nearby Wikimedia images.


  • OsmAnd is available in version 2.0 for Apple’s iOS operating system. The most notable new feature is the routing function. The app can be downloaded from the iTunes Store.


  • The latest version of GraphHopper’s Routing API offers new capabilities: path details and an “avoid” feature. Other GraphHopper APIs also receive updates.
  • Jason Remillard wants to train a deep learning algorithm to recognise baseball pitches using OSM data as the training set. His code is on github.


  • Vespucci the editor for use on mobile devices has a major release and leaps from v0.9.9 to version 10.
    • internal validator re-factored
    • C-Mode presets
    • Support for custom tasks
  • The new version of Mapbox Maps SDK for Android v5.3.0 comes with a new name, improved CJK performance and much more.
  • Mapbox Navigation SDK for Android v0.8.0 release comes with event listeners when using the prebuilt UI, fixed audible SSML tags in voice instructions, and custom notifications.

Other “geo” things

  • WeltN24 reports (de) (Google translate) about the recent inauguration of the Zugspitze cable car, which breaks three world records. The cable car is of course already mapped in OpenStreetMap.
  • Who will create the maps in the future? Man, machines or both together? A blog article from Mapillary wants to find answers.
  • Justin O’Beirne published a detailed blog post on the status of Google Maps compared to Apple Maps, which also includes Google’s intensive use of machine learning. O’Beirne feels that Google Maps’ wealth of information seems overwhelming. On Twitter @vtcraghead shares his thoughts on how the OpenStreetMap community should deal with this.
  • The interactive map of koeln.de shows the zone without fireworks and barriers to New Year’s Eve in Cologne, Germany.
  • Members from OSGEO (Open Source Geospatial Foundation) developed OpenDroneMap.org, an open source project for processing aerial drone imagery, over the past 5 years. There is no direct relation with OpenStreetMap. You can find a more detailed explanation on how it works here
  • Ian Dees writes on the Mapzen blog about a project that collects addresses for branch locations from operators’ websites. The data is then converted to a more useful format (GeoJSON) and made available at alltheplaces.xyz under a Creative Commons’ CC-0 waiver license.

Upcoming Events

Where What When Country
Denver Online High School Mitchell PoliMappers’ Adventures: One mapping quest each day 2017-12-01-Invalid date everywhere
Montreal Les Mercredis cartographie 2018-01-03 canada
Albuquerque MAPABQ (join us!) 2018-01-03 united states
Stuttgart Stuttgarter Stammtisch 2018-01-03 germany
Passau Niederbayerntreffen 2018-01-08 germany
Lyon Rencontre libre mensuelle 2018-01-09 france
Nantes Réunion mensuelle 2018-01-09 france
Berlin 115. Berlin-Brandenburg Stammtisch 2018-01-11 germany
Tokyo 東京!街歩き!マッピングパーティ:第15回 品川神社 2018-01-13 japan
Kyoto 幕末京都マッピングパーティ#01:京の町家と大獄と 2018-01-13 japan
Brisbane Brisbane Photo Mapping 2018-01-13 australia
Rome FOSS4G-IT 2018 2018-02-19-2018-02-22 italy
Cologne Bonn Airport FOSSGIS 2018 2018-03-21-2018-03-24 germany
Poznań State of the Map Poland 2018 2018-04-13-2018-04-14 poland
Bordeaux State of the Map France 2018 2018-06-01-2018-06-03 france
Milan State of the Map 2018 (international conference) 2018-07-28-2018-07-30 italy

Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please check your event in our public calendar preview and correct it, where appropriate.

This weeklyOSM was produced by Nakaner, Polyglot, SK53, Spanholz, Tordanik, derFred, jinalfoflia, sev_osm.

by weeklyteam at December 30, 2017 11:36 AM

December 29, 2017

Paolo Massa (phauly@WP) Gnuband.org

Gallery of XKCD and other Python matplotlib styles

I’m reading the wonderful “Python Data Science Handbook” by Jake VanderPlas, a book written entirely as Jupyter notebooks! And got excited about matplotlib styles but XKCD “style” was missing so I modified a bit the code for rendering the different styles to include it. Below a small part of the gallery (XKCD style is the first line) which is generated by the jupyter notebook available as a gist on github and embedded below.


View the code on Gist.

I love XKCD graphs, for example the following one, and you can create them with Python!!!

XKCD graph

by paolo at December 29, 2017 02:05 PM

December 28, 2017

Noella - My Outreachy 17

Am enjoying the journey!

This time around there are all sort of things I see, learn and notice from Outreachy

To begin, The project is progressing and I have submitted some few patches under it😎. Till now the most difficult is not coding but understanding and knowing what I am required to do. It took me some good time to come up with a conclusion on what exactly I had to do. Also, I can say my PHP skills are greatly improving as I keep working.
My first task was to put the MassMessage extension in a namespace MediaWiki\MassMessage. I bet you when I sat on my pc the day I started the task, I didn't know one bit what it means😂😂. I had all sort of ideas in my mind from the few research I made. I was so confused that I had to contact my mentors. Funny enough I really did not still have it clear but while doing, I had it clearer till the first patch I submitted.

What actually are those things that shocked me in the journey?

  • The speed at Which I learn. It is very high.
  • I heard of a new program GCI(Google Code In) currently going on and Wikimedia selected, were I see little children contributing code to OpenSource and and making some funny noise on IRC.😂😂😂 
  • The fact that even on Christmas Wikimedians code, that shows passion in what they do 😘😘. I completely forgot we all have different time zones.
I don't just learn how to code and complete a project in my Outreachy journey. I learn how to contribute to Opensource, I learn how to behave in a Community, I learn and see how Developers from different backgrounds see things and interpret them. I am really enjoying!!

P.S: Finally done with the first task. Time to take the next. I hope to have some new taste and flavour from it.😀

by Noella Teke (noreply@blogger.com) at December 28, 2017 07:44 PM

Wiki Education Foundation

Video helps medical students edit Wikipedia

Student editors in Wiki Education’s program who contribute to medical topics have a new resource available to them in their online training modules this fall: A how-to video specifically related to contributing medical content to Wikipedia.

The video was a joint project between Osmosis, University of California San Francisco Professor Amin Azzam, and Wiki Education. Osmosis is a learning platform used by many medical students, with an Open Osmosis initiative to create open educational resources in medical topics. In collaboration with Wiki Project Med Foundation, Osmosis has released hundreds of videos under a free license on Wikimedia Commons so reliable information on medical topics in video form can be embedded in Wikipedia articles on those topics. Dr. Azzam has been a longtime instructor in Wiki Education’s program, and passionately advocates for other medical school faculty to teach with Wikipedia through Wiki Education’s programs.

The five-minute video covers how to find medical topics to improve, how to cite according to Wikipedia’s guidelines for reliable sources for medical articles, how to use sandboxes, and how to avoid plagiarism. It was crafted by Osmosis’s Mihir Joshi with input from Osmosis’s Rishi Desai, Wikipedia’s James Heilman, USCF’s Dr. Azzam and Evans Whitaker, and Wiki Education’s LiAnna Davis. You can watch the video below, or find it in our Editing Medical Topics online training.

by LiAnna Davis at December 28, 2017 05:59 PM

Gerard Meijssen

#Wikidata - Cyrus J. Colter and G. B. Lancaster - #diversity

Cyrus J. Colter was inducted on the Black Literary Hall Of Fame in 1999. He has a Wikipedia article in Japanese and his Wikidata item has been expanded with a link to Open Library and VIAF

According to a tweet, G. B. Lancaster was once one of New Zealand's most popular writers. When you google her, you find that G.B. Lancaster is a pseudonym for Edith Joan Lyttleton. For Mrs Lyttleton there is now a link to Open Library as well.

From a diversity point of view, both Mr Colter and Mrs Lyttleton represent minorities. Giving attention to either increases the diversity of Wikidata. Linking both authors to the Open Library has the most inclusive effect. There is now a bigger public for the books they have written.

by Gerard Meijssen (noreply@blogger.com) at December 28, 2017 10:19 AM

December 27, 2017

Wikimedia Foundation

Wikimedia Research Newsletter, September 2017

Medical articles on French Wikipedia have “high rate of veracity”

Reviewed by Nicolas Jullien

A doctoral thesis[1] at Aix-Marseille University examined the accuracy of medical articles on the French Wikipedia. From the English abstract: “we selected a sample of 5 items (stroke, colon cancer, diabetes mellitus, vaccination and interruption of pregnancy) which we compare, assertion by assertion, with reference sources to confirm or refute each assertion. Results: Of the 5 articles, we analyzed 868 assertions. Of this total, 82.49% were verified by the referentials, 15.55% not verifiable due to lack of information and 1.96% contradicted by the referentials. Of the contradicted results, 10 corresponded to obsolete notions and 7 to errors, but mainly dealing with epidemiological or statistical data, thus not leading to a major risk when used, not recommended, on health. Conclusion: … This study of five medical articles finds a high rate of veracity with less than 2% incorrect information and more than 82% of information confirmed by scientific references. These results strongly argue that Wikipedia could be a reliable source of medical information, provided that it does not remain the only source used by people for that purpose.”

This medical PhD thesis is a very well documented analysis of the questions raised by the publication of medical information on Wikipedia. Although the findings, summarized in the abstract, will not be new to those who know Wikipedia well, it presents a good review of the literature on the topic of medical accuracy, and also of the purpose of Wikipedia (not a professional encyclopedia, but a form of popular science, an introduction, and some links to go further). This document is in French.

Assessing article quality and popularity across 44 Wikipedia language versions

Reviewed by Nicolas Jullien

From the paper: Distribution of quality scores in 12 topic areas on English, German and French Wikipedia

Image by Włodzimierz Lewoniewski, Krzysztof Węcel and Witold Abramowicz, CC BY 4.0

Overlaps of the English, German and French Wikipedia’s coverage of universities. The authors provide an interactive online tool to generate such Venn diagrams for other topic areas and language combinations.
Image by Włodzimierz Lewoniewski, Krzysztof Węcel and Witold Abramowicz, CC BY 4.0

This is the topic of a paper in the journal Informatics[2]. From the English abstract: “Our research has showed that in language sensitive topics, the quality of information can be relatively better in the relevant language versions. However, in most cases, it is difficult for the Wikipedia readers to determine the language affiliation of the described subject. Additionally, each language edition of Wikipedia can have own rules in the manual assessing of the content’s quality. There are also differences in grading schemes between language versions: some use a 6–8 grade system to assess articles, and some are limited to 2–3. This makes automatic quality comparison of articles between various languages a challenging task, particularly if we take into account a large number of unassessed articles; some of the Wikipedia language editions have over 99% of articles without a quality grade. The paper presents the results of a relative quality and popularity assessment of over 28 million articles in 44 selected language versions. Comparative analysis of the quality and the popularity of articles in popular topics was also conducted. Additionally, the correlation between quality and popularity of Wikipedia articles of selected topics in various languages was investigated. The proposed method allows us to find articles with information of better quality that can be used to automatically enrich other language editions of Wikipedia.”

Regarding the quality metrics, I salute the coverage in terms of languages, which allows to go beyond the “official” automated evaluation provided by the Wikimedia Foundation (ORES) that is only available on some big language projects. As the authors explained, this part is mostly based on a work already published, but fairly extended. It also proposes some solutions to the quality comparisons between different languages, and takes into account the variations of perspectives between different cultures.

It also opens a discussion about the popularity of articles, and how this can help to choose which master language has to be chosen when an article exists. Although this part is just at its beginning, their discussion makes the next step for their work, looking forward.

From the paper: Distribution of various article metrics by quality class on English Wikipedia
Image by Włodzimierz Lewoniewski, Krzysztof Węcel and Witold Abramowicz, CC BY 4.0

“Wikipedia: An opportunity to rethink the links between sources’ credibility, trust, and authority”

Reviewed by FULBERT

This theoretical paper[3] explored ambiguous relationships between credibility, trust, and authority in library and information sciences and how they are related to perceived accuracy in information sources. Credibility is linked to trust, necessary when we seek to learn from or convey information between people. This is complicated when the authority of a source is considered, as personal or institutional levels of expertise increase the ability to speak with greater credibility.

The literature about how this works with knowledge and information on the Web is inconsistent, and as a result this work sought to develop a unified approach through a new model. As credibility, trust, and authority are distinct concepts that are frequently used together inconsistently, they were explored through how Wikipedia is used and perceived. While Wikipedia is considered highly accurate, trust in it is average while its credibility is at times suspect.

Sahut and Tricot developed the authority, trust and credibility (ATC) model, where “knowledge institutions confer authority to a source, this authority ensures trust, which ensures the credibility of the information.” As a result, “the credibility of the information builds trust, which builds the authority of the source.” This model can be useful when applying to the citation of sources in Wikipedia, as it helps explain how the practice of providing citations in Wikipedia increases credibility and thus encourages trust, “linking content to existing knowledge sources and institutions.”

The ATC model is a helpful framework for explaining how Wikipedia, with its enormous readership, continues to suffer from challenges to being perceived as an authority due to its inconsistencies in article citations and references. This theorizes that filling these gaps will increase authority and thus the reputation of Wikipedia itself.

Figure 2 from the paper, on Wikipedia authority, trust and credibility. (“The educational institution can spread a bad reputation on Wikipedia, which decreases its authority, has a negative influence on its trust, which negatively influences the credibility of the information. Conversely, a positive experience of credibility of Wikipedia information increases readers’ trust.”)
by Gilles Sahut and André Tricot, public domain

Conferences and events

Academia and Wikipedia: Critical Perspectives in Education and Research

A call for papers has been published for a conference titled “Academia and Wikipedia: Critical Perspectives in Education and Research”, to be held on June 18, 2018, at Maynooth University in the Republic of Ireland. The organizers describe it as “a one-day conference that aims to investigate how researchers and educators use and interrogate Wikipedia. The conference is an opportunity to present research into and from Wikipedia; research about Wikipedia, or research that uses Wikipedia as a data object”.

Wiki Workshop 2018

The fifth edition of Wiki Workshop will take place in Lyon, France on April 24, 2018, as part of The Web Conference 2018. Wiki Workshop brings together researchers exploring all aspects of Wikimedia websites, such as Wikipedia, Wikidata, and Wikimedia Commons. The call for papers is now available. The submission deadline for papers to appear in the proceedings of the conference is January 28, all other papers on March 11.

See the research events page on Meta-wiki for other upcoming conferences and events, including submission deadlines.

Other recent publications

Other recent publications that could not be covered in time for this issue include the items listed below. contributions are always welcome for reviewing or summarizing newly published research.

Compiled by Tilman Bayer

OpenSym 2017

  • “What do Wikidata and Wikipedia have in common?: An analysis of their use of external references”[4] From the abstract: “Our findings show that while only a small number of sources is directly reused across Wikidata and Wikipedia, references often point to the same domain. Furthermore, Wikidata appears to use less Anglo-American-centred sources.”
  • “A glimpse into Babel: An analysis of multilinguality in Wikidata”[5] From the abstract: “we explore the state of languages in Wikidata as of now, especially in regard to its ontology, and the relationship to Wikipedia. Furthermore, we set the multilinguality of Wikidata in the context of the real world by comparing it to the distribution of native speakers. We find an existing language maldistribution, which is less urgent in the ontology, and promising results for future improvements.”
  • “Before the sense of ‘we’: Identity work as a bridge from mass collaboration to group emergence”[6] From the paper: “… From these interviews, we identified that a Featured Article (FA) collaboration that had occurred in 2007 in the “Whooper Swan” Wikipedia article, was very important for the actions of later group work. The focus of this paper is around this foundational article.”

Illustration from “Interpolating quality dynamics in Wikipedia and demonstrating the Keilana effect”
Image by EpochFail (Aaron Halfaker), CC BY-SA 4.0

  • “Interpolating quality dynamics in Wikipedia and demonstrating the Keilana effect”[7] From the abstract: “I describe a method for measuring article quality in Wikipedia historically and at a finer granularity than was previously possible. I use this method to demonstrate an important coverage dynamic in Wikipedia (specifically, articles about women scientists) and offer this method, dataset, and open API to the research community studying Wikipedia quality dynamics.” (see also research project page on Meta-wiki)

See also our earlier coverage of another OpenSym 2017 paper: “Improved article quality predictions with deep learning

OpenSym 2016

  • “Mining team characteristics to predict Wikipedia article quality”[8] From the abstract: “The experiment involved obtaining the Spanish Wikipedia database dump and applying different data mining techniques suitable for large data sets to label the whole set of articles according to their quality (comparing them with the Featured/Good Articles, or FA/GA). Then we created the attributes that describe the characteristics of the team who produced the articles and using decision tree methods, we obtained the most relevant characteristics of the teams that produced FA/GA. The team’s maximum efficiency and the total length of contribution are the most important predictors.”
  • “Predicting the quality of user contributions via LSTMs”[9] From the discussion section: “We have presented a machine-learning approach for predicting the quality of Wikipedia revisions that can leverage the complete contribution history of users when making predictions about the quality of their latest contribution. Rather than using ad-hoc summary features computed on the basis of user’s contribution history, our approach can take as input directly the information on all the edits performed by the user [e.g. features such as “Time interval to previous revision on page”, the number of characters added or removed, “Spread of change within the page”, “upper case/ lower case ratio”, and “day of week”]. Our approach leverages the power of LSTMs (long-short term memory neural nets) for processing the variable-length contribution history of users.”

Plot describing the change, from October 2014 to January 2016, in the absolute number of female biography articles (horizontal axis) and their share among all biographies (vertical axis), for various Wikipedia languages (appearing in similar form in the “Monitoring the Gender Gap …” paper)
Image by Maximilianklein, CC BY-SA 4.0

  • “Monitoring the gender gap with Wikidata human gender indicators”[10] From the abstract: “The gender gap in Wikipedia’s content, specifically in the representation of women in biographies, is well-known but has been difficult to measure. Furthermore the impacts of efforts to address this gender gap have received little attention. To investigate we use Wikidata, the database that feeds Wikipedia, and introduce the “Wikidata Human Gender Indicators” (WHGI), a free and open-source, longitudinal, biographical dataset monitoring gender disparities across time, space, culture, occupation and language. Through these lenses we show how the representation of women is changing along 11 dimensions. Validations of WHGI are presented against three exogenous datasets: the world’s historical population, “traditional” gender-disparity indices (GDI, GEI, GGGI and SIGI), and occupational gender according to the US Bureau of Labor Statistics.” (see also Wikimedia Foundation grant page)
  • “An empirical evaluation of property recommender systems for Wikidata and collaborative knowledge bases”[11] From the abstract: “Users who actively enter, review and revise data on Wikidata are assisted by a property suggesting system which provides users with properties that might also be applicable to a given item. … We compare the [recommendation] approach currently facilitated on Wikidata with two state-of-the-art recommendation approaches stemming from the field of RDF recommender systems and collaborative information systems. Further, we also evaluate hybrid recommender systems combining these approaches. Our evaluations show that the current recommendation algorithm works well in regards to recall and precision, reaching a recall of 79.71% and a precision of 27.97%.”
  • “Medical science in Wikipedia: The construction of scientific knowledge in open science projects”[12] From the abstract: “The goal of my research is to build a theoretical framework to explain the dynamic of knowledge building in crowd-sourcing based environments like Wikipedia and judge the trustworthiness of the medical articles based on the dynamic network data. By applying actor–network theory and social network analysis, the contribution of my research is theoretical and practical as to build a theory on the dynamics of knowledge building in Wikipedia across times and to offer insights for developing citizen science crowd-sourcing platforms by better understanding how editors interact to build health science content.”
  • “Comparing OSM area-boundary data to DBpedia”[13] From the abstract: “OpenStreetMap (OSM) is a well known and widely used data source for geographic data. This kind of data can also be found in Wikipedia in the form of geographic locations, such as cities or countries. Next to the geographic coordinates, also statistical data about the area of these elements can be present. … in this paper OSM data of different countries are used to calculate the area of valid boundary (multi-) polygons and are then compared to the respective DBpedia (a large-scale knowledge base extract from Wikipedia) entries.”

See also our earlier coverage of another OpenSym 2016 paper: “Making it easier to navigate within article networks via better wikilinks

Diverse other papers, relating to structured data

Figure from “Scholia and scientometrics with Wikidata” (screenshot of https://tools.wmflabs.org/scholia/author/Q20980928 )
by Finn Årup Nielsen, CC0 1.0

  • “Scholia and scientometrics with Wikidata”[14] From the abstract: “Scholia is a tool to handle scientific bibliographic information in Wikidata. The Scholia Web service creates on-the-fly scholarly profiles for researchers, organizations, journals, publishers, individual scholarly works, and for research topics. To collect the data, it queries the SPARQL-based Wikidata Query Service.”
  • “Linking Wikidata to the rest of the Semantic Web”[15]
  • “Chaudron: Extending DBpedia with measurement”[16] From the abstract: “We propose an alternative extraction to the traditional mapping creation from Wikipedia dump, by also using the rendered HTML to avoid the template transclusion issue. This dataset extends DBpedia with more than 3.9 million triples and 949.000 measurements on every domain covered by DBpedia. […] An extensive evaluation against DBpedia and Wikidata shows that our approach largely surpasses its competitors for measurement extraction on Wikipedia Infoboxes. Chaudron exhibits a F1-score of .89 while DBpedia and Wikidata respectively reach 0.38 and 0.10 on this extraction task.”
  • “Assessing and Improving Domain Knowledge Representation in DBpedia”[17] From the abstract: “… we assess the quality of DBpedia for domain knowledge representation. Our results show that DBpedia has still much room for improvement in this regard, especially for the description of concepts and their linkage with the DBpedia ontology. Based on this analysis, we leverage open relation extraction and the information already available on DBpedia to partly correct the issue, by providing novel relations extracted from Wikipedia abstracts and discovering entity types using the dbo:type predicate …”
  • “A Case Study of Summarizing and Normalizing the Properties of DBpedia Building Instances”[18] From the abstract: “The DBpedia ontology [holds] information for thousands of important buildings and monuments, thus making DBpedia an international digital repository of the architectural heritage. This knowledge for these architectural structures, in order to be fully exploited for academic research and other purposes, must be homogenized, as its richest source – Wikipedia infobox template system – is a heterogeneous and non-standardized environment. The work presented below summarizes the most widely used properties for buildings, categorizes and highlights structural and semantic heterogeneities allowing DBpedia’s users a full exploitation of the available information.”
  • “Experience: Type alignment on DBpedia and Freebase”[19] From the abstract: “… instances of many different types (e.g. Person) can be found in published [ linked open data ] datasets. Type alignment is the problem of automatically matching types (in a possibly many-many fashion) between two such datasets. Type alignment is an important preprocessing step in instance matching. Instance matching concerns identifying pairs of instances referring to the same underlying entity. By performing type alignment a priori, only instances conforming to aligned types are processed together, leading to significant savings. This article describes a type alignment experience with two large-scale cross-domain RDF knowledge graphs, DBpedia and Freebase, that contain hundreds, or even thousands, of unique types. Specifically, we present a MapReduce-based type alignment algorithm … “
  • “High-Throughput and Language-Agnostic Entity Disambiguation and Linking on User Generated Data”[20] From the preprint (which contains no mention of Wikidata): “Our KB [ knowledge base] consists of about 1 million Freebase machine ids for entities. These were chosen from a subset of all Freebase entities that map to Wikipedia entities. We prefer to use Freebase rather than Wikipedia as our KB since in Freebase, the same id represents a unique entity across multiple languages… …we generated a ground truth data set for our EDL system, the Densely Annotated Wikipedia Text (DAWT), using densely Wikified or annotated Wikipedia articles. Wikification is entity linking with Wikipedia as the KB. We started with Wikipedia data dumps, which were further enriched by introducing more hyperlinks in the existing document structure. […] As a last step, the hyperlinks to Wikipedia articles in a specific language were replaced with links to their Freebase ids to adapt to our KB. … We also plan to migrate to Wikipedia as our KB.”
  • “Managing and Consuming Completeness Information for Wikidata Using COOL-WD”[21] From the abstract: “… we discuss how to manage and consume meta-information about completeness for Wikidata. […] We demonstrate the applicability of our approach via COOL-WD (http://cool-wd.inf.unibz.it/), a completeness tool for Wikidata, which at the moment collects around 10,000 real completeness statements.” (see also related paper)
  • “Querying Wikidata: Comparing SPARQL, Relational and Graph Databases”[22] From the abstract: “… we experimentally compare the efficiency of various database engines for the purposes of querying the Wikidata knowledge-base…”
  • “Reifying RDF: What Works Well With Wikidata?”[23] From the abstract: “… we compare various options for reifying RDF triples. We are motivated by the goal of representing Wikidata as RDF, which would allow legacy Semantic Web languages, techniques and tools – for example, SPARQL engines – to be used for Wikidata. However, Wikidata annotates statements with qualifiers and references, which require some notion of reification to model in RDF. We thus investigate four such options: …” (A SPARQL-based search engine for Wikidata has since become available.)


  1. Antonini, Sébastien (2017-06-22). “Étude de la véracité des articles médicaux sur Wikipédia”. Aix Marseille Université. 
  2. Lewoniewski, Włodzimierz; Krzysztof, Węcel; Abramowicz, Witold (2017-06-22). “Relative Quality and Popularity Evaluation of Multilingual Wikipedia”. Informatics 2017, 4(4), 43. 
  3. Sahut, Gilles; Tricot, André (2017-10-31). “Wikipedia: An opportunity to rethink the links between sources’ credibility, trust, and authority”. First Monday 22 (11). ISSN 1396-0466. doi:10.5210/fm.v22i11.7108. Retrieved 2017-12-17. 
  4. Piscopo, Alessandro; Vougiouklis, Pavlos; Kaffee, Lucie-Aimée; Phethean, Christopher; Hare, Jonathon; Simperl, Elena (2017). What do Wikidata and Wikipedia have in common?: An analysis of their use of external references (PDF). OpenSym ’17. New York, NY, USA: ACM. pp. 1–1–1:10. ISBN 9781450351874. doi:10.1145/3125433.3125445. 
  5. Kaffee, Lucie-Aimée; Piscopo, Alessandro; Vougiouklis, Pavlos; Simperl, Elena; Carr, Leslie; Pintscher, Lydia (2017). A glimpse into Babel: An analysis of multilinguality in Wikidata (PDF). OpenSym ’17. New York, NY, USA: ACM. pp. 14–1–14:5. ISBN 9781450351874. doi:10.1145/3125433.3125465. 
  6. Lanamäki, Arto; Lindman, Juho (2017). Before the sense of ‘we’: Identity work as a bridge from mass collaboration to group emergence (PDF). OpenSym ’17. New York, NY, USA: ACM. pp. 5–1–5:9. ISBN 9781450351874. doi:10.1145/3125433.3125451. 
  7. Halfaker, Aaron (2017). Interpolating quality dynamics in Wikipedia and demonstrating the Keilana effect (PDF). OpenSym ’17. New York, NY, USA: ACM. pp. 19–1–19:9. ISBN 9781450351874. doi:10.1145/3125433.3125475. 
  8. Betancourt, Grace Gimon; Segnine, Armando; Trabuco, Carlos; Rezgui, Amira; Jullien, Nicolas (2016). Mining team characteristics to predict Wikipedia article quality. OpenSym ’16. New York, NY, USA: ACM. pp. 15–1–15:9. ISBN 9781450344517. doi:10.1145/2957792.2971802. 
  9. Agrawal, Rakshit; deAlfaro, Luca (2016). Predicting the quality of user contributions via LSTMs (PDF). OpenSym ’16. New York, NY, USA: ACM. pp. 19–1–19:10. ISBN 9781450344517. doi:10.1145/2957792.2957811. 
  10. Klein, Maximilian; Konieczny, Piotr; Zhu, Haiyi; Rai, Vivek; Gupta, Harsh (2016). Monitoring the gender gap with Wikidata human gender indicators (PDF). OpenSym 2016. Berlin, Germany. p. 9. 
  11. Zangerle, Eva; Gassler, Wolfgang; Pichl, Martin; Steinhauser, Stefan; Specht, Günther (2016). An empirical evaluation of property recommender systems for Wikidata and collaborative knowledge bases (PDF). OpenSym ’16. New York, NY, USA: ACM. pp. 18–1–18:8. ISBN 9781450344517. doi:10.1145/2957792.2957804. 
  12. Tamime, Reham Al; Hall, Wendy; Giordano, Richard (2016). Medical science in Wikipedia: The construction of scientific knowledge in open science projects (PDF). OpenSym ’16. New York, NY, USA: ACM. pp. 4–1–4:4. ISBN 9781450344814. doi:10.1145/2962132.2962141.  (extended abstract)
  13. Silbernagl, Doris; Krismer, Nikolaus; Specht, Günther (2016). Comparing OSM area-boundary data to DBpedia (PDF). OpenSym ’16. New York, NY, USA: ACM. pp. 11–1–11:4. ISBN 9781450344517. doi:10.1145/2957792.2957806. 
  14. Nielsen, Finn Årup; Mietchen, Daniel; Willighagen, Egon (2017-05-28). Scholia, Scientometrics and Wikidata. European Semantic Web Conference. Lecture Notes in Computer Science. Springer, Cham. pp. 237–259. ISBN 9783319704067. doi:10.1007/978-3-319-70407-4_36. 
  15. Andra Waagmeester, Egon Willighagen, Núria Queralt Rosinach, Elvira Mitraka, Sebastian Burgstaller-Muehlbacher, Tim E. Putman, Julia Turner, Lynn M Schriml, Paul Pavlidis, Andrew I Su, and Benjamin M Good: Linking Wikidata to the rest of the Semantic Web. Proceedings of the 9th International Conference Semantic Web Applications and Tools for Life Sciences. Amsterdam, The Netherlands, December 5-8, 2016. (conference poster)
  16. Subercaze, Julien (May 2017). Chaudron: Extending DBpedia with measurement. Portoroz, Slovenia: Eva Blomqvist, Diana Maynard, Aldo Gangemi. 
  17. Ludovic Font A, Amal Zouaq A, B, Michel Gagnon: Assessing and Improving Domain Knowledge Representation in DBpedia
  18. Agathos, Michail; Kalogeros, Eleftherios; Kapidakis, Sarantos (2016-09-05). “A Case Study of Summarizing and Normalizing the Properties of DBpedia Building Instances”. In Norbert Fuhr, László Kovács, Thomas Risse, Wolfgang Nejdl (eds.). Research and Advanced Technology for Digital Libraries. Lecture Notes in Computer Science. Springer International Publishing. pp. 398–404. ISBN 9783319439969. Retrieved 2016-08-27.  Closed access
  19. KEJRIWAL, MAYANK; MIRANKER, DANIEL P. (2016). “Experience: Type alignment on DBpedia and Freebase” (PDF). ACM: 10. 
  20. Bhargava, Preeti; Spasojevic, Nemanja; Hu, Guoning (2017-03-13). “High-Throughput and Language-Agnostic Entity Disambiguation and Linking on User Generated Data”. arXiv:1703.04498 [cs]. 
  21. Prasojo, Radityo Eko; Darari, Fariz; Razniewski, Simon; Nutt, Werner. Managing and Consuming Completeness Information for Wikidata Using COOL-WD (PDF). KRDB, Free University of Bozen-Bolzano, 39100, Italy. 
  22. Hernández, Daniel; Hogan, Aidan; Riveros, Cristian; Rojas, Carlos; Zerega, Enzo (2016-10-17). Querying Wikidata: Comparing SPARQL, Relational and Graph Databases. International Semantic Web Conference. Lecture Notes in Computer Science. Springer, Cham. pp. 88–103. ISBN 9783319465463. doi:10.1007/978-3-319-46547-0_10.  Closed access author’s preprint
  23. Hernández, Daniel; Hogan, Aidan; Krötzsch, Markus (2015). “Reifying RDF: What Works Well With Wikidata?”. Reifying RDF: What Works Well With Wikidata?. Proceedings of the 11th International Workshop on Scalable Semantic Web Knowledge Base Systems,. 1457 of CEUR Workshop Proceedings. pp. 32–47. 

Wikimedia Research Newsletter
Vol: 7 • Issue: 9 • September 2017
This newsletter is brought to you by the Wikimedia Research Committee and The Signpost
Subscribe: Syndicate the Wikimedia Research Newsletter feed Email WikiResearch on Twitter WikiResearch on Facebook[archives] [signpost edition] [contribute] [research index]

by Tilman Bayer at December 27, 2017 04:12 PM

Gerard Meijssen

#Wikidata - #Reebok Human Rights award

Once upon a time, there was a company called Reebok who presented an award to human rights activists under the age of 30. Every year four or five people received $50,000.--. Every year attention was given to human rights. Important enough because an award like this gives additional relevance and resonance for an extremely good cause. It may even provide some extra protection by making people more visible.

The award is no longer presented. Some people who were recognised  refused the award because in their opinion Reebok itself should take care about its human right record. Some people took actions and they were successful; the last award ceremony was in 2007. That is all; a lot less attention for human rights, defeat in victory.

The best information about the Reebok Human Rights Award is at the Internet Archive's Wayback machine.  Nothing wrong with the credentials as stated of the people who were awarded.. When you compare this with the linked people at Wikipedia, you will miss the Chilean soccer player, the Nigerian business magnate and find what are Wikipedia red links.

by Gerard Meijssen (noreply@blogger.com) at December 27, 2017 01:07 PM

Noella - My Outreachy 17

Coding has begun :))

Hmmm!! long waited for this time, where i will get my hands dirty 😎😎

Coding has finally started and the first thing is to setup a MediaWiki instance on Cloud VPS, which I shall use with my mentors to test changes before deployment. Believe me, it was a sweet experience but difficult. Just at the beginning, I had multiple challenges😓😓 but luckily, I have very responding mentors (@Legoktm, @D3r1ck) always available for help. This is a narration of what happened.

I had never worked with cloud services before so I was relying greatly on documentations and following of steps. I created a new instance massmessage-test-wiki on the massmessage project on WikiMedia Cloud VPS. That was pretty simple. The next thing was to connect to the instance and setup MediaWiki on it. I had struggle for hours without succeeding, made research with no good output. In fact, I could not even connect to the instance😢😢. After some long struggle I thought the problem might be my operating system as I had no other idea. I formated my PC and installed Ubuntu 16.04 hopping all is well, I configured ssh on the machine and tried to login again but same error. I almost got mad at that point😠😠.  I contacted my mentors explaining to them what has being going on. After a chat with both mentors we figured out what was the problem and within a minute I was able to connect to the instance and MediaWiki is up and running on massmessage-test-wiki instance😎😎.

Actually, the solution was out of my reach. I needed a shell privileges which I could not grant myself. CHAI!!!! I wish I had contacted my mentors before formating😥😥. It would have saved me the stress of backing up and setting up my whole PC.

At least It was a great challenge and I learnt alot. Looking forward to greater challenges (hopefully without formating) and to greater things to learn😋😋😋.  

by Noella Teke (noreply@blogger.com) at December 27, 2017 06:55 AM

December 26, 2017

Wiki Education Foundation

Roundup: Eastern European Literature

If you had searched for the Hungarian novel, The Door, on Wikipedia before this year, you would have been disappointed with the lack of information. Now if you looked for it, you’d find the work of a student in Sibelan Forrester’s Spring 2017 course at Swarthmore College, Eastern European Prose. The student expanded the article to include the story’s plot, main characters, and critical reception. They also touched upon the autobiographical elements found in this story by Magda Szabó. The novel follows Magda, a writer, and her unique relationship with her housecleaner, an older woman named Emerence. Emerence is a mysterious and strong-willed character. She sets her own wages, hours, and chores and holds many secrets behind the closed doors of her house. Magda is fascinated by the woman and develops a close relationship with her over many years. The novel opens with Magda’s reoccurring nightmare, from which she awakes to say “I killed Emerence.” The rest of the novel serves as the narrator’s explanation of this sentence.

Another student expanded the article on Garden, Ashes, a Yugoslavian novel written in 1965 by Danilo Kiš. The novel, which follows a young boy whose father is sent to Auschwitz, is based on Kiš’s own childhood. Andi Scham, the young boy protagonist, insists that his father is not gone, but has only disappeared. Andi often imagines that his father is following him in disguise and invents stories about him. The boy’s imagination keeps the image of his father alive, but gives him both nightmares and haunting illusions. The student who expanded this article added more information about the autobiographical elements of the story, common themes and symbols, the origins of its title, and critical analysis.

When students contribute to Wikipedia through a Wikipedia assignment, they become creators of knowledge, not simply consumers of it. Their work has a worldwide audience and a measurable impact. In total, students in Professor Forrester’s spring course added 17,000 words to Wikipedia and their work has been viewed 30,900 times.

Students also gain critical skills that they take with them into their academic and professional lives through a Wikipedia assignment. If you’re interested in teaching with Wikipedia or in learning more, visit teach.wikiedu.org or reach out to contact@wikiedu.org.

Image: File:Old Brass Door Knocker.jpgScrypted, public domain, via Wikimedia Commons.

by Cassidy Villeneuve at December 26, 2017 05:20 PM

December 25, 2017

Tech News

Tech News issue #52, 2017 (December 25, 2017)

TriangleArrow-Left.svgprevious 2017, week 52 (Monday 25 December 2017) nextTriangleArrow-Right.svg
Other languages:
العربية • ‎English • ‎français • ‎magyar • ‎italiano • ‎polski • ‎português do Brasil • ‎svenska • ‎українська • ‎中文

December 25, 2017 12:00 AM

December 24, 2017

Gerard Meijssen

#Wikimedia - #diversity and #inclusion requires #trust

All the Wikimedia projects have their culture. Each project has its own culture and there is this overall stated ambition that diversity and the inclusion that is needed to make it work is alive and well.

We have our diversity conferences and the best result is how the "gender gap" is approached. It gets a lot of attention and the positive effects are noticeable. There is however more to diversity and some of the beliefs we hold so dear prevent the inclusion from those that are at the outside looking in.

One of the Wikimedia traditions is that we do not trust; trust is in the citations, the sources. We do not trust each other, why should we? When for diversities sake, people who receive the "Harriët Freezerring" are added, it is accepted because there is a Wikipedia article that mentions them.. But when people are added because they are "artists from the African diaspora" there is a problem. There are no articles yet and the point of adding the artists first is because Wikidata enables managing projects in multiple languages.

There are many people who are targetted for attention in Wikipedia editathons. There have been editathons in the past so there is an established track record for the Black Lunch Table. That did not bring trust, the trust needed to accept that the BLT will manage the people on the list. The trust that bare boned items will get sufficient statements eventually.

The problem with trust is that when it is not given, it can not be assumed for other, similar situations either. The trust that retractions from scientific papers will be included so that we know what Wikipedia articles are inherently wrong. Retractions are absent at this time and while I trust the people involved in the inclusion of citations, why trust at all when equally worthy causes are not trusted? Why include all these scientific papers without similar quality control?

by Gerard Meijssen (noreply@blogger.com) at December 24, 2017 03:45 PM