This month a Recurser I know, Pepijn de Vos, observed a concentration of high-quality open source software in the developer tools category, to the exclusion of other categories. With a few exceptions.

I understood where he's coming from, though my assessment differs. I started reflecting on those exceptions. Do they "prove the rule" in the colloquial sense that "every rule has exceptions," or do they "prove the rule" in the older sense, in that they give us an opportunity to test the rule? A few years ago I learned about this technique called "appreciative inquiry" which says: look at the unusual examples of things that are working well, and try to figure out how they've gotten where they are, so we can try to replicate it. So I think it's worth thinking a bit more about those exceptional FLOSS projects that aren't developer tools and that are pretty high-quality, in user experience design and robust functionality. And it's worth discussing problems and approaches in product management and user experience design in open source, and pointing to people already working on it.

FLOSS with good design and robust functionality: My list would include Firefox, Chromium, NetHack, Android, Audacity, Inkscape, VLC, the Archive Of Our Own, Written? Kitten!, Signal, Zulip, Thunderbird, and many of the built-in applications on the Linux desktop. I don't have much experience with Blender or Krita, but I believe they belong here too. (Another category worth thinking about: FLOSS software that has no commercial competitor, or whose commercial competitors are much worse, because for-profit companies would be far warier of liability or other legal issues surrounding the project. Examples: youtube-dl, Firefox Send, VLC again, and probably some security/privacy stuff I don't know much about.)

And as I start thinking about what helped these projects get where they are, I reach for the archetypes at play. I'll ask James and Karl to check my homework, but as I understand it:

Mass Market: NetHack, VLC, Firefox, Audacity, Inkscape, Thunderbird, youtube-dl
Controlled Ecosystem: Zulip, Archive Of Our Own
Business-to-business open source: Android, Chromium
Rocket Ship To Mars: Signal
Bathwater? Wide Open? Trusted Vendor? not sure: Written? Kitten!

The only "Wide Open" example that easily comes to mind for me is robotfindskitten, a game which -- like Written? Kitten! -- does one reasonably simple thing and does it well. Leonard reflected on reasons for its success at Roguelike Celebration 2017 (video). But I'd be open to correction, especially by people who are familiar with NetHack, VLC, Audacity, Inkscape, or youtube-dl development processes.

Design: Part of de Vos's point is about cost and quality in general. But I believe part of what he's getting at is design. Which FLOSS outside of developer tooling has good design?

In my own history as an open source contributor and leader, I've worked some on developer tools like PyPI and a linter for OpenNews, but quite a lot more on tools for other audiences, like MediaWiki, HTTPS Everywhere, Mailman, Zulip, bits of GNOME, AltLaw, and the WisCon app. The first open source project I ever contributed to, twelve years ago, was Miro, a video player and podcatcher. And these projects had all sorts of governance/funding structures: completely volunteer-run with and without any formal home, nonprofit with and without grants, academic, for-profit within consultancies and product companies.

So I know some of the dynamics that affect user experience in FLOSS for general audiences (often negatively), and discussed some of them in my code4lib keynote "User Experience is a Social Justice Issue" a few years ago. I'm certainly not alone; Simply Secure, Open Source Design, Cris Beasley, The Land, Clar, and Risker are just a few of the thinkers and practitioners who have shared useful thoughts on these problems.

In 2014, I wrote a few things about this issue, mostly in public, like the code4lib keynote and this April Fool's joke:

It turns out you can go into your init.cfg file and change the usability flag from 0 to 1, and that improves user experience tremendously. I wonder why distributions ship it turned off by default?
Wikimedia and pushback: But I also wrote a private email that year that I'll reproduce below. I wrote it about design change friction in Wikimedia communities, so it shorthands some references to, for instance, a proposed opt-in Wikimedia feature to help users hide some controversial images. But I hope it still provides some use even if you don't know that history.
I wanted to quickly summarize some thoughts and expand on the conversation you and I had several days ago, on reasons Wikimedia community members have a tough time with even opt-in or opt-out design changes like the image filter or VisualEditor or Media Viewer.
  • ideology of a free market of ideas -- the cure for bad speech is more speech, if you can't take the heat then you should not be here, aversion to American prudishness etc., etc. (more relevant for image filter)
    • relatedly "if you can't deal with the way things are then you are too stupid to be here" (more applicable to design simplifications like Media Viewer and VisualEditor)
  • people are bad at seeing that the situation that has incrementally changed around them is now a bad one (frog in pot of boiling water); see checkbox proliferation and baroque wikitext/template metastasis
  • most non-designers are bad at design thinking (at assessing a design, imagining it as a changeable prototype, thinking beyond their initial personal and aesthetic reaction, sussing out workflows and needs and assessing whether a proposed design would suit them, thinking from other people's points of view, thinking from the POV of a newcomer, etc.)
    • relatedly, we do not share a design vocabulary of concepts, nor principles that we aim to uphold or judge our work against (in contrast see our vocabulary of concepts and principles for Wikipedia content, e.g. NPOV, deletionism/inclusionism)
      • so people can only speak from their own personal aesthetics and initial reactions, which are often negative because in general people are averse to surprise novelty in environments they consider home, and the discourse can't rise beyond "I don't like it, therefore it sucks"
  • past history of difficult conversations, sometimes badly managed (e.g. image filter) and too-early rollout of buggy feature as a default (e.g. VisualEditor), causes once-burned-twice-shy wariness about new WMF features
    • Wikimedians' core ethos: "It's a wiki" (if you see a problem, e.g. an error in a Wikipedia article, try to fix it); everyone is responsible for maintaining and improving the project, preventing harm
      • ergo people who feel responsible for the quality of the project are like William F. Buckley's "National Review" in terms of their conservatism, standing athwart history yelling "stop"

I haven't answered some questions: what are the common patterns in our success stories (governance, funding, community size, maintainership history, etc.)? How do we address or prevent problems like the ones I mentioned seeing within Wikimedia? But it's great to see progress on those questions from organizations like Wikimedia and Simply Secure and Open Tech Strategies (disclosure: I often do work with the latter), and I do see hope for plausible ways forward.

What might the future of the Wikimedia movement look like? At the Wikimedia Summit, held in Berlin, Germany, from 29–31 March, around 210 participants from across the globe gathered to find answers to this question.

Over three energetic days, representatives from Wikimedia affiliates, the Wikimedia Foundation, and three Wikimedia committees came together with members of nine movement strategy working groups to discuss how to build the future of the Wikimedia movement and ensure access to more knowledge for more people. Via interactive sessions, open space forums, and conversation circles, participants analyzed research the working groups had produced and key questions they have formulated. Each program element was designed to help find answers to these questions, engage the affiliates in this work, and provide a platform for creating our future together.

“We’re here to work collaboratively on our future”

With a fresh name for 2019, the Wikimedia Summit—formerly the “Wikimedia Conference”—is the annual meeting of Wikimedia affiliates, the Wikimedia Foundation, and three Wikimedia committees, and is hosted and organized by Wikimedia Germany (Deutschland; WMDE). The change of name went hand in hand with a refining of the event’s objectives: to discuss strategy and governance of the Wikimedia movement. This year’s event put the spotlight on the movement strategy, which brings together members from all parts of our community to determine where we as a movement want to go and how we get there.

The Wikimedia Summit got underway with opening speeches that brought the bigger picture into view right from the outset. Michelle Müntefering, the German Minister of State at the Federal Foreign Office, acknowledged the work of Wikimedians in proliferating knowledge and spoke of the importance of reflecting different, diverse perspectives in this “new, democratic global library.” Katherine Maher, the Wikimedia Foundation’s Executive Director, got straight to the point, stating “we’re here to work collaboratively on our future” before Kaarel Vaidla, the movement strategy process architect, and Nicole Ebber, the movement strategy program manager, took the floor to outline how.

Finding the best way to move forward together

As a key gathering for Wikimedia affiliates, the Wikimedia Summit provided working groups a dedicated space to engage with some of their main stakeholders. Its strategy-focused objective saw discussions center on how to advance in our strategic direction and become the essential support system for the whole free knowledge movement. On a concrete level,  this was the first opportunity for the working groups to speak directly to affiliates and start developing answers together to sets of guiding questions that the groups have formulated, each relating to a specific thematic area within the movement. Among the key insights that came out of the Summit are that there is a need to turn existing frustration into positive energy; ensure that people are at the core of what we do; and create partnerships and share resources effectively and meaningfully. Some working groups divided into sub-groups to enable them to better interact with the affiliates, and with other working groups.

Day one allowed working groups time to work together while the affiliate representatives were guided through how to use these questions to kickstart conversations about our future as well as ask their questions about the role they can play. The day was capped off by thematic meetups and dinner at WMDE’s offices.

Day two’s program focused on developing content and enriching it with the perspectives and input from Wikimedia affiliates. In a relaxed, open format, affiliates were able to visit each group at designated  areas in the venue, discuss the scoping documents and questions, and enhance conversations by providing useful local or thematic insight into the group’s work. The Wikimedia Board of Trustees rounded off the day’s program with their reflections on the path toward our future before the evening festivities got underway at nearby hotspot Villa Neukölln.

After two days of discussions, ideas generation, and exchange, the third and final day of the event focused on how to build on these conversations and develop next steps. Working groups took the input they received on day two and used this to start crafting a plan for action for developing recommendations for structural change within the Wikimedia movement. The event concluded with speeches and insights from Emna Mizouni (Wikimedian from Tunisia), Sunil Abraham (The Centre for Internet and Society), and Ryan Merkley (Creative Commons), who acknowledged the hard work it has taken to get to this point and offered words of encouragement to take us forward. Emna called upon participants take the insights from the Wikimedia Summit and apply them within their context: “We’re the privileged people [who get] to go back to our countries and local communities to deliver what we have been through during these three days.”  Ryan spoke to the ambitious nature of collaboratively designing our movement’s future, saying “It’s a rare opportunity that a group this large gets to do strategy… The reason we do it is because it’s better. It takes longer, it’s frustrating, but it’s also more inclusive and we get better ideas.” You can add your voice and your ideas for change. Conversations are currently happening online where you can can share your insights and solutions that will make the Wikimedia movement future ready. Join in on the Meta-Wiki page, on several language wikis, and also via an upcoming survey.

More interaction, more strategy—and more sweets!

The program was designed to be adaptive, and this flexibility enabled all nine groups to connect with the Wikimedia affiliates in numerous ways. It also allowed facilitators to tailor their approach as engagement levels peaked and dropped and frustrations were surfaced. With each shift, facilitators adjusted the design to make sure the program responded to participant needs in real time. Crucially, affiliates had the opportunity to speak to working groups in person about the strategy for our future and develop a picture of how the organized part can contribute. Many commented that this is a necessary process and that being at the Summit had strengthened their own ideas about the future of the movement and motivated people to take these back to their own community.

As the central event to discuss strategy, the Wikimedia Summit provided the perfect forum to collaborate on our future, and the refined focus led to targeted, effective conversations. Fostering interaction lay at the heart of the Summit (aside from the opening and closing, there were no lectures, a first for this event), and this helped create meaningful exchanges between participants. Feedback from the attendees was positive, with people emphasizing the need for the movement to meet and network in person. Beyond the program and format, elements such as the friendly space policy were particularly well received as was the international sweets table (which is exactly as good as it sounds).  Thank you to all who attended! We’re already looking forward to the 2020 Wikimedia Summit.

Anna Rees, Project Assistant, International Relations
Wikimedia Germany (Deutschland)

You can find more info about the Wikimedia Summit on Meta and also read more about this year’s event in the Summit report.

Wikimedia UK welcomes three new trustees

10:42, Thursday, 18 2019 April UTC

We are very pleased to announce the recent appointment of three new trustees to Wikimedia UK’s board. Sangeet Bhullar, Jane Carlin and Marnie Woodward bring a range of skills and experience to the board including financial management, strategic partnerships and digital literacy.

Sangeet Bhullar joined the Wikimedia UK board in January 2019. She is the Founder and Director of WISE KIDS, which focuses on New Media Literacy Education, Digital Citizenship, Online Safety and Digital Well-being. In the last 17 years, Sangeet has worked with thousands of young people, parents and professionals in the UK, Singapore and Malaysia, addressing these themes. She is passionate about amplifying young people’s voices and about rights and agency in addressing risk, harm, opportunity and well-being online. Sangeet is based in Wales and is a member of a number of Welsh government and non-government committees. She will sit on Wikimedia UK’s Partnerships Advisory Board, helping to shape the chapter’s education programme.

Jane Carlin has been a trustee of Wikimedia UK since September 2018. She has served in a wide range of senior finance roles within the publishing and wider media sector, including educational publishing, and brings strong finance and compliance skills to the board. Jane is a chartered management accountant and is Chair of Wikimedia UK’s Audit and Risk Committee.

Marnie Woodward also joined the board in September 2018. She is a chartered management accountant, and has been involved in the charity sector as a finance director for several decades, having worked with the Dulwich Picture Gallery, the Mental Health Foundation, the Musicians Benevolent Fund and RPS Rainer among others. Previously she was a trustee and the chair of the Finance and Administration Committee of the Church Urban fund. She brings knowledge and experience of financial and organisational issues across a range of charitable enterprises. Marnie sits on the Audit and Risk Committee and is Treasurer for Wikimedia UK.

We are delighted to have attracted trustees of this calibre to the Wikimedia UK board, with such an impressive range of skills, knowledge and experience, and are looking forward to the new insights they can offer the charity over the next few years. There will be a number of additional board vacancies at Wikimedia UK this year, and we encourage members of our community to consider whether they could play a role within the governance of the organisation.

Screenshot of the AWB software – image by Magioladitis Reedy Rjwilmsi, GNU General Public License

Wikimedia UK has started running events to encourage long-time Wikipedia editors and those interested in becoming technically proficient at more complex tasks to gain skills that will allow them to improve Wikimedia projects.

In November we ran our first event on how to write a Featured Article. On May 7th, we will be running our second SkillShare event, focusing on how to use AutoWikiBrowser. AWB is “a browser that follows a user-generated list of pages to modify, presenting changes to implement within each of those pages, then progressing to the next page in the list once the changes are confirmed or skipped by the user.” It is intended to help editors make tedious and repetitive edits quickly and easily.

To use the software, you have to apply for permission on wiki, at this page. You need at least 250 non-automated edits in mainspace to get permission. I definitely recommend that if you want to come to our skillshare event that you do this in advance. You can then download the software here. Then follow the instructions on getting started.

There’s lots of other sources available online to understand how to use AWB like the video below.

So what can you do? You can auto tag templates, fix common typos, find and replace particular words, or import custom fixes. Once you’ve specified what you want to change, AWB will browse a set of selected pages, or a set of randomly generated ones, and then suggest changes based on your parameters. You can then review the suggested changes and decide to implement them or not.

You might want to check out the User Manual for AWB to get a better understanding about how to use it, but if you are a Wikipedia editor with a reasonable amount of experience who wants to better understand tools like AWB, you should consider coming to our next SkillShare, on Tuesday May 7. It takes place at the Wikimedia UK office in London, near London Bridge, Southwark and Blackfriars stations. Please sign up on the Eventbrite page to let us know you’re coming!

Five years ago, Manavpreet Kaur administered several tests for entry-level forensics diploma students at Punjabi University, located in Patiala, Punjab, India.

From her vantage point, Kaur—who had recently completed a Ph.D. in forensic science—quickly realized that not all of the students were comfortable with the course being taught exclusively in English. She searched for Punjabi-language documentation to help them but was frustrated to find that very little was available.

That’s when Satdeep Gill, a fellow student in another class, introduced her to the Punjabi-language Wikipedia.

Although the encyclopedia contained only a few thousand articles, Kaur recognized that the site had much potential for making knowledge about forensic science available, being much faster and cheaper to distribute than a traditional book.

Over the next year, Kaur wrote more than one hundred articles about the subject, including a span where she created one per day for a hundred days—known as completing “100wikidays.” (Her favorite is one you might expect: the entry about forensic science.) She also did a second 100wikidays to create biographies of notable women, thereby writing them into Punjabi’s popular history.

In 2016, however, Kaur decided that she wanted her work to have a greater impact, one that reached beyond her field. “By working on one specific subject, I was catering to the needs of very few people,” she tells me. “Some individuals are better editors and some are better facilitators. I am the latter. … I realized that I could contribute more effectively and productively in planning and organizing initiatives like education programs, trainings, and Wikipedia awareness campaigns.”

Kaur started with her own classroom of master’s students, a role she had taken on after beginning post-doctoral studies in facial expression analysis. Five of them started editing Wikipedia with help from Kaur, and two of them later completed their own 100wikidays. (In fact, one did two 100wikidays, and another is working towards their second as of this moment.)

This quickly snowballed, to put it mildly. Since then, Kaur helped organize nine different Wikimedia-focused events and contributed towards another seven. Given the Wikimedia movement’s persistent gender gap and the sheer number of languages in India—the country has 23 official languages, and hundreds more are spoken—Kaur has kept a special eye on putting women into leadership roles and ensuring that Indic events cut across language barriers.

Some of these events have emphasized the needs of Punjabi Wikipedia readers, which differ from those reading other, larger, Wikipedias. “We have found that the people who read the Punjabi Wikipedia are either interested in the literature [in the language] or seeking information … because of language constraints,” she tells me.

That knowledge factored into the creation of Wiki Women for Women Wellbeing, or WWWW2018, which brought together women leaders from across India to create and promote Wikipedia content about women’s health. This was a big project—the scale “forced us to build a support network,” she says—but what they took away from it was that it needed to be even bigger.

And that’s why March of this year was a very, very busy month for Kaur. First, she facilitated a joint UNESCO-Wikipedia gender bridging workshop where over the course of five hours, Indian women wrote historical women into Wikipedia. 41 articles were created and another 62 were edited, including a number of scientists, media personalities, and artists.

Second, she was the primary organizer of a women-focused “Train the Trainer” event held in New Delhi last month, where over a dozen women learned about communication tactics, partnerships, grant reporting, and more.

“So many amazing and enlightened people are working together selflessly to build Wikipedia,” she says. “It is an amazing place to be. I’ve met so many people from incredibly diverse backgrounds. The acknowledgments, trust, and faith given to and invested in me have helped shape who I am today and are the major driving force that keeps me going.”

Interview by Ed Erhart, Senior Editorial Associate, Communications
Wikimedia Foundation

To learn more about Manavpreet, visit her Wikimedia user page.

What happened on Wikipedia when Notre-Dame burned?

11:15, Wednesday, 17 2019 April UTC
Notre Dame de Paris on fire on April 15, 2019 – image by Milliped CC BY-SA 4.0

By Richard Nevell, Wikimedia UK Project Coordinator

Shortly before 7pm on April 15 in Paris a devastating fire broke out in one of France’s most iconic buildings. The fire was extinguished after more than 12 hours, and while the stone walls still stood and movable artwork had been removed, the spire and roof had collapsed, causing extensive damage.

As a medievalist and someone who studies destruction I watched with grim interest, as did much of the world. The cathedral is a masterpiece of medieval art and architecture and welcomed 13 million visitors in 2018.

Within minutes of the first news reports, Wikipedia was being updated. At 7:19pm a short note was added to the article on Notre-Dame de Paris to say that it was on fire, and thirteen minutes later a separate page had been created to document the incident. Soon that page was available in more than 40 languages. At 8:21pm, Wikipedia’s Twitter account asked people to help document the event by taking photographs as it happened.

Lots of social media users commented that Wikipedia was quick to note that the building was on fire.

On 15 and 16 April, 1.2 million people read the page about the fire, and 16 million read the pages about the cathedral across all language versions. They came to Wikipedia to find out what was going on, and what had been lost.

Elite medieval architecture was designed to convey power and inspire awe; in cathedrals it was also intended to demonstrate piety and create something eternal. Cathedrals contain effigies, burials, and monuments to people who have long since passed but who wanted their memory to live on. Over Notre-Dame’s 850-year history it survived wars and revolution and became a symbol of French identity.

The fire has also caused some reflection about the loss of heritage sites elsewhere. For some people, it has brought back memories of fires at York Minster in 1984 and Windsor Castle in 1992. Further afield, the number of people reading the English Wikipedia’s list of destroyed heritage increased 27,000%, and there were more changes to that page in one day than in the past six months put together. This wasn’t just the work of one person either, it was more than 30 people (most of whom hadn’t edited the article before) who wanted to help. The page was of course updated to include Notre-Dame, but it was far more wide ranging. Historic events were documented from a dozen countries including Argentina, Ireland, and Turkey. If there was any doubt this interest in the loss of cultural heritage was triggered by the events unfolding at Notre-Dame de Paris, there was a common theme of destruction by fire in the updated coverage on the page. The entries are a mix of accidental and deliberate loss, and each tells a story which impacts a community. With the events in Paris, that community is international in scale.

The buildings around us are impermanent, and even thick stone walls which have stood for centuries can come under threat without prior notice. In these situations, Wikipedia acts as a form of public documentation. While the news records events as they unfold and the human impact, Wikipedia is there to provide the long view, with a detailed history of the cathedral and more than 7,000 media files with which to explore its fabric.

Wikipedia itself is impermanent, though its contributors try to create something which will endure the test of time. In other parts of the world projects such as New Palmyra aim to catalogue endangered heritage, creating digital models. There is no way to recover what has been lost, and physical recreations must be handled sensitively, but it does at least help document what has been lost.

If you have images of the Notre-Dame, or any other heritage sites, please do consider uploading them to Wikimedia Commons. From there they can potentially reach an audience of millions and the whole world can benefit.

 

When members of the public read of a new astronomical discovery, learn about a unique endangered animal, fact-check claims about climate change, or educate themselves about an illness, Wikipedia is often their first stop. The encyclopedia that anyone can edit provides a wealth of information on a wide range of topic areas, but also presents many opportunities for improvement. Wikipedia articles are best in subject areas that align with the interests of its volunteer user base. When a lot of people are not just interested in something, but have relevant knowledge and access to sources, the result is a high-quality article. Some scientific and technical topics, however, have few people editing them and can be difficult for someone without specialized training. Among Featured Articles, the highest quality assessment a Wikipedia article can receive, there are clear clusters of coordinated interest around topics related to history and popular culture. There are far fewer Featured Articles on medicine or astronomy than there are on military history or sports, for example.

For years, when Wiki Education staff have attended science-related conferences, we’ve heard from scientists that they’ve noticed errors, omissions, misconceptions, and overly technical writing on this or that article in their field, but didn’t know how to fix it themselves. It’s easy to make an edit on Wikipedia, but it can be challenging to learn how to contribute content meaningfully. That’s why we run virtual professional development courses to train subject matter experts to be “Wiki Scientists”. In this 12-week course, Wiki Education staff help scientists learn how Wikipedia works and how to improve public knowledge in their fields.

I’m excited to announce the participants of our newest Communicating Science course, with 8 scientists coming from a wide range of backgrounds and disciplines.

  • Alexandra M. Courtis is a PhD candidate at the University of California, Berkeley, working in materials science, physical chemistry, and nanoscience. Through this course, she will focus on improving Wikipedia’s coverage of women in STEM as well as topics concerning nanoscience and optics.
  • Meg Eastwood is an Assistant Professor and Science and Engineering Reference Librarian at the University of Denver Libraries. She has a BA in Biology from Grinnell College and an MS in Information Studies from The University of Texas at Austin. In the years between those degrees, Meg worked as a field assistant for ecological studies and as a lab prep/RA for the Shoals Marine Lab on Appledore Island, ME. In this Wiki Education course, Meg hopes to learn more about the inner workings of Wikipedia so that she can contribute articles and host edit-a-thons that highlight women in STEM, citizen science projects, and more.
  • Eric Grunwald is a lecturer and group coordinator in the English Language Studies group at MIT, where he teaches writing, speaking, and listening to second-language undergraduates and graduate students. As an undergraduate at Stanford University, Grunwald, intending to be an engineer or a physicist, took two years of STEM subjects and worked on a project for the NASA Space Shuttle before switching to the humanities (history). His teaching interests include the writing process, creative writing (he holds an MFA in fiction), composition, and the Web 2.0, and he has been using Wikipedia in his graduate STEM writing classes for three years, allowing students to practice their academic writing with an authentic audience and fill in content gaps in the online encyclopedia. An active writer himself, Grunwald has published fiction, book reviews, and translations in journals and newspapers nationwide and is currently focusing on poetry and a play.
  • Wasiu Lawal holds a PhD in Earth and Environmental Sciences from the University of Texas at Arlington. He is an advocate for the advancement of underrepresented minorities in the sciences. He is an active volunteer with the American Chemical Society where he serves as a member of its Committee on Minority Affairs. Wasiu’s overall mission is to improve the public’s perception of science through effective science communication.
  • Maame Ekua Manful is an entrepreneur and a Food Science Graduate Student at ISA Lille, France with specialization in Food Quality Management Systems. She is passionate about using storytelling to make science simplified and attractive to the younger generations, as well as bridging the science community with the global community through knowledge sharing in the food science space. Maame seeks to achieve this as she engages Wiki Education’s course by sharing and improving articles related to food science.
  • Tyler Newton is a PhD Candidate in Earth Sciences at University of Oregon. His research focuses on understanding the mechanics and properties of Earth’s crust using computational and observational seismology. Tyler is passionate about communicating earthquake science to the public and increasing the accuracy of Wikipedia.
  • Roopesh Ojha is an astronomer working for the Fermi Gamma-ray Space Telescope. He is active in outreach because he thinks it is important for everyone to know about both basic and cutting-edge science so they can make informed decisions as the citizens of a democracy. He looks forward to learning the mechanics and ethos of Wikipedia so he can contribute his mite to this invaluable public resource.
  • Sarah Peers is current Deputy President of the International Network of Women Engineers & Scientists and a very longstanding champion of diversity in STEM. She is based near Hadrian’s Wall in the United Kingdom, and holds several degrees in mathematics and an engineering PhD. Her day-job involves advising on STEM education that fit-for-industry and supporting innovation in technology and engineering sectors. As an avid Wikipedia user, Sarah hopes through this Wikipedia course to ensure better visibility of the issues of gender and wider diversity in STEM and in industry.

Our Communicating Science on Wikipedia course will run through May. Stay tuned for updates about the great work these experts contribute to Wikipedia.


For more information about our course offerings and to sign up for updates about our next Wiki Scientists course, visit learn.wikiedu.org.

A buggy history

10:17, Tuesday, 16 2019 April UTC
—I suppose you are an entomologist?—I said with a note of interrogation.
—Not quite so ambitious as that, sir. I should like to put my eyes on the individual entitled to that name! A society may call itself an Entomological Society, but the man who arrogates such a broad title as that to himself, in the present state of science, is a pretender, sir, a dilettante, an impostor! No man can be truly called an entomologist, sir; the subject is too vast for any single human intelligence to grasp.
The Poet at the Breakfast Table (1872) by Oliver Wendell Holmes, Sr. 
 
A collection of biographies
with surprising gaps (ex. A.D. Imms)
The history of interest in Indian insects has been approached by many writers and there are several bits and pieces available in journals and there are various insights distributed across books. There are numerous ways of looking at how people historically viewed insects. One attempt is a collection of biographies, some of which are uncited verbatim (and not even within quotation marks) accounts  from obituaries, by B.R. Subba Rao who also provides something of a historical thread connecting the biographies. Keeping Indian expectations in view, Subba Rao and M.A. Husain play to the crowd. Husain was writing in pre-Independence times where there was a genuine conflict between Indian intellectuals and their colonial masters. They begin with interpretations of mentions of insects in old Indian writings. As can be expected there are mentions of honey, shellac, bees, ants, and a few nuisance insects in old texts. Husain takes the fact that the term Satpada षट्पद or six-legs existed in the 1st century Amarakosa to suggest that Indians were far ahead of time because Latreille's Hexapoda, the supposed analogy, was proposed only in 1825. Such histories gloss over the structures on which science and one can only assume that they failed to find the development of such structures in the ancient texts that they examined. The identification of species mentioned in old texts are often based on ambiguous translations should leave one wondering what the value of claiming Indian priority in identifying a few insects is. For instance K.N. Dave translates a verse from the Atharva-veda and suggests an early date for knowledge of shellac. This interpretation looks dubious and sure enough, Dave has been critiqued by Mahdihassan.  The indragopa (Indra's cowherd) is supposedly something that appears after the rains. Sanskrit scholars have identified it variously as the cochineal insect (the species Dactylopius coccus is South American!), the lac insect, a firefly(!) and as Trombidium (red velvet mite) - the last matches the blood red colour mentioned in a text attributed to Susrutha. To be fair, ambiguities resulting from translation are not limited to those that deal with Indian writing. Dikairon (Δικαιρον), supposedly a highly-valued and potent poison from India was mentioned in the work Indika by Ctesias 398 - 397 BC. One writer said it was the droppings of a bird. Valentine Ball thought it was derived from a scarab beetle. Jeffrey Lockwood claimed that it came from the rove beetles Paederus sp. And finally a Spanish scholar states that all this was a misunderstanding and that Dikairon was not a poison, and believe it or not, was a masticated mix of betel leaves, arecanut, and lime! One gets a far more reliable idea of ancient knowledge and traditions from practitioners, forest dwellers, the traditional honey harvesting tribes, and similar people that have been gathering materials such as shellac and beeswax. Unfortunately, many of these traditions and their practitioners are threatened by modern laws, economics, and culture. These practitioners are being driven out of the forests where they live, and their knowledge was hardly ever captured in writing. The writers of the ancient Sanskrit texts were probably associated with temple-towns and other semi-urban clusters and it seems like the knowledge of forest dwellers was not considered merit-worthy.

A more meaningful overview of entomology may be gained by reading and synthesizing a large number of historical bits, of which there are a growing number. The 1973 book published by the Annual Reviews Inc. should be of some interest. I have appended a selection of sources that I have found useful in adding bits and pieces to form a historic view of entomology in India. It helps however to have a broader skeleton on which to attach these bits and minutiae. Here, there area also truly verbose and terminology-filled systems developed by historians of science (for example, see ANT). I prefer an approach that is free of a jargon overload and like to look at entomology and its growth along three lines of action - cataloguing with the main product being collection of artefacts and the assignment of names, communication and vocabulary-building are social actions involving groups of interested people who work together with the products being scholarly societies and journals, and pattern-finding where hypotheses are made, and predictions tested. I like to think that anyone learning entomology also goes through these activities, often in this sequence. With professionalization there appears to be a need for people to step faster and faster into the pattern-finding way which also means that less time is spent on the other two streams of activity. The fast stepping often is achieved by having comprehensive texts, keys, identification guides and manuals. The skills involved in the production of those works - ways to prepare specimens, observe, illustrate, or describe are often not captured by the books themselves.

Cataloguing

The cataloguing phase of knowledge gathering, especially of the (larger and more conspicuous) insect species of India grew rapidly thanks to the craze for natural history cabinets of the wealthy (made socially meritorious by the idea that appreciating the works of the Creator was as good as attending church)  in Britain and Europe and their ability to tap into networks of collectors working within the colonial enterprise. The cataloguing phase can be divided into the non-scientific cabinet-of-curiosity style especially followed before Darwin and the more scientific forms. The idea that insects could be preserved by drying and kept for reference by pinning, [See Barnard 2018] the system of binomial names, the idea of designating type specimens that could be inspected by anyone describing new species, the system of priority in assigning names were some of the innovations and cultural rules created to aid cataloguing. These rules were enforced by scholarly societies, their members (which would later lead to such things as codes of nomenclature suggested by rule makers like Strickland, now dealt with by committees that oversee the  ICZN Code) and their journals. It would be wrong to assume that the cataloguing phase is purely historic and no longer needed. It is a phase that is constantly involved in the creation of new knowledge. Labels, catalogues, and referencing whether in science or librarianship are essential for all subsequent work to be discovered and are essential to science based on building on the work of others, climbing the shoulders of giants to see further. Cataloguing was probably what the physicists derided as "stamp-collecting".

Communication and vocabulary building

The other phase involves social activities, the creation of specialist language, groups, and "culture". The methods and tools adopted by specialists also helps in producing associations and the identification of boundaries that could spawn new associations. The formation of groups of people based on interests is something that ethnographers and sociologists have examined in the context of science. Textbooks, taxonomic monographs, and major syntheses also help in building community - they make it possible for new entrants to rapidly move on to joining the earlier formed groups of experts. Whereas some of the early learned societies were spawned by people with wealth and leisure, some of the later societies have had other economic forces in their support.

Like species, interest groups too specialize and split to cover more specific niches, such as those that deal with applied areas such as agriculture, medicine, veterinary science and forensics. There can also be interest in behaviour, and evolution which, though having applications, are often do not find economic support.

Pattern finding
Eleanor Ormerod, an unexpected influence
in the rise of economic entomology in India

The pattern finding phase when reached allows a field to become professional - with paid services offered by practitioners. It is the phase in which science flexes its muscle, specialists gain social status, and are able to make livelihoods out of their interest. Lefroy (1904) cites economic entomology as starting with E.C. Cotes [Cotes' career in entomology was short, after marrying the famous Canadian journalist Sara Duncan in 1889 he too moved to writing] in the Indian Museum in 1888. But he surprisingly does not mention any earlier attempts, and one finds that Edward Balfour, that encyclopaedic-surgeon of Madras collated a list of insect pests in 1887 and drew inspiration from Eleanor Ormerod who hints at the idea of getting government support, noting that it would cost very little given that she herself worked with no remuneration to provide a service for agriculture in England. Her letters were also forwarded to the Secretary of State for India and it is quite possible that Cotes' appointment was a result.

As can be imagined, economics, society, and the way science is supported - royal patronage, family, state, "free markets", crowd-sourcing, or mixes of these - impact the way an individual or a field progresses. Entomology was among the first fields of zoology that managed to gain economic value with the possibility of paid employment. David Lack, who later became an influential ornithologist, was wisely guided by his father to pursue entomology as it was the only field of zoology where jobs existed. Lack however found his apprenticeship (in Germany, 1929!) involving pinning specimens "extremely boring".

Indian reflections on the history of entomology

Kunhikannan died at the rather young age of 47
A rather interesting analysis of Indian science is made by the first native Indian entomologist to work with the official title of "entomologist" in the state of Mysore - K. Kunhikannan. Kunhikannan was deputed to pursue a Ph.D. at Stanford (for some unknown reason many of the pre-Independence Indian entomologists trained in Stanford rather than England - see postscript) through his superior Leslie Coleman. At Stanford, Kunhikannan gave a talk on Science in India. He noted in his 1923 talk :

In the field of natural sciences the Hindus did not make any progress. The classifications of animals and plants are very crude. It seems to me possible that this singular lack of interest in this branch of knowledge was due to the love of animal life. It is difficult for Westerners to realise how deep it is among Indians. The observant traveller will come across people trailing sugar as they walk along streets so that ants may have a supply, and there are priests in certain sects who veil that face while reading sacred books that they may avoid drawing in with their breath and killing any small unwary insects. [Note: Salim Ali expressed a similar view ]
He then examines science sponsored by state institutions, by universities and then by individuals. About the last he writes:
Though I deal with it last it is the first in importance. Under it has to be included all the work done by individuals who are not in Government employment or who being government servants devote their leisure hours to science. A number of missionaries come under this category. They have done considerable work mainly in the natural sciences. There are also medical men who devote their leisure hours to science. The discovery of the transmission of malaria was made not during the course of Government work. These men have not received much encouragement for research or reward for research, but they deserve the highest praise., European officials in other walks of life have made signal contributions to science. The fascinating volumes of E. H. Aitken and Douglas Dewar are the result of observations made in the field of natural history in the course of official duties. Men like these have formed themselves into an association, and a journal is published by the Bombay Natural History Association[sic], in which valuable observations are recorded from time to time. That publication has been running for over a quarter of a century, and its volumes are a mine of interesting information with regard to the natural history of India.
This then is a brief survey of the work done in India. As you will see it is very little, regard being had to the extent of the country and the size of her population. I have tried to explain why Indians' contribution is as yet so little, how education has been defective and how opportunities have been few. Men do not go after scientific research when reward is so little and facilities so few. But there are those who will say that science must be pursued for its own sake. That view is narrow and does not take into account the origin and course of scientific research. Men began to pursue science for the sake of material progress. The Arab alchemists started chemistry in the hope of discovering a method of making gold. So it has been all along and even now in the 20th century the cry is often heard that scientific research is pursued with too little regard for its immediate usefulness to man. The passion for science for its own sake has developed largely as a result of the enormous growth of each of the sciences beyond the grasp of individual minds so that a division between pure and applied science has become necessary. The charge therefore that Indians have failed to pursue science for its own sake is not justified. Science flourishes where the application of its results makes possible the advancement of the individual and the community as a whole. It requires a leisured class free from anxieties of obtaining livelihood or capable of appreciating the value of scientific work. Such a class does not exist in India. The leisured classes in India are not yet educated sufficiently to honour scientific men.
It is interesting that leisure is noted as important for scientific advance. Edward Balfour, mentioned earlier, also made a similar comment that Indians were too close to subsistence to reflect accurately on their environment!  (apparently in The Vydian and the Hakim, what do they know of medicine? (1875) which unfortunately is not available online)

Kunhikannan may be among the few Indian scientists who dabbled in cultural history, and political theorizing. He wrote two rather interesting books The West (1927) and A Civilization at Bay (1931, posthumously published) which defended Indian cultural norms while also suggesting areas for reform. While reading these works one has to remind oneself that he was working under and with Europeans and would not have been able to have many conversations on these topics with Indians. An anonymous writer who penned the memoir of his life in his posthumous work notes that he was reserved and had only a small number of people to talk to outside of his professional work.
Entomologists meeting at Pusa in 1919
Third row: C.C. Ghosh, Ram Saran, Gupta, P.V. Isaac, Y. Ramachandra Rao, Afzal Husain, Ojha, A. Haq
Second row: M. Zaharuddin, C.S. Misra, D. Naoroji, Harchand Singh, G.R. Dutt, E.S. David, K. Kunhi Kannan, Ramrao S. Kasergode, J.L.Khare, Jhaveri, V.G.Deshpande, R. Madhavan Pillai, Patel, A. Mujtaba, P.C. Sen
First row: Capt. Froilano de Mello, Robertson-Brown, S. Higginbotham, C.M. Inglis, C.F.C. Beeson, Gough, Bainbrigge Fletcher, Bentley, Senior-White, T.V. Rama Krishna Ayyar, C.M. Hutchinson, Andrews, H.L.Dutt


Entmologists meeting at Pusa in 1923
Fifth row (standing) Mukerjee, G.D.Ojha, Bashir, Torabaz Khan, D.P. Singh
Fourth row (standing) M.O.T. Iyengar, R.N. Singh, S. Sultan Ahmad, G.D. Misra, Sharma,Ahmad Mujtaba, Mohammad Shaffi
Third row (standing) Rao Sahib Y Rama Chandra Rao, D Naoroji, G.R.Dutt, Rai Bahadur C.S. Misra, SCJ Bennett (bacteriologist, Muktesar), P.V. Isaac, T.M. Timoney, Harchand Singh, S.K.Sen
Second row (seated) Mr M. Afzal Husain, Major RWG Hingston, Dr C F C Beeson, T. Bainbrigge Fletcher, P.B. Richards, J.T. Edwards, Major J.A. Sinton
First row (seated) Rai Sahib PN Das, B B Bose, Ram Saran, R.V. Pillai, M.B. Menon, V.R. Phadke (veterinary college, Bombay)

Note: As usual, these notes are spin-offs from researching and writing Wikipedia entries, in this case on several pioneering Indian entomologists. It is remarkable that even some people in high offices, such as P.V. Isaac, the last Imperial Entomologist, and grandfather of noted writer Arundhati Roy, is largely unknown (except as the near-fictional Pappachi in Roy's God of Small Things)


References
An index to entomologists who worked in India or described a significant number of species from India - with links to Wikipedia links (where possible - the gaps are huge)
(woefully incomplete - feel free to let me know of additional candidates)

Carl Linnaeus - Johan Christian Fabricius - Edward Donovan - John Gerard Koenig - John Obadiah Westwood - Frederick William Hope - George Alexander James Rothney - Thomas de Grey Walsingham - Henry John Elwes - Victor Motschulsky - Charles Swinhoe - John William Yerbury - Edward Yerbury Watson - Peter Cameron - Charles George Nurse - H.C. Tytler - Arthur Henry Eyre Mosse - W.H. Evans - Frederic Moore - John Henry Leech - Charles Augustus de Niceville - Thomas Nelson Annandale - R.C. WroughtonT.R.D. Bell - Francis Buchanan-Hamilton - James Wood-Mason - Frederic Charles Fraser  - R.W. Hingston - Auguste Forel - James Davidson - E.H. Aitken -  O.C. Ollenbach - Frank Hannyngton - Martin Ephraim Mosley - Hamilton J. Druce  - Thomas Vincent Campbell - Gilbert Edward James Nixon - Malcolm Cameron - G.F. Hampson - Martin Jacoby - W.F. Kirby - W.L. DistantC.T. Bingham - G.J. Arrow - Claude Morley - Malcolm Burr - Samarendra Maulik - Guy Marshall
 
Edward Percy Stebbing - T.B. Fletcher - Edward Ernest Green - E.C. Cotes - Harold Maxwell Lefroy - Frank Milburn Howlett - S.R. Christophers - Leslie C. Coleman - T.V. Ramakrishna Ayyar - Yelsetti Ramachandra Rao - Magadi Puttarudriah - Hem Singh Pruthi - Shyam Sunder Lal Pradhan - James Molesworth Gardner - Vakittur Prabhakar Rao - D.N. Raychoudhary - C.F.W. Muesebeck  - Mithan Lal Roonwal - Ennapada S. Narayanan - M.S. Mani - T.N. Ananthakrishnan - K. Kunhikannan - Muhammad Afzal Husain

Not included by Rao -   F.H. Gravely - P.V. Isaac - M. Afzal Husain - A.D. Imms - C.F.C. Beeson
 - C. Brooke Worth - Kumar Krishna -


PS: Thanks to Prof C.A. Viraktamath, I became aware of a new book-  Gunathilagaraj, K.; Chitra, N.; Kuttalam, S.; Ramaraju, K. (2018). Dr. T.V. Ramakrishna Ayyar: The Entomologist. Coimbatore: Tamil Nadu Agricultural University. - this suggests that TVRA went to Stanford on the suggestion of Kunhikannan.

    Tech News issue #16, 2019 (April 15, 2019)

    00:00, Monday, 15 2019 April UTC
    TriangleArrow-Left.svgprevious 2019, week 16 (Monday 15 April 2019) nextTriangleArrow-Right.svg
    Other languages:
    English • ‎español • ‎français • ‎polski • ‎português do Brasil • ‎suomi • ‎svenska • ‎čeština • ‎русский • ‎українська • ‎עברית • ‎العربية • ‎فارسی • ‎کوردی • ‎हिन्दी • ‎中文 • ‎日本語

    weeklyOSM 455

    15:33, Sunday, 14 2019 April UTC

    02/04/2019-08/04/2019

    Logo

    View of the world’s hidden infrastructure mapped in the OpenStreetMap database 1 | © OpenInfraMap | © MapTiler | Map data © OpenStreetMap contributors

    Mapping

    • In a series of tweets, Tyler Busby updates progress on using machine learning to identify rooftop solar panels in Austin, Texas. The data are used to maintain a MapRoulette challenge for adding the data to OSM.
    • Nuno Caldeira announced with a tweet the work done by OSM’s Portuguese Telegram working group. More than 4,000 buildings on the islands of Flores and Corvo, in the western group of the Azores archipelago, were mapped in 35 days.
    • Brian Prangle suggests improving the POI data quality in the United Kingdom as the project for the second quarter. First priority should be businesses that no longer exist or have changed their name. The project follows a successful first quarter where nearly 70,000 addresses were improved or confirmed with use of public Food Hygiene Rating Standard data.
    • Dan S proposes a UK-specific project to map solar electricity panels. He points to wiki pages for rooftop solar panels and large solar farms that have data that should help in completing the task.
    • JM82 is surprised (de) (automatic translation) that the coverage of mapped buildings in Austria seems to be decreasing.
    • Kleper from Colombia invites all active mappers to help complete the following task: 121. It is suggested to start with the area of Chocó, which is affected by constant migration of people from different regions and recently had to deal with a further increase of migration due to the crisis in Venezuela.
    • The intensive mapping of garage access roads in residential areas by employees of newspaper publishers is intensively discussed (de) (automatic translation) in the German forum. User boldtrn has found, in a sample, an error rate of 30 to 50 percent in the posts in his street.
    • LorenzoStucchi has drafted three proposals for the tag landcover. He suggests landcover=cultivated for areas which are cultivated by humans such as farmland or meadows, landcover=artificial for areas that are altered by construction, mining or other human activities and landcover=barren for areas without vegetation and with bare soil. The request for comments period started on 6 April 2019.
    • CapitaineMoustache has drafted a proposal for industrial=grain_storage_centre as a specification of landuse=industrial. The request for comments period is planned to start on 5 April 2019.
    • The British national mapping agency, the Ordnance Survey, tweeted a photo of surveying work starting on Tottenham Hotspur’s new stadium which opened this month. Steve Chilton points out that OSM has had it mapped in detail for a while now.

    Community

    • Colin Blackburn is intrigued by a “Clothing Optional” labelled beach in OSM on a route that he has vaguely planned to run. We hope he is not helping Mapillary during his runs.
    • In his diary, hvalentim wrote (automatic translation) some reflections on the mapping of national registered Portuguese-built heritage sites in OSM: some gaps were detected and many other suggestions made for the improvement and standardisation of tagging.

    Imports

    • Jason Owen drafted a plan to import water well positions in Harare, Zimbabwe and announced it on the import mailing list.

    Events

    • The Call for Proposals for the HOT Summit 2019, that will take place on 19-20 September 2019 directly before the State of the Map 2019 in Heidelberg, is now open.
    • Registration for SotM France, which will take place in Montpellier from 14 to 16 June 2019, is open! (fr) (automatic translation)

    Humanitarian OSM

    • HOT brings transparency to staff salaries. While it does not publish individual salaries, it has made the formula/method open. The article, on HOT’s website, explains why they wanted to be more open about the salaries, the different forms of transparency and the difficulties they faced during the process to find a new model to increase equality. The article also details how the new model eliminates inequalities and the effect it had on the payroll after the implementation.
    • This year’s nomination phase for new members at HOT US Inc. has started and lasts until 16 April 2019. As Natalie Sidibe wrote on the mailing list, HOT is allowing self-nominations this year in order to increase the diversity of membership.
    • Gaurav Thapa, from Kathmandu Living Labs, describes the collection process for gathering geospatial data on the ground in Kathmandu Valley, Nepal as part of a project to increase the understanding of exposure to natural hazards and to help minimise risks resulting from these threats.
    • Jorieke Vyncke gave a huge Thank You from Doctors Without Borders to HOT for the work and support HOT has provided after Cyclone Idai.

    Education

    • A collection of 200+ videos in English (and another 200 in German) about OpenStreetMap, FOSS4G, OSGeo, GIS, etc. is available on TIB AV-Portal.
    • Mapbox’s article “Your first steps with JOSM” helps you contribute to OpenStreetMap.

    Maps

    • Russ Garrett made some enhancements to OpenInfraMap: different aspects of infrastructure are now shown as layers, and a heat map layer of solar photovoltaic generators has been added.

    Programming

    • Heidelberg University’s GIScience Research Group released a preview of a new Openrouteservice client for the web and mobile devices.

    Releases

    Did you know …

    • … the tool comparemaps.drona.ro which allows you to compare OSM with Google Maps, Bing Maps and Nokia Maps?
    • …the App KartaGPS?
    • NotesReview, which you can use to search for OSM notes using search terms?
    • …the webpage Pic4Review? It allows mapping, using Mapillary images, and participating in various missions, such as adding accessibility for wheelchairs, adding ecopoints, or integrating missions over public toilets. You can also add your own quests!
    • … the concept of an “advanced stopping line” for cyclists? These allow bicycles to wait at traffic lights in front of other traffic. In OSM these can be tagged as cycleway=asl.

    Other “geo” things

    • Even vultures can recognise the international border between Spain and Portugal. (Or rather, the difference in legislation between both countries regarding abandonment of cattle carcasses.)
    • Many economists claim that larger cities are more productive than smaller cities, usually known as the agglomeration effect. Tom Forth, of ODI Leeds, writes in CityMetrics that this does not apply to Birmingham because of poor public transport.
    • In a tweet, Simon Küstenmacher refers to a publication by the World Economic Forum. The publication explains five “maps that reveal the world’s remaining wildernesses.”
    • Mike Blackmore posted an animation on Twitter showing how almost all rivers in Europe have “developed” or straightened over the past centuries.William du Plooy posted a similar shocking comparison: The deforestation in Mozambique.

    Upcoming Events

    Where What When Country
    Leoben Stammtisch Obersteiermark 2019-04-11 austria
    Zurich OSM Stammtisch Zurich 2019-04-11 switzerland
    Berlin 130. Berlin-Brandenburg Stammtisch 2019-04-12 germany
    Salt Lake City University of Utah Campus Mapping Party 2019-04-13 united states
    Biella Incontro mensile 2019-04-13 italia
    Cologne Bonn Airport Bonner Stammtisch 2019-04-16 germany
    Lüneburg Lüneburger Mappertreffen 2019-04-16 germany
    Reutti Stammtisch Ulmer Alb 2019-04-16 germany
    Toulouse Rencontre mensuelle 2019-04-17 france
    Karlsruhe Stammtisch 2019-04-17 germany
    Tokyo 史跡を訪ねてマッピングパーティ(蒲田、六郷) 2019-04-20 japan
    Bremen Bremer Mappertreffen 2019-04-22 germany
    Salt Lake City SLC Map Night 2019-04-23 united states
    Nottingham Nottingham pub meetup 2019-04-23 england
    Joué-lès-Tours Rencontre Mensuelle 2019-04-23 france
    Barcelona #geomobBCN 2019-04-24 spain
    Montpellier Réunion mensuelle 2019-04-24 france
    Düsseldorf Stammtisch 2019-04-24 germany
    Phone/Video Conferencing Mappy Hour US 2019-04-24 united states
    Mumble Creek OpenStreetMap Foundation public board meeting 2019-04-24 everywhere
    Lübeck Lübecker Mappertreffen 2019-04-25 germany
    Greater Vancouver area Metrotown mappy Hour 2019-04-26 canada
    Graz Grazer Linuxtage 2019 2019-04-26-2019-04-27 austria
    Resistencia Taller de edición en FLISoL2019 2019-04-27 argentina
    Rennes Recensement des parcs et jardins 2019-04-28 france
    Montpellier State of the Map France 2019 2019-06-14-2019-06-16 france
    Angra do Heroísmo Erasmus+ EuYoutH_OSM Meeting 2019-06-24-2019-06-29 portugal
    Minneapolis State of the Map US 2019 2019-09-06-2019-09-08 united states
    Edinburgh FOSS4GUK 2019 2019-09-18-2019-09-21 united kingdom
    Heidelberg Erasmus+ EuYoutH_OSM Meeting 2019-09-18-2019-09-23 germany
    Heidelberg HOT Summit 2019 2019-09-19-2019-09-20 germany
    Heidelberg State of the Map 2019 (international conference) 2019-09-21-2019-09-23 germany
    Grand-Bassam State of the Map Africa 2019 2019-11-22-2019-11-24 ivory coast

    Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please check your event in our public calendar preview and correct it, where appropriate.

    This weeklyOSM was produced by Nakaner, NunoMASAzevedo, Polyglot, Rogehm, SK53, SunCobalt, TheSwavu, YoViajo, derFred, jinalfoflia.

    The Bandwidth of Katie Bouman

    08:28, Sunday, 14 2019 April UTC
    First things first, yes, many people were involved in everything it took to make the picture of a black hole. However, the reason why it is justified that Katie Bouman is the face of this scientific novelty is because she developed the algorithms needed to distill the image from the data. To give you a clue about the magnitude of the problem she solved; the data was physically shipped on hard drives from multiple observatories. For big science, the Internet often cannot cope.

    There are eternal arguments why people are notable in Wikipedia. For a lot of that knowledge a static environment like Wikipedia is not appropriate and this environment is causing a lot of those arguments. To come back to Katie, eh every scientist, their work is collaborative and much of it is condensed into "scientific papers". One of the black hole papers is "First M87 Event Horizon Telescope Results. I. The Shadow of the Supermassive Black Hole". There are many authors to this paper not only "Katherine L. Bouman". When a major event like a first picture of a black hole is added, it is understandable that a paper like this is at first attributed to a single author..

    Wikimedia projects have to deal with the ramifications of science for many reasons. The most obvious one is that papers are used for citations. To do this properly, it is science who defines what is written and not selected papers to support an opinion. The public is invited to read these papers and the current Wikipedia narrative is in the single papers, single points of view. This makes some sense because the presentation is static. In Wikidata the papers on any given topic are continuously expanded, the same needs to be true for papers by any given author. Technically a Wikipedia could use Wikidata as the source for publications on a subject or by an author. The author could be Katie Bouman and proper presentations make it obvious that the pictures of a black hole were a group effort with Katie responsible for the algorithms.
    Thanks,
           GerardM

    Evaluating Element Timing for Images

    23:19, Saturday, 13 2019 April UTC

    In the search for a better user experience metric, we have tried out the upcoming Element Timing for Images API in Chrome.

    Background

    One of the tasks we in the performance team have been struggling with is finding better metrics that can tell us more about the user experience than the technical metrics we usually get out of browsers.

    We started out 2015 trying to find a way to know when images are displayed for the user. We tried out the latest patterns at that moment in T115600. We used our WebPageTest instance to record a video of the browser loading the Obama page, and followed state of the art technology at that moment using a User Timing mark to fire when the image was displayed.

    The results were very disappointing. The mark was at 2.0 seconds, but as you can see in the screenshot, the image was displayed at 4.8 seconds. It was off by 2.8 seconds :( We did multiple tests and we got the same result multiple times. We tried the state-of-the-art technique people where talking about and it was clearly completely wrong. This taught us the important lesson the reliability of new RUM metrics we decide to collect need to be verified in synthetic testing using a video recording of the browser.

    The next attempt to measure when images appear was when WebPageTest added support for visual element metrics (meaning analyzing a video and getting metrics for specific elements), but that only helps us with synthetic testing. We also want better metrics collected directly from our users.

    Element timings

    @Gilles has been working on enabling origin trials for Chrome for us to verify the effectiveness and usefulness of upcoming performance APIs. Recently we enabled the Trial for Element Timing for Images on Russian Wikipedia. The goal of this API is to report exactly what we had been looking for: when an image is actually displayed to the user.

    Let's verify the accuracy of this new metric and see if it works better than old approximations marked with user timings.

    Evaluating element timings

    Using Browsertime we record a video of the screen and run some extra JavaScript to collect the new metric. Then we compare the metric we get from JavaScript with the one we get from the video.

    The first large image in an article is named thumbnail-high, so we know which one to use. The following JavaScript snippet is what allows us to get the Element Timing metric just for that element:

    (function() {
        const elements = performance.getEntriesByType('element');
        for (let element of elements) {
            if (element.name === 'thumbnail-high') {
                return element.startTime;
            }
        }
    })();

    This is passed to Browsertime, which runs it after the page has loaded. Visual Elements are enabled, which analyses the video and gives us a timing corresponding to when the largest image within the viewport is displayed (which for most articles, is the thumbnail-high image).

    $ docker run --rm -v "$(pwd)":/browsertime sitespeedio/browsertime:4.6.0 --script thumbnail-high.js https://ru.wikipedia.org/wiki/Древесные_стрижи -n 11 --visualElements

    This was run on two different connectivity types and 11 times in a row. Then we keep the median for both metrics and we get the following:

    URL Connectivity Largest Image from video (ms)  Element Timing (ms)
    https://ru.wikipedia.org/wiki/Древесные_стрижи cable 1100 1097 
    https://ru.wikipedia.org/wiki/Древесные_стрижи 3g 1567 1536

    The video recording performed by browsertime is done at 30 frames per second. Which means each frame lasts 1000/30 = 33.333ms. This indicates that the differences seen between Element Timing and the video analytics are within one frame. Element Timing might very well be the more accurate one, since it's not constrained by the video recording's 30fps cadence.

    That looks really promising and very accurate, particularly compared to old workarounds. We tested a couple more URLs that you can see in T219231 and they showed the same result.

    For our content, it looks like the Element Timing API finally provides a way for us to know accurately when images are really displayed to users!

    Off the bookshelf and into the world

    18:42, Friday, 12 2019 April UTC
    Dr. Anthony Denzer.
    Image: File:Denzer headshot.jpg, Tdenzer, CC BY-SA 4.0, via Wikimedia Commons.

    Dr. Anthony Denzer is Department Head and Associate Professor of Architectural Engineering at the University of Wyoming. He taught a Wikipedia writing assignment for the first time last fall in his architectural history course. Here, he shares why he’ll do it again.

    Maybe you know that Mecca Flats, built in Chicago in 1892, is a significant lost site for African-American history, memorialized by Gwendolyn Brooks in her poem “In the Mecca.” But did you know that it was originally built as a hotel for white visitors to the World’s Columbian Exposition? Or that it was demolished by the Illinois Institute of Technology (IIT) in 1952 to build Crown Hall, itself a significant work of modern architecture? I did not know these facts until University of Wyoming student Alexandra Krosley, a Junior majoring in Art, added them to Wikipedia in November 2018.

    Or did you know that the Church of Saint-Jean-de-Montmartre (Paris, 1904) was ordered to be demolished while under construction, due to its daring design? UW student Ione Chandler, Sophomore in Architectural Engineering, added that to Wikipedia. (And, as she made clear, it was not demolished.) Or how about the fact that “Earthquake Baroque” is a style that describes 18th-century churches in the Philippines? I learned that from Cassidy Post’s contribution to Wikipedia.

    When Alexandra, Ione, and Cassidy enrolled in my History of Architecture course last fall, I imagine they had little idea about what was to come. As part of the course I charged them to contribute to Wikipedia. These were my selling points:

    • Your efforts will have a real audience. Lots of people will benefit by reading your work, not just me.  
    • Wikipedia is an amazing, good thing, but it needs to be improved, especially for architectural history.
    • You’ll develop or sharpen a number of important skills: Research, Critical thinking, Writing, Information Literacy, Collaboration.

    I’ve been contributing to Wikipedia myself since 2008. I don’t remember why or how I started, but I can see that I began by contributing to the page about the Baltimore Basilica (known to architects as the Baltimore Cathedral), because it had little about the building’s architectural significance at that time. (It could still use more!) I’ve contributed, off and on, ever since. So I began my course last fall with a good sense of the effort required in editing Wikipedia and the rewards.

    Wikipedia is a great medium for entry-level student research because of the requirement to write with a neutral point of view. I did not want these students to create their own interpretations or arguments; those are graduate-level learning objectives in architectural history in my view. I wanted them to learn to find and cite credible sources, and to exercise judgment about what’s important to architectural history that should be represented on Wikipedia.

    Was it a success? Absolutely. Of the 70 students in my class, 60 made significant contributions to existing Wikipedia articles, plus four created new articles. The examples I mentioned above are just a narrow slice of the overall contribution. When I stand back and consider the collective impact, I’m really proud. The best part is that the students’ efforts have a worldwide impact, rather than a place on my bookshelf.

    So thanks Alexandra, Ione, and Cassidy. You’re the best. And Wikipedia—the world’s best free encyclopedia—now represents architectural history a bit better. We’ll do it again next fall.


    Interested in adapting a Wikipedia writing assignment for your course? Visit teach.wikiedu.org to get started.


    Header image: File:Paoay Church, Paoay.jpg, Jose Angelo Santos, CC BY-SA 3.0, via Wikimedia Commons.

    Alternatives in education

    17:23, Thursday, 11 2019 April UTC

    With the Wikipedia Assignment, Wiki Education provides instructors with an alternative to traditional assignments. Instead of having students create one-time-use term papers, they create work that can reach thousands. Essentially, we love challenging educational paradigms in search of models that better fit the needs of students and instructors. It was fitting, therefore, when a student in a Wiki Education-supported class wrote a new Wikipedia article that covers an alternative to the traditional educational tool of note-taking.

    Pedagogical studies abound on the efficacy of note-taking, though practitioners have varied styles. While some debate the pros and cons of writing words vs. typing them, in reality, students have more than two options. A student in Nathan TeBlunthuis’s Online Communities course wrote about a different way to take notes called “sketchnoting“.

    The definition of sketchnoting, created and uploaded to Commons by the student.
    Image: File:Sketchnoting Definition.jpg, Amytangg, CC BY-SA 4.0, via Wikimedia Commons.

    With sketchnoting, students do more than copy down words. Students graphically render their thoughts and ideas. Connections between concepts become more than simply stated—they’re clearly drawn. The new article boasts sections on sketchnoting’s history, process, and elements, as well as misconceptions. It includes references to eight reliable sources, demonstrating that the student understood and met Wikipedia’s verifiability requirements. The student also went above and beyond by creating her own graphics for the article, uploading them to Wikimedia Commons for use by anyone.

    Although sketchnoting is a relatively new concept, it has already made a name for itself. The website “Cult of Pedagogy” included sketch-noting in their research roundup citing a scientific article on its benefits for memory and retention. Now this educational tool is covered in detail on Wikipedia, showing that there is more than one way for students to meet their learning goals.

    Elements of sketchnoting.
    Image: File:Sketchnoting elements.jpg, Amytangg, CC BY-SA 4.0, via Wikimedia Commons.

    If the student had submitted this paper to their instructor, only one set of eyes would have seen it, and few people besides the student would benefit from the work. With our Wikipedia assignment, however, the Wikipedia article on sketchnoting has already received over one hundred views! This student’s experience definitively demonstrates that alternatives are out there—be bold and try them.


    Interested in adapting a Wikipedia assignment to fit your course? Visit teach.wikiedu.org for all you need to know to get started.

     

     

     

     


     

    Three months ago, a German court ruled that part of a Wikipedia article—found to be defamatory in a previous court decision—had to be removed from both the article and its associated revision tracker, known as a “history” page.

    (History pages allow anyone to see how a Wikipedia article has developed since they were created, in some cases going back all the way to 2001. We have a desktop-based tutorial on using these history pages in both English and German.)

    The ruling stems from a previous lawsuit against the Foundation, originally filed in mid-2018. It asserted that a Wikipedia article’s claim about an academic professor was untrue and defamatory, even though it was backed by a citation to a reliable source.[1]

    A German court ruled in September of last year that the content was in fact defamatory, largely because the source in question had been taken offline—what we call “link rot.” German volunteers quickly removed the text in question from the article but the article’s corresponding history page retained the statements. This is a common practice on Wikimedia projects.

    Of course, Wikipedia is a massive, living project. Its articles are constantly evolving as volunteer contributors add, improve, and update content on the encyclopedia. Article histories are a useful tool for contributors, readers, and staff—even courts, on occasion— to see how an article has evolved over time. It paints a picture of how a Wikipedia article has developed, gives context to what has been added or removed (and why this was done), and promotes accountability for edits.

    In a case like this, deleting the article history may in fact inspire more curiosity, leading visitors to search for more information about the controversy. More importantly, without an article history, newcomers will have no idea what information does not belong in the article, and run the risk of repeating the defamation if new sources can be found to support it. Article histories do not claim to be truth on a topic—and, particularly when content has been deleted, can often suggest the opposite.

    Despite these compelling reasons for retaining the history of an article, the German court ruled that the article history must be deleted or the Wikimedia Foundation would be liable for the defamatory content contained therein. Because of the very short deadline from the legal proceeding—we were given less than one day to take action—the Foundation took the unusual step of oversighting[2] the history and then bringing the matter to the attention of the German oversighters. Under more normal timelines, we would have brought this sort of case to the volunteer editor community for their consideration as a new source for what should be in the article.

    How to view a Wikipedia article's "history" page on desktop. Text from the English Wikipedia's article "cat," CC BY-SA 3.0; Image montage by Alvesgaspar, CC BY-SA 3.0. Individual image credits are listed at the image source.

    This result is disappointing for us, but it is not a notable change in the law, and while it may harm the integrity of some articles by deleting important information about their history, it does not impose any new editorial standards on individual Wikipedia contributors. One of the necessary aspects of a never-finished encyclopedia is that mistakes will be made and information will need to be updated. Often, this is done quietly, quickly, and efficiently by the volunteers who tend to the projects every day. Rarely, it is not.

    It is also important to note that this decision only affects the Wikimedia Foundation’s role as a hosting provider, and does not change the level of care that individual Wikipedia contributors must engage in when adding information from sources to the encyclopedia. As always, when a source is no longer available for a particular fact or the fact can be considered insulting or slanderous, they should consider whether that portion of the text should be revised or deleted. This sort of re-evaluation of claims in an article is a typical aspect of editing Wikipedia and is reflected in Wikipedia’s rules on sources and biographies of living people.[2]

    It is our firm belief that most complaints are better resolved when the subject of an article works with the volunteer editor community to improve it rather than bringing legal action. As such, we will continue to encourage the people who make requests to us to seek community solutions wherever possible.

    We believe that the best encyclopedia is one where all the volunteers can collaborate together, understand what work has been done on an article already, and improve the quality of each article over time.

    Jacob Rogers, Senior Legal Counsel, Legal
    Allison Davenport, Technology Law and Policy Fellow, Legal
    Wikimedia Foundation

    Footnotes

    1. Note that the issue in this case is distinct from the right to be forgotten. Here, the claim was that something in the article was false in the first place, that a source had made a mistake, and that the information harmed the individual’s reputation.
    2. These links are to the policies on the English-language Wikipedia only for the convenience of readers of this blog post. While the English Wikipedia’s policies cover much the same ground as those in the German, the latter’s equivalent policies may differ in some respects. You can find those named in this post at the following links: oversight, verifiability (belege), and biographies of living persons (artikel über lebende personen).

     

    This blog post has been updated to remove a footnote that had survived the editing process despite no longer being applicable to the text it was attached to.

    Semantic MediaWiki 3.0.2 released

    09:06, Thursday, 11 2019 April UTC

    April 11, 2019

    Semantic MediaWiki 3.0.2 (SMW 3.0.2) has been released today as a new version of Semantic MediaWiki.

    It is a release providing several bug fixes. Please refer to the help pages on installing or upgrading Semantic MediaWiki to get detailed instructions on how to do this.

    Who gets to be an expert on Wikipedia?

    17:35, Wednesday, 10 2019 April UTC

    Dr. Erin Siodmak is an Adjunct Assistant Professor at the City University of New York in Women and Gender Studies and Sociology. Last fall, Dr. Siodmak learned how to help close the gender gap on Wikipedia through our online course. Here, Dr. Siodmak talks about what it means to claim the title of “expert” on Wikipedia and in the classroom.

    Dr. Erin Siodmak.
    Image: File:Erin Siodmak.jpg, Ecs222, CC BY-SA 4.0, via Wikimedia Commons.

    Some time in the spring of 2018, I read a short New York Times profile¹ about a teenager from Queens who had contributed tens of thousands of edits to Wikipedia. Most of Ryan Ng’s edits are about the New York City subway, a topic he has dedicated himself to covering. The first thought I had as I read about Mr. Ng was how impressive it was for someone so young to be such an authority – an expert – on a subject that can make a worthwhile contribution to a public knowledge repository. My next thoughts were something like, ‘I wish I had that kind of expertise,’ and ‘I can’t imagine ever being able to know enough to edit Wikipedia.’

    When I was sixteen, my parents let me go to boarding school. They had to refinance our house, but I went. Six years later, I graduated magna cum laude from New York University’s individualized studies school. And even though neither of my parents has a college degree, I earned a Ph.D. in Sociology and Gender Studies in 2017. I spent more years in school than a 19-year-old has spent on the planet, and have amassed years in higher education that equal or surpass the years of compulsory education in the United States. I have been teaching college courses for ten years. Rarely have I been intimidated by difficulty; and when I do feel intimidated by something, I don’t let it stop me from doing it. On the contrary, I seek out the difficult, the things that scare me. But editing Wikipedia? That’s something other people, with valuable knowledge, do; people who are experts in their fields, who know something about coding or HTML or web development.

    That disconnect I experienced concerns me. I am perhaps exactly the kind of person (or, rather, one kind of person) who should be editing Wikipedia – a queer, feminist, critical academic and teacher – and someone who should be confident in doing so, someone with an advanced degree and years of research and educational experience. But it’s not my lack of confidence or self-perception as a Wikipedian that is my biggest concern: if I’m not editing Wikipedia, then who is? I started asking my students this past year if they have ever edited Wikipedia or considered doing it. A handful said that they had, but ninety-nine percent (in an entirely unscientific, haphazard poll) never had and had never considered doing it.

    My background in feminist theory and critical and alternative methodology informs all of my teaching. I tell students that their experiences, ideas and questions are valid, and that they should know that all of the possible means for communicating those ideas are valid. Systematic and rigorous study is necessary (and something we have to understand to thrive in classrooms and workplaces), but so are feelings and critique of existing, accepted knowledges. I tell students that they are experts in things I know nothing about, and that they have knowledge to contribute to any discussion – I’ve just been studying the texts and theories longer. How can I teach and feel this to be true and not have realized the problem with my own access gap regarding Wikipedia’s information producers?

    Fast forward to the fall of 2018 and my introduction to working as a Wikipedia editor. I am grateful to have been a part of a Wiki Scholars course, and maybe even more indebted to the instructor training offered by Wiki Education. I’ve had many advantages in my life, but not even all of that education and privilege has always let me see myself as having authority; I owe a part of my new feeling of authority to this course. It’s exciting that I now know how to add useful content to Wikipedia in meaningful ways – copyedits, clarity revision, new information or images, correcting errors, or by adding brand new pages. But I’m most excited to share my knowledge about Wikipedia, the editing process, and how to add new content with students this fall. It’s great to have ‘experts’ like academics and scholars work to improve Wikipedia, but given that Wikipedia is a tool for spreading knowledge (and given a feminist perspective on epistemology), wouldn’t it be great for everyone to see themselves as experts? And, even better, for everyone to have the tools to share their knowledges?


    Want to learn how to leverage Wikipedia in this unique skills-development and networking online course? Visit our informational page to learn about our upcoming course beginning this June.

    The Wikimedia Foundation and the Project Grants Committee are excited to announce the newest successful grantees from the Project Grants program.

    Project Grants provide community members with funds to pursue their ideas for improving Wikimedia projects.  These grants support individuals, groups and organizations in implementing both new experiments and proven ideas.  Projects vary widely in their focus, but generally fall into four categories: online projects, offline projects, research, and software development. In addition, major themes explored in funded projects included promotion of knowledge equity, initiating and deepening open knowledge partnerships, and supporting efforts to improve the quality and breadth of structured data on Wikidata.

    What trends are we seeing?

    Knowledge equity

    Many proposals we received this round continue to support the concept of knowledge equity, a central tenet in the Wikimedia movement’s strategic direction focused on supporting communities that have been ignored or poorly resourced by structures of power and privilege. Grantees will be pursuing this goal using different skills and approaches to improve knowledge equity.  New tools will help us understand and address these disparities in our movement, such as a monitoring tool that will assess cultural gaps in Wikipedia projects in the Culture Gap Monthly Monitoring project. Another tool, Scribe, will provide guidance on structure when starting a new article and will be tailored to the needs of contributors in underserved communities.

    Some grantees will be inviting in knowledge from languages that lack visibility in our movement. The user group Wikimedians in Colombia (Wikimedistas de Colombia) plan to engage with the Wayuu and Nasa indigenous communities to better represent them in article content and work alongside them as fellow volunteers in our movement. Wiki Kouman 2019 in Côte d’Ivoire, led by contributor Modjou, will develop rich Wiktionary content on local languages, including audio recordings of words in Baoulé, Bété and Dioula. The Heritage GLAM digitization project will involve collaboration with many cultural institutions in the Punjab region of India. This includes the Municipal Library Patiala, whose holdings and works by Punjabi authors are in poor condition and are at risk of being lost forever. Finally, CEE Spring 2019 includes a strategic focus on developing minority languages content in the central and eastern Europeans countries it will engage through its competition.

    Grantees are also focused on disparities regarding how different genders are represented in Wikimedia projects. A Wikimedian-in-Residence program at the Smithsonian will specifically focus on building partnerships and programs to better represent women from the United States online using the extensive resources the institution has to offer. The #VisibleWikiWomen campaign organized by Whose Knowledge? will work to ensure that images depicting women on Wikimedia projects reflect a wider range of social origins and racial backgrounds in order to reduce prejudices about the roles women can have in society.

    Open knowledge partnerships

    Grantees are deepening or initiating partnerships with organizations and institutions aligned with open knowledge efforts, and who are committed to supporting Wikimedia projects and their communities. For instance, OCLC will support the development of a Wikipedia+Libraries training program in Mexico that meets the needs of public libraries in outreach to several organizations in country. The Wikimedian-in-Residence program with UNESCO will commit to many activities that will build a clear pathway for UNESCO and other UN agencies to share their contributions. Finally, a partnership through a Wikimedian-in-Residence project with the Lionel-Groulx Foundation will serve to run a series of events to improve historical content involving French-speakers in North America.

    Wikidata development

    Many funded projects involve tool building or other efforts to fill in gaps in structured data coverage on Wikidata. While the Wikidata & ETL project seeks to provide a tool to improve automated imports of data into the project, GlobalFactSyncRE will produce a tool for contributors that detects and displays differences across information in infoboxes to help make information on Wikidata more consistent. Continued development of the Commons Android app will allow mobile users to more directly update geolocated Wikidata items with images as well. Finally, programming at the Smithsonian through its Wikimedian-in-Residence program will include micro-crowdsourcing tasks (such as the current Wikidata games) for newer volunteers to introduce them to Wikidata and how to improve structured data generally..

    Funded projects

    Twenty applications, totalling a bit less than $700,000 USD, were funded in this round.  We received 42 proposals for review, the largest number of proposals we have received in a single round. Projects vary widely in their focus, but are usually focused on a particular kind of activity: online programs, offline programs, research, or software development.

    Here is what we funded this round:

    Software: five projects

    • Commons Android app v3: This third version of the Commons mobile uploader app aims to increase app stability, improve a recommendations feature for nearby places, maintain a limited connectivity mode, and provide better outreach to underrepresented communities.
    • Scribe: Scribe is an editing tool to support underserved Wikipedia editors, helping them to plan the structure of their new articles and to find references in their language, specifically when an existing article in another language is not available for reference or translation. Contributors Frimelle and Hadyelsahar will be developing this tool to address significant content disparities in Wikipedia projects that reflect a large number of speakers and have a relatively smaller number of active contributors.
    • Culture Gap Monthly Monitoring: The Wikipedia Cultural Diversity Observatory (WCDO) will be engaging in several activities in research, tools building, and community engagement to regularly assist communities and individual editors to increase the cultural diversity in their language editions’ content.
    • GlobalFactSyncRE: DBPedia will be developing GlobalFactSyncRE, a tool that will extract all infobox facts and their references to produce a tool for Wikipedia editors that detects and displays differences across infobox facts in an intelligent way to help sync infoboxes between languages and Wikidata. The extracted references will also be used to improve Wikidata items.
    • Wikidata & ETL: This project aims at improving management and increasing automation of processes loading data into Wikidata, and proposes a tool as a platform for creation of repeatable processes for bulk loading data into Wikidata and other Wikibase instances from various data sources.

     
    Online programs: four projects

    • Wiki Loves Monuments 2019 coordination: The international coordination team for Wiki Loves Monuments (WLM) proposes to strengthen the foundation for healthy and sustainable WLM competitions across the world. In this grant, the team will focus considerable effort on increasing the sustainability of these events by addressing known issues around developing best practices, resourcing local teams to run successful events, and adapting to mobile engagement.
    • CEE Spring 2019: This annual international article writing contest generates content from every country and region in Central and Eastern Europe on 30+ Wikipedias.  CEE Spring’s remarkable community spirit plays a central role in fostering a thriving, collaborative volunteer base in the region. The grant will continue incentivizing content creation focused on similar themes from last year, including closing the gender gap, and expanding minority language Wikipedias, and showcasing the cultural heritage of Central and Eastern Europe.
    • WM HU/Editor retention program: This grant will fund the editor retention program in the Hungarian Wikipedia. The project helps the Hungarian Wikipedia community in decreasing the negative experiences and strengthening the positive experiences of the contributors; improving the community atmosphere and strengthening the community cohesion, the Wikipedia identity, the sense of mission and pride in Wikipedia.
    • VisibleWikiWomen2019: Whose Knowledge?, in partnership with Wikimedians and women’s and feminist organizations around the world, is organizing a campaign to add more diverse and quality images of women to Commons and Wikipedia throughout March 2019 to celebrate International Women’s Month. This year, the organization plans to take what they have learned from 2018 #VisibleWikiWomen and grow the campaign, creating more materials and connections that will be useful for this year’s campaign and many more years to come.

     
    Offline programs: ten projects

    • Smithsonian Wikimedian-in-Residence for Gender Representation: This project will establish a Wikimedian-in-Residence for the Smithsonian Women’s History Initiative, and increase the representation of women on Wikimedia projects, and seek ongoing support for a permanent Wikimedian-in-Residence at the institution.
    • Action Plan for Wikipedia + Libraries Training in Mexico: OCLC will investigate the viability of and approach to a Wikipedia+Libraries training program for library staff in Mexico, to leverage the libraries in support of the Wikimedia Foundation’s New Readers initiative. This project will identify a Mexico-based organization that would lead the training, develop an advisory group, and produce an action plan for how to design and deliver the training.
    • Offline Wikipedia in Senegal Schools: This project plans to address the absence of an easy, rapid and reliable way to access the wealth of information contained in Wikipedia. Contributors GastelEtzwane and Mouha.ibs will develop a train-the-trainers project to address this problem: trainers will attend a seminar teaching them about offline Wikipedia using Kiwix. Trainers will then go out to remote schools in Senegal and train teachers on the use of Kiwix in the classroom as a part of a certification process.
    • Heritage GLAM:  This project will focus on growing successful GLAM partnerships with government institutes, capacity building and documenting rare archives, and making visible books and artwork of historical and cultural importance to North India that lack an online presence.
    • Wikimedian in Residence at UNESCO 2019–2020: Past engagement with UNESCO has demonstrated that working with Wikipedia to share knowledge allows UNESCO to reach a far wider public with detailed information that traditional report publication. The applicants have prepared a roadmap to mass adoption of open licensing and sharing of content on Wikimedia projects across the UN. By the end of this grant in early 2020, the project will result in policies, documentation and processes in place to share knowledge from across the UN on Wikimedia projects.
    • Wiki Loves Africa 2019: Funding for this project will support prize distribution efforts associated with local contests supported by Wiki Loves Africa 2019, a successful annual public contest where communities across Africa can contribute media (photographs, video and audio) about their environment to Wikimedia Commons for use on Wikipedia and other Wikimedia projects.
    • Wiki Kouman 2019 in Côte d’Ivoire (Local language in Côte d’Ivoire): Contributors Modjou and Mognissan will increase the visibility and content of Côte d’Ivoire’s local languages through Wiktionary through language research, audio recording, and community-supported content development. WikiKouman derives its etymology from Dioula, one of the local languages, in which “Kouman” means “to speak.”
    • History of Quebec and French-speaking North America: This Wikimedian-in-Residence project led by contributor MathieuGP with the Lionel-Groulx Foundation aims to build and support the community of individuals and organizations interested in developing and improving wikimedia contents pertaining to the history of Quebec and French-speaking North America.
    • Wikipedia Women and Ancestral Knowledge in the Colombian Context: In collaboration with the Center for Internet and Society at Del Rosario University (Centro de Internet y Sociedad Universidad, or ISUR), this project is aimed at supporting the Wayuu and Nasa indigenous peoples in Colombia to participate on Wikipedia and enhance the participation of women through promotion of digital skills and article development on multiple Wikipedia projects to represent their knowledge.
    • Editathons in Pistoia District: The project centers on an edit-a-thon series in the Pistoia district of Italy with the support of local and historical experts. This project will allow community members to get a complete deep-level coverage and to develop a model for localized work for other regions in Italy.

     
    Research: one project

    • Machine Learning to Predict Wikimedia User Blocks: This project will involve investigating user misconduct on English Wikipedia using machine learning techniques to better understand what circumstances lead to user blocks, and how blocking can be done more effectively.

     

    Who reviewed these proposals?

    Seventeen Wikimedians volunteered their time on the Project Grants Committee to collectively review the proposals received in a given round.  The committee members come from at least 13 different wikis and collectively speak at least 13 different languages, and have backgrounds in even more. As Wikimedians, their backgrounds vary widely:  they are editors, reviewers, content translators, leaders of local chapters, software writers, off-wiki event organizers, workshop facilitators, sysops and bureaucrats, copyright and licensing permission experts, policy advisors, and more. Some members also serve as advisors to new grantees, helping to answer questions, connect them to relevant resources, and comment on monthly and midpoint reports. These volunteers play a critical role throughout the process, including making decisions about how to best use the movement’s money to achieve impact.

    Chris Schilling, Program Officer, Community Resources, Community Engagement
    Wikimedia Foundation

    The first Wikimedia + Education conference

    13:27, Wednesday, 10 2019 April UTC
    Participants in the Wikimedia Education conference – image by Jon Urbe-Foku CC BY-SA 4.0

    By Jason Evans, National Wikimedian for Wales.

    In April 2019 the Basque Wikimedians User Group hosted the first Wikimedia + Education conference in Donostia. Using Wikipedia, and other Wikimedia projects in education is nothing new. There is a vibrant and well established community already engaged in a diverse range of projects from Wiki Clubs in primary schools to accredited Wikipedia based modules in universities. It’s hard to believe then that this is the first official, global, gathering of Wikimedians and educators involved in this work.

    In its early days Wikipedia was shunned by educators – pooh-poohed as inconsistent and unreliable. But Wikipedia has long established itself as the go-to for information for millions of people in hundreds of languages, especially our young digital natives. In Wales for example, each time we ask school children, between 80-90% say they often use Wikipedia to help with their school work.

    Gradually then, educators are beginning to realise that rather than ignore the gigantic free encyclopaedia in the room, they, with their students, can actually benefit from contributing to Wikipedia. This allows them to teach a whole raft of skills, such as research, digital literacy, collaboration and critical thinking. Contributing to Wikipedia also allows young people to feel like they are contributing something to society. Wikipedia gives their school work real world value. Rather than writing as essay, which is simply marked and filled away in a drawer, students can make a lasting contribution to collective knowledge in their language, which is accessible by anyone, anywhere in the world.

    One thing that struck me at this conference was the diversity of participants. Education in Wikipedia is definitely not an English or even Western Centric concept, and often Education project facilitators, be they Wikimedia chapters and groups, universities, cultural institutions or even local governments, are motivated by the desire to increase content in a given language, and to increase the use of that language in the classroom. Activities in the Basque Country and Catalonia, areas keen to protect and promote their own unique language and cultural identity, are good examples.

    There was a wide range of activities presented at the conference – With Robin Owain flying the flag for Wales with a presentation on progress at home. He also presented a fantastic video produced by Aaron Morris, Wikipedian in Residence with Menter Mon, highlighting his recent work with Welsh primary schools.

    University level education activities were well represented at the conference with Wikipedia based assignments proving increasingly popular in universities around the World. In the UK, Edinburgh University has lead the way with this work and in Ireland Maynooth University has found Wikipedia contribution is hugely popular amongst students. The practice is also well establish in North America and many European countries.

    Some Universities, drive participation with the help of a Wikipedian in Residence, or through training librarians. In Serbia universities have appointed Wiki ambassadors and the Catalans have a group of dedicated volunteers who coordinate projects within universities.

    In Wales, higher education have been slow to embrace Wikipedia as a teaching tool. Individual lecturers at Swansea and Aberystwyth Universities have begun to explore the possibilities but there is definitely great potential for more engagement. Where we have had increasing success in Wales is in Secondary schools, thanks to the work of Aaron Morris. A number of schools are now teaching students digital competencies and Welsh language skills through Wikipedia editing as part of the Welsh Baccalaureate. Schools have even started forming Wiki Clubs and Primary Schools are also teaching their children about Wikipedia.

    Presentations from educators in Argentina, Armenia, France, Catalonia, the Basque Country and others show that Wales is not alone in engaging younger children with Wikipedia editing. Katherine Maher, in her Keynote, pointed out that many of the movements most valuable contributors today began editing when they were thirteen or even younger. I tweeted her quote and had replies from editors who were as young as 8 years old when they made their first Wikipedia edit.

    For many teachers, Wikipedia is merely a vehicle for effectively teaching a range of skills, which they would need to teach regardless. But for the Wikimedia movement and those in local governments with a mandate for supporting the growth of a language or culture, teaching Wikipedia can be seen as a long term investment in young people – instilling the notion that they can play an active role in the future of their language by contributing information rather than simply consuming it. This also helps build up the digital presence of a language which is essential for further investment in online infrastructure, by the likes of Microsoft and Google.

    Armenian Wiki-clubs have been hugely successful with more than 30 clubs active around the country. Each club has its own trained coordinators and children contribute content they consider interesting, such as cartoons, films and music. Clubs also allow children to contribute through simple tasks to Wiktionary, Commons and other Wikimedia projects – which is a great way of lowering the barriers to entry. Club coordinators are responsible for checking the quality of all contributions of students. With only a small community of editors on the Welsh Wikipedia, the ability to manage and correct large amounts of new content from younger people would definitely need consideration here and the training of new trainers, be they teachers or community leaders and the production of more documentation and guidelines would be essential in replicating any such project at scale.

    Participants at the Wikimedia Education Conference 2019 – image by Maialen Andres-Foku CC BY-SA 4.0

    LiAnna Davis of the Wiki Education Foundation raised another issue which deserves consideration. Increasingly our young people are consuming knowledge through video rather than through reading. This might not be a bad thing, but it’s not something Wikipedia is very good at, especially in smaller languages. Should we be considering the creation of open video content as part of educational projects? At the very least this would complement and add value to a Wikipedia article, so i think it’s definitely something to consider.

    From a Welsh perspective the approach of the Basque community is probably the most inspiring and the most relevant to our ambitions for the Welsh language Wikipedia. There are actually at least 10 active education programmes in the Basque Country. Some are small, but valuable programmes such as the work by Mondragon University to rewrite the lead to Wikipedia articles related to citizenship based on perceptions of school children – an exercise they call ‘Politics through Participation’. There is the Txikipedia, children’s Wikipedia project – the only children’s Wikipedia in the world to sit within a languages main Wikipedia, as well as an interesting community project aimed at encouraging locals to write about their local area. However it’s the work of creating content in universities and secondary schools which aligns best with our ambition to raise the standard of the Welsh Wikipedia for all, including the high percentage of young people who use it to find information for school work.

    University students in the Basque country editing Wikipedia – image by Xabier Cañas CC BY-SA 4.0

    Basque Wikimedians took the school syllabus for primary and secondary schools and, with the help of subject specialists, used it build a list of over 1800 articles which were vital for children’s education. They then partnered with Basque language universities to develop a program for students to create content relevant to secondary school pupils, and they partnered with secondary schools to write content relevant to primary schools.

    In Wales the government has a long term strategy to grow the number of Welsh speakers to 1 million by 2050, and Welsh Wikipedia (along with Wikidata) is now officially recognised as an important part of that strategy. Implementing a strategy similar to the Basque Country would help the government achieve targets around digital competencies and the Welsh language in schools, whilst at the same time educating children about the topics being added to Wikipedia, and the output of this work becomes part of the open knowledge ecosystem in the Welsh language, where all Welsh speakers stand to benefit.

    As we look to build on recent success in Wales, this conference has provided valuable insight into the incredible work already happening all around the world, from the perspective of educators and Wikipedians.

    See more images from the Wikimedia Education conference 2019 on Wikimedia Commons.

    @Wikidata is no relational #database

    10:18, Tuesday, 09 2019 April UTC
    When you consider the functionality of Wikidata, it is important to appreciate it is not a relational database. As a consequence there is no implicit way to enforce restrictions. Emulating relational restrictions fail because it is not possible to check in real time what it is that is to be restricted.

    An example: in a process new items are created when there is no item available with an external identifier. Query indicates that there is no item in existence and a new item is created. A few moments later the existence of an item with the same external identifier is checked using query. Because of the time lag that exists, what is known to be in the database and what actually is in the database differs and query indicates there is no item and a new but duplicate item is created.

    Implications are important.

    Wikidata is a wiki. The implications are quite different. In a wiki things need not be perfect, and the restrictions of a relational model are in essence recommendations only. In such a model duplicate items as described above are not a real problem, batch jobs may merge these items when they occur often enough. Processes may use arrays knowing the items it created earlier and thereby minimising the issue.

    Important is that we do not blame people for what Wikidata is not and accept its limitations. Functionality like SourceMD enable what Wikidata may become; a link to all knowledge. Never mind if it is knowledge in Wikipedia articles, scholarly articles or in sources used to prove whatever point.
    Thanks,
          GerardM

    This Month in GLAM: March 2019

    20:54, Monday, 08 2019 April UTC
    • Albania report: WikiFilmat SQ – new articles about the Albanian movie industry!
    • Armenia report: Art+Feminism+GLAM, Collaboration with Hovhannes Toumanian museum
    • Australia report: Art+Feminism 2019 in Australia
    • Brazil report: The GLAM at USP Museum of Veterinary Anatomy: a history of learnings and improvements
    • Colombia report: Moving GLAM institutions inside and outside Colombia
    • Czech Republic report: Edit-a-thon Prachatice
    • France report: Wiki day at the Institut national d’histoire de l’art; Age of wiki at the Musée Saint-Raymond
    • India report: Gujarat Vishw Kosh Trust content donation to Wikimedia
    • Italy report: Italian librarians in Milan
    • Macedonia report: WikiLeague: Edit-a-thon on German Literature
    • Netherlands report: WikiconNL, International Womens Day and working together with Amnesty, Field study Dutch Libraries and Wikimedia
    • Serbia report: Spring residences and a wiki competition
    • Sweden report: UNESCO; Working life museums; Swedish Performing Arts Agency shares historic music; Upload of glass plates photographs
    • UK report: Wiki-people and Wiki-museum-data
    • USA report: Women’s History Month and The Met has two Wikimedians in the house
    • Wikidata report: Go Siobhan!
    • WMF GLAM report: Structured Data on Wikimedia Commons; Bengali Wikisource case study
    • Calendar: April’s GLAM events

    The Historian’s Craft

    18:44, Monday, 08 2019 April UTC

    One of the first things that a budding historian learns is the value of good research skills. The second is how to take those results and pull them together into a comprehensive and readable work that can be shared with others. It is no coincidence that these are two skills that Wikipedia volunteers also quickly discover are of invaluable worth when creating or improving a Wikipedia article. As such, it should be of no surprise that Oregon State University instructor Dr. Stacey Smith chose to have her students in her course practice their research and writing skills by contributing content to Wikipedia during the fall of 2018, where their work on African American abolitionists can be read by the entire world. Their work resulted in the creation of multiple new articles on people who lacked articles and the improvement of several that already existed on Wikipedia.

    One of the new articles is about William Lambert, a prominent African-American citizen and abolitionist in Detroit, Michigan during the mid to late 19th century. He was born free and was educated by a Quaker schoolmaster, who not only gave him an excellent education but also introduced Lambert to the abolitionist movement. In his twenties Lambert was living in Detroit and working in a tailor shop. It was here that he met George DeBaptiste, with whom he would work and collaborate with on abolitionist matters and on the Underground Railroad. Lambert is perhaps most well known for assisting the fugitive slave Robert Cromwell, who escaped his owner John Dun and fled to Canada, where he could live in freedom. Lambert was responsible for exerting his influence and placing Dun in jail, giving Cromwell the ability to successfully reach Canada. This wasn’t without repercussion, as these actions influenced politicians to pass the Fugitive Slave Act in 1850, which greatly reduced the ability for slaves to escape the cruelty of slavery.

    Louisa Matilda Jacobs, public domain via Wikimedia Commons. Photo uploaded by a student at Oregon State University.

    Another Wikipedia article that students created was about Louisa Matilda Jacobs, an African American abolitionist and civil rights activist and the daughter of famed fugitive slave and author, Harriet Ann Jacobs. Her mother was a mistress to congressman and newspaper editor Samuel Tredwell Sawyer, Louisa’s father. Harriet was the slave of Dr. James Norcom, who tried to force her into a sexual relationship by threatening her children. She fled, expecting that Norcom would sell her children. This expectation was correct as Sawyer purchased the children and helped them make their way to safety and freedom. Jacobs was eventually reunited with her mother and the two fled to Boston, where she was educated at home until her father paid for her to attend a seminary school in New York. She returned to Boston, where she received training to become a teacher. With her mother, Jacobs founded Jacobs Free School, a Freedmen’s School in Alexandria, Virginia. In 1866 she opened a second one in Georgia called the Lincoln School. She was also active in the activism movement and spoke about women’s suffrage on an American Equal Rights Association lecture tour alongside Susan B. Anthony and Charles Lenox Remond. She also worked as a matron of the National Home for the Relief of Destitute Colored Women and Children and at Howard University.


    Interested in adapting a Wikipedia writing assignment to fit your course? Visit teach.wikiedu.org for all you need to know to get started.

    Tech News issue #15, 2019 (April 8, 2019)

    00:00, Monday, 08 2019 April UTC
    TriangleArrow-Left.svgprevious 2019, week 15 (Monday 08 April 2019) nextTriangleArrow-Right.svg
    Other languages:
    Bahasa Indonesia • ‎English • ‎español • ‎français • ‎polski • ‎português do Brasil • ‎suomi • ‎svenska • ‎čeština • ‎русский • ‎українська • ‎עברית • ‎العربية • ‎فارسی • ‎کوردی • ‎हिन्दी • ‎中文 • ‎日本語

    The European Union (EU) Commission’s proposal for a Regulation on preventing the dissemination of terrorist content online runs the risk of repeating many of the mistakes written into the copyright directive, envisioning technological solutions to a complex problem that could bring significant damage to user rights. The proposal includes a number of prescriptive rules that will create frameworks for censorship and potentially harm important documentation about terrorism online. It would further enshrine the rule and power of private entities over people’s right to discuss their ideas.

    However, there are still ways to shape this proposal to further its objectives and promote accountability. The report on the proposal will be up for a vote in the Civil Liberties and Justice Committee (LIBE) in the European Parliament on 8 April, and Wikimedia urges the committee to consider the following advice:

    1. Stop treating the internet like one giant, private social media platform

    According to the draft, any platform that hosts third party content—from social media to Wikimedia projects like Wikipedia and potentially to services hosting private files—needs to describe how it deals with content that may be related to terrorism in its own terms of service. While people can talk about terrorism in different forms and for different purposes, such as for research, awareness raising, and news reporting, the regulation would force platforms to decide what is and what is not an acceptable way to have these conversations.

    Yet, the law should not mirror websites’ approach to curbing illegal content by applying their terms of service because this would remove incentives to do better. The proposed regulation would oblige all platforms to act in a similar manner, regardless of the content they host or their operational model. That includes Wikipedia, where a rigorous set of top-down policies may interfere with its robust and effective system of transparent community dispute resolution over content.

    Instead, legislators should clearly define illegal terrorist content and leave hosting service providers little room for interpretation.

    2. Let courts decide, not machines

    Similarly to the new copyright directive, the regulation envisions the use of automated tools to proactively detect, identify and disable access to terrorist content. Deciding what is and what is not expression that condones terrorism is a complicated matter and context is crucial in deciding whether content is illegal under anti-terrorist laws. Such decisions need to be made by courts, not by algorithms which may or may not be subject to human oversight.

    Where law enforcement relies on code, the code becomes the law. That goes against how our free knowledge projects operate, with vibrant and open deliberation on what should have its place on Wikipedia, and what shouldn’t. Platforms’ content moderation should build on a proper framework that involves well-prepared people, not only machines.

    3. Do not overturn the principles of free expression

    Freedom of expression is a right that can only be exercised by the practice of expressing one’s thoughts, ideas, or opinion. Boundaries are only applied when that expression is deemed unacceptable. Content filtering works on exactly the opposite premises—prematurely stifling expression before it has a chance to be heard and assessed.

    Any reference to measures that may lead to proactive content filtering should be removed from the proposal. Upload filters overturn jurisprudence and legal practices in all jurisdictions that recognize freedom of expression as a human right. They operate in secrecy and their decisions are shrouded in trade secrets of companies running them. Relying on these technologies may stop some of the communication we don’t want, but it is not worth the price of undermining the foundation of free expression.

    4. Do not force websites to remove legal content

    The proposal envisions that, in addition to content removal orders, the competent authority can issue a referral to request a company check whether content violates their terms of service. Platforms will face penalties if they do not speedily address these referrals, which creates a strong incentive to act non-transparently and remove content that may in fact be legal.

    The measure should be removed from the proposal. Instead, authorities tasked with tackling terrorist content should be required to focus on cases where the terrorist context is evident and issue an order to remove the piece of content in question. Lawmakers need to leave room for the less evident cases to be discussed as acceptable freedom of expression.

    The LIBE Committee will vote on a few good changes that have been proposed, most notably the removal of proactive measures and referrals. Providing an exclusion for content disseminated for educational, artistic, journalistic or research purposes is a good idea. However, if the dissemination of terrorist content does not need to be intentional to be removed (as in calling for aiding and abetting terrorist activities), a lot of important information may still get caught up in a surge of removals. We hope that the committee responsible for ensuring respect for civil liberties in EU legislation, will rise to the occasion. We will continue to monitor the legislative process for this regulation and remain committed to defending and promoting free knowledge.

    Anna Mazgal, EU Policy Adviser, Wikimedia Germany (Deutschland)
    Jan Gerlach, Senior Public Policy Manager, Wikimedia Foundation

    One of the key mechanisms that allows Wikipedia to maintain its high quality is the use of inline citations. Through citations, readers and editors make sure that information in an article accurately reflects its source. As Wikipedia’s verifiability policy mandates, “material challenged or likely to be challenged, and all quotations, must be attributed to a reliable, published source”, and unsourced material should be removed or challenged with a citation needed flag.

    However, deciding which sentences need citations may not be a trivial task. On the one hand, editors are urged to avoid adding citations for information that is obvious or common knowledge—like the fact that the sky is blue. On the other hand, sometimes the sky doesn’t actually appear blue—so perhaps we need a citation for that after all?

    Scale up this problem to the size of an entire encyclopedia, and it may become intractable. Wikipedia editors’ time is limited and their expertise is valuable—which kinds of facts, articles, and topics should they focus their citation efforts on?  Also, recent estimates show that a substantial proportion of articles have only a few references, and that one out of four articles in English Wikipedia does not have any references at all. This suggests that while around 350,000 articles contain one or more citation needed flags, we are probably missing many more.

    We recently designed a framework to help editors identify and prioritize which sentences need citations in Wikipedia. Through a large study that we conducted with editors from English, Italian and French Wikipedia, we first identified a set of common reasons why individual sentences in Wikipedia articles require citations. We then used the results of this study to train a machine learning model classifier that can predict whether or not any given sentence needs a citation —and why—on the English Wikipedia. It will be deployed in the next 3 months to other language editions.

    By improving the identification of where Wikipedia gets its information from, we can support the development of systems to help volunteer-driven verification and fact-checking, potentially increasing Wikipedia’s long-term reliability and making it more robust against biases, information quality gaps and coordinated disinformation campaigns

    Why do we cite?

    To teach machines how to recognize unverified statements, we first needed to systematically classify the reasons why sentences need citations.

    We started by examining policies and guidelines related to verifiability in the English, French, and Italian Wikipedias and attempted to characterize the criteria for adding (or not adding) a citation described in those policies. To verify and enrich this set of best practices, we asked 36 Wikipedia editors from all three language communities to participate in a pilot experiment. Using WikiLabels, we collected editors’ feedback on sentences from Wikipedia articles: editors were asked to decide whether a sentence needed a citation and to specify a reason for their choices in a free-text form.

    Our methods and our final set of reasons for adding or not adding a citation can be found on our project page.

    Reasons for adding a citation.
    Reasons for not adding a citation.

    Teaching a machine to discover citation gaps.

    Next, we trained a machine learning model to discover sentences needing citations, and characterize them with a matching reason.

    We first trained a model to learn from the wisdom of the whole editor community how to identify sentences that need to be cited. We created a dataset of English Wikipedia’s “featured” articles, the encyclopedia’s designation for articles that are of the highest quality—and also the most well-sourced with citations. Sentences from featured articles that contain an inline citation are considered as positives, and sentences without an inline citation are considered as negatives. With this data, we trained a Recurrent Neural Network that can predict whether the sentence is positive,  (should have a citation), or negative (should not have a citation) based on the sequence of words in the sentence. The resulting model can correctly classify sentences in need of citation with an accuracy of up to 90%.

    Explaining algorithmic predictions

    But why is the model up to 90% accurate? What is the algorithm looking at when deciding whether a sentence needs a citation?

    To help interpret these results, we took a sample of sentences needing citations for different reasons, and highlighted words the model considered the most when it classified the sentences. In the case of “opinion” statements, for example, the model assigned the highest weight to the word “claimed”. In the “statistics” citation reason, the most important words to the model are verbs that are often used in reporting numbers. In the case of scientific citation reasons, the model pays more attention to domain-specific words like “quantum”.

    Examples of sentences that need citations according to our model, with key words highlighted.

    Predicting why a sentence needs a citation

    Similar to the “reason” field of the [citation needed] tag, we want our model to also provide full explanations of citation reasons. Therefore we created a model that can classify statements needing citations with a reason. We first designed a crowdsourcing experiment using Amazon Mechanical Turk to collect labels about citation reasons. We randomly sampled 4,000 sentences that contain citations from Featured articles, and asked crowdworkers to label them with one of the eight  citation reason categories we identified in our previous study. We found that sentences more likely need citations when they are related to scientific or historical facts, or when they reflect direct/indirect quotations.

    We modified the neural network designed in the previous study, so that it can classify an unsourced sentence into one of the 8 citation reason categories. We retrained this network using the crowdsourced labeled data, and found that it provides reasonable accuracy (precision at 0.62) in predicting citation reasons, especially for classes with a substantial amount of training data.

    Next steps: predicting “citation need” across languages and topics

    The next phase of this project will involve modifying our models so that they can be trained for any language available in Wikipedia. We will use these multilingual models to quantify the proportion of unverified content across Wikipedia editions, and map citation coverage across different article topics, in order to help editors identify areas where adding high quality citations is particularly important.

    We plan to make the source code of these new models available  soon. In the meantime, you can check out the research paper, recently accepted at The Web Conference 2019, its supplementary material with detailed analysis of the citation policies, and all the data we used to train the models.

    We would love to hear your feedback and comments, so please reach out to us on our project page to help us improve it.

    Miriam Redi, Research Scientist, Wikimedia Foundation
    Jonathan Morgan, Senior Design Researcher, Wikimedia Foundation
    Dario Taraborelli, former Director of Research, Wikimedia Foundation
    Besnik Fetahu, Post-doctoral Scientist, L3S Lab Hannover

    The authors would like to thank the community members of the English, French, and Italian Wikipedias, along with workers from Amazon Mechanical Turk, for helping with data labeling and for their precious suggestions.

    It’s been quite a long time (four and a half years in fact) since I looked at the state of the African language Wiktionaries. For those new to Wiktionary, the idea is that it will describe all words of all languages using definitions and descriptions in the particular language edition. An ambitious task!

    So, how are the projects progressing?

    A portion of the Octateuch in Ethiopian

    African Language Wiktionaries

    Language 30/5/2010 15/5/2011 29/10/2014 22/3/2019 % +
    Malagasy 4,253 3,599,084 5,482,632 52.33%
    Afrikaans 14,669 14,731 15,794 20,831 31.89%
    Swahili 13,000 13,027 13,903 14,029 0.91%
    Wolof 2,689 2,693 2,310 2,312 0.09%
    Somali 1,635
    Sotho 1,389 1,398 1,343 1,343 0.00%
    Lingala 673
    Zulu 131 510 586 599 2.22%
    Igbo (incubator) 375
    Kinyarwanda 306 306 367 366
    Tsonga 359 363 92 359 290.22%
    Oromo 218 264 322 335 4.04%
    Swati 371 377 290 292 0.69%
    Amharic 319 377 206 217 5.34%
    Egyptian Arabic (incubator) 195

    In short, although it’s been so long since the last update, there’s not much to show. The only project to more than double its articles in four and a half years is Tsonga, off a minute base. Malagasy has always had a huge amount of bot activity, and is still growing from a large base, and Afrikaans shows some signs of life. But overall, the state of the African language Wiktionaries can be described as dormant.

    Perhaps the African language Wikipedias will fare better?

    African Language Wikipedias > 1000 articles

    Language 26/6/2015 5/9/2017 30/6/2018 2/4/2019 % +
    Malagasy 79,329 84,634 84,996 91,528 7.68%
    Afrikaans 35,856 46,824 50,275 76,965 53.11%
    Swahili 29,127 37,443 42,773 49,555 15.86%
    Yoruba 31,068 31,577 31,672 31,867 0.62%
    Egyptian Arabic 14,192 17,138 18,605 20,405 9.67%
    Amharic 12,950 13,789 14,286 14,558 1.90%
    Northern Sotho 1,000 7,823 8,050 8,018 -0.40%
    Somali 3,446 4,727 4,898 5,456 11.39%
    Shona 2,321 2,851 3,630 4,278 17.85%
    Hausa 1,345 1,525 1,856 3,494 88.25%
    Lingala 2,062 2,915 3,023 3,113 2.98%
    Kabyle 2,296 2,887 2,844 2,986 4.99%
    Kinyarwanda 1,780 1,810 1,823 1,821 -0.11%
    Kikuyu 1,349 1,357 1,358 0.07%
    Igbo 1,019 1,384 1,320 1,392 5.45%
    Kongo 1,176 1,179 1,193 1.19%
    Wolof 1,023 1,157 1,166 1,184 1.54%
    Luganda 1,153 1,162 1,169 0.60%
    Zulu 683 942 959 1,067 11.26%
    Language 26/6/2015 5/9/2017 30/6/2018 2/4/2019 % +

    The Zulu Wikipedia is the latest addition to the 1000 club, having reached this milestone just before Wikimania last year, and progress has been steady since then.

    At first glance, Hausa looks like it’s in great shape, with an 88% increase in the number of articles. But this is misleading, as many of these are one line articles on football players, the entirety of which translates as, for example, “Kenny Allen (footballer) is an English football player.” No disrespect to Kenny Allen, but I’m not sure he and the 100s of other footballers listed there are critical components of Hausa knowledge. There’s a move to delete these articles (you can see the impressive list here while it’s up), but even if they survive, it’s not a sign of a healthy project.

    Leaving aside Hausa, it’s once again Afrikaans, growing at an impressive 53% over the period, that provides an example for the rest. At current rates, it’s on track to pass Malagasy and reclaim its position on top in about a year or so.

    Besides Afrikaans, only Shona, Swahili, Somali and Zulu show a growth rate above 10%, while quite a few sit idle.

    Moving on to the South African language editions specifically:

    South African Language Wikipedias

    Language 26/6/2015 5/9/2017 30/6/2018 2/4/2019 % +
    Afrikaans 35,856 46,824 50,275 76,965 53.11%
    Northern Sotho 1,000 7,823 8,050 8,018 -0.40%
    Zulu 683 942 959 1,067 11.26%
    Xhosa 356 708 738 789 6.91%
    Tswana 503 639 641 641 0.00%
    Tsonga 266 526 562 585 4.09%
    Sotho 223 523 539 546 1.30%
    Swati 410 432 439 467 6.38%
    Venda 151 256 256 265 3.52%
    Ndebele (incubator) 12 12 11 -8.33%
    Language 26/6/2015 5/9/2017 30/6/2018 2/4/2019 % +

    Afrikaans remains the only project that could be described as a usable Wikipedia – the other languages are still very much in the formative stages. Zulu is also showing signs of life. Besides these two, only Xhosa and Swati see growth rates above 5%. It’s sad to see the stalling of Northern Sotho, while Ndebele shows no signs of getting out of the incubator anytime soon.

    2019 has been proclaimed the Year of Indigenous Languages by the UN, but so far there’s not much sign of a change in the status of the African language projects. Later today sees the South African Centre for Digital Language Resources, in collaboration with the Academy of African Languages and Science from the University of South Africa, present an interactive day workshop on contributing to Wikipedia in South African languages. It’s great to see this initiative, which arose with no help that I’m aware of from Wikimedia South Africa. I’m always hopeful with events like these. Generally very few people to stay around to edit Wikipedia, but as projects like Northern Sotho and Swahili show, one person can make a huge difference in the early stages, and it justs needs a committed editor to stick around. It’s a lonely job editing in the early stages, wondering if it’s worthwhile, no community, no idea if their work is being read. Hopefully someone will take on the challenge!

    If you are looking to contribute, but don’t know where to start, please reach out to Wikimedia South Africa and we’d be happy to assist.

    Related posts

    Image from Wikimedia Commons

    How your students can counteract misinformation

    14:30, Tuesday, 02 2019 April UTC

    This April 2, on #AprilFactsDay, we’re reminded of the importance of trustworthy information. How can we equip the next generations of information consumers and producers with the skills they need to participate in our rapidly changing digital landscape?

    Wikipedia is one of the most trusted sites among the cacophony online. That’s because it’s built on the principle of verifiability; its community-made policies take a strict stance against promotion and advertising; and the volunteers that curate its content value neutrally presented and well-referenced facts. Information that doesn’t adhere to these standards is deleted as soon as one of Wikipedia’s thousands of devoted volunteers encounters it.

    But there are still gaps in information on Wikipedia, which can be harder to spot than false information. That’s where instructors and students in our Student Program are making a difference. Higher education instructors use our tools and assignment templates to teach students how to identify gaps on Wikipedia and use what they’re learning in class to correct those gaps.

    That’s what Dr. Ada Palmer did with her 32 students at the University of Chicago last fall term. Students added 90,000 words of well-researched content to Wikipedia about “how new information technologies trigger innovations in censorship and information control.” *

    Did you know, for example, that food producers can more easily sue their critics in certain states in the US because of food libel laws? The laws are often criticized as a restriction of first amendment rights.

    And newspaper theft, a form of censorship, occurs when an individual, organization, or government removes a large portion of a publication without the consent of the publisher in order to prevent others from reading it. The Wikipedia article now highlights some notable cases, as well as the strategies that various states and cities in the US employ to counteract it.

    Margaret Sullivan of the Washington Post calls April 2 (also known as Fact-Checking Day) “a global counterpunch on behalf of truth.” She also writes that it’s an opportunity to get the public more involved in the processes of informational evaluation that journalists undertake daily.

    In a Wikipedia writing assignment, students participate in fact-checking Wikipedia, looking for informational gaps, and correcting those gaps for the benefit of millions of readers. Wikipedia writing is fact-checking in action, with an added praxis of making the digital informational landscape better.

    In the case of Dr. Palmer’s students, the assignment is also an opportunity to educate others of their rights in the face of false information and censorship.

    In general, a Wikipedia writing assignment provides students with an opportunity to learn to critically evaluate information and participate in modes of knowledge creation that they typically accept passively. When Stanford Graduate School of Education found in 2016 that most students can’t tell the difference between a credible news website and a fake news site, a lot of instructors sprung into action to understand how they could help reinforce these skills in their students. Critical media literacy is an essential part of education and a skill that every instructor in higher education has the power to teach.

    The ability to access trustworthy and free information equips citizens to know and protect their rights. Access, however, is just the first step. Access plus judgement – the ability to discern reliable from unreliable information – is what truly makes a digital citizen.


    Read more about how our work at Wiki Education combats fake news and how you can help. And for more information about Dr. Palmer’s course, visit the University of Chicago’s course page here or their Youtube channel.


    Interested in adapting a Wikipedia writing assignment to fit your course? Visit teach.wikiedu.org for all you need to know.

    Help my CI job fails with exit status -11

    08:41, Tuesday, 02 2019 April UTC

    For a few weeks, a CI job had PHPUnit tests abruptly ending with:

    returned non-zero exit status -11

    The connoisseur [ 1 ] would have recognized that the negative exit status indicates the process exited due to a signal. On Linux, 11 is the value for the SIGSEGV signal, which is usually sent by the kernel to the process as a result of an improper machine instruction. The default behavior is to terminate the process (man 7 signal) and to generate a core dump file (I will come to that later).

    But why? Some PHP code ended up triggering a code path in HHVM that would eventually try to read outside of its memory range, or some similar low level fault. The kernel knows that the process completely misbehaved and thus, well, terminates it. Problem solved, you never want your program to misbehave when the kernel is in charge.

    The job had recently been switched to use a new container in order to benefit from more recent lib and to match the OS distributions used by the Wikimedia production system. My immediate recommendation was to rollback to the previous known state, but eventually I have let the task to go on and have been absorbed by other tasks (such as updating MediaWiki on the infrastructure).

    Last week, the job suddenly began to fail constantly. We prevent code from being merged when a test fails, and thus the code stays in a quarantine zone (Gerrit) and cannot be shipped. A whole team could not ship code (the Language-Team ) for one of their flagship projects (ContentTranslation .) That in turn prevents end users from benefiting from new features they are eager for. The issue had to be acted on and became an unbreak now! kind of task. And I went to my journey.

    returned non-zero exit status -11, that is a good enough error message. A process in a Docker container is really just an isolated process and is still managed by the host kernel. First thing I did was to look at the kernel syslog facility on our instances, which yields:

    kernel: [7943146.540511] php[14610]:
      segfault at 7f1b16ffad13 ip 00007f1b64787c5e sp 00007f1b53d19d30
         error 4 in libpthread-2.24.so[7f1b64780000+18000]

    php there is just HHVM invoked via a php symbolic link. The message hints at libpthread which is where the fault is. But we need a stacktrace to better determine the problem, and ideally a reproduction case.

    Thus, what I am really looking for is the core dump file I alluded to earlier. The file is generated by the kernel and contains an image of the process memory at the time of the failure. Given the full copy of the program instructions, the instructions it was running at that time, and all the memory segments, a debugger can reconstruct a human readable state of the failure. That is a backtrace, and is what we rely on to find faulty code and fix bugs.

    The core file is not generated. Or the error message would state it had coredumped, i.e. the kernel generated the core dump file. Our default configuration is to not generate any core file, but usually one can adjust it from the shell with ulimit -c XXX where XXX is the maximum size a core file can occupy (in kilobytes, in order to prevent filling the disk). Docker being just a fancy way to start a process, it has a setting to adjust the limit. The docker run inline help states:

    --ulimit ulimit Ulimit options (default [])

    It is as far as useful as possible, eventually the option to set is: --ulimit core=2147483648 or up to 2 gigabytes. I have updated the CI jobs and instructed them to capture a file named core, the default file name. After a few runs, although I could confirm failures, no files got captured. Why not?

    Our machines do not use core as the default filename. It can be found in the kernel configuration:

    name=/proc/sys/kernel/core_pattern
    /var/tmp/core/core.%h.%e.%p.%t

    I thus went on the hosts looking for such files. There were none.

    Or maybe I mean None or NaN.

    Nada, rien.

    The void.

    The result is obvious, try to reproduce it! I ran a Docker container doing a basic while loop, from the host I have sent the SIGSEGV signal to the process. The host still had no core file. But surprise it was in the container. Although the kernel is handling it from the host, it is not namespace-aware when it comes time to resolve the path. My quest will soon end, I have simply mounted a host directory to the containers at the expected place:

    mkdir /tmp/coredumps
    docker run --volume /tmp/coredumps:/var/tmp/core ....

    After a few builds, I had harvested enough core files. The investigation is then very straightforward:

    $ gdb /usr/bin/hhvm /coredump/core.606eb29eab46.php.2353.1552570410
    Core was generated by `php tests/phpunit/phpunit.php --debug-tests --testsuite extensions --exclude-gr'.
    Program terminated with signal SIGSEGV, Segmentation fault.
    #0  0x00007f557214ac5e in __pthread_create_2_1 (newthread=newthread@entry=0x7f55614b9e18, attr=attr@entry=0x7f5552aa62f8, 
        start_routine=start_routine@entry=0x7f556f461c20 <timer_sigev_thread>, arg=<optimized out>) at pthread_create.c:813
    813    pthread_create.c: No such file or directory.
    [Current thread is 1 (Thread 0x7f55614be3c0 (LWP 2354))]
    
    (gdb) bt
    #0  0x00007f557214ac5e in __pthread_create_2_1 (newthread=newthread@entry=0x7f55614b9e18, attr=attr@entry=0x7f5552aa62f8, 
        start_routine=start_routine@entry=0x7f556f461c20 <timer_sigev_thread>, arg=<optimized out>) at pthread_create.c:813
    #1  0x00007f556f461bb2 in timer_helper_thread (arg=<optimized out>) at ../sysdeps/unix/sysv/linux/timer_routines.c:120
    #2  0x00007f557214a494 in start_thread (arg=0x7f55614be3c0) at pthread_create.c:456
    #3  0x00007f556aeebacf in __libc_ifunc_impl_list (name=<optimized out>, array=0x7f55614be3c0, max=<optimized out>)
        at ../sysdeps/x86_64/multiarch/ifunc-impl-list.c:387
    #4  0x0000000000000000 in ?? ()

    Which @Anomie kindly pointed out is an issue solved in libc6. Once the container has been rebuilt to apply the package update, the fault disappears.

    One can now expect new changes to appear to ContentTranslation.


    [ 1 ] ''connoisseur'', from obsolete French, means "to know" https://en.wiktionary.org/wiki/connoisseur . I guess the English language forgot to apply update on due time and can not make any such change for fear of breaking back compatibility or locution habits.

    The task has all the technical details and log leading to solving the issue: T216689: Merge blocker: quibble-vendor-mysql-hhvm-docker in gate fails for most merges (exit status -11)

    (Some light copyedits to above -- Brennen Bearnes)

    Autonomous Systems performance report

    06:49, Tuesday, 02 2019 April UTC

    Today we're publishing our first report of the performance experienced by visitors of Wikimedia websites, focused on the Autonomous Systems visitors are connecting from.

    This report will be updated monthly, with historical data made available. The goal is to watch the evolution of these metrics over time, allowing us to identify improvements and potential pain points.

    In order to make a fair assessment of the autonomous systems' performance, real user metrics collected from web browsers are normalised, in order to avoid differences such as average device power for a given network's users potentially skewing the results. For example, an ISP with more expensive data plans might have users with more expensive, better performing devices on average. This is way we compare data points only for similar effective device CPU power between providers. We also separate the mobile and desktop experiences, because they serve different content, with a notable difference in the median page weight, which directly impacts performance metrics. We wouldn't want the mobile/desktop mix of a given provider to influence the results.

    If you look at the report, you might wonder why some autonomous systems' underlying mobile networks show up under "desktop" and some wired internet providers appear under "mobile". The explanation is that the internet providers either sell home internet devices that are effectively mobile network modems, resulting in people using their desktop computers (and as a result, the desktop websites) over a mobile network. Or the providers have mobile device users automatically connect to the same provider's WiFi routers when users are in reach of one.

    One caveat about this report is that in countries that are physically large, like the United States, the country-wide aggregation in no way reflects important regional differences there might be for a given network. The main reason why we can't look at smaller regions is that we have simply no way of knowing where mobile users are connecting from, short of collecting geolocation data. Since we care deeply about our user's privacy and their experience, it doesn't feel appropriate at this time to ask users for their precise location in order to generate this type of finer-grained data. Such a scheme would also suffer from self-selection bias. There's already a lot of work to be done with the data aggregated at the national level!

    We hope that this public report will help network operators understand their customers' real performance characteristics when it comes to browsing one of the web's largest websites. We are welcoming of peering requests networks might want to propose, should they seek to improve their connectivity to our datacenters.

    Dr. Irene Chen gave her chemistry students a unique opportunity to practice science communication. She incorporated a Wikipedia writing assignment into her course at UC Santa Barbara this last fall. The course discussed major breakthroughs in nucleic acids research – information that students then channeled into relevant Wikipedia articles where details were missing. Eight students added a total of 13,600 words to Wikipedia this way, a process requiring that they synthesize research in the most concise, essentialized way possible.

    3,600 of those words were channeled into Wikipedia’s article about systematic evolution of ligands by exponential enrichment, a chemical process also known as in vitro selection. Before the student began improving the article, it was almost entirely made up of an introduction and no other informational sections. That was even noted on the article’s Talk page, where volunteers discuss their desired changes. Dr. Chen’s student responded directly to the issue by adding sections about the details of the procedure. Those new sections include information about how chemists generate a single stranded oligonucleotide library and then how that library is incubated to allow binding with the oligonucleotide-target. The student also added information about tracking the progress of the resulting reaction.

    Another student expanded Wikipedia’s article about minimum inhibitory concentration (MIC), which before was also just an introduction and few references. 1,200 words later, the article boasts an additional 10 sources, as well as background information about the history and clinical usage of the MIC concept. As the article states, MIC “is the lowest concentration of a chemical, usually a drug, which prevents visible growth of bacterium,” a definition supported by a review paper published in 2005 about the treatment of bacterial infectious diseases. The article has been viewed almost 13,000 times since the student made changes – many more than would a typical term paper!

    Students tend to invest more in their work when they realize it can be accessed by millions. They understand the responsibility to represent information accurately and take their new-found role of ‘knowledge creator’ seriously. They’re a great group to do this work well, especially in the sciences. Students can translate complex course topics for a general audience who might be learning about the topic for the first time because they remember what that was like. They do a great service to the world by sharing expertise they have access to (both through their professor and their library resources) with Wikipedia’s worldwide readership. That matters, and they understand that.


    Interested in incorporating a Wikipedia writing assignment into your course? Visit teach.wikiedu.org for all you need to know to get started. Or hear from instructors who have done this here.

    Older blog entries