Improving open education information on Wikipedia

16:33, Friday, 24 2021 September UTC
Helen DeWaard
Helen DeWaard

Helen DeWaard is a teacher educator with Lakehead University and a learner designer with the faculty of education at the University of British Columbia. As a teacher educator, her courses focus on critical media and digital literacy in teaching and curriculum design. As a learning designer, DeWaard seeks out ways to support teacher educators with their efforts to integrate meaningful learning tasks into course designs. DeWaard recently participated in the Open Educational Resources (OER) Wiki Scholars course, in which Wiki Education partnered up with the Global OER Graduate Network to train OER researchers and scholars how to edit Wikipedia. During the 6-week course, Wiki Education facilitated as participants worked to improve Wikipedia’s articles about OER-related topics.

“The enticement was the chance to work more closely with other graduate level scholars in my network, while also learning how to edit and create Wikipedia articles relating to the topics around open education,” DeWaard says. “My underlying goal was to become more comfortable with the process of working within wiki-spaces in order to possibly add this into my teaching practice.”

DeWaard focused on the articles Open Educational Practices, Open Educational Resources in Canada, and Open Thesis because they were either missing chunks of information or the article had not been published yet. After her cohort published their work on Wikipedia, they already noticed its impact, as it reached 400,000 readers. By contributing to Wikipedia, DeWaard was able to see how much dedicated attention it takes to ensure Wikipedia articles are accurate and representative.

“What has changed as a result of this Wikipedia work is my understanding of how hard it is to ensure that the information posted in an article is valid, credible, containing citations to source information, and written without bias,” DeWaard says. “While my author’s voice is evident in my scholarly writing, when writing for a Wikipedia article, I needed to take a critical stance and apply a critical lens to focus on ‘just the facts’ from a balanced perspective. I had to keep asking myself, ‘what does someone who is reading this article really need to know and learn about this topic, and how should this topic be structured.’”

With the Wiki Scholars course, DeWaard was able to strengthen her connections with other GO-GN members that participated in the course. In addition, she was able to grow in her research and writing skills in a Wikipedia setting, thanks to instructor Will Kent’s guidance.

“This course was a way to become informed, gain some experience, build new connections in my personal learning network, and develop confidence to contribute to global public knowledge. I hope that my contributions to Wikipedia will help inform or clarify and provide current knowledge in key areas of information,” DeWaard says.

DeWaard notices that in teacher education, although they discourage students to utilize Wikipedia, it is often the first stop they use when researching a specific topic. With implementation of Wikipedia-editing assignments in teacher education programs, teacher educators can have a clear understanding of how information on Wikipedia is published and moderated.

“Shifting mindsets about the usability and credibility of Wikipedia as a starting place for research can occur when teacher education programs engage candidates and teacher educators in a critical examination of key topics relevant to education. By incorporating a Wiki-editing task within a course of study in teacher education, the impact and potential shift in understanding can extend beyond the faculty and into K-12 classrooms,” DeWaard says.

DeWaard is currently designing a Wiki Education workshop for her faculty so they can get an overview of how Wikipedia editing can be implemented in classrooms—an assignment they can do with Wiki Education’s support through the Wikipedia Student Program.

“I think that many would benefit from learning how to contribute to Wikipedia so this can become a valuable means of engaging higher education instructors into building wiki-work within course designs,” DeWaard says. “In this way students and instructors will come to understand the value of current, credible, and searchable information for many fields of study. In addition, students and instructors can become co-learners and collaborators on creating Wikipedia articles for topics of study within their fields of study, thus contributing to the global knowledge fund.”

In relation to the open educational resources movement, DeWaard believes that there will be a heightened awareness of the movement because of the outstanding coverage on Wikipedia thanks to participants in the Wiki Scholars course. To her, a movement does not gain momentum without people, so it is also important to give recognition and credit where it is due.

“With so many advancements and changes in the field, it will be necessary and important to not only contribute to article production, but to also encourage and celebrate the efforts of those in open education who are doing this work,” DeWaard says. “I’m hoping that one day, members of the GO-GN network can be recognized for contributions with their unique penguin logo on a barnstar and a penguin added to the wiki-love collection.”

To take a course similar to Helen’s, please visit Image credits: GoToVan from Vancouver, Canada, CC BY 2.0, via Wikimedia Commons; Richelieu5851, CC BY-SA 4.0, via Wikimedia Commons.

By Volker Eckl, Lead UX Engineer, Product Design

The Wikimedia Foundation has decided to adopt Vue.js as its future JavaScript framework for use in MediaWiki and across the Wikimedia movement. The Design Systems Team provides an overview of recent decisions and recommendations and an outlook of the next steps towards a shared user-interface component library and the larger design system.


In 2019, the Front-end Architecture Working Group (FAWG) was formed to explore how to address challenges in front-end development at Wikimedia. The FAWG recommended that the Wikimedia Foundation evaluate Vue.js as the potential JavaScript framework for use in MediaWiki and across the Wikimedia Movement. This has been part of the ongoing Platform Evolution program to evolve our platform and empower developers. After a pilot project, the Wikimedia Foundation decided to officially adopt Vue.js and to invest in migrating its systems and code to Vue. 

In response to this, the Design Systems Team was created and is charged with building out infrastructure and tools for using Vue, creating and maintaining a Vue-based unified design system and component library, building processes for working with this library across Wikimedia, and migrating front-end products to Vue. By the time the team was created in January 2021, multiple teams at the Wikimedia Foundation (WMF) were developing code in Vue, using three different sets of reusable components. In addition, Wikimedia Deutschland (WMDE) had been developing in Vue for years and had built its own design system called WiKit. Initially, the Design Systems Team focused on unifying the shared components on the WMF side into one library, but we knew we wanted to find a way to build a consolidated library for WMF, WMDE, volunteer developers, and anyone else who might want to use it.

At the same time, the Design Systems Team ran into several other issues that required making decisions across teams and across the organizational boundary between WMF and WMDE. For example, we needed to decide whether, when, and how to migrate from Vue 2 to Vue 3, which is a decision that affects everyone using Vue across the Wikimedia movement. We identified several other major decisions to be made and decided to convene a summit with everyone working on Vue at WMF and WMDE so we could resolve these questions together and learn from each other. The teams that had experimented with Vue had all produced great work in an environment with a lot of uncertainty and had given the Design Systems Team excellent feedback and suggestions. We wanted to work collaboratively with them to build and maintain this new design system, starting with these important decisions.

Converge and unify

The future design system user-interface library—providing the building blocks for interfaces of highly complex projects like Abstract Wikipedia or with exposure to hundreds of millions of users like the Desktop Improvements project—will follow a design-first approach. Therefore, we started with a designer workshop preceding the engineering-centered summit to identify design blockers and hurdles.

The Designer Workshop

Over 20 designers and user-experience-centered engineers from different product and platform teams at the Wikimedia Foundation collaborated on a two day workshop. The goal of the workshop was to explore how we might create a streamlined design process for common user-interface components – from ideation to quality assurance. Additionally, the workshop was a chance to deepen the collaboration between various stakeholders in this wide-reaching project.

Topics covered included:

  • Governance Model: Workshop participants reviewed a decision workflow, known as the Governance Model. The workflow outlines the decisions, steps, and processes involved in the creation, re-usage, modification, and deprecation of system components. It also sets clear collaboration guidelines that will enable teams to actively contribute to the system.
  • Component Design Process: Leveraging the design system principles and quality assurance checklists already in use, participants discussed opportunities form proving the design process for component additions. A key element was clarifying the role of design tokens as systematic design decisions in predefined, centralized, limited, and traceable values in this process.
  • Resource Identification & Ecosystem: Participants shared and specified resources to support a production design workflow, including the Design Style Guide.

While the Governance Model and recommendations for CSS architecture and icon handling were carried over to the developer summit, other topics have been settled on or placed in Phabricator tasks for interested parties and volunteers to provide additional feedback.

The Developer Summit

The developer summit took place over three days with more than 60 people attending. The first day was about gathering feedback and experiences. The Design Systems Team introduced itself and its plans, developers from several teams presented short demos of projects they had built in Vue, and there was a retrospective where developers discussed their experiences and concerns with Vue development in the context of the MediaWiki ecosystem. The second and third days were focused on making decisions. In each session, the Design Systems Team presented a problem and possible solutions, which the group then discussed until consensus was reached on how to move forward.

The first decision we had to make was when to upgrade to Vue 3. All of the existing Vue code at Wikimedia uses Vue 2, but we would like to take advantage of the added features and better performance of the new major version. The longer we wait to upgrade, the more code will be written for Vue 2 in the meantime that will have to be migrated later. However, Vue 3 does not support Internet Explorer 11 and certain other older browsers, so once we upgrade we will no longer be able to support these browsers in our Vue-based projects. Users of those browsers will get a more limited experience without JavaScript. 

The group decided that this trade-off was worth it: we’re going to upgrade to Vue 3 as soon as we can, even though that means dropping IE11 support for Vue-based projects. We also decided to review our policies for what no-JavaScript experiences should look like in our products, to ensure that we provide an adequate experience for users of IE11 and other older browsers.

The other major decisions centered around the shared component library. The group had various opinions on which existing library should become the unified one and on whether we should even choose an existing library or start a new one. We ended up deciding to start a new library, to build it up quickly by transferring the good parts of the existing libraries, and to build it out collaboratively from the start so that it belongs to all of us.

In line with our decision to upgrade to Vue 3 quickly, we will build this library in Vue 3 from the start. A task force with representatives from the Design Systems Team, other WMF teams, and WMDE will be formed to spearhead the creation of this new library and to handle the details of configuring tools, settling on library guiding principles, and designing and documenting a contribution process. Although these groups will be doing the work to initialize the library, we will do so while remaining open to feedback from anyone in Phabricator and Gerrit, and will welcome contributions to the library from anyone who has an interest.

We also decided that the unified library will use Vite as its build system (as opposed to Webpack, which most of the existing libraries use), and will be written in TypeScript. We weren’t able to come to a decision on whether the new library should use LESS, Sass, or something else as its CSS processor; that decision will be left to the task force. 

For a detailed overview of all the decisions we made at the summit, see this summary.


The post-summit task force will soon meet to address the remaining decisions, to finalize our processes around contribution, and to start up the new component library. Keep an eye on the wikitech-l and design mailing lists and the Design System Team’s workboard for updates about the new library.

The Design Systems Team is grateful to everyone who participated in the Designer Workshop and Developer Summits for sharing their experiences, suggestions, opinions, and offers to help with these efforts. We are hopeful that kicking off this work with dedicated time for collaborative brainstorming and problem-solving will lead to a powerful and excellent design system in the end.

About this post

Featured image credit: File:Lissajous-Figur — 2020 — 7768.jpg, Dietmar Rabich, CC BY-SA 4.0

Jessica E. Brodsky is a doctoral student in Educational Psychology at The Graduate Center, CUNY. Her dissertation research evaluates the impact of lateral reading instruction on undergraduate fact-checking skills. Patricia J. Brooks is Professor of Psychology at the College of Staten Island and The Graduate Center, CUNY.  She is involved in WikiProject Women in Psychology and has engaged her students in editing Wikipedia since 2012.

Jessica Brodsky and Patty Brooks
Jessica Brodsky (left) and Patricia Brooks.
Image courtesy Jessica Brodsky, all rights reserved.

The past two years have been difficult for students and faculty alike, as we have had to grapple simultaneously with the consequences of the COVID-19 pandemic and an information infodemic, defined by the World Health Organization as “an overabundance of information—some accurate and some not—that makes it hard for people to find trustworthy sources and reliable guidance when they need it.” As social media platforms and fact-checking organizations struggle to keep up with vetting COVID-19 related information, it has become urgent for all of us to learn effective strategies for deciding whether online information can be trusted.

Wikipedia can play an important role in helping students quickly determine if sources of information are trustworthy. For example, features of the National Vaccine Information Center (NVIC) website suggest that it may be a trustworthy source of information about vaccines: the website has a professional appearance, a “.org” domain name, and uses references in its articles. However, a quick search in Wikipedia reveals that the NVIC is an anti-vaccination advocacy group. In fact, professional fact-checkers often turn to Wikipedia as a starting point to investigate the potential biases or agenda of a source; see here for more information.

Over the past several years, we have partnered with colleagues teaching civics to first-year college students at the College of Staten Island, CUNY. As one of 11 institutions participating in the Digital Polarization Initiative (DPI) sponsored by the American Association of State Colleges and Universities, we have aimed to teach college students how to use lateral reading strategies to vet online information. Lateral reading involves leaving a website to investigate the people and organizations promoting the online content, finding out what other sources have to say about it, and tracing the content back to its original source. Lateral reading contrasts with vertical reading strategies often associated with checklists like the CRAAP test, where students are taught to scrutinize online content for specific features. Given the complexity of the internet today, vertical reading strategies have proven insufficient for accurate fact-checking. Initial findings from the DPI suggest that deliberate instruction in lateral reading improves college students’ fact-checking skills; for more details see here.

In Fall 2020, we shifted instruction to focus on information related to the COVID-19 pandemic and economic recovery efforts. As instruction was fully online, we built a series of asynchronous assignments to teach lateral reading skills. In our pretest assessments, we learned that using Wikipedia to investigate information sources went against what students had been taught before college or in other college courses. That is, 77.8% of students (N = 221) indicated that their teachers had discouraged them from using Wikipedia as an information source and 67.9% said they were unlikely or very unlikely to recommend Wikipedia as a source of information to one of their classmates.

Over the course of the semester, students completed three assignments where they practiced using Wikipedia to investigate the agenda of various information sources (people and/or organizations). All the assignments were tied to course themes and current events. For example, students used Wikipedia to learn more about the authors of tweets opposing mask mandates and to research an organization whose tweets expressed concerns about the United States’ economic recovery under President Biden.

At the end of the semester, not only were students more likely to read laterally when determining how much they trusted online information, they were also more likely to report using Wikipedia for academic and non-academic research and to recommend Wikipedia to others. Students were also more accurate in their knowledge about Wikipedia. When asked to select their first-choice strategy for evaluating the trustworthiness of a website, 42.9% indicated that they would use Wikipedia to learn more about it––a marked increase from 5.9% at pretest!

Adults today are clearly aware of the challenges of figuring out if they can trust COVID-19 information they see online. According to an April 2020 Pew Research Center survey, only 28% of adults felt very confident in their ability to determine the accuracy COVID-19-related news. Our students felt similarly unsure at the start of the semester, but made gains in their confidence in vetting COVID-19 related information. On the pretest, only 27.6% of students indicated confidence in their ability to fact-check online news about the COVID-19 pandemic; this proportion doubled (55.2%) at posttest. For more details about this project, and examples of assessments that you can tailor for their own courses, see here.

Image credit: Alex Irklievski, CC BY-SA 4.0, via Wikimedia Commons

22 September 2021, San Francisco, CA, USA — The Wikimedia Foundation, the nonprofit that operates Wikipedia and other Wikimedia projects, today announced that the Wikimedia Endowment has reached its initial $100 million fundraising goal ahead of schedule. This early achievement is a testament to the generosity of Endowment donors around the world and provides enduring support for free knowledge. Launched in 2016 to support the future of Wikimedia projects, the Endowment is a permanent fund that helps protect Wikipedia and Wikimedia projects in times of uncertainty and enables long-term investments to support their growth. 

“When Wikipedia first started 20 years ago, no one could predict the vital role it would play in our world,” said Jimmy Wales, founder of Wikipedia and Wikimedia Endowment Board member. “Wikipedia is our collective legacy, the sum of all human knowledge available to everyone. As we reach this historic milestone, the success of the Endowment ensures that Wikipedia will continue to be a gift from all of us to future generations of knowledge seekers.”

As Wikipedia celebrates its 20th birthday this year, it has cemented its role as one of the most reliable and trustworthy sources of information on the internet. However, the information on Wikipedia and the Wikimedia projects still has significant gaps. Women, communities of color, and non-Western languages, among other critical topics, continue to be underrepresented on the site. With this significant achievement, the Wikimedia movement will be better positioned to take on these and other challenges over the long-term. 

Investment income from the Endowment can now be used to support Wikipedia and other Wikimedia projects. The Endowment Board will be sharing more about its strategy to support the projects in the coming year.  

“Wikipedia is founded on the belief that knowledge is a public good that belongs to all people,” said Lisa Gruwell, Chief Advancement Officer of the Wikimedia Foundation. “Through the Endowment, and the generosity of our donors, we have been able not just to protect this public good for years to come, but also to focus on growth, opening doors to fuel innovation and equitable representation so that Wikipedia and other free knowledge projects reflect the world.”  

The Wikimedia Endowment is governed independently from the Wikimedia Foundation as a long-term, permanent fund to support the future of the Wikimedia projects in perpetuity.  Reader donations and institutional grants are used to support the Wikimedia Foundation’s  annual operations, separate from the Endowment. 

New members of the Wikimedia Endowment Board 

As part of this milestone, three new members to the Wikimedia Endowment Board were also announced; they add critical expertise in Wikimedia’s volunteer communities, the Wikimedia mission and values, and nonprofit management. New members include Phoebe Ayers, who currently serves as Librarian for Electrical Engineering & Computer Science, IDSS, and Mathematics at Massachusetts Institute of Technology and former Wikimedia Foundation Board of Trustee, and Patricio Lorente, General Secretary of Administration of the National University of La Plata and former Chair of the Wikimedia Foundation Board of Trustees. The Board also welcomed Doron Weber, an executive with more than two decades of experience in nonprofit management, currently serving as Vice President of Programs at the Alfred P. Sloan Foundation. 

Together, the eight members of the Endowment Board bring a diverse range of skills and experiences to steward the future of the Wikimedia Endowment.  

Creating an independent nonprofit 

The Wikimedia Foundation and Wikimedia Endowment Board have also announced plans to move the Wikimedia Endowment, currently managed by the Tides Foundation as a Collective Action Fund, into its own separate 501(c)3 organization governed by the Wikimedia Endowment Board. The move will help solidify the independence of the Wikimedia Endowment, allowing its management and investments to be aligned directly to the needs of the Wikimedia projects. The application to move the Endowment to a separate 501(c)3 is currently before the Internal Revenue Service (IRS).

The Wikimedia Endowment’s $100 million milestone is a direct result of the generosity of its donors. The Endowment was first established thanks to a legacy gift by Jim Pacha, who generously donated much of his estate to seed its initial funding. Today, more than 1,400 members of the Wikipedia Legacy Society have committed to leave a portion of their estate to the Wikimedia Endowment. Other Endowment donors who choose to be publicly recognized are listed on our Benefactors page

About the Wikimedia Foundation 

The Wikimedia Foundation is the nonprofit organization that operates Wikipedia and the other Wikimedia free knowledge projects. Our vision is a world in which every single human can freely share in the sum of all knowledge. We believe that everyone has the potential to contribute something to our shared knowledge, and that everyone should be able to access that knowledge freely. We host Wikipedia and the Wikimedia projects, build software experiences for reading, contributing, and sharing Wikimedia content, support the volunteer communities and partners who make Wikimedia possible, and advocate for policies that enable Wikimedia and free knowledge to thrive.

The Wikimedia Foundation is a charitable, not-for-profit organization that relies on donations. We receive donations from millions of individuals around the world, with an average donation of about $15. We also receive donations through institutional grants and gifts. The Wikimedia Foundation is a United States 501(c)(3) tax-exempt organization with offices in San Francisco, California, USA.

By Srishti Sethi, Senior Developer Advocate

Small wiki toolkit participants, by Srishti Sethi, CC BY-SA 4.0

Small wiki toolkits is an initiative to build technical capacity in smaller language wikis by developing toolkits, conducting workshops, and providing technical support. This initiative was kicked off in 2019 at Wikimedia Stockholm to address the need for smaller wikis to draw technical contributors and equip them with the necessary skills to do important technical work to help support, grow and maintain their wikis. It aims to bridge the skill gap in smaller communities and to build a global network of individuals interested in training others, discussing and resolving the challenges of these wikis. 

In the initiative’s first year, there were over 200 people who participated in 16 informational sessions, technical workshops, and group discussions through 6 Wikimedia events (local, regional, and international) online and offline. Through these events, the initiative engaged people from many smaller wikis from all over the world.

It took some time to adapt to the virtual events and the new normal brought on by the Covid19 pandemic, but the initiative continued to advance during its second year.   

At WMF, there was ongoing research to identify emerging technical communities by looking at their content growth, number of editors, number of bots edits over time to find correlations between them. For example, if a community has reached a tipping point in its content and isn’t yet utilizing bots and tools, then the community likely falls in the category of emerging technical communities. 

A few wikis like Minangkabau in South East Asia, Quechua in South America, and Kurdish in the Middle East regions were highlighted as they had a significantly high number of native speakers. Then, user groups, chapters, initiatives that focus on these language wikis were approached to investigate if they could benefit from the technical capacity-building efforts. Based on the needs and priorities of the wiki communities, the following two initiatives were kicked off:

  • Small wiki toolkits <> South Asia – This initiative is for supporting Indic languages wikis in the South Asia region. In the first year, the workshops were run for the Indic language wikis only in India; after receiving an overwhelmingly positive response they were opened up to the broader community. This initiative is run in collaboration with Indic-TechCom.

As part of two initiatives, over this past year, 8 technical workshops were conducted, one per month for each community in which around 50 members participated from 12 countries and language wikis

Map of small wiki toolkit countries, created with by Srishti Sethi, CC BY-SA 4.0

Some of the sessions conducted as part of these efforts were around integrating growth features on wikis, writing Wikidata queries, running bots, debugging template errors, populating infoboxes using Wikidata, etc. 

Through a follow-up survey and virtual meeting, participants shared that they find these workshops useful, and they feel a strong need to continue to learn and grow their technical skills. They also reported to have increased familiarity with technical topics and used the skills gained to solve various problems on their wikis. 

Here is what one of our attendees had to say about the workshops:

I believe Small wiki toolkits project has a good impact on the Kurdish community in Wikimedia. First, it made it better and easier to understand some technical topics and introduced new tools which will help us to manage Kurdish Wikipedia. Second, the technical questions we had were able to get answers through the workshops as most of the workshops depend on our needs, including more understanding of Wikidata and linking Wikidata to Wikipedia articles. Meanwhile, the Small wiki toolkits project introduced us to a group of technicians who are able to solve our issues. So in the future, if we face obstacles and technical difficulties we can approach them or the project.

Muhammed Serdar, Kurdish Wikimedians User Group

SSmall wiki toolkit participants, by Srishti Sethi, CC BY-SA 4.0

Based on the learnings from the last year, some roles and responsibilities for organizers and mentors have been documented here. As a few next steps, the initiative plans to grow its pool of mentors, to provide trainings for topics in module format, and to offer watch-party style trainings for previously recorded sessions to save mentors some time. And, finally, Small Wiki Toolkits wants to work with many more communities in the next year! 

If you are a mentor, interested in helping run a technical workshop, add yourself to this wiki page here

If you are working with a small wiki and you are interested in kicking off a technical capacity building initiative for your community, contact us on the talk page.

Lastly, a huge thank you to our past mentors and organizers who helped us run the trainings last year – Andre Klapper, Benoît Evellin, Birgit Mueller, Doug Taylor, Jay Prakash, Joaquin Oltra Hernandez, Krishna Chaitanya Velaga, Mahir Morshed, Mike Peel, Mohammed Sardar Noori, Satdeep Gill!

About this post

Featured image credit: File:Hand tools.svg, SAİT71, CC BY-SA 4.0

Toward a National Collection

09:13, Tuesday, 21 2021 September UTC



£14.5m awarded to transform online exploration of UK’s culture and heritage collections through harnessing innovative AI.

In order to connect the UK’s cultural artefacts and historical archives in new and transformative ways, The Arts and Humanities Research Council (AHRC) has awarded £14.5m to the research and development of emerging technologies, including machine learning and citizen-led archiving.

Five major projects form the largest investment of Towards a National Collection, a five-year research programme. The announcement reveals the first insights into how thousands of disparate collections could be explored by public audiences and academic researchers in the future.

The five ‘Discovery Projects’ will harness the potential of new technology to dissolve barriers between collections. They will open up public access and facilitate research across a range of sources and stories held in different physical locations. One of the central aims is to empower and diversify audiences by involving them in the research and creating new ways for them to access and interact with collections. In addition to innovative online access, the projects will generate artist commissions, community fellowships, computer simulations, and travelling exhibitions.

Daria Cybulska, Director of Programmes and Evaluation at Wikimedia UK said “We were delighted that two Discovery Projects we are involved in were selected for this round of funding by the AHRC. Increasing and diversifying access to UK’s collections and supporting open research is of strategic importance for Wikimedia UK. Through our involvement we will aim to demonstrate the benefits of open knowledge, especially for public engagement, and its abilities to help dissolve barriers between separate collections. We see an exciting amount of overlap between the projects we are directly supporting and our priorities, for example amplifying underrepresented community heritage in the Our Heritage, Our Stories project, and linked open data, including Wikidata in the Congruence Engine project.”

The investigation is the largest of its kind to date. It extends across the UK, involving 15 universities and 63 heritage collections and institutions of different scales, with over 120 individual researchers and collaborators.

Together, the Discovery Projects represent a vital step in the UK’s ambition to maintain leadership in cross-disciplinary research, both between different humanities disciplines and in their work with other sectors. Towards a National Collection will set a global standard for other countries building their own collections, enhancing collaboration between the UK’s renowned heritage and national collections worldwide.


Read more about the two Discovery Projects Wikimedia UK are engaged in:

The Congruence Engine: Digital Tools for New Collections-Based Industrial Histories

Principal Investigator: Dr Timothy Boon, Science Museum Group

Project partners: British Film Institute, National Museums Scotland, Historic Buildings & Monuments Commission for England (Historic England/English Heritage), National Museum Wales, National Museums Northern Ireland, The National Archives, National Trust, The V&A, universities of Leeds, London, and Liverpool, BBC History, Birmingham Museums Trust, BT Heritage & Archives, Grace’s Guide to Industrial History, Isis Bibliography of the History of Science, Saltaire World Heritage Education Association, Society for the History of Technology, Whipple Museum of the History of Science (Tools of Knowledge Project), Tyne & Wear Archives & Museums (Discovery Museum), Bradford Museums and Galleries, Wikimedia UK and Manchester Digital Laboratory (MadLab).

The Congruence Engine will create the prototype of a digital toolbox for everyone fascinated by our industrial past to connect an unprecedented range of items from the nation’s collection to tell the stories they want to tell. What was it like then? How does our past bear on our present and future? Until now, historians and curators have become acclimated to a world where it has only been possible to work with a small selection of the sources – museum objects, archive documents, pictures, films, maps, publications etc – potentially relevant to the history they want to explore.

The Congruence Engine will use the latest digital techniques to connect collections held in different locations to overcome this major constraint on the histories that can be created and shared with the wider public in museums, publications and online. Digital researchers will work alongside professional and community historians and curators.

Through iterative exploration of the textiles, energy and communications sectors, the project will tune collections-linking software to make it responsive to user needs. It will use computational and AI techniques – including machine learning and natural language processing – to create and refine datasets, provide routes between records and digital objects such as scans and photographs, and create the tools by which participants will be able to enjoy and use the sources that are opened to them.


Our Heritage, Our Stories: Linking and searching community-generated digital content to develop the people’s national collection

Principal Investigator: Professor Lorna Hughes, University of Glasgow

Project partners: The National Archives, Tate, British Museum, University of Manchester, Association for Learning Technology, Digital Preservation Coalition, Software Sustainability Institute, Archives+, Dictionaries of the Scots Language, National Lottery Heritage Fund, National Library of Scotland, National Library of Wales, Public Record Office of Northern Ireland & Wikimedia UK

In the past two decades communities have gathered, recorded, and digitised their collections in a form of ‘citizen history’ that has created a truly democratic and vast reservoir of new knowledge about the past, known as community-generated digital content (CGDC). However, CGDC has proved extraordinarily resistant to traditional methods of linking and integration, for lack of infrastructure and the multilingual, multidialectal, and multicultural complexity of the content.

Our Heritage, Our Stories will dissolve existing barriers and develop scalable linking and discoverability for CGDC, through co-designing and building sophisticated automated AI-based tools to discover and assess CGDC ‘in the wild’, in order to link it and make it searchable. This new accessibility will be showcased through a major public-facing CGDC Observatory at The National Archives, where people can access, reuse, and remix these newly-integrated collections.

The project will make CGDC more discoverable and accessible whilst respecting and embracing its complexity and diversity. Through this, it will help tell the stories of communities through their rich collections of CGDC, which are at present hidden from wider view. By dissolving barriers between these and showcasing their content, the project will help centre diverse community-focused voices within our shared national collection.

Further information & images


How Wikidata is helping the CU Boulder library

15:11, Monday, 20 2021 September UTC
Chris Long
Chris Long

Wikidata’s overlap with cataloging has increased in recent times, prompting many librarians to transition into using it more. Chris Long, who is the Director of the Resource Description Services Team at the University of Colorado Boulder’s library, has been an avid user of Wikidata since 2019, creating and editing a variety of items.

Though Long’s experience with Wikidata is extensive, he participated in our recent Wikidata Institute course to learn more about how its sources and tools can be implemented in his university’s library. His institution is participating in the Library of Congress’s Program for Cooperative Cataloguing (PCC) pilot using Wikidata, so Long spent time editing items related to that initiative.

As a cataloger and cataloging manager, it is important to keep abreast of emerging cataloging trends. The Library of Congress and PCC are increasingly exploring the efficacy of Wikidata in cataloging work, so learning to use it is important to stay current,” Long says. “Being able to provide Wikidata training for my colleagues affords them the chance to do some hands-on linked data work.”

With Wikidata’s useful tools, Long learned the importance of constructing data models for effective querying. He believes the services Wikidata offer can immensely maximize the impact of any project or collection.

“While there are a number of library linked data projects in existence, many are either small-scale ‘proof of concept’ projects, or require a great deal of institutional support to participate,” Long says. “Conversely, Wikidata is a low-barrier way for librarians to actively create usable linked data that can have a large impact.”

Wikidata can enhance any library’s collection, he believes.

“It affords the opportunity to de-silo our library metadata and let it ‘play’ on the Semantic Web, allowing for the discovery of associations among persons and concepts that would otherwise not be possible,” Long says.

Long is currently preparing a Wikidata project for his team involving the University of Colorado Boulder faculty. The project consists of creating Wikidata items for faculty as well as doing revisions on existing ones to try and associate them with the university. Projects like this are common in universities as they allow faculty to be displayed on tools like Scholia.

To take a course like Chris took, please visit Image credits: Gribeco, CC BY-SA 3.0, via Wikimedia Commons; Chris Evin Long, CC BY-SA 4.0, via Wikimedia Commons

German influences in Indian ornithology

12:40, Monday, 20 2021 September UTC
I have noted before that many non-English works in science, even from Europe, are often given a quick pass-over in English works and cases range from the failure to cite junior synonyms in taxonomic monographs [ Ixos fisquetti Eydoux & Souleyet, 1842 from a French source has been ignored as a synonym of Pycnonotus priocephalus (Jerdon, 1839) by nearly all taxonomists] to glossing over major contributions like those of the German anatomist and cladist Max Fuerbringer. Some of this may be due to the wars but there is a sense that even much later works often do not give enough credit where it is due.

Some inkling of how German research was actively ignored during the war years can be found in T.B. Fletcher's address at a meeting of entomologists in 1919 where he called attention to Sir George Hampson's decision to not cite any German papers in a taxonomic revision.

T.B. Fletcher's call to boycott German literature and products.
Report of the proceedings of the third entomological meeting : held at Pusa on the 3rd to 15th February 1919 (1920)

In Tim Birkhead's preface to his history of ornithology, Ten Thousand Birds, he notes the ratings of his friends for the most influential ornithologists: 
"David Lack was the clear leader (30 votes), followed by Ernst Mayr (23), Niko Tinbergen (21), Robert MacArthur (11), Peter Grant (11), Nick Davies (11), Erwin Stresemann (11), Charles Sibley (11), Konrad Lorenz (9), and Donald Farner (8)."
Title page of Volume 7 part 2 of Handbuch der Zoologie (1934)
This is obviously a questionable sample size but the presence of three Germans in the list (with Stresemann at the root of the academic genealogy of the other two - Mayr and Lorenz) should be a useful indicator. A much richer view of influence and the genealogy of ornithology in Germany can be found in the writings of Jürgen Haffer. Haffer, who died a few years ago, was a student of Ernst Mayr who in turn was a student of Stresemann. Stresemann's influence was far-reaching, extending into India through Salim Ali who spent some time with Stresemann at the Zoological Museum, Berlin. An invitation to visit Berlin for a Wikipedia-related meeting allowed me to pursue my research on Stresemann's work and the Salim Ali connection. Ali notes in his biography that it was through the Germans and their Heligoland observatory that he picked up his studies of live birds in the hand and ringing.* In 1914, at the age of 25, while still a doctoral student in medicine, Stresemann was offered the task of writing an entry on the birds in the Handbuch der Zoologie series since the bigger names in German ornithology were too busy to take up the job. This offer from the series editor Willy Kükenthal was to be crucial in his later career. The draft version which followed a structure suggested by Kukenthal arrived in 1920, delayed by the First World War, and when it was published in 1934, it consisted of 900 pages. The book led Stresemann to his future career in the Berlin museum picked in preference to many other bigger and dominant names. In producing the book, Stresemann had clearly conducted a great deal of research into the literature, both new and old, before him which also led him to later reflect on the historical development of ornithology - leading to another magnificent work which was also translated into English as Ornithology from Aristotle to the Present - a (signed) copy of which apparently went to Salim Ali and was passed on to the late S.A. Hussain (who mentioned it over a coffee one evening not too long ago). Now Birkhead's ornithological history does not do a good job of telling us what went into Stresemann's Handbuch der Zoologie. This book had 2200 printed copies but only 536 were sold by 1934 and 156 in 1944 and the remaining copies were burnt at the end of World War II (see Bock, 2001). I  browsed through a copy of the book in the library of Zoological Museum at Berlin and have extracted the table of contents which gives a good overview of the topics covered (I have removed the page numbers and hopefully there are no major transcription errors, use to see what they mean but be prepared for mis-translations):

Stresemann (left) in Finland during the Ornithological Congress of 1958. Photo from the Alexander Wetmore album courtesy of Smithsonian Instituion / Biodiversity Heritage Library.

Einleitung [Introduction]
Erforschungsgeschichte [Research history]

Haut und Hautgebilde [Epidermis]
Haut: Cutis - Epidermis - Schnabel [beak] - Stirnplatten - Nagel [nails]: Zehennägel- Fingernägel - Sporen - Federn [feathers] - Konturfedern [contour feathers] - Augenwimpern [eyelash], Tastfedern - Pelzdunen - Puderfedern [powder down] - Pinselfedern - Fadenfedern - Afterschaft - Nestdunen - Stellung der Federn [position of feathers] - Anordnung der Federn [arrangement of feathers] - Schwingfedern [flight feathers] - Deckfedern [coverts] - Diastataxie [diastaxy] - Afterflügel - Oberarmdecken [upper wing coverts] - Steuerfedern [control feathers] - Mauser [moult] - Schnelligkeid des Federwachstums [rate of feather growth?] - Mauserperioden [moult period]- Umfang der Mauser - Doppelte und dreifache Mauser [double and triple moult] - Reihenfolge des Federwechsels [sequence and  - Abhängigkeit der Mauser von äusseren und inneren Einflüssen - Schuppen [scales]: Deck-, Lauf-schuppen - Fersenschuppen - Hautdruesen - Färbung von Haut und Hautgebilden [colours of skins and skin formation]: Melanine und Lipochrome - Bildungsort der Lipochrome - Periodischer Faerbungswechsel [periodic colour change] - Federzeichnung - Farbenindruck - Schillerfarben - Farbaberrationen [aberrant colours] - Komplizierte Mutationen

Skelett [Skeleton]
Schädel [skull] - Ersatzknochen - Deckknochen - Bewegungen im Schädel - Unterkiefer - Zweiter Schlundbogen - Driter Schlundbogen - Pneumatizität der Schädelknochen [pneumatization of the skull] - Wirbelsäule [spine] - Rippen - Brustbein [sternum] - Schultergürtel - Vordere Extremetität - Pneumatizität des Rumpf [pneumatization of the hull] - und Extremitätenskeletts - Ossifikation der Markknochen [Ossification of the medullary bone]

Muskelsystem [Muscular system]
Viszeralmuskulatur - Somatische Muskulatur - Augenmuskulatur [eye muscles] - Parietale Muskeln - Glatte Federnmuskeln [smooth feather muscle] - Hautmuskeln [skin muscles] - Rote und weisse Muskulatur - Muskelkerne

Nervensystem [Nervous system]
Rückenmark [spinal cord] - Spinalnerven [spinal nerves] - Gehirn [brain] - Gehirnnerven III-XII - Kleinhirn [cerebellum] - Mittelhirn [midbrain] - Zwischenhirn [Diencephalon] - Vorderhirn [forebrain] - Autonomes Nervensystem - Paraganglien - Parasympathisches System

Sinnesorgane [Sense organs]
Hautsinnesorgane [skin sensory organs] - Geschmacksorgan [taste organs] - Geruchsorgan [olfactory organs] - Hörorgan [hearing organ] - Labyrinth - Scheckenteil - Vestibularteil - Bogengangteil - Mittelohr - Paratympanisches Organ - Äußerer Gehörgang [external ear canal] - Auge [eyes] - Retina - körper - Akkomodation - Cornea - Sclera - Bulbus - Augenmuskeln - Lidapparat - Assoziation beider Augen - Augendrüsen [eye glands]

Verdauungssystem [Digestive system]
Mund-Rachenhöhle - Zunge - Histologie der Mund-Rachenhöhle - Drüsen - Faerung der Mundhöhle  Oesophagus - Magen - Druesenmagen - Muskelmagen - Innervierung des Magens - Darm - Duodenalschleife - Ileum - Diverticulum caecum vitelli - Blinddärme - Enddarm - Struktur der Darmwand - Kloake - Bursa Fabricii - Innervierung des Darmes - Verdauung - Leber - Pankreas

Klementaschenderivate und Thyreoidea [Endocrine? and Thyroid gland]

Atmungsorgane [Respiratory system]
Atemweg - Pharyngo-nasale Luftsäcke - Kehlspalt - Stutzgerüst des Kehlkopfes - Trachea - Freie Bronchien - Syrinx - Syrinxmuskeln - Innvervierung der Syrinx - Sexualdimorphismus im Syrinxbau - Lunge - Pulmonale Luftsäcke [pulmonary air sacs] - Histologie der Luftsäcke - Bronchialbaum - Physiologie der Atmung - Funktionen der Luftsäcke - Thoraxbewegungen - Kammerung der Leibeshöhle

Zirkulationsorgane [Blood circulation]
Herz  - Wärmeschutz -Schutz gegen Ueberhitzung  - Körpertemperatur - Arterien: Schicksal der Aortenbögen -Arterien der vorderen Extremität - Arterien der hinteren Extremität - Intersegmentale Arterien - Arterien des Darmkanales - Arterien der Nieren und Keimdrüsen - Venen: Embryonale Entwicklung  - Gebiet der Vena cardinalis posterior - Gebiet der Vena cava posterior - Gebiet der Vena hypogastrica - Nierenpfortaderkreislauf - Gebiet der Venae portae - Gebiet der Vena cardinalis anterior - Gebiet der Vena jugularis - Gebiet der Vena vertebralis communis und  Vena  subclavia  - Blutzellen:   Leukozyten  und  Erythrozyten  -Thrombozyten - Lymphgefäßsystem  — Milz

Urogenitalsystem [Urinogenital system]
Harnapparat — Harn  — Nebenniere — Geschlechtsapparat. Entwicklung: Urgeschlechtszellen - Entwicklung der Keimdrüsen  — Entwicklung des Müllerschen Ganges - Zustand beim Männchen: Hoden — Reste der Urniere beim Männchen - Nebenhoden - Samenleiter - Übertragung des Sperma - Phalloides Organ - Zustand beim Weibchen: Schwund des rechten Ovars und rechten Ovidukts - Geschlechtsumwandlung - Ovar  — Reste von Urniere und Wolffschem  Gang beim Weibchen — Ovidukt

Keimzellen [Germ cells]
Ei. Eierstockei — Dotterbildung - Große Wachstumsperiode des Eies - Bilateraler Bau der Oozyte und des Follikels - Follikelsprung - Hau des Reifeies - Sekundäre Eihüllen -   Kalkschalc - Färbung der Schale - Schalendicke - Eiform - Legeakt- Eiweiß - Verhältnis des Dottergewichts zum Eiweißgewicht - Zusammensetzung des Eiweißes - Eigröße - Spermium.

Embryonale Entwickelung [Embryo development]
Befruchtung - Furchung - Gastrulation -  Primitivstrelfen -Kopffortsatz - Mesoderm -Orientierung der Embryonalanlage - Drehung auf die linke Seite -Eihäute-Dottersack- Resorption des Dotters -   Gefäße des Dottersackes - Amnion -Serosa -Allantois - Gefäße der Allantois - Eiweißsack - Bau der Eiweißsackwandung - Resorption des Eiweißes - Verbindungen der Allantois gefäße - Stellung des Embryo im letzten Drittel der Bebrütung - Schlüpfakt -Abbau lies Schalenkalkes - Aufnahme des Dottersackes in die Bauchhöhle-Stellung des Eies während der Bebrütung-Physiologie der Hmbryonalentwickeluug - Ent-Wickelungsdauer -Brutdauer.

Postembryonale Entwickelung [Post-embryonic development]
Nestflüchter und Nesthocker - Dottervorrat -  Gewichtszunahme -  Nahrungsmenge - Erste Befiederung - Nestlingsdunen - Färbung des Dunenkleides- Tragdauer des Jugendkleides -   Eigenschaften der ersten Plugfedern -  Färbungs entwickelung - Nestlingszeit -  Proportionsverschiebungen -   Nahrungsaufnahme der Jungen - Leitmale

Geschlechtsdimorphismus [sexual dimorphism]
Geschlechtschromosomen - Zahlenverhältnis der Geschlechter - Gynandromorphe - Sexualhormone -Geschlechtsunterschiede in der Färbung - Größenverschiedenheit der Geschlechter - Geschlechtsunterschlede im Skelettbau - Geschlechtsunterschiede und Werbung - Unterschiede im Stimmapparat - Periodischer Wechsel des Geschlechts-dimorphismus- Geschlechtsunterschiede im Mauserverlauf - Geschlechtsdimorphismus und Brutpflege - Übertragung männlicher Eigenschaften auf das Weibchen - Rassenunterschiede im geschlechtlichen Färbungsabstand - Mutative Vergrößerung des Geschlechtsdimorphismus

Fortpflanzung [Reproduction]
Werbung. Erreichung der Geschlechtsreife -  Fortpflantungsperiode - Zusammenhalt der Geschlechter-  Verlobung- Balz -  Psychische Selektion- Begattung -Nest. Ort der Eiablage - Nestbautrieb-    Nestform-    Standort des Nestes- Baukunst als ererbte Anlage - Baustoffe - Verarbeitung der Baustoffe - Dauer des Nestbaues - Aushöhlen von Holz und Erdreich - Benutzung von Ameisen- und Termitenbauten [use of termites and termite nests] -  Fehlen des Nestbautriebes - Wiederbenutzung des alten Nestes  — Bautätigkeit nach Brutbeginn - Organveranderungen zur Nestbau-Zelt — Ei: Eiabläge und Klima -    Eierzahl [number of eggs] — Beziehungen zwischen Eigewicht und Zahl der Eier [Relations between egg weight and number of eggs] — Nachlegen -  Nachgelege —  Brut [brood] — Polyandrie — Legeabstand - Bebrütung - Schutzfärbung der Eier -   Anteil der Geschlechter am Brutgeschäft  — Bebruting durch beide Gatten - Ablösung beim Brüten — Verständigungsmittel der Gatten  - Triebhandlungen im Dienste der Brutsicherung — Bebrütung durch nur einen Partner -      Brutflecke - Kompensation mangelnder Brutflecke — Bebrütung über die normale Brutdauer hinaus  -   Schlüpfakt  — Jungenpflege  — Verhalten der Nesthocker - Verhalten der Nestflüchter -   Zusammenhalt der Familien  — Geselliges Brüten -  Polygynie — Ehelosigkeit — Geselliges Leben der Pinguine  — Erbrütung der Eier durch Bodenwarme - Brutparasitismus — bei Cuculiden — Färbunganpassung der Kuckuckseier — Größenanpassung der Kuckuckseier — bei Icteriden - bei Ploceiden — bei Indicatoriden  — bei Heteronetta  — Rasche Embryonalentwickelung der Brutschmarotzer — Schädigung der Wirtsvögel.

Lebensdauer [life spans]

Tag- und Nachtvögel [day and night birds]

Ernährung [nutrition]
Nahrung — Nahrungswahl — Nahrungsaufnahme — Erweiterung von Spalträumen — Bogenförmige Schnäbel  — Zusammenspiel von Schnabel und Zunge — Mundwerkzeuge der Nektarsauger — Saugakt — Ornithophile Blüten — Zungen-apparat der Spechte — Nahrung der Spechte — Mundwerkzeuge der körnerfressenden Passeres — Mundwerkzeuge der Papageien — Jagd auf fliegende Beutetiere — Nahrungsaufnahme bei den Raubvögeln— Scharren — Nahrungsaufnahme aus dem Wasser — Vorrat-Sammeln — Zerkleinerung der Nahrung durch Zerrupfen oder Zerschlagen — Prüfung der Nahrung mit dem Geschmackssinn — Tastsinn im Bereich der Mundwerkzeuge — Bildung des Werkzeuges nach dem Bedürfnis — Ausnutzung der Nahrung. Zellulosereiche Nahrung  — Darmbakterien — Fleischnahrung — Gallen-farbstoffe - Resorption und Anbau pflanzlicher Farbstoffe — Endozoische Samen-verbreltung durch Vögel

Stoffwechsel und Energiewechsel [Metabolism and energy metabolism]
Chemie des Eies — Zusammensetzung des Dotters — Zusammensetzung des Eierklars — Zusammensetzung von Schalenhaut und Kalkschale — Stoffwechsel des Embryo— Stoffwechsel des Erwachsenen. Erhaltung des ernährungsphysiologischen Gleichgewichts — Mineralstoffwechsel — Eiweißabbau und Harn — Grundumsatz und Leistungszuwachs — Periodisches Schwanken des Fettansatzes — Stoffwechsel im Hunger — Hungerresistenz und Körpergröße — Wasserhaushalt.

Bewegung [Movement]
Bewegungen  der  Wirbelsäule —  Brustwirbelsäule —  Halswirbelsäule — Schwanzwirbelsfiule — Bewegungen der hinteren Extremität — Schlafstellung - Ortsbewegung: Laufen und Hüpfen — Gang  — Ortsbewegung der Schwimmvögel auf festem Boden — Klettern — Bewegungsform und Bauplan — Längenverhältnis der Zehenglieder — Längenverhältnis von Lauf und Unterschenkel — Bewegungen der vorderen Extremitat — Flügelskelett — Schultergürtel — Schultergelenk — Ellenbogengelenk — Handgelenk — Gelenke zwischen Mittelhand und Fingern — Flügelmuskeln — Muskeln zur Bewegung des — Muskeln zur Bewegung des Vorderarms — Muskeln zur Bewegung der Mittelhand und der Finger — a) Ursprung am Oberarm — b) Ursprung am Vorderarm — c) Ursprung am Metacarpus — Flügelfläche — Propatagium und Metapatagium — Schwungfedern — Bau der Schwungfedern — Spannung der Schwungfedern — Wirkung des Luftdruckes an den Federstrahlen — Erteilung des Vortriebes — Flug: Ruderflug — a) große Vögel — b) kleine Vögel — Flügeltypen — Hubflügel — Schnellflügel — Schwebeflügel — Zahl der Flügelschläge — Zusatzbelastung — Hüpfender Flug — Schwebeflug der Kleinvögel — Gleitflug — Schwirrflug — Rütteln — Flugleistung — Flugarbeit — Ausnutzung der Windkräfte  — Statischer Segelflug — Dynamischer Segelflug  — Änderung der Höhe — Änderung der Richtung — Abflug — Landüng — Aufgaben des Schwanzes — Verlust des Flugvermögens — Schwimmen — Tauchen — Fußtaucher — a) Kormorane — b) Podiceps, Colymbus, Tauchenten — Flügeltaucher — Wechselbeziehungen zwischen Tauch- und Flugvermögen — Tauchleistungen

Tonerzeugung [Sound generation]
Syrinx und Trachea als Zungenpfeife — Akustik der Zungenpfeifen — Akustik der immwerkzeuge der Vögel — Wirkung der Stimm-Muskeln — Paarige und unpaarige Stimmapparate — Veränderung des von den schwingenden Membranen erzeugten Tones — Resonanzapparate — Biologische Bedeutung der Stimmlaute — Instrumentalmusik

Geographische Verbreitung [Geographical distribution]
Alter des Vogelstammes, der Arten und Rassen — Arten-Zahl — Ausbreitungsschranken
— Verbreitungsmittel — Räumliche Sonderung der Populationen als Vorbedingung der
Artenvermehrung — Artvermehrung als Folge ökologischer Umstellung — Diskontinuierliche Verbreitung als Ergebnis erdgeschichtlichen Geschehens — Regionale Verbreitung der Vögel

Wanderungen [Migration]
Ökologische Ursachen der Wanderungen — Winteraufenthalt — Dauer des Aufenthaltes im Überwinterungsgebiet — Ökologische Ansprüche an das Winterquartier — Traditionelles Festhalten am Winterquartier — Beziehungen zwischen Urheimat und Winterquartier — Überwandern südlicher Populationen durch nördliche — Unbeständige Lage der Winterquartiere — Winteraufenthalt der Albatrosse — Räumliche Ausdehnung des Überwinterungsgebietes — Wanderwege — Ökologisch begründete Umwege — Schleifenförmiger Zugweg — Historisch begründete Umwege — Verlassen der traditionellen Zugbahnen — Breite der Zuggebiete — Leistungen: Beispiele für lange Wanderwege — Beispiele für lange Flugstrecken — Häufigkeit und Dauer der Rasten — Vergleich der täglichen Flugleistungen während der Brutzeit und der Zugzeit — Energiequellen — Orientierung — Optische Orientierung — Flug in großen Höhen — Richtungssinn — Richtungsgefühl und Richtungstrieb — Andressiertes Richtungsgefühl — Artgedächtnis — Steigerung der Orientierungsfähigkeit durch Selektion — Verdriftung — Aufsuchen neuer Brutgebiete — Veranlassung zum Aufbruch — terscheidung zwischen Wettervögeln und Instinktvögeln — Verkettung von Zugtrieb und r;pflanzungszyklus — Zusammenhänge zwischen Zugtrieb und endokrinem System — Beeinflussung des Zuges durch meteorologische Faktoren — Windrichtung — Beziehungen zwischen Zugzeiten und Dauer des Fortpflanzungszyklus — Beziehungen zwischen Zugzeiten und Länge des Wanderweges — Veranlassung zur Einstellung der Wanderung — Trennung nach Alter und Geschlecht — a) im Herbst — b) im Frühjahr —Geselliges Wandern —Zug und Mauser — Stammesgeschichtliches Alter der Zugvögel

Parasiten [Parasites]
Vermes: Trematoden — Cestoden — Nematoden — Acanthocephalen — Pentastomiden — Arthropoden: Acari — Flöhe — Wanzen — Fliegen — Mallophagen

Stammesgeschichte [Evolutionary history]

Stresemann's work is also well illustrated and it makes use of graphs to show how conclusions were arrived at. For instance there is a graph that shows the numbers of male and female larks collected at Danish lighthouses which points to protandry in Spring migration. It clearly was a truly illuminating and broad overview of ornithology in the 1930s and one that was widely appreciated. It is unclear if Salim Ali went through the contents of this work. The only major biography of Stresemann is by Jürgen Haffer, Erich Rutschke and Klaus Wunderlich - all three of whom are no more. Their biography includes several interesting sections but the ones that stand out are by Haffer and include scientometric approaches to examining the life and work of Stresemann. Unfortunately most of the book is in German and there is only a short summary in English. Haffer provides a chronological view of Stresemann's research focus over time using a graphical timeline.

A chronology of Stresemann's research focus from Haffer et al., 2000.
Haffer's phylogeny of avian taxonmy

Photo: Z thomas (Creative Commons)

Haffer notes that one of Stresemann's major activities was his review of literature and I think this kind reflective approach is especially important to the development of any field. It is clear that this showed the direction for further research for ornithology in Germany. The fact that Germany was at the forefront of ornithology can also be noted by the persistence of many technical terms from German that are still in use in ornithology like zugunruhe (or migratory restlessness). In fact it was Stresemann who coined the German word "einemsen" in 1935 for describing the then undocumented behaviour of birds anointing themselves with live ants. Salim Ali who was clearly in touch with Stresemann at that time found a suitable English verb for it as "anting" in a note published in the Journal of the Bombay Natural History Society - a word that has stayed ever since in the English ornithologist's dictionary.

The correspondence archive** at the Museum of Natural History in Berlin has only two letters from Ali to Stresemann [Reference: S IV Nachl. Stresemann/Akte Salim, A.; MfN d. Hub, HBSB]. One written (typed) on 24 July 1964 is in response to a letter of condolence to Salim Ali on the death of Loke Wan Tho. The other (29 April 1966) is a bit of an apology for not studying the moults of birds:

33 Pali Hill, Bandra
29 April 1966

Dear Stresemann,

Many thanks for the prompt reply to my query about age, moult, and leg colour in Philomachus. This clarifies the position nicely.

I feel guilty and unhappy not to make fuller use of the exceptional opportunities one gets for studying moults etc. when handling such large numbers of birds for ringing. But unless we can have a much larger team of helpers in our migration study camps than circumstances permit - including some devoted entirely to moult study - this is very difficult. We have to collect arthropod parasites and blood samples from the birds for virus studies, and the various operations connecting with ringing - measuring, weighing, etc - use up all the time and facilities available. How, and for how long, to detain the birds during and after all these operations without harming them, when several hundred birds have to be dealt with under more or less alfresco conditions is another problem. All the same it seems a great pity that such wonderful opportunities cannot be more fully utilized!

With warmest regards, Yours ever
As can be seen from the graph of Haffer, Stresemann really moved into the study of moult towards the 1960s and until the end of his life. He was greatly aided in his research on moult by his (second) wife Vesta, an ornithologist in her own right about whom rather little has been written. Salim Ali notes in his autobiography that Stresemann was his guru and that he routinely wrote enquiries to which detailed replies would be sent without fail but in a difficult cursive handwriting. Perhaps someone can find the archives of Ali's letters and see what is to be learnt there. Ali notes that Stresemann was warm and welcoming in his letters even before he met him, a reason for Ali to choose Berlin over the British Museum. He also wondered how Stresemann managed to keep up with his correspondence given the number of people who wrote to him.

I suspect that a reflection on the state of knowledge of Indian birds with respect to their patterns of moult will not be particularly uplifting but reflect we must. The maintenance of a system of privileges (most often passively by not fighting against privilege) for a few ringers will ensure the poverty of local expertise that still continue.

The entrance to Waldfriedhof Dahlem (4 April 2017)
That afternoon, I went round to Waldfriedhof Dahlem (the Dahlem forest cemetery) to look for Stresemann's grave - which curiously is shared with that of his guru Ernst Hartert. It must be the only tombstone shared by two unrelated ornithologists. Actually Stresemann had wished to be beside his mentor after his death and was cremated with the ashes interred into the grave of Hartert. The grave is maintained by the Berlin district but despite weaving through the blocks, I failed to spot it!

* Ali's early ringing in India included field assistance from the Swiss ornithologist Alfred Schifferli (1912–2007, for a biography in German see - apparently Schifferli's namesake father essentially founded Swiss ornithology and a son Luc also continued in the same field) - there is a mention in Zafar Futehally's auto-biography of a field assistants who had grouped the the three bird-ringers as the three "Alis" that included "Schiffer-Ali" !
** Salim Ali evidently gifted about 200 bird specimens to the collection of the Berlin museum, the species list suggests that it was mostly from peninsular India.


Wikimedia Foundation invited me to attend the Wikimedia Conference at Berlin. I visited the archives of the Museum für Naturkunde on the 4th of April and the library on the 5th of April 2017. Thanks are due to Dr Sabine Hackethal and Sandra Miehlbradt, archivists at the Museum of Natural History Berlin for tracing the correspondence between Ali and Stresemann and for allowing their contents to be shared here. Thanks are also due to Martina Rißberger, librarian at Museum für Naturkunde Berlin for access to the Handbuch der Zoologie 7-2 and the biography of Erwin Stresemann. Thanks also to Kalpana Das for assistance.

One of the reasons for posting this is to point out that Indian scientists and amateurs alike have a rather narrow view of the field of ornithology. In fact at a meeting to consider founding an ornithologists association many big names were asked if poultry came under ornithology and those present decided that their field and organization should restrict themselves to the study of wild birds. Even today bibliographic compilations on India routinely skip references to parasitology, ethno-ornithology, paleontology, molecular biology, behaviour, biomechanics and a host of other areas while tending to focus on bird records and regional avifaunal lists - the last was one of the things that Stresemann explicitly banned from the Journal fur Ornithologie during his editorship

Tech News issue #38, 2021 (September 20, 2021)

00:00, Monday, 20 2021 September UTC
previous 2021, week 38 (Monday 20 September 2021) next

weeklyOSM 582

09:47, Sunday, 19 2021 September UTC


lead picture

Castle Dossier Map Switzerland [1] © IFS Geometa Lab | map data © OpenStreetMap contributors

Mapping campaigns

  • The Swiss OSM Project of the Month for September is the mapping of electrical vehicle charging stations (de) > en .


  • A long thread on the Swiss mailing list, split over several months, discussed how to map the border between Switzerland and Italy around Monte Generoso, where the precise line is uncertain. Earlier discussions occurred in July and August.
  • John Stanworth wants to further improve his mapping of smoothness=* and mtb:scale=* and asked other cycling contributors for comments on his views.
  • Hiddenhausen asked (de) > en , on the German forum, what the use of the tag landuse=street_green might mean. It appears to be an ad hoc attempt to refine the use of landuse=grass for grassy highway verges and medians.
  • Voting is underway, until Saturday 25 September, for headlight=* to reflect the legal requirement on some roads to use your headlights.
  • SK53 clarified with examples that, in his opinion, there is widespread misuse of the sac_scale=* tag for real alpine climbing routes, which the Swiss Alpine Club grades with a completely different scale.
  • User Koreller asked for feedback on their contribution a guide to mapping North Korea available on the OSM wiki.


  • Amanda McCann shared with us what she did in OpenStreetMap during August 2021.
  • Numerous OSM contributors have received ‘friend’ requests through the OSM website which appear to be phishing attacks. Not surprisingly this has received comments on the German forum (de) > en
    and OSM subreddit.


  • Marius David (marius851000) intends (fr) > en importing open data about restaurants in the Pays de la Loire (France), and he is seeking inputs on how to do it before starting.

OpenStreetMap Foundation

  • Simon Poole justified his insistence that the OSMF register trademarks, whilst he served on the LWG, by referring to a recent trademark debacle involving PostgreSQL. A Spanish not-for-profit registered ‘PostgreSQL’ as a mark in Spain, and has recently applied for broader coverage in the European Union and USA. The latter application was recently abandoned, but the ‘PostgreSQL Community’ is still live.


  • The UN Mappers team has announced the launch of an internship programme. Several open positions relate to OSM in mapping and feature extraction, social media and communication, geospatial analysis, and map visualisation and design. The internships are fully remote or in-person at the UN duty stations in Brindisi (Italy) and Valencia (Spain), and are open to all nationalities. These internships are an opportunity to work in an international environment, gaining experience and developing high skills in the humanitarian field. The deadline to apply is Saturday 9 October.

Humanitarian OSM

  • The monthly HOT Tech Working Group meeting has been put on hold. The hot_tech team has convened a number of working groups specific to individual technologies or projects such as the OSM Galaxy Project. Anyone who wishes to participate in these is requested to fill out a form.
  • HOT announced the appointment of Dr Ibrahima Cisse as Director of the Western and Northern Africa Open Mapping Hub.
  • The Open Mapping Hub Eastern and Southern Africa (ESA), organised by HOT, announced a range of Open Mapping grants. Expressions of interest are invited immediately, with formal proposals expected during October.
  • The HOT Quality Control and Assurance Working Group have developed and released the ‘Problem User Escalation Document’, which outlines the steps in escalating data quality problems encountered with users (mappers/validators).


  • [1] The Institute for Software at the Technical University of East Switzerland (IFS OST) invites (Video (en)) us to check out their video regarding mapping technology for the Castle Dossier Map The video forms part of an entry for the Prix Carto of the Swiss Society of Cartographers.
  • Jaisen Nedumpala reported on how he used OSM to help map a 3 km buffer zone during the recent Nipah virus outbreak in Kerala (India), with the personal support of Heinz_V from Germany and OSM contributors from Kerala.


  • Sustrans, the UK cycling infrastructure charity, asked for local input for improving a national cycle route in the city of Durham. Gregory Marler’s (user livingwithdragons) advice, given somewhat tongue-in-cheek, was to switch to OSM.


  • The momepy Python library for analysing urban form is now at version 0.5.0. Of particular interest is the addition of the COINS algorithm for classifiying street hierarchies.

Did you know …

  • … what the charging regime is for using the HOT Tasking Manager? Also known as the Tasking Manager Sustainability Model.
  • … there is a page on the OSM Wiki for noting sites that use OSM but don’t provide correct attribution?
  • … that OpenCage post geographical trivia via Twitter on the last Friday of each month (hashtag #fridaygeotrivia)?

Other “geo” things

  • The Colorado Department of Transport falsely changed the status of an open road to closed, not only on their own site (albeit briefly), but directly on Google, Waze, TomTom and AppleMaps. A major highway Interstate-70 was actually closed due to mudslides, causing an increase in traffic on a minor road. The fictitious road closure was meant to discourage further traffic build-up.
  • Using the example of the ‘Chinese Name of Lidl’ Yunus reflected on the challenge for western companies to select a Chinese name for their brand.

Upcoming Events

Where What Online When Country
OSM Africa Monthly Mapathon: Map Malawi osmcalpic 2021-09-04 – 2021-10-04
Karlsruhe Karlsruhe Hack Weekend osmcalpic 2021-09-17 – 2021-09-19 flag
Anderlecht Software Freedom Day osmcalpic 2021-09-18 flag
Nantes Journées européennes du patrimoine 2021, Nantes osmcalpic 2021-09-18 flag
Grenoble Atelier OpenStreetMap – retrouvailles et initiation ! osmcalpic 2021-09-20 flag
Lyon Rencontre mensuelle Lyon osmcalpic 2021-09-21 flag
Bonn 143. Treffen des OSM-Stammtisches Bonn osmcalpic 2021-09-21 flag
Berlin OSM-Verkehrswende #27 (Online) osmcalpic 2021-09-21 flag
Lüneburg Lüneburger Mappertreffen (online) osmcalpic 2021-09-21 flag
DRK Missing Maps Online Mapathon osmcalpic 2021-09-23
[Online] OpenStreetMap Foundation board of Directors – public meeting osmcalpic 2021-09-24
Düsseldorf Düsseldorfer OSM-Treffen (online) osmcalpic 2021-09-24 flag
Amsterdam OSM Nederland maandelijkse bijeenkomst (online) osmcalpic 2021-09-25 flag
FOSS4G 2021 Buenos Aires – Online Edition osmcalpic 2021-09-27 – 2021-10-02
Bremen Bremer Mappertreffen (Online) osmcalpic 2021-09-27 flag
Grenoble Mapathon Missing Maps – Cartographier des cartes humanitaires sur un mode collaboratif et libre. osmcalpic 2021-09-28 flag
San Jose South Bay Map Night osmcalpic 2021-09-29 flag
Bruxelles – Brussel Virtual OpenStreetMap Belgium meeting osmcalpic 2021-09-28 flag
okres Žilina Missing Maps mapathon Slovakia online #4 osmcalpic 2021-09-30 flag
京田辺市 京都!街歩き!マッピングパーティ:第26回 Re:一休寺 osmcalpic 2021-10-02 jp
Greater London Missing Maps London Mapathon osmcalpic 2021-10-05 flag
Landau an der Isar Virtuelles Niederbayern-Treffen osmcalpic 2021-10-05 flag
Stuttgart Stuttgarter Stammtisch (Online) osmcalpic 2021-10-05 flag
Hlavní město Praha Online validation mapathon osmcalpic 2021-10-07 cz
Nordrhein-Westfalen OSM-Treffen Bochum (Oktober) osmcalpic 2021-10-07 flag
UN Mappers: MaPathon – le Università a servizio della cooperazione internazionale osmcalpic 2021-10-08
Berlin 160. Berlin-Brandenburg OpenStreetMap Stammtisch osmcalpic 2021-10-08 flag

If you like to see your event here, please put it into the OSM calendar. Only data which is there, will appear in weeklyOSM.

This weeklyOSM was produced by Nordpfeil, PierZen, SK53, TheSwavu, arnalielsewhere, derFred.

Todd Siegel joins Wiki Education’s Advisory Board

15:46, Friday, 17 2021 September UTC
Todd Siegel
Todd Seigel
Image courtesy Todd Seigel, all rights reserved.

Todd Siegel, product designer, prototyper, and wordsmith, has been appointed to Wiki Education’s Advisory Board, which is focused on growing our network and generating new revenue.

“I’m excited to join Wiki Education’s Advisory Board to help expand its community, and further educate students about the full range of Wikipedia’s powers,” Todd says.

Todd has more than 15 years of experience serving startups as an independent contractor and advisor. He educates a wide range of audiences on rapidly expressing product ideas with prototypes that look and feel real — without needing to code. He gave live prototyping presentations at Xerox PARC, Copenhagen Institute of Interaction Design, and Product School, plus hackathons ranging from Cisco to Cedars Sinai Medical Center.

He played a pivotal role in designing and evangelizing, the leading web based tool for prototyping apps without coding, used by more than 400,000 people.

As a wordsmith, Todd co-founded the literary series Word Performances, and performed his poetry in over 20 shows including the Litcrawl literary festival, and was called the ee cummings of San Francisco tech culture.

I’m thrilled to add Todd’s experience, skillset, and connections to the San Francisco community to Wiki Education’s Advisory Board.

By Muniza A, Isaac Johnson, and Martin Gerlach

What articles are most likely to lead readers to the Astrology article (Hint: the clicks are Fast and Furious.)? What topics are readers most interested in after reading about velociraptors, and how do those topics change across languages (For example, do readers only go to English Wikipedia to learn about the Fighting Dinosaurs fossil)? 

These are just some of the questions that can be answered by the Wikipedia clickstream, a publicly available dataset that shows how readers in Wikipedia get to an article and where they go from there. Wikipedia clickstream consists of weighted (source, destination) pairs extracted from the internal pageview logs of Wikipedia. This data is maintained in 11 languages and is updated every month, thanks to Wikimedia’s Data Engineering team. Analyzing this data is a multi-step process that involves downloading it, getting familiar with its structure, finding the right tools and methods to process it, and choosing appropriate visualizations for it.

In order to make answers to questions like those above accessible to everyone, and not just data scientists or folks with programming knowledge, we’ve created a visualization tool called WikiNav. This tool was developed as part of an Outreachy-internship from May to August 2021.

About WikiNav

WikiNav processes the Wikipedia clickstream to generate various visualizations, each of which focuses on a certain aspect of the data. This can help users get to insights faster, eliminating the need for them to crunch the numbers themselves.

  • What paths do readers take to or from a Wikipedia article?

WikiNav generates a Sankey chart that can visualize the top sources and destinations to and from an article and indicates the percentage of views sent or received by each source and destination.

Image credit:, Muniza A, CC-BY-SA 4.0
  • How does the composition of traffic change from month to month?

WikiNav plots an article’s traffic for the current month alongside its traffic from the previous month which can help visualize the changes in the nature of traffic over time.

Image credit:, Muniza A, CC-BY-SA 4.0
  • How does the composition of traffic differ across languages?

Users can also visualize how sources and destinations to and from an article change across different languages. This is done by looking up the top sources and destinations for the current article across clickstream data for selected languages.

Image credit:, Muniza A CC-BY-SA 4.0

For these questions and others, explore your favorite articles at

Technical details

Our aim for WikiNav was to create an interface that was reasonably fast and responsive without requiring extensive computational resources and that could be used by people with varying levels of programming experience.

The project started out in the form of a Jupyter notebook on PAWS during the Outreachy initial contribution round. This posed two issues: 

  • Since all of the data could not be loaded into memory at once, it had to be processed in chunks or be looped over, which was slow. 
  • Published PAWS notebooks are static, so visualizing other articles would still require Python knowledge and a Mediawiki account.

This led to our current setup as described below:


Wikinav API resides on a Cloud VPS instance along with the clickstream SQLite files. We chose SQLite because of its small footprint, hassle-free configuration, portability, and low read latency. Converting a whole clickstream snapshot to SQLite takes a small Python script and a few minutes, and we’ve set up a cron job that does this every time a new snapshot becomes available. Since we’re only concerned with reads here, we did not have to worry about concurrent writes which are serialized in SQLite and might pose a problem if your application is write-heavy. As of now, the WikiNav API provides access to the latest two snapshots of the clickstream data due to limited storage on the VPS.

Requests to the WikiNav API are handled by Nginx which acts as a reverse proxy for Gunicorn. This ensures that requests from and responses to slow clients are buffered and provides added benefits such as production-grade load balancing, caching, and enhanced security. Gunicorn is a WSGI HTTP server that processes requests from a web server such as Nginx and then communicates those requests to Flask. Finally, depending on the URL of the request and the parameters supplied with it, Flask creates an SQL query for the clickstream SQLite files, sorts the results, extracts the requested subset from them, and returns them in the form of a JSON response.


The frontend for WikiNav is hosted on Toolforge. It acts as a dashboard for the clickstream data by generating statistics and visualizations for it and provides features that allow users to interact with and manipulate those visualizations. We chose React for developing the frontend since there is a lot of data flowing between visualizations and this data needs to be updated frequently. React simplifies the management of this state and helps make the process of updating and rerendering the visualizations a lot more efficient. 

The WikiNav app lets users select a language and title of interest and then queries the WikiNav API to get the relevant data for generating visualizations for the user’s selection. It also makes additional calls to other APIs such as the MediaWiki API to obtain contextual information about the title (for example, the corresponding titles in other languages).


Getting our data from multiple APIs meant that we had to make multiple asynchronous HTTP requests for every article. Streamlining those requests, managing the flow of responses across components, and handling errors associated with each source required some refactoring. We also made sure to cache API responses so that two components asking for the same data don’t result in repeated requests and, as a result, higher load times. 

Another challenge was making sure that our API could keep up with the interactivity of our frontend. This required experimentation with different database structures and settings so that we could perform fast lookups on the clickstream data.

Example: comparing languages

To see this setup in action, here’s an example featuring the language comparison charts. 

Each time you add a new language to compare, a handler fires queries to the WikiNav API to get the sources and destinations for the current title in the selected language. It also sends requests to the Wikipedia Langlinks API to get translations for the top sources and destinations in the selected language. Once done, it aligns the results from those API calls to get new datasets for the incoming and outgoing pageviews bar charts, respectively.

Image credit:, Muniza A CC-BY-SA 4.0

Explore further

Ready to explore further? You can check out the WikiNav-tool yourself. In addition, you can see the clickstream data visualized in-situ on Wikipedia articles with this user script that uses WikNav’s API. If you have additional suggestions on how to improve the tool, you can leave feedback on the talk page on meta or comment directly on the GitHub repository containing the source code and documentation. 

Thanks to the Data Engineering and Cloud Services teams for their support around data and infrastructure and to the Outreachy program providing the opportunity for this project as part of an internship.

About this post

Featured image credit: File:Sextante, Acervo do Museu Paulista da USP (6).jpg, José Rosael/Hélio Nobre/Museu Paulista da USP, CC BY-SA 4.0

Improving Asian American journalists’ biographies

15:25, Thursday, 16 2021 September UTC

Wiki Education recently collaborated with the Wikimedia Foundation and the Asian American Journalism Association (AAJA) to host a 6-week training course in order to give AAJA members and others the time and space to learn how to add Asian American and Pacific Islander journalists’ biographies to Wikipedia.

One course participant was Pamela Ng, homepage editor for Fox News Digital and part of the executive leadership program at AAJA. Previously, she worked for the New York Daily News and PIX11 News. She says she joined to improve her research skills and learn more about Wikipedia.  She also wanted to improve the dearth of coverage on Wikipedia of Asian American journalists.

“I was shocked to learn that only 4 percent of English Wikipedia’s biographies of American journalists are of people of Asian descent. Representation is important and I hope my contributions will encourage others to help make Wikipedia content more diverse,” says Ng.

Ng appreciates the vast number of Asian American journalists she learned about while doing her own research to add content to Wikipedia. As part of the course, Ng created the biography of CeFaan Kim, an ABC News correspondent and reporter for WABC-TV in New York City.  With her contributions on a website viewed by millions of people every day, she feels it is especially important that these biographies gain recognition. Ng is currently writing more biographies on Asian American journalists as well as expanding on existing pages in effort to increase Wikipedia’s Asian American representation.

Ng’s hands-on experience in the AAJA Wiki Scholars course increased her confidence in consulting Wikipedia for information, as she’s now familiar with the extensive work that goes into ensuring Wikipedia’s content is high quality and based on reliable sources. She hopes other detractors will soon join the hundreds of millions of people who use Wikipedia, and realize how impactful Wikipedia can be.

“Wikipedia is often viewed as an unreliable source because anybody can contribute to it. If more people learned about what goes into Wikipedia, I think there’d be less hesitancy in using it as a jumping off point for research or other projects,” says Ng.

To take or sponsor a course similar to the one Pamela took, please visit

NEW YORK — In a divided opinion, the Fourth Circuit dismissed an appeal brought by the Wikimedia Foundation, which challenges the National Security Agency’s mass interception and searching of Americans’ international internet communications. The American Civil Liberties Union, Knight First Amendment Institute at Columbia University, and the law firm Cooley LLP represent the Wikimedia Foundation in the litigation, Wikimedia Foundation v. NSA.

Although the court held that Wikimedia had provided public evidence that its communications with Wikipedia users around the world are subject to NSA surveillance, the court went on to hold that further litigation would expose sensitive information about the government’s spying activities — and that the “state secrets privilege” required dismissal of the suit. The court rejected Wikimedia’s argument that the special procedures Congress enacted in the Foreign Intelligence Surveillance Act (FISA) preempt the state secrets privilege and allow the case to go forward. 

“We are extremely disappointed that the court wrongly credited the government’s sweeping secrecy claims and dismissed our client’s case,” said Patrick Toomey, senior staff attorney with the ACLU’s National Security Project. “Every day, the NSA is siphoning Americans’ communications off the internet backbone and into its spying machines, violating privacy and chilling free expression. Congress has made clear that the courts can and should decide whether this warrantless digital dragnet complies with the Constitution.”

At issue in this lawsuit is the NSA’s “Upstream” surveillance, through which the U.S. government systematically monitors Americans’ private emails, internet messages, and web communications with people overseas. With the help of companies like Verizon and AT&T, the NSA has installed surveillance devices on the high-capacity internet circuits that carry Americans’ communications in and out of the country. It searches that traffic for key terms, called “selectors,” that are associated with hundreds of thousands of targets. In the course of this surveillance, the NSA copies and combs through vast amounts of internet traffic. 

“We respectfully disagree with the Fourth Circuit’s ruling. Now more than ever, it is crucial that people are able to access accurate, well-sourced information, without concern about government surveillance,” said James Buatti, senior legal manager at the Wikimedia Foundation. “In the face of extensive public evidence about NSA surveillance, the court’s reasoning elevates extreme claims of secrecy over the rights of Internet users. We call upon the United States government to rein in these harmful practices, and we will continue to advocate for the privacy and free expression rights of Wikimedia readers, contributors, and staff.” 

Judge Diana Gribbon Motz, who dissented from the court’s state secrets ruling, warned that the majority’s opinion “stands for a sweeping proposition: A suit may be dismissed under the state secrets doctrine, after minimal judicial review, even when the Government premises its only defenses on far-fetched hypotheticals.” 

“For years, the NSA has vacuumed up Americans’ international communications under Upstream surveillance, and to date, not a single challenge to that surveillance has been allowed to go forward,” said Alex Abdo, litigation director of the Knight First Amendment Institute at Columbia University. “The Supreme Court should make clear that NSA surveillance is not beyond the reach of our public courts.”

Wikimedia and its counsel are considering their options for further review in the courts.

For more information about the case: 

The opinion is available here: 

Allegra Harpootlian, 303-748-4051,
Lorraine Kenny, 917-532-1623,
Gwadamirai Majange,

Today marks the start of a Heritage Month focused on celebrating the history, culture, and influence of Latinx communities in the United States. 

The official name of the month itself (National Hispanic Heritage Month) is a living example of the power of language — its history and inequities in who controls it, and its impact on the perceptions and identities of people and their communities.

At the Wikimedia Foundation, the nonprofit that operates Wikipedia and its companion free knowledge projects, we know words matter. We are committed to creating an inclusive, equitable living record of history, stories, and contexts. This often includes righting the historical record — and expanding it to include the perspectives of people left out by systems of power and privilege. 

This Latinx Heritage Month — what we have chosen to call this annual celebration — we are expanding this traditionally US-specific commemoration to celebrate the richness of our global Latinx Wikimedia community, while recognizing the work still needed to be done to achieve authentic representation online. 

We invite you to explore the origins of the term Hispanic; consider the legacies of colonization and the impact of language; and to hear firsthand from some of our Latinx Wikimedia contributors around the world on the importance of filling knowledge gaps about Latinx people and topics on Wikipedia and other Wikimedia projects. 

Why “Latinx Heritage Month” 

The term Hispanic commonly applies to countries with a cultural and historical link to Spain. In other words, it applies to countries previously colonized by Spain. In the US, “Hispanic” has become a broad catchall, referring to persons with a historical and cultural relationship with Spain regardless of their race and ethnicity. For these reasons, many have contested the term and flagged its negative connotations and racist undertones

When it comes to describing their individual identities, recent research from Pew reveals that just over half of “Hispanic” and “Latino” people have no preference between the two terms. In some cases, the labels are used interchangeably. Another more recent identity label to emerge is “Latinx.” Although not widely adopted, it is considered a more gender- and LGBTQI-inclusive term — and what we have chosen to use during our celebration this month. 

Latinx content gaps on Wikimedia projects 

Wikipedia and other Wikimedia projects, sadly, do not currently reflect the world’s diversity. This results in a less rich, complex, and accurate picture of our world, its people, and its knowledge on our projects. 

Preliminary data from a recent Foundation survey of people in the US indicates that Latinx people, especially women, are dramatically underrepresented among Wikipedia contributors and readers in the United States. The data show that just 22% of Latinx women feel represented on Wikipedia, and only 31% of Latinx women in the US use Wikipedia. Data from our annual Community Insights Report also shows Latinx people in the US are severely underrepresented in our communities, representing only 5.2% of Wikimedia contributors. 

When it comes to the content represented on Wikipedia at large, we know from the Oxford Internet Institute that there are more Wikipedia articles written about Antarctica than many countries in Latin America.

Perspectives of Latinx Wikimedia contributors 

Nearly 20 years ago, the New York Times said that one day, the name of this month may change to “Colombian-Dominican-Cuban-Mexican-Puerto Rican-and-Other Heritage Month.” Why? Because the Latinx community is not monolithic. It is richly, beautifully complex, made up of an array of different identities, cultures, and experiences.

Our goal is for Wikimedia projects and contributors to reflect this rich diversity. Wikipedia is a mirror of the world’s biases — to deliver on our commitment to knowledge equity, we must address barriers that prevent people from both accessing and contributing to free knowledge.

To shed light on our efforts to do just that, we interviewed five Latinx Wikimedia contributors on their experiences in our movement, why they are committed to closing knowledge gaps, and what they want people to know about their heritage:

Carmen Alcázar

Carmen Alcázar (User:Wotancito) is a member of Wikimedia Mexico and a new Wikimedian of the Year 2021 Honourable Mention winner. She started the Editatona project to increase gender diversity on Spanish Wikipedia in 2015, which has since grown to host 60 events in Latin America. 

Mónica Bonilla

Mónica Bonilla-Parra (User:Mpbonillap) is on the board of Wikimedia Colombia. She is a linguist and researcher who uses Wikimedia projects to preserve and promote the culture and histories of Indigenous communities in Latin America. She coordinates the Wayuu Digital Project of ISUR and Fundacion Karisma, supporting media literacy processes in schools of the Wayuu community in the Colombo-Venezuelan Guajira. 

Carla Toro

Carla Toro Fernández (User:Soylacarli) works with Wikimedia Chile to host editing events on Wikimedia projects to improve content on gender, human rights, culture, science, heritage, and more. She also edits Wikipedia in a volunteer capacity, watching for vandalism and verifying information. On Wikidata, she does data quality control and uses queries to identify content gaps. 

Chola fashion in Gran Poder

User:carlillasa is a member of the Wikimedistas de Bolivia user group. She writes Wikipedia articles about Bolivia, uploads photos, and gives editing workshops in collaboration with fellow volunteers. One of the first articles she wrote was on her favorite Bolivian novel, Intimas, by author Adela Zamudio. 

Selene Yang, who works on the DEI team at the Wikimedia Foundation, is a co-founder of Geochicas, a group of women who work to close the gender gap in the OpenStreetMap community and also works towards bridging the mapping community with the Wikimedia community. She has also led edit-a-thons for the Art+Feminism initiative with TEDIC, a digital rights defender organization in Paraguay, to produce historiographic reviews on the roles of women in the construction of the modern Paraguayan state, and raise awareness of the importance of Wikipedia for the restoration of collective memory, respectively. 

Why should people care about filling knowledge gaps about Latinx people and topics on Wikipedia and other Wikimedia projects?

  • “As on Wikipedia, we need all versions of history. What we write, do, share is not complete without the vision of women, of the global south, of postcolonial realities, of dissent. We have to ensure that there are seats for women. We have to commit to them having a good experience in our space.” —Carmen Alcázar
  • “The history of Colombia and Colombians on Wikipedia has been told and narrated from places other than Colombia, a situation that generates many biases in the information, but that we can change by involving more Colombians in the projects, in the communities and in their construction. To the extent that we involve more people, more voices, more languages, we will truly fulfill the mission of the Wikimedia movement: to empower and encourage people around the world to gather and develop neutral educational content under a free content license or in the public domain, and to disseminate it effectively and globally. Ultimately, closing the gaps in content, participation, and representation will strengthen and grow the community of volunteers, who make the community exist, continue, and advance.” —Mónica Bonilla-Parra
  • “I feel this is very important because people access the internet — and particularly Wiki projects — to find information and to know more about a subject. So, what happens when the information is simply not there, or when the information provided is shown from an outsider’s perspective? It’s crucial for content about Latin American issues and people to be written from a local point of view, to avoid stereotypes. Furthermore, in these times when the internet is the place where we preserve our history, the fact that there are information gaps on Latin American topics makes us invisible and keeps us out of history. That is basically what information gaps do these days, they leave you out of history, which is unacceptable.” —Carla Toro Fernández
  • “We are not represented on the Wikimedia projects, which are the window to knowledge on the internet. It’s difficult for the rest of the world to understand 1) how complex and diverse our reality is, and 2) we, ourselves, can understand the diversity of the region that we live in. I believe it’s fundamental to be able to go on Wikipedia and see a photo of your city, a photo of your favorite regional dish, an article about your favorite national author. We need to create content for and by us, to not have to feel like orphans of the internet anymore.” —User:Carlillasa
  • “History is always told by those who have the privilege of narrating it; however, the struggle for the living memory of people, collectives and communities is what becomes invisible through epistemic injustice. This has its foundations in the systems of oppression that emerge in the face of any form of disruption of the established order. Closing the gap in the production of knowledge about Latin America not only breaks down the material and symbolic barriers on access to information and the visibility of our memory, but also empowers, from the recognition of ourselves, those of us who historically have not been able to tell our own story.” —Selene Yang

What is one thing you wish people knew about your community, culture, or history?

  • “There are many annual festivals in my country, but the one that I like the most for its cultural importance and its high importance to the family is the Day of the Dead—imagining that on that day my grandma comes to my house for a coffee with milk and a pan de muerto makes me smile. It’s a bit difficult to understand outside of Mexico, but that’s what Wikipedia is for.” —Carmen Alcázar
  • “In Colombia, there are currently about 68 Indigenous languages that have been affiliated to 13 different linguistic families. Wikipeetia is the Wikipedia in Wayuunaiki, a project that has been built by the Wayuu people, who are located in La Guajira Colombo-Venezonala (the ancestral territory of the Wayuu people).” —Mónica Bonilla-Parra
  • “The truth is that I’d like for them to know about so much!  Our history is composed by our many native civilizations who have diverse cultures, traditions, and histories of their own. There is a Wikipedia category titled Culture of Chile, where articles on Chilean culture —from Chilean tea culture to the article about cantineras, female soldiers who fought in the War of the Pacific.” —Carla Toro Fernández
  • “I would like for the world to know that Bolivia is a very diverse country and that all of its social and cultural representations (from the most popular to the most academic) are worthy of attention and respect. In that sense, I consider the article about Bolivian gastronomy and all the articles that have been created lately about food in Bolivia to be a valuable testament of not just the culinary diversity of my country, but also the cultural processes linked to food from prehispanic times, through the colonial times up until our current globalized reality.” —User:Carlillasa
  • El Güeguense is one of the first plays in America translated from Nahuatl into Spanish. It satirically represents through music, dances, and dramaturgy the convergence between Indigenous cultures and their relationship with the Spanish conquest. It is the force of comedy and wit in protest against the tragedy of the conquest. Currently the play comes to life during the patron saint festivities of my hometown city of Diriamba, Nicaragua.” —Selene Yang

What motivates you to contribute to Wikimedia projects?

  • “In addition to contributing to a greater common good, what motivates me most is that there is so much more to write. … At every opportunity, the story of an incredible woman whose trajectory has been overturned by the patriarchy jumps onto my edit list, so it renews my energy to keep doing this. I stay motivated even if not everything goes well and the attitudes of other male Wikipedians are not appropriate, although sometimes after organizing events and all that entails, there are still people in 2021 who, despite the explicit and clear rules of the projects, still think of Wikimedia projects that do not correspond to the world we live in.” —Carmen Alcázar
  • “The collective construction of humanity’s knowledge. I am passionate about understanding other ways of learning, teaching and building the world, and that is why I have worked and built projects with invisible communities, not only on the Internet but in society. I am also a fan of languages and technology and in Wikimedia I find a special place where my profession, my passion and my motivation connect.” —Mónica Bonilla-Parra
  • “I work in the field of science, where data is almost always kept behind paywalls that prevent people from accessing this information. The Wiki ecosystem changed this by making knowledge accessible to anyone with an internet connection, putting it at a click’s reach. Another thing that motivates me is the fight against fake news, and since everything in Wikipedia needs to have a reliable source, I feel it is the perfect place where trusted information can be found and used to counter the fake information that is generated around some issues, as was the case this last year with vaccines.” —Carla Toro Fernández
  • “It is important to me that my country, with all of its diversity, is well-represented in Wikimedia. Also, I like in general that articles are well written.”  —User:Carlillasa
  • “Currently I contribute more directly with the Openstreetmap community through the Geochicas collective; however, our projects are also intertwined with Wikipedia. For example, the Streets of Women initiative seeks to generate a visualization where you can count the nomenclature of city streets according to their gender and if the streets named after a woman have an article in Wikipedia. This initiative has led us to generate meetings, editatonas, and workshops to find those women that the public sphere has left out of history. The most motivating thing about these shared learning processes is to recognize the relevance of the relationships between communities and how we all somehow find ourselves fighting for the same goal, such as greater participation and representation of women both in the world’s largest encyclopedia (Wikipedia) as well as in today’s most important open and collaborative geographic database (OpenStreetMap).” —Selene Yang

Jorge Vargas is Senior Regional Partnerships Manager at the Wikimedia Foundation. You can follow him on Twitter at @jorgeavargas.

Improving Wikipedia’s coverage of OER

16:06, Wednesday, 15 2021 September UTC
Virginia Clinton-Lisell
Virginia Clinton-Lisell
Image courtesy Virginia Clinton-Lisell, all rights reserved.

For being the world’s largest open educational resource (OER), Wikipedia’s coverage of OER-related topics left something to be desired. That’s why Wiki Education collaborated with the GO-GN Global OER Graduate Network and the Hewlett Foundation to run two Wiki Scholars courses aimed at improving Wikipedia’s coverage of OER, broadly defined.

The call for participants snagged the attention of Virginia Clinton-Lisell, an associate professor of educational foundations and research at the University of North Dakota. Virginia’s one of the primary researchers at her institution’s Open Education Group, so she was a natural fit for the course.

“I think often Wikipedia is scoffed at because ‘anyone can edit and write,'” Virginia says. “But the process of learning how to edit and write is quite involved and there are very clear criteria. It was excellent to be taken step by step through everything and get feedback before making my changes live.”

During our Wiki Scholars courses, participants like Virginia work with the course instructor and training materials to learn about the steps involved in adding new content to Wikipedia. The aim is not only to teach participants how Wikipedia works, but also to give them the time and space to make a tangible impact to Wikipedia and the readers who come to learn about these topics.

Virginia improved the article on open textbooks because it’s the primary area of her research. Thanks to her additions, when someone comes to learn about open textbooks, they’ll see that while commercial textbooks produce no difference in learning performance compared to open textbooks, the costs continue to increase. Perhaps making this information more accessible to the public—like school administrators—will help increase further adoption of open textbooks.

During the course, Virginia also added information about North Dakota legislation to the policy section of the article on open educational resources.

“I liked getting to write about North Dakota’s legislation (even though it was a small addition) just because I’m excited about the initiatives the legislators have passed here,” she says. “I really hope that people who use Wikipedia to learn about OER realize that this movement is big and well researched.”

The course served another purpose for Virginia: It inspired her to incorporate Wikipedia editing into the courses she teaches, using Wiki Education’s Wikipedia Student Program. This fall, her introduction to the foundations of education students will further improve Wikipedia’s coverage.

“I realized that having my students edit Wikipedia would be a fantastic way to have them actively be involved as creators of Open Educational Resources,” she says. Having students create OERs — often called Open Educational Practice — is a hallmark of Wiki Education’s programmatic activities, and we’re thrilled when our Wiki Scholars alumni see the value in their own learning experience and choose to pass this on to their students.

Interview conducted by Reema Haque. Hero image credit: MatthewUND, CC BY-SA 3.0, via Wikimedia Commons; image of Virginia courtesy Virginia Clinton-Lisell, all rights reserved.

Wikimedia Projects & AI Tools: Vandalism Detection

10:20, Wednesday, 15 2021 September UTC

There is a machine learning service available to interested Wikimedia projects and communities called ORES. It aims to recognise if an edit, for instance on Wikipedia, is damaging or done in good faith. Of course, false predictions cannot be avoided and thus remain a major risk. Here’s how we try to handle it.  

ORES: A system designed to help detect vandalism

ORES (Objective Revision Evaluation Service) is a web service and API that provides machine learning as a service for Wikimedia projects and is designed to help automate critical wiki-work – for example, vandalism detection and removal. In practice it aims to help human editors, in this case patrollers (volunteers who review others’ edits), identify potentially damaging edits. Importantly, the decision whether an edit is kept or reverted isn’t made by the algorithm, it always remains with the human patroller. 

In order to make a prediction about edits, ORES looks at the edit history across Wikipedia and calculates two general types of scores – “edit quality” and “article quality”. In this post, we will focus on the former for the sake of simplicity. 

“Edit Quality” Scores

One of the most critical concerns about Wikimedia’s open projects is the review of potentially damaging contributions. There’s also a need to identify good-faith contributors – who are inadvertently causing damage – and offer them support. 

In its most basic functionality, the ORES machine looks at the history of edits on Wikipedia and assumes that most damaging edits have been reverted rather quickly by human patrollers, whereas good faith edits stay untouched longer. Based on this, ORES gathers statistical data on edits, which it then groups into “features” – things like “curse words added”, “length of edit”, “citation provided”, or “repeated words”. 

These features, the system assumes, help assess whether an edit is made in good faith or damaging. ORES makes a prediction and presents it to a human for further consideration. It is important to emphasise again and again what such machines do: predictions based on assumptions. The results might change considerably if were to use different features, which takes us to the next important functionality: feature injection.

Features: The numbers that let machines recognise correlations

The flow of a diff to features to a machine prediction is shown visually. Author: EpochFail, License: CC-BY-SA via Wikimedia Commons
A Wikipedia edit is analysed by the machine using features to predict its quality.

In the pictured example, features of an edit are measured (e.g. “words added” and “curse words added”), because under some circumstances, they correlate with damaging edits. A machine learning algorithm can use features like these to identify the patterns that correlate with vandalism and other types of damage in order to do something useful—like help with counter-vandalism work.

Features are how machines see the world. If some characteristic of an edit suggests that the edit is vandalism, but that characteristic is not captured in any of the features that are measured and provided to a machine learning algorithm, then the algorithm cannot learn any patterns related to it. 

Feature Injection

Features depend largely on human input and data available. They are thus, just like data and human history, prone to biases. This is why it is crucial that we are open and transparent about how a result was determined and which features were decisive. Wikimedia offers a functionality that allows us to ask ORES which features it used when it made a prediction about an edit. For instance, we can check which features it used when assessing revision number 21312312 of the article French Renaissance on English Wikipedia.

We can go even a step further. ORES lets us toy around with the assessment above by adding and removing features as we see fit. We can either use authentic data from Wikipedia or add synthetic data we came up with in order to see how that would influence the result. What if the article was properly referenced? What if the vocabulary contained more sophisticated terms? We can test those things on an actual article. 

Feature Injection for Everyone

To be ethical and human centred, machine learning systems must be open and offer people a way to test them, to figure out how they work and react. Not because of the things we know, but because of the things we don’t always immediately realise. We have a gender bias on Wikipedia. Users who identify as female have reported that their edits were seen more critically than when they didn’t report their gender. Biographies about women are fewer and shorter. This alone is enough for a machine learning tool to pick up biases and magnify them and it might be very hard for us to recognise this. Similar biases exist with race, language, age and many other categories. And that is precisely why we need systems that are transparent. Ideally they will be open to everyone, and ideally not only on Wikipedia. No, this won’t solve the issue of bias, but it might limit it. And it will give civil society and researchers a tool to counter negative developments.

A miniseries on machine learning tools:  Machine Learning and Artificial Intelligence technologies have the potential to benefit free  knowledge and improve access to trustworthy information. But they also  come with significant risks. Wikimedia is building tools and services  around these technologies with the main goal of helping volunteer  editors in their work on free knowledge projects. But we strive to be as  human centred and open as possible in this process. This is a  miniseries of blog posts that will present tools that Wikimedia develops  and uses, the unexpected and sometimes undesired results and how we try  to mitigate them.  

14 September 2021, San Francisco, California — Today, the Board of Trustees of the Wikimedia Foundation announced the appointment of Maryana Iskander as the organization’s new CEO. She is a globally recognized social entrepreneur and an expert in building cross-sector partnerships that combine innovative technology with community-led solutions to close opportunity gaps.  

As CEO of Wikimedia Foundation, the global nonprofit organization that supports Wikipedia and 12 other free knowledge projects, Maryana will champion the organization’s goal to ensure that people everywhere can access and share knowledge freely. She will formally begin on January 5, 2022 and report to the Foundation’s Board of Trustees. 

Since 2013, Maryana has served as CEO of Harambee Youth Employment Accelerator, a South African non-profit social enterprise focused on building African solutions for the global challenge of youth unemployment.  Under her leadership, Harambee received the prestigious Skoll Award for Social Entrepreneurship in 2019 for its model to support now over 1.5 million youth with access to learning and earning opportunities. Throughout her career, Maryana has sought to break down barriers that improve access to information and opportunity. 

“Maryana’s approach to leadership is based around collaboration and community,” said Nataliia Tymkiv, Acting Chair of the Board of Trustees of the Wikimedia Foundation. “She has deep appreciation for the role that volunteer-led communities can play in addressing social challenges. Throughout her career, she has driven tangible impact on issues from healthcare to unemployment. We believe that she will be a powerful champion to grow the Wikimedia movement and increase global access to free knowledge.”

“Today, societies across the world are confronted by systemic challenges that require the best of human-led and technology-enabled solutions. This remarkable global movement demonstrates how powerful that combination can be in ensuring that every human can freely share in the sum of all knowledge,” said Maryana Iskander. “I am honored to support this inspiring vision and build a more equitable future for knowledge together.”   

The Wikimedia Foundation currently has over 500 employees around the world, with an annual budget of over $100 million. The Foundation operates the technology infrastructure that enables more than 18 billion visits to Wikipedia monthly and advocates for policies globally that protect and advance access to information. In her role as CEO, Maryana will take on the urgent task of expanding access to, and participation in, free knowledge globally, as the threats of widespread misinformation and online censorship grow more dire.  

Wikimedia also supports a movement of over 280,000 volunteer contributors who edit Wikipedia and its sister sites every month. Maryana will collaborate closely with the volunteer movement to make progress towards a shared vision for the future of Wikimedia which prioritizes knowledge equity, helping to close knowledge gaps in content on Wikimedia projects and increasing the diversity of contributors to the sites by reducing barriers to knowledge that prevent women and marginalized communities from equitable participation. 

Maryana also brings experience from leadership roles in the public, private, and social sectors. She spent more than half a decade as the Chief Operating Officer of Planned Parenthood Federation of America, a volunteer-led social movement focused on access to healthcare. Maryana was also the Advisor to the President of Rice University, an associate at global consulting firm McKinsey & Company, and a law clerk on the United States Court of Appeals for the Seventh Circuit. 

Born in Cairo, Egypt, Maryana was educated in the United States and the United Kingdom. She holds a B.A. magna cum laude from Rice University, an M.Sc. from Oxford University as a Rhodes Scholar, and a J.D. from Yale Law School, where she received a Distinguished Alumna Award. Maryana is also a Truman Scholar, a Henry Crown Fellow, and a member of the Aspen Global Leadership Network. She serves on the board of World Education Services.

About the Wikimedia Foundation

The Wikimedia Foundation is the nonprofit organization that operates Wikipedia and the other Wikimedia free knowledge projects. Our vision is a world in which every single human can freely share in the sum of all knowledge. We believe that everyone has the potential to contribute something to our shared knowledge, and that everyone should be able to access that knowledge freely. We host Wikipedia and the Wikimedia projects, build software experiences for reading, contributing, and sharing Wikimedia content, support the volunteer communities and partners who make Wikimedia possible, and advocate for policies that enable Wikimedia and free knowledge to thrive. 

The Wikimedia Foundation is a charitable, not-for-profit organization that relies on donations. We receive donations from millions of individuals around the world, with an average donation of about $15. We also receive donations through institutional grants and gifts. The Wikimedia Foundation is a United States 501(c)(3) tax-exempt organization with offices in San Francisco, California, USA.

A little-known naturalist from Chikkaballapur

11:22, Tuesday, 14 2021 September UTC
Bangalore has historically, being an administrative centre with a mild climate, had a fair share of colonial natural history collectors and naturalists. We know a fair bit about the botanists who walked this region and a bit about hunters of larger game but rather little about those who studied insects. A few years ago I became aware of the Campbell brothers from Ireland (but of Scottish origin). It took some time to put together the Wikipedia entries on them which is where more straightforward biographical details may be found.

After a trip to the Nandi Hills [to examine a large number of heritage Eucalyptus trees (nearly 200 years old) that the Horticulture Department had decided to cut down to the stump, supposedly because falling branches were seen by the Archaeological Survey of India as a threat to heritage buildings nearby], some of us decided to visit Chikballapur to examine the place of work of  Dr Thomas Vincent Campbell (1863-16 December 1930) - "T.V." as he was known to his friends was a missionary doctor with the London Missionary Society and had worked briefly at Jammalamadugu where his older brother William Howard Campbell (20 September 1859 - 18 February 1910) had worked as a missionary. Another brother back in Derry, David Callender Campbell (1860-1926) was also a keen observer of moths and a botanist. In their younger days in Derry, they and their siblings had put together a "family" museum of natural history that was said to be among the best in the region! William was the oldest of nine siblings and appears to have been the sturdiest considering that he was a champion rugby player at Edinburgh University. He moved to Cuddapah in 1884 and he may well have been the first person to see Jerdon's courser in life - Jerdon, Hume, and others appear to have dealt only with specimens obtained from local hunters. William collected moths and many of them appear to have gone to Lord Rothschild and nearly 60 taxa were described on their basis by Hampson. In 1909, he was to become director United Theological College Bangalore but ill health (sprue) forced him to return to Europe and he died in 1910 in Italy. His Cuddapah-born son Sir David Callender Campbell (1891 – 1963) became a prominent Northern Ireland politician. William's life is covered in some detail by Alan Knox while examining the only known egg of Jerdon's courser. A biography (a bit hagiographic though) of William in Telugu also exists.

T.V.'s life on the other hand was hard to find information on, we knew of his insect specimens. He was in contact with E.A. Butler who specialized in the life histories of insects and T.V. seems to have taken off after him and not only colllected bugs (ie Hemiptera) but made notes on them which were used by Distant in the Fauna of British India. Several insects that T.V. collected have never been seen again. T.V. moved to Chikaballapur and worked at the Ralph Wardlaw Thompson Memorial Hospital which is now just known as the CSI Hospital and largely in disrepair. The hospital in its heyday was among the few in the region and treated a large number of patients. After suffering from tuberculosis, he also established a TB sanatorium at Madanapalli. Campbell treated nearly a thousand cases of cataract and was awarded a Kaisar-i-Hind medal for work in 1908. Campbell appears to have made a very large collection of insects from Cuddapah, Chikballapur, and from the Ooty area (where he would have spent summers). Many of these are now in the Natural History Museum in London and a good number are type specimens (ie, the specimens on the basis of which new species were described). Professor C.A. Viraktamath, entomologist and specialist on the leafhoppers, has for many years searched for a supposedly wingless Gunhilda noctua which was collected from the Nilgiris. Based on T.V.'s connections, I believe the place to look for them would be somewhere in the vicinity of the church in Ketti. Considering the massive alteration in habitats, there is a slight chance that the species has gone extinct but it is doubtful that it was so narrow in its distribution.
W.H. Campbell

Dr T.V. Campbell
T.V.'s former home in Chikaballapur

Dr TV attending to patients in Chikaballapur, c. 1912

A lane inside the hospital premises named after T.V.

Foundation stone of the hospital

The Wardlaw Thompson Hospital c. 1914

Gunhilda noctua - a monotypic genus never seen
since T.V. found them for W.L. Distant to describe in 1918
from The Fauna of British India. Rhynchota Vol.II

The Wikipedia entries can be found at T.V. Campbell and W.H. Campbell. Many people helped in the development of these articles. Roy Vickery kindly obtained a hard to find obituary of T.V., Alan Knox sent me some additional sources on W.H.C. and Susan Daniel, librarian at the United Theological College was extremely helpful. Arun Nandvar drove and S. Subramanya joined our little adventure in Chikaballapur. Dr Eric Lott made enquiries with the SOAS and LMS archives but found little. My entomologist friends and mentors, Prashanth Mohanraj and Yeshwanth H.M. shared their enthusiasm in discovering more about T.V.

Asian American Journalists on Wikipedia

15:54, Monday, 13 2021 September UTC

Heather J. Sharkey has been working with undergraduate and graduate students on Wikipedia projects since 2019, with the goal of promoting public-facing scholarship. She is a professor in the Department of Near Eastern Languages and Civilizations at the University of Pennsylvania.

Dr. Heather Sharkey
Image by CallMeBarcode, CC BY-SA 4.0 via Wikimedia Commons.

The Asian American Journalists Association (AAJA) partnered with Wiki Education to host a Wiki Scholars training course, funded by the Wikimedia Foundation, in July 2021. The goal was to increase representation of journalists of Asian origin by equipping participants with skills to improve existing articles or write new ones. Though neither a journalist nor a person of Asian heritage, I was privileged to join the AAJA group when a spot opened up.  I used the opportunity to write four new articles about Asian American journalists, including Nancy Yoshihara, a longtime reporter for The Los Angeles Times, who was one of the AAJA’s founders.  In the process, I strengthened editing and coding skills that I expect to apply at the University of Pennsylvania, where I teach modern and contemporary Middle Eastern history, and where I have been incorporating writing for Wikipedia into my courses during the past two years.

The AAJA has a mission, which is “advancing diversity in newsrooms [to] ensure fair and accurate coverage of communities of color.” Established in California in 1981, the AAJA has grown to welcome journalists of Asian and Pacific Islander heritage and supporters of other backgrounds within North America and the world.  It understands Asia widely: everything from the eastern Mediterranean region (western Asia, including part of the Middle East) to South, Central, East, and Southeast Asia, and into the Pacific arena.

The AAJA originally had strong American focus.  Its founders were responding to a history of popular American anti-Asian sentiment which went back to the nineteenth century and gained expression through laws like the Chinese Exclusion Act of 1882.  Conscious of this past, the AAJA’s website continues to affirm its goal of promoting “equitable and accurate coverage of Asian Americans and Pacific Islanders (AAPIs) and AAPI issues,” largely by encouraging AAPI students to enter media careers and by offering mentorship to journalists of Asian and Pacific Islander origin or heritage.

To prepare for this Wiki Education course, I read about the AAJA – and that was when I realized that its co-founder Nancy Yoshihara lacked a page on Wikipedia.  Given Wikipedia’s well-known gender gap, the absence did not entirely surprise me, and I was determined to address it.  In looking for sources about Yoshihara’s career, I found an interview that she gave on C-SPAN in 1997 in conjunction with an AAJA meeting in Boston that featured a panel on “The Price of Asian Political Involvement.”  Yoshihara, then president of the AAJA’s Los Angeles chapter, cited the 1996 election in Washington State of Gary Locke (b. 1950), who became the first Asian-American governor in the continental United States.  She also cited concerns about disturbing portrayals of Asian and Asian Americans in America mass media, which in some cases entailed propagation of Charlie-Chan- and martial-arts-style stereotypes and allegations of political manipulation through campaign donations. Nearly twenty-five years have passed since Yoshihara discussed these phenomena in her C-SPAN interview.  And yet, the recent upsurge in anti-Asian hate crimes in the United States – including the March 2021 Atlanta spa shootings – points to the persistence of the problem of American xenophobia towards people of Asian background and the continuing relevance of the AAJA’s efforts to promote inclusion and understanding via reporting.

Another journalist about whom I wrote for Wikipedia is Arun Venugopal, who grew up in Texas to parents who immigrated from India.  In print media and on radio, Venugopal has addressed issues facing Asian American and other communities.  He has discussed, for example, popular discourses about Asian Americans as a “model minority” and how such ideas have contributed to broader patterns of racism and xenophobia towards immigrants and people of color in the United States.

By participating in this Wiki Education course, I realized that while the AAJA may have been an American organization upon its foundation, its scope has steadily widened. Now stretching far beyond California, and counting more than 1,500 members, the organization is increasingly international. This point became clear in the weekly meetings that Wiki Education’s Will Kent led by Zoom for AAJA program participants, who tuned in from places ranging from Denver to Delhi and Seoul.

Developing these articles for this AAJA Wiki Scholars course also alerted me to a particular challenge about writing journalists’ biographies: journalists tend to write about others, not themselves, which can make it hard to find basic information about them.  Perhaps their instinct for security and confidentiality – for their sources, as for themselves – explains this discretion.  Despite my best efforts, for example, I could not find a birth year for either Yoshihara or Venugopal.  Fortunately, since Wikipedia articles are always works-in-progress, future researchers may find the information and fill these gaps later on.  These lessons about sourcing and revision are ones that I will pass on to my students at Penn.

Hero image of Penn campus: Kevin83002, Public domain, via Wikimedia Commons

Production Excellence #35: August 2021

13:23, Monday, 13 2021 September UTC

How’d we do in our strive for operational excellence last month? Read on to find out!


Zero documented incidents last month. Isn't that something!

Learn about past incidents at Incident status on Wikitech. Remember to review and schedule Incident Follow-up in Phabricator, which are preventive measures and other action items to learn from.

Image from Incident graphs.


In August we resolved 18 of the 156 reports that carried over from previous months, and reported 46 new failures in production. Of the new ones, 17 remain unresolved as of writing and will carry over to next month.

The number of new errors reports in August was fairly high at 46, compared to 31 reports in July, and 26 reports in June.

The backlog of "Old" issues saw no progress this past month and remained constant at 146 open error reports.

💡 Did you know:

You can zoom in to your team's error reports by using the appropriate "Filter" link in the sidebar of our shared workboard.

Take a look at the workboard and look for tasks that could use your help.

View Workboard


Last few months in review:

Jan 2021 (50 issues) 3 left.
Feb 2021 (20 issues) 6 > 5 left.
Mar 2021 (48 issues) 13 > 10 left.
Apr 2021 (42 issues) 18 > 17 left.
May 2021 (54 issues) 22 > 20 left.
Jun 2021 (26 issues) 11 > 10 left.
Jul 2021 (31 issues) 16 > 12 left.
Aug 2021 (46 issues) + 17 new unresolved issues.


156 issues open, as of Excellence #34 (July 2021).
-18 issues closed, of the previously open issues.
+17 new issues that survived August 2021.
155 issues open, as of today (3 Sep 2021).

For more month-over-month numbers refer to the spreadsheet.


Thank you to everyone who helped by reporting, investigating, or resolving problems in Wikimedia production. Thanks!

Until next time,

– Timo Tijhof

Tech News issue #37, 2021 (September 13, 2021)

00:00, Monday, 13 2021 September UTC
previous 2021, week 37 (Monday 13 September 2021) next

weeklyOSM 581

09:28, Sunday, 12 2021 September UTC


lead picture

Mapping individual parking spaces [1] © Lejun | map data © OpenStreetMap contributors

Mapping campaigns

  • Радченко Алексей invited (ru) us to participate in the OpenStreetMap markup contest (ru) > en running from 10 to 30 September. There are simple instructions and dozens of prizes to be won. Join in.
  • The OSM-US project to map playgrounds in Philadelphia (we reported earlier) was completed on 1 September.


  • Jeroen Hoek noticed a page on the wiki relating to a proposal to regularise the tag china_population. The tag seems to designed to change the way some larger cities in China are rendered.
  • The authorities of the French commune of Torsac have recently completed a project to name streets and assign house numbers. Wilfried, a commune representative, asked (fr), on the OSM-FR forum, why these addresses and street names do not yet appear in OSM. The answer turned out to be somewhat complex, as the long thread in the forum shows. However, the issue was resolved by Christian Quest.
  • Andrés (User AngocA) from Bogotá explained (es) > en in a diary entry how to analyse Notes and how, if there are many Notes open in the area of interest, you can develop a strategy to close them.
  • [1] User Lejun has written up some interesting tips for mapping individual parking spaces.
  • Monika Tota asked (pl) members of the OpenStreetMap Polska group how they tag properties that are somewhat subjective, like smoothness=good vs intermediate or wheelchair=yes vs limited. People responded with many methodologies they use, pointing out that reality often resists clear-cut classification and that this question is related to the issue of cartographic generalisation.
  • Voting is open until Sunday 19 September for the proposal club=cadet, intended to be used to map the locations where various Youth Cadet groups meet, together with details of each group.


  • User Assange expressed their dismay at mapping changes in China and Taiwan which they perceive to be politically driven. For example, is this a vacation school or concentration camp?
  • GeOsm has launched its open-source, globally distributed map database service for countries around the world.
  • JaLooooNz wrote a diary entry about the use of service=driveway and the ‘need’ for a service=residential_driveway.
  • OpenStreetMap Belgium has chosen Constantine Tumwine from Tanzania as Mapper of the Month and interviewed him.
  • Noé discovered (fr) that a town in Ivory Coast with a population of over 30,000 bears his first name, Noé. It wasn’t mapped on #OpenStreetMap, so he took care of it.
  • Feye Andal shared that Youthmappers in the Philippines grew from four to nine local chapters in just one year.

Humanitarian OSM

  • HOT published their Annual Report for 2020 (covering the period from July 2020 to June 2021). Note the report is large and may load slowly.
  • The Open Mapping Hub – Asia Pacific, organised by HOT, is launching a newsletter to share highlights of open mapping projects, communities, and opportunities across the region and invites you to subscribe.


  • has released a 2021 Fall foliage prediction map showing the progressive change of leaves’ colour through the USA.
  • used (fr) > en data from the French national statistical institute, INSEE, in order to map 4,000 hairdressers in France with, mostly bad, puns in their names. While using OpenStreetMap as a baselayer, clicking on a shop marker links to Google Maps.
  • The Ordnance Survey, Britain’s national mapping agency, withdrew support for OS Open Space on 31 August. OpenSpace was launched in 2008 using OpenLayers. OSM’s founder, Steve Coast, worked as a consultant to the OS on the initial project. Replacement services continue to be available via the new hub, but a number of existing applications and websites no longer work.

Open Data

  • After a small pilot project in 2017, the Church of England announced their intention to map graves in all 19,000 churchyards in their care. The mapping, funded by Historic England, the National Lottery Heritage Fund and Caring for God’s Acre, uses LiDAR and is expected to take seven years. It is hoped that the resulting data will be open.



  • Christian Quest wrote (fr) > en about his summer work on optimising performance for the new French tile server.


  • Bryan Housel announced release 1.1.7 of RapiD, including bugfixes and validation improvements.
  • A September release (2021.09.01-6-android) of Organic Maps on Android is now available on Play Store.

Did you know …

  • … the OpenParkingMap? Forked by jakecoppinger from zlant’s Parking Lanes Viewer, this version has Australian parking signs.
  • OSMCha is a tool for checking OSM changesets for data quality? The tool was developed by Wille Marcel in 2015, and Wille remains the maintainer.
  • … that you can find user names or the user id associated with a partial user name using Who’s That? Particularly helpful if you can’t remember how a user name is capitalised.

OSM in the media

  • Natfoot provided an audio-visual version of weeklyOSM 580 on YouTube.

Other “geo” things

  • The Guardian previewed the new book Atlas of the Invisible by UCL geographer James Cheshire and graphics designer Oliver Uberti. The book contains many visualisations of geographic data relating to climate change.
  • Episode 84 of the Geomob podcast featured Dave Gee, creator of hand-drawn Doodle Maps.
  • OpenCage’s latest #geoweirdness thread is focused on Canada.

Upcoming Events

Where What Online When Country
OSM Africa Monthly Mapathon: Map Malawi osmcalpic 2021-09-04 – 2021-10-04
Bogotá Distrito Capital Agreguemos y editemos rutas de transporte en OpenStreetMap osmcalpic 2021-09-11 flag
Zürich Mapping-Party/132. OSM-Treffen Zürich osmcalpic 2021-09-11 flag
Arlon Réunion des contributeurs OpenStreetMap, Arlon osmcalpic 2021-09-13 flag
臺北市 OpenStreetMap x Wikidata 月聚會 #32 osmcalpic 2021-09-13 flag
Hamburg Hamburger Mappertreffen osmcalpic 2021-09-14 flag
PHXGeo Meetup (Phoenix, AZ, US) osmcalpic 2021-09-15
The ISPRS SC Webinar Series: Collaborative Humanitarian Mapping with PoliMappers and UN Mappers osmcalpic 2021-09-16
Karlsruhe Karlsruhe Hack Weekend osmcalpic 2021-09-17 – 2021-09-19 flag
Nantes Journées européennes du patrimoine 2021, Nantes osmcalpic 2021-09-18 flag
Grenoble Atelier OpenStreetMap – retrouvailles et initiation ! osmcalpic 2021-09-20 flag
Lyon Rencontre mensuelle Lyon osmcalpic 2021-09-21 flag
Bonn 143. Treffen des OSM-Stammtisches Bonn osmcalpic 2021-09-21 flag
Berlin OSM-Verkehrswende #27 (Online) osmcalpic 2021-09-21 flag
Lüneburg Lüneburger Mappertreffen (online) osmcalpic 2021-09-21 flag
DRK Missing Maps Online Mapathon osmcalpic 2021-09-23
[Online] OpenStreetMap Foundation board of Directors – public meeting osmcalpic 2021-09-24
Düsseldorf Düsseldorfer OSM-Treffen (online) osmcalpic 2021-09-24 flag
Amsterdam OSM Nederland maandelijkse bijeenkomst (online) osmcalpic 2021-09-25 flag
FOSS4G 2021 Buenos Aires – Online Edition osmcalpic 2021-09-27 – 2021-10-02
Bremen Bremer Mappertreffen (Online) osmcalpic 2021-09-27 flag
Grenoble Mapathon Missing Maps – Cartographier des cartes humanitaires sur un mode collaboratif et libre. osmcalpic 2021-09-28 flag
Bruxelles – Brussel Virtual OpenStreetMap Belgium meeting osmcalpic 2021-09-28 flag

If you like to see your event here, please put it into the OSM calendar. Only data which is there, will appear in weeklyOSM.

This weeklyOSM was produced by Lejun, Nordpfeil, PierZen, SK53, Supaplex, TheSwavu, YoViajo, derFred.

This Month in GLAM: August 2021

00:31, Sunday, 12 2021 September UTC

Improving Wikipedia’s coverage of 9/11

18:46, Friday, 10 2021 September UTC

Tomorrow is the 20th anniversary of the September 11 attacks — and as many people reflect on the milestone, some will turn to Wikipedia to read about this moment in history and the widespread impacts of it. The attacks occurred in Wikipedia’s first year of existence, and played an important role in shaping the culture of the nascent encyclopedia project. A recent article in Slate by Stephen Harrison provides a nice overview of Wikipedia’s coverage and explores how Wikipedia and the War on Terror “grew up together”. But as the 20th anniversary approaches, Wikipedia’s articles related to the attacks and their aftermath don’t get the sort of editing attention they once did, and it shows. The Guantanamo military commission article, for example, had a banner informing readers that its “factual accuracy may be compromised due to out-of-date information” — a banner that someone added to the page in November 2010.

For the last two months, Wiki Education, in collaboration with our partners at ReThink Media, has been addressing content gaps within Wikipedia’s articles related to September 11, the War on Terror, and related topics. We’ve been leading a ReThink Media Wiki Scholars course, where we brought together a group of peace and security studies experts to identify content gaps in Wikipedia’s coverage. We taught them to edit Wikipedia and navigate policy, something that’s especially important when working in an area where strong feelings persist.

One of Wikipedia’s most active WikiProjects, or collectives of editors tackling a particular topic area, is WikiProject Military History. Articles related to the military often have extensive coverage of the specifics of war — but this approach has led to gaps in the context of humanitarian implications. During the course, we had several conversations about whether Wikipedia articles should include this kind of information, or whether the goal was to primarily provide accounts of campaigns and operations. We came to the consensus that Wikipedia’s goal is to provide an overview of all relevant information, which necessarily includes the humanitarian impacts of war. As a result of this, the participants updated information in the article on the War on terror, including adding a previously absent section on civilian casualties in various countries and war zones.

Other articles improved by the group include the September 11 attacks article, in which a contributor added subsections to the “domestic response” section about discrimination and racial profiling of Arab Americans and interfaith efforts to educate people about the Muslim faith. Another tackled the Post-9/11 article, adding a section about discriminatory backlash. And the Islamophobia in the United States article now has a section on Islamophobia in places of worship, thanks to a participant in the course.

A previously short article about Holy Shrine Defenders got an overhaul from another participant, resulting in a significant expansion. And information related to the Guantanamo Bay detention camp, and the United States v. Khalid Sheikh Mohammed case also saw significant edits from course participants. One Wiki Scholar updated and rewrote the Guantanamo military commission article, finally allowing the removal of that 11-year-old warning banner.

But sometimes smaller changes can have a big impact. In the lead section of the September 11 attacks article, al Qaeda is described as “Wahhabi”. One participant removed that term because it was inaccurate. Their edit was reverted by a Wikipedian because the statement was sourced, and the discussion on the article’s talk page didn’t come to a resolution. In our class session, the Wiki Scholar asked how best to proceed. Looking at the sources, it was fairly obvious that two were weak, but one came from an academic source, which meant it wasn’t the sort of thing that could be dismissed out of hand. But then a course participant who had the book on their own bookshelf referenced the cited page and found the relevant quote: Because Osama bin Laden and most of the hijackers are Saudi nationals, it was assumed that al-Qaeda is an expression of Wahhabism. That is not the case. Once the precise quote was supplied, the editors engaging on the talk page were able to reach consensus quickly.

Real world events overtook our course as people had to miss sessions to do press interviews after the Fall of Kabul, and many of them were personally impacted as they worried about the safety of colleagues and friends who were trying to escape Afghanistan. But despite that, they continued to work to improve Wikipedia, understanding that improvements like these were critical in the weeks leading up to the 20th anniversary of the September 11 attacks, when readership of these articles is skyrocketing. In a few short weeks, the articles our subject-matter experts improved have received more than 1.4 million page views – and we expect that number to rise even more tomorrow and in the coming weeks. That means millions of people searching for neutral, fact-based information around this anniversary now get a more nuanced picture of the impacts the attacks have had over the past 20 years.

For as Slate’s Stephen Harrison writes, “As we approach the 20th anniversary of Sept. 11, Facebook users are likely to see 9/11 tributes selected by an algorithmic assessment of that user’s content preferences, part of the personalized, polarized social media experience. On the other hand, every English Wikipedia user who visits the current page for the September 11 attacks this week will see the same article regardless of their demographic profile.”

Interested in partnering with Wiki Education to improve Wikipedia’s coverage of a subject area? Visit Image credit: Carol M. Highsmith, Public domain, via Wikimedia Commons

How writing for Wikipedia helps journalists

16:16, Friday, 10 2021 September UTC

Yiwen Lu is a student journalist based in Chicago and the Communications Director of the Asian American Journalists Association (AAJA). She was a participant in Wiki Education’s recent AAJA Wiki Scholars course, which was made possible by the Wikimedia Foundation. In the course, AAJA’s members worked together to increase Wikipedia’s coverage of Asian American and Pacific Islander journalists’ biographies. 

“I use Wikipedia a lot personally and professionally, and it struck me that minority journalists are underrepresented in Wikipedia biographies. That inspired me to take this opportunity and learn more about how people edit Wikipedia articles that end up being a full encyclopedia,” Lu says.

Lu created an entry for Jiayang Fan, who is a staff writer for The New Yorker. She compiled many of Fan’s interviews and other sources to be able to write a comprehensive biography.

Taking the AAJA Wiki Scholars course provided Lu a fresh perspective on the contributing process on Wikipedia. As an avid user of Wikipedia previously, she transformed from a consumer to a writer confident in writing from scratch.

“Previously as a user/reader of Wikipedias, I would only pay attention to the actual Wikipedia page, but the course taught me about different sections in the writing process, various pages in addition to the main Wikipedia, as well as many fun resources,” Lu says. “It introduced me to the behind-the-scenes parts of Wikipedia. So after taking the course, I am comfortable with creating an article from scratch, adding citations, as well as improving existing articles through using talk pages, for example.” 

As journalists emphasize the importance of reporting accurate information from reliable sources, the AAJA Wiki Scholars course taught Lu that Wikipedians share similar values. A major feature of Wikipedia is including citations in every article, a tool journalists can use to pinpoint sources for their own work.

“Beyond learning about the topic itself through contributing, I think learning how citations are made in Wikipedia would be helpful for journalists when we are looking into an issue, as it helps us to trace back to the original sources,” Lu says. “I personally found the process of creating a Wikipedia entry helpful for me to read about multiple sources on the topic.”

Throughout the course, Lu enjoyed engaging in genuine conversations with other Wiki Scholars while picking up on how new Wikipedia articles come to life.

“Looking at the back end of things – the talk pages and the edit history have been really fun to look at. Those helped me visualize how one builds a Wikipedia from scratch,” Lu says. “Reading user pages of individual users has also been a really fun part; I would never imagine that Wikipedia can also serve this additional social function to bring the community together.” 

With Wikipedia being a website filled with publicly accessible information, Lu sees the vast amount of benefits that come along with teaching others about Wikipedia contributions and the importance of publishing information in an effort to give credit where it’s due. 

“As something that is openly accessible to everyone on the internet, Wikipedia is in a really unique and important position to educate people, so having diverse perspectives makes sure that we are telling the stories that need to be heard,” Lu says.

 Lu hopes her contribution on Wikipedia provides representation for other aspiring journalists, especially Asian Americans. Her experiences starting out on her journalism career reflects on why she took the initiative to enroll in the AAJA Wiki Scholars course.

“Personally, when I started to get interested in journalism during college, there were few AAPI journalists around me, and there was no role model for me to look at. As a result, I have been lost for the first couple of months of navigating this career,” Lu says. “The more I report and write, the more I realized that there were few coverages of the AAPI community because there are not many AAPI journalists. A lot of the time, journalists don’t look like the community they cover. Therefore, for the sake of both helping aspiring journalists who hope to get into the industry as well as helping the community find the right person to tell their stories, I found it important to participate in the AAJA Wiki Scholars course and contribute to the representation of minority journalists – and not even journalists, but people of color in all professions.”

To take a course like Yiwen’s, please visit Image courtesy Yiwen Lu, all rights reserved.

Outreachy report #24: August 2021

00:00, Friday, 10 2021 September UTC

Highlights We said goodbye to many of our wonderful May 2021 cohort interns We had to deal with an increasing amount of extensions, one including a CoC incident We processed their final feedback We found wonderful initial application reviewers! May 2021 cohort Interns (now alums) who didn’t have extensions finished their internships this month. In the last conversation we all had, many of them expressed their appreciation for our bi-weekly chats.

Sharing the accomplishments of an amateur scientist

15:33, Thursday, 09 2021 September UTC
Britt Forsberg
Britt Forsberg

Britt Forsberg has had an extensive amount of experience with science. Currently, she coordinates training and service opportunities for the Minnesota Master Naturalist program where she prepares volunteers for service in conservation and connects them to stewardship, research, and education volunteer opportunities.

When Forsberg learned about the Wiki Scientists course through 500 Women Scientists, she was eager to take up this opportunity to increase representation of women in STEMM on Wikipedia.

Forsberg edited the page for Miriam Rothschild, who is a British natural scientist that has contributed to zoology, entomology, and botany. She selected this scientist because, she says, she wanted to acknowledge so-called ‘amateur’ scientists, whom she believes deserve equal recognition as those with traditional academic credentials.

“Even though she didn’t have the academic credentials that many people find necessary, she was incredibly knowledgeable and made huge contributions to entomology,” Forsberg said.  “I also found her previous page disappointing in the ways it called attention to her lack of educational background and her husband’s remarriage after their divorce instead of her scientific achievements.”

Forsberg says that if it had not been for the Wiki Scientists course, she would not have had the chance to properly dive into researching Rothschild. Because of her dedicated work, within a short period of time after publishing on Wikipedia, there were many views.

“I was amazed at the number of page views our articles had just in the small time we worked in the cohort so I think it’s clear that Wikipedia is a major player and that people pay attention to what is posted there.  It’s very important that Wikipedia users can see themselves somewhere in Wikipedia,” Forsberg says.

What made Forsberg’s time in the Wiki Scientists course memorable was the chance to work and connect with others towards similar goals. She hopes others in academia who are still denouncing Wikipedia as a good starting place will soon see its purpose and place in the information landscape.

“I think some people in academia can dismiss Wikipedia as a source but participating in the course would show them what a rich resource it is,” she says.

Forsberg looks forward to spreading the word and knowledge about Wiki Education’s useful services and how contributions like this impact others.

We’re starting to look at how we can use Wikipedia in our program,” she says. “We’re trying to represent more diverse perspectives in our field and while we could manage that information, having our participants work in Wikipedia means that their information will find a much larger audience. It also solves some technical problems for us in that we don’t have to maintain a website, host server space, etc., and other Wikipedia users can help us stay on top of things like plagiarism. Everyone benefits this way; our program, our participants, and Wikipedia users across the world.” 

To take a course like Britt’s, please visit Image credits: Open Media Ltd., CC BY-SA 3.0, via Wikimedia Commons; Mountainairy, CC BY-SA 4.0, via Wikimedia Commons.

By Miriam Redi, Fabian Kaelin, Tiziano Piccardi

Colossal octopus by Pierre Denys de Montfort, Public Domain, and EMS VCS 3 by Standard Deviant, CC BY-SA 2.0

It’s often said that an image is worth a thousand words, but for the millions of images and billions of words on Wikipedia, this idiom doesn’t always apply. Images are essential for knowledge diffusion and communication, but less than 50% of Wikipedia articles are illustrated at all! Moreover, images on articles are not stand-alone pieces of knowledge: they often require large captions to be properly contextualized and to support meaning construction. 

More than 300M people in the world have visual impairments, and billions of people in the Global South with limited internet access would benefit from text-only documents. These groups rely entirely on the descriptive text to help contextualize images in Wikipedia articles. But only 46% of images in English Wikipedia come with a caption text, and only 10% have some form of alt-text, with 3% having an alt-caption that is appropriate for accessibility purposes. This lack of contextual information not only limits the accessibility of visual and textual content on Wikipedia, but it also affects the way in which images can be retrieved and reused across the web.

Several Wikimedia teams and volunteers have successfully deployed algorithms and tools to help editors fix the problem of lack of visual content on Wikipedia articles. While very useful, these methods have limited coverage. An average of only 15% of articles find good candidate image matches. 

Existing automated solutions for image captioning are also difficult to incorporate in editors’ workflows: the most advanced computer vision-based image to text generation methods aren’t suitable for the complex, granular semantics of Wikipedia images and are not generally available for languages other than English.

The Wikipedia Image/Caption Matching Competition on Kaggle

As part of our initiatives to address Wikipedia’s knowledge gaps, we are organizing the “Wikipedia Image/Caption Matching Competition.” We are inviting communities of volunteers, developers, data scientists, and machine learning enthusiasts to help us solve the hardest problems in the image space. 

The  “Wikipedia Image/Caption Matching Competition” is designed to foster the development of systems that can automatically associate images with their corresponding image captions and article titles. The Research Team at the Wikimedia Foundation will be hosting the competition through Kaggle starting September 9th, 2021. This competition was made possible thanks to collaborations with Google Research,  EPFL, Naver Labs Europe, and Hugging Face, who massively helped with the data preparation and the competition design. Given the highly novel, open, and exploratory nature of the challenge proposed, the first edition of the competition comes in a “playground” format.

Participation is completely online and open to anyone with access to the internet. In this competition, participants will be provided with content from Wikipedia articles in 100+ language editions. They will be asked to build systems that automatically retrieve the text (an image caption, or an article title) closest to a query image. The best models will account for the semantic granularity of Wikipedia images and operate across multiple languages. 

The collaborative nature of the platform helps lower barriers to entry and encourages broad participation. Kaggle is hosting all data needed to get started with the task, example notebooks, a forum for participants to share and collaborate, and submitted models in open-sourced formats. With this competition, we hope to provide a fun and exciting opportunity for people around the world to grow their technical skills while contributing to one of the largest online collaborative communities and the most widely used free online encyclopedia. 

A large dataset of Wikipedia image files and features

Space-time distortion made by Earth, GNU Free Documentation License

bn: সাধারণ আপেক্ষিকতা তত্ত্ব অণুয়ায়ী সময় এবং কাল এর বক্রতা একটি দ্বি-মাত্রিক চিত্রের সাহায্যে উপস্থাপন করা হয়েছে।ja: 一般相対性理論によって記述される、2次元空間と時間の作る曲面。地球の質量によって空間が歪むとして記述して、重力を特殊相対性理論に取り入れる。実際の空間は3次元であることに注意すべし。
ko: 일반상대성이론에서 묘사된 시공의 곡률을 2차원으로 표현한 그림.
it: Una celebre illustrazione divulgativa della curvatura dello spaziotempo dovuta alla presenza di massa, rappresentata in questo caso dalla Terra.
en: Two-dimensional projection of a three-dimensional analogy of spacetime curvature described in general relativity
ckb: دەرھاوێشتەیەکی دووڕەھەندی لە چەمانەوەی کاتـجێ لە بۆشایییەکی سێڕەھەندیدا، کە لە تیۆریی ڕێژەیی ئاینشتایندا دێتە بەر باس.
my: နှိုင်းရသီအိုရီအရ သုံးဖက်မြင် အာကာသအချိန် ကွေးညွတ်ပုံအား နှစ်ဘက်အမြင်ဖြင့် ဖော်ပြထားပုံ

Participants will work with one of the largest multimodal datasets ever released for public usage. The core training data is taken from the Wikipedia Image-Text (WIT) Dataset, a large curated set of more than 37 million image-text associations extracted from Wikipedia articles in 108 languages that was recently released by Google Research.

The WIT dataset offers extremely valuable data about the pieces of text associated with Wikipedia images. However, due to licensing and data volume issues, the Google dataset only provides the image name and corresponding URL for download and not the raw image files.

Getting easy access to the image files is crucial for participants to successfully develop competitive models. Therefore, today, the Wikimedia Research team is releasing its first large image dataset. It contains more than six million image files from Wikipedia articles in 100+ languages, which correspond to almost1 all captioned images in the WIT dataset. Image files are provided at a 300-px resolution, a size that is suitable for most of the learning frameworks used to classify and analyze images. The total size of the dataset released stands around 200GB, partitioned into 200 files of around 1GB.

With this large release of visual data, we aim to help the competition participants—as well as researchers and practitioners who are interested in working with Wikipedia images—find and download the large number of image files associated with the challenge, in a compact form.

While making the image files publicly available is a first step towards making Wikipedia images accessible to larger audiences for research purposes, the sheer size of the raw pixels makes the dataset less usable in lower-resource settings. To improve the usability of our image data, we are releasing an additional dataset, containing an even more compact version of the six million images associated with the competition. We compute and make publicly available the images’ ResNet-50 embeddings. We describe each image with a 2048-dimensional signature extracted from the second-to-last layer of a ResNet-50 neural network trained with Imagenet data. These embeddings contain rich information about the image content and layout, in a compact form. Images and their embeddings are stored on Kaggle, and on our Wikimedia servers

Here is some sample PySpark code to read image files and embeddings:

# File Format:
## Pixels columns: image_url, b64_bytes, metadata_url
### b64_bytes are the image bytes as a base64 encoded string 
## Embedding columns: image_url, embedding
### Embedding: a comma separated list of 2048 float values

# embeddings 
def parse_embedding(emb_str):
   return [float(e) for e in emb_str.split(',')]
# parse embedding array
first_emb = (
   .select(F.col('_c0').alias('image_url'), parse_embedding('_c1').alias('embedding'))
# 2048
# pixels
first_image = (spark
   .select(F.col('_c0').alias('image_url'), F.col('_c1').alias('b64_bytes'),F.col('_c2').alias('metadata_url'))
# parse image bytes
import base64
from io import BytesIO
from PIL import Image
pil_image =
# (300, 159)

This is an initial step towards making most of the image files publicly available and usable on Commons in a compact form. We are looking forward to releasing an even larger image dataset for research purposes in the near future!

We encourage everyone to download our data and participate in the competition. This is a novel, exciting, and complex scientific challenge. With your contribution, you will be advancing the scientific knowledge on multimodal and multilingual machine learning. At the same time, you will be providing open, reusable systems that could help thousands of editors improve the visual content of the largest online encyclopedia. 


We would like to thank everyone who contributed to this amazing project, starting with our WMF colleagues: Leila Zia, head of Research, for believing in this project and for overseeing every stage of the process, Stephen La Porte and Samuel Guebo who supported the legal and security aspects of the data release, Ai-Jou (Aiko) Chou for the amazing data engineering work, Fiona Romeo for the data about alt text quality, and Emily Lescak and Sarah R. Rodlund for helping with the release of this post.

Huge thanks to the Google WIT authors (Krishna Srinivasan, Karthik Raman, Jiecao Chen, Michael Bendersky, and Marc Najork) for creating and sharing the database, and for collaborating closely with us on this competition, and to the Kaggle team (Addison Howard, Walter Reade, Sohier Dane) who worked tirelessly for making the competition happen.

All this would not have been possible without the valuable suggestions and brainstorming sessions with an amazing team of researchers from different institutions: thank you Yannis Kalantidis, Diane Larlus, and Stephane Clinchant from Naver Labs Europe; Yacine Jernite from Hugging Face, and Lucie Kaffee from the University of Copenhagen, for your excitement and dedication to this project!


  1. We are publishing all images having a non-null “reference description” in the WIT dataset.  For privacy reasons, we are not publishing images where a person is the primary subject, i.e., where a person’s face covers more than 10% of the image surface. To identify faces and their bounding boxes, we use the RetinaFace detector. In addition, to avoid the inclusion of inappropriate images or images that violate copyright constraints, we have removed all images that are candidate for deletion on Commons from the dataset.

About this post

Featured image credit: Wikipedia20 Knowledge.svg, Wikimedia Foundation, CC0 1.0

8 September 2021 — Today, the Wikimedia Foundation, the global nonprofit organization that supports Wikipedia and other free knowledge projects, announced six inaugural grants as part of the newly launched Knowledge Equity Fund, an effort to close knowledge gaps and address racial inequities in its projects. The first round of grants will be given to six global nonprofit organizations: Arab Reporters for Investigative Journalism (ARIJ), the Borealis Philanthropy Racial Equity in Journalism Fund, Howard University School of Law and the Institute for Intellectual Property and Social Justice (IIPSJ), InternetLab, STEM en Route to Change (SeRCH) Foundation, and the Media Foundation of West Africa. 

“As a movement dedicated to the sum of all knowledge, we must take a more active role in breaking down the barriers to knowledge that have disproportionately impacted communities of color throughout history,” said Lisa Gruwell, Chief Advancement Officer at the Wikimedia Foundation and an advisor on the Equity Fund Committee. “Racism has skewed the historical record and continues to deny communities of color access to knowledge as a human right. Through the Equity Fund, we are thrilled to support organizations working directly to address these inequities, so that the work of free knowledge can finally reflect the world’s rich diversity.”

The Equity Fund is a $4.5 million fund created by the Wikimedia Foundation to advance more equitable, inclusive representation in Wikimedia projects, including Wikipedia. Through the fund, the Foundation will build a robust ecosystem of institutional partners working at the intersection of free knowledge and racial justice. The Equity Fund extends the Foundation’s explicit goal to support communities that have been left out by structures of power and privilege. It was conceptualized in June 2020, in the wake of global protests about police brutality and racial injustice in the United States. 

The first grant recipients of the Equity Fund are:

  • Arab Reporters for Investigative Journalism (ARIJ), Jordan ($250,000): To provide a one-year investment to expand the investigative journalism ecosystem in 16 countries in the Middle East. With our support, ARIJ will expand the training and support they provide for Arab journalists around racial equity and accessibility, and advocate for increased coverage of marginalized communities throughout the region.
  • Borealis Philanthropy’s Racial Equity in Journalism Fund, United States ($250,000): To provide a one-year investment to support US-based journalism organizations led by and for people of color, helping expand news and public affairs coverage in communities of color. Through the Racial Justice in Journalism fund, we will seek to increase media coverage and, subsequently, source citations for Wikimedia projects about issues and leaders that impact diverse communities.
  • Howard University School of Law and the Institute for Intellectual Property and Social Justice (IIPSJ), United States ($260,000): To create a two-year Wikimedia Race and Knowledge Equity Fellowship to produce white papers and academic research exploring how free knowledge can be used to advance racial equity and socio-economic empowerment throughout the intellectual property landscape. This Fellowship would also develop recommendations to address gaps in the free knowledge ecosystem that exacerbate systemic racism and block progress to advance racial equity.  
  • InternetLab, Brazil ($200,000): To create a two-year Wikimedia Race and Knowledge Equity Fellowship to research the impact of systemic racism and digital access for African descendants in Brazil, explore the most pressing barriers to the participation of Black people in knowledge online, and identify how racial inequality is reflected in the availability of online content in Portuguese and in Brazil.  The Fellowship will work to identify how national and local policies create barriers related to online knowledge, and potential policy solutions to address intellectual property, access, and education among others.
  • Media Foundation for West Africa (MFWA), Ghana ($150,000): To provide a one-year investment to support MFWA’s work providing journalist training and advocacy for journalist rights. With this grant, MFWA will expand their work to cover racial equity through funding for investigative journalism, promoting and protecting freedom of expression and digital rights in the region.  
  • STEM en Route to Change (SeRCH) Foundation, United States ($250,000): To provide a two-year investment to the SeRCH Foundation to support the expansion of their signature program, #VanguardSTEM, which amplifies the voices of Black, Indigenous, women of color and non-binary people of color in STEM fields. The SeRCH Foundation will leverage cultural production, including multimedia storytelling, to advance non-traditional forms of knowledge creation, to build freely licensed and open rich media content about STEM leaders of color, and address inequitable representation throughout scientific fields.

Racial equity is directly tied to our movement’s focus on knowledge equity, part of our long-term strategy for 2030. Knowledge equity is defined as supporting the knowledge and communities that have been excluded by historical structures of power and privilege. Many of the barriers that prevent people from accessing and contributing to knowledge are rooted in systems of racial oppression. Due to colonization and slavery, knowledge from Black and Indigenous communities, along with other historically marginalized groups, has been systematically excluded and erased from the historical canon. The Equity Fund will directly address the barriers to free knowledge experienced by Black, Indigenous, and communities of color around the world. Investments from the Equity Fund will address one or more of five focus areas: 

  • Supporting scholarship & advocacy focused on free knowledge and racial equity; 
  • Expanding media and journalism efforts focused on people of color around the world; 
  • Addressing unequal internet access; 
  • Improving digital literacy skills that impede access to knowledge; 
  • Investing in non-traditional records of knowledge such as oral histories. 

Grant recipients are chosen based on their past record of impact, their alignment to Wikimedia’s vision of access to knowledge, and their potential to benefit free knowledge. Following this first round of grantees, the Equity Fund will continue to look for additional grantees that align to our goals of addressing racial inequities in free knowledge through subsequent rounds of funding. The next round will likely take place in the next year.

About the Wikimedia Foundation

The Wikimedia Foundation is the nonprofit organization that operates Wikipedia and the other Wikimedia free knowledge projects. Our vision is a world in which every single human can freely share in the sum of all knowledge. We believe that everyone has the potential to contribute something to our shared knowledge, and that everyone should be able to access that knowledge freely. We host Wikipedia and the Wikimedia projects, build software experiences for reading, contributing, and sharing Wikimedia content, support the volunteer communities and partners who make Wikimedia possible, and advocate for policies that enable Wikimedia and free knowledge to thrive. 

The Wikimedia Foundation is a charitable, not-for-profit organization that relies on donations. We receive donations from millions of individuals around the world, with an average donation of about $15. We also receive donations through institutional grants and gifts. The Wikimedia Foundation is a United States 501(c)(3) tax-exempt organization with offices in San Francisco, California, USA.