Thank You To Each and Every Amazing Volunteer!

15:40, Friday, 18 2021 June UTC
WMUK what we do header by Katie Crampton (WMUK) Licence CC BY-SA 4.0

As Small Charity Week draws to a close we send our Thanks and Appreciation to every single Wikimedia UK Volunteer.

The previous 12 months has seen all of us challenged like never before. Collectively and
individually we have faced new ways of living and changes to how we work and volunteer.

As a staff and Trustee team we are sending our appreciation and thanks to each one of you for your dedication, commitment, passion and belief that time volunteered on open knowledge is time well spent.

Thank You!

By George Colbourn, Fundraising Development Coordinator at Wikimedia UK

This year, Wikimedia UK is taking part in ‘Small Charity Week’, an annual event to celebrate the contributions made by the small charity sector to individuals, communities and causes across the country. From the 14th-19th of June 2021, we will be joining other charities with a revenue of less than £1 million to highlight the importance of our work and the impact it has on our beneficiaries.

The small charity sector comprises thousands of organisations across the country that, despite restricted funding, provide crucial support, services, campaigns and events. Through the extensive efforts of our volunteers and partner organisations, as well as the vital support of our donors, Wikimedia UK is able to work on projects that increase knowledge equity around the world, develop digital literacy and contribute to the development and improvement of Wikipedia and other Wikimedia projects. Small Charity Week provides us with a great opportunity to showcase this work and the efforts of the wider Wikimedia community.

Throughout the week, we will be taking part in a number of activities to highlight our achievements and impact, with each day having a specific focus of our work. 

  • Monday 14th June: ‘I love small charities day’. A chance for our staff, volunteers, members and supporters to express their admiration for the small charity sector.
  • Tuesday 15th June: ‘Big advice day’. We will be hosting an online Q&A session focusing on our work, the open-knowledge movement and the small charity sector as a whole.
  • Wednesday 16th June: ‘Policy day’. Here we will be focusing on recent work that Wikimedia UK has undertaken to advocate changes in policy on a regional and national level, helping to promote the importance of open access to knowledge and its benefits to society.
  • Thursday 17th June: ‘Fundraising day’. An opportunity for us to engage with both new and longstanding supporters, informing them on how to get involved with our work and start fundraising.
  • Friday 18th June: ‘Small charity big impact day’. We will be looking back at some of our recent achievements that have had a profound impact on the open knowledge movement.
  • Saturday 19th June: ‘Appreciation day’. The work we do is only possible through the commitment of our supporters, volunteers and members. Appreciation day will give us an opportunity to express our gratitude and thanks to all of those people who make up the national Wikimedia movement.

There are numerous ways to get involved in Small Charities Week, whether you’re a volunteer, member or new to Wikimedia UK. Follow our facebook, twitter and instagram channels to keep updated about our activities over the course of the week and how you can be a part of it. A full schedule of Small Charity Week can be found here

Please also use these hashtags to keep up to date with our campaign involvement: #SmallButVital, #SmallCharityWeek, @WikimediaUK

A view upwards towards the domes of a mosque, taking in all the intricate artistic details.
The intricate artwork inside Sultan Ahmed Mosque Dome, by By Ank Kumar – Own work, CC BY-SA 4.0

New research identifies a strong fixation on the Western canon in Wikipedia’s coverage of visual arts, but offers ways towards a more truly global perspective.

If you were asked to name artworks or artists, how many would be non-Western? You might visit the free encyclopedia Wikipedia to support your search; after all 1.7 billion visitors per month do just this. However, new research highlights that even with Wikipedia’s approach –  ensuring anyone can edit and add content – there is still a bias towards Western artworks and artists. 

Artist and scholar Waqas Ahmed and veteran Wikipedian Dr. Martin Poulter were well-placed to investigate Wikipedia’s perspective on the visual arts. For example, they observed with the English language version of Wikipedia, “its ‘List of sculptors’ is 99% Western, ‘List of painters by nationality’ is around 75% European and its ‘List of contemporary visual artists’ is 80% European”. They probed whether this was just a problem with those articles, or just with the English language Wikipedia. “There appears to be a systemic cultural bias against non-Western visual art and artists across all Wikipedia platforms and in various languages”, Ahmed says. “We hope that this research will remind people that the Western artistic canon is but one of many worldwide – each deserving respect and appreciation on its own terms.” 

With a commitment to identifying and overcoming barriers to diversity online, Wikimedia UK, the national charity for the Wikimedia open knowledge movement, funded the research. Director of Programmes and Evaluation Daria Cybulska says “The vision of the Wikimedia movement has always been ‘a world in which every single human being can freely share in the sum of all knowledge.’ In recent years, we began more critically examining what that means, and what are the repercussions of existing biases and gaps in content on Wikipedia. Some biases on Wikipedia are better known than others – and this research shines a new light on cultural biases, and what can be done about them.”

The researchers measured the coverage of visual arts across the hundreds of different language versions of Wikipedia. They compared 100 artists from the Western canon to 100 significant artists from other cultures. Poulter pointed out that “Even equal coverage of the Western artists and the artists from all of the rest of the world would still be a pro-Western bias, because Europe is just one sixth of the world.” The research found that on average Wikipedia coverage was seven times greater for artists in the Western canon than for their non-Western counterparts.

One example compared the coverage of The Sistine Chapel in the Vatican and the Sultan Ahmed Mosque (Blue Mosque) in Istanbul. Both places of worship receive approximately 5 million visitors each per year, and have enormous cultural and artistic importance. Whilst both Michelangelo and Syed Kasim Gubari are considered geniuses within their respective cultures, Michelangelo’s Wikipedia articles total over 440 times greater length than Gubari and the Blue Mosque Ceiling does not have a single entry across the Wikimedia projects.

Past research has identified geographical biases and a gender gap on Wikipedia, where a small (but growing) minority of biographies are about women. This new research demonstrates and measures a specifically cultural bias. Ahmed and Poulter suggest we can all play our part through extending the coverage of art and artists outside the Western canon. For individual wiki contributors, this can involve creating, translating, or extending articles. Cultural institutions can help by sharing their knowledge and images.

As the research states societal biases have a long and well-documented history, rooted in systems of hegemony and oppression like imperialism.” These biases inevitably shape narratives online and are reinforced through echo chambers. The first step to creating an online world which truly reflects global cultures and histories is the awareness that we are far from there – yet.

The research paper is currently undergoing peer review but can be freely accessed as a pre-print through

A lush green swamp forest, with rays of tigt streaming through the trees.
Nawas Sharif’s 3rd Place Wiki Loves Earth 2020, Swamp surrounded by Mystery

Upload your photographs during June to be in with a chance of winning country and
international prizes.

This year Wales is taking part in the international photography competition ‘Wiki Loves Earth’ organised by the Wikimedia movement. Founded 9 years ago as a focus for nature heritage, the competition aims to raise awareness of protected sites globally.

Robin Owain who leads the Wikimedia UK projects across Wales said “Today we’re excited to be launching Wiki Loves Earth 2021 in Wales. This is one of the largest photography competitions in the world focusing on National Parks, Sites of Special Scientific Interest, Sites of Outstanding Natural Beauty and all other protected areas. Robin explained “The biodiversity and geology of Wales is unique, and this competition allows Welsh photographers to show the beauty of their landscape, the flora and fauna of their protected areas on a world stage.”

Organizations who will be supporting this exciting competition include Natural Resources Wales, Pembrokeshire and Snowdonia National Park, the Welsh Mountaineering Club, Edward Llwyd nature society, National Library of Wales, Wikimedia UK, WiciMon and others.

Jason Evans, National Wikimedian at the National Library said, “”The National Library of Wales is thrilled to support this competition, which will encourage people to explore and document Wales’ diverse wildlife and landscapes. This aligns with our commitment to community engagement and will complement our current Welsh Government funded project to support the improvement of Welsh language data and mapping services.”

This year, the international winning photos will represent two categories — landscapes, including individual trees if it has a preservation order, and macro or close-ups of animals, plants, fungi etc. Examples of past winners can be seen at #WikiLovesEarth. For instructions on how to upload, Google ‘Wiki Loves Earth 2021 in Wales’, or follow this link.

Any photographs taken in the past (even on a phone) can be uploaded during June, with prizes at both country and national level for the winners. Robin added “The competition is open to everyone. We ask our friends, volunteers and staff to put Wales on the international map by entering their photographs of our beautiful and diverse country.”

Read more about Wiki Loves Earth 2021 in Wales here on Wikimedia Commons.
More on Wiki Loves Earth can be found here.

Cystadleuaeth Ffotograffiaeth Byd-Eang Ar Natur – Cymru’n Cystadlu

Mae Cymru’n cymryd rhan eleni yn y gystadleuaeth ffotograffeg ‘Wiki Loves Earth’ (Wici’r
Holl Fyd yw’r enw Cymraeg) a drefnir gan y mudiad rhyngwladol Wikimedia, mam Wicipedia!

Ymhlith y sefydliadau eraill fydd yn cefnogi’r gystadleuaeth y mae Cyfoeth Naturiol Cymru, Parciau Cenedlaethol Eryri a Phenfro, Clwb Mynydda Cymru, Cymdeithas Edward Llwyd, Llyfrgell Genedlaethol Cymru, Wikimedia UK, WiciMon ac eraill.

Yn ôl Robin Owain sy’n arwain gwaith Wikimedia UK yng Nghymru, “Mae heddiw’n ddiwrnod cyffrous iawn, gan ein bod ni’n lansio Wici’r Holl Ddaear 2021 yng Nghymru. Dyma un o’r cystadlaethau ffotograffiaeth mwyaf yn y byd sy’n canolbwyntio ar Barciau Cenedlaethol, Safleoedd o Ddiddordeb Gwyddonol Arbennig, Safleoedd o Harddwch Naturiol Eithriadol ac ardaledd warchodedig eraill.” Esboniodd Robin “Mae bioamrywiaeth a daeareg Cymru’n unigryw, ac mae’r gystadleuaeth hon yn caniatáu i ffotograffwyr o Gymru gofnodi ac arddangos ei harddwch: y tirwedd a’r golygfeydd, a chyfoeth yr amrywiaeth bywyd sydd yma: anifeiliaid, ffwng a phlanhigion.”

Ymhlith y sefydliadau eraill fydd yn cefnogi’r gystadleuaeth y mae Cyfoeth Naturiol Cymru, Parciau Cenedlaethol Penfro ac Eryri, Clwb Mynydda Cymru, Cymdeithas Edward Llwyd, Llyfrgell Genedlaethol Cymru, WiciMon a Wikimedia UK.

Dywedodd Jason Evans, Wicimediwr Cenedlaethol Cymru, “Mae’n bleser gan Lyfrgell Genedlaethol Cymru gael cefnogi’r gystadleuaeth hon, a fydd yn annog pobl i archwilio a dogfennu bywyd gwyllt a thirweddau amrywiol Cymru. Mae hyn yn cyd-fynd â’n hymrwymiad ni i ymgysylltu â’r gymuned a, bydd yn ategu ein prosiect presennol, a ariennir gan Lywodraeth Cymru, i gefnogi gwella data a gwasanaethau mapio yn y Gymraeg.”

I gael hyd i’r cyfarwyddiadau sut i uwchlwytho, a rhagor am ymgais Cymru, Gwglwch Wici’r Holl Ddaear 2021 Cymru, neu dilynnwch y ddolen:

Gallwch uwwchlwytho lluniau rydych wedi eu cymryd unrhyw bryd yn y gorffennol, neu fynd ati o hyn tan ddiwedd Mehefin, ac yn ôl Robin, “Gall unrhyw beson gystadlu, gan ddefnyddio camera eu ffôn neu gamera pwrpasol! Be sy’n bwysig ydy ein bod yn cystadlu, fel cenedl ar lefel rhyngwladol!”

Dolen i wefan Cymru: yma ar Comin Wicimedia

Human Rights and the role of digital literacy

11:07, Wednesday, 12 2021 May UTC
File: Graphics for Wiki for human rights campaign 2021 by Jasmina El Bouamraoui and Karabo Poppy Moletsane. CC0 1.0

By George Colbourn, Fundraising Development Coordinator at Wikimedia UK

The accelerating use of digital technology means that it now plays a major role in most aspects of our lives. For many of us it is vital to be able to learn, work and communicate in these online environments, making digital literacy a key aspect of education in the 21st century. Yet acquiring these digital skills isn’t just important for our personal and professional lives; a digitally literate society has now become a necessity in preserving one of our most fundamental human rights. This article addresses why freedom of information has taken on a new context in the digital age, and the efforts Wikimedia UK are making to ensure the safe and responsible democratization of knowledge.

A few years ago, I was working for an American charity that provided support and refuge for North Korean refugees. It was here that I attended an event hosted by Yeon-Mi Park, a young North Korean who had fled her home country and been granted citizenship in the United States. Since then she has become a best selling author, documenting her life in North Korea and her perilous escape. Listening to her account was eye-opening; the lack of nutrition and health care that her family endured, and the brutal repression forced on her community seemed almost too dystopian to comprehend.

One particular aspect of her account that stuck in my mind was her realization of the world around her once she had found freedom. Her newfound ability to embrace previously unknown cultures and philosophies, to argue and to voice opinions were so new to her, and it was this aspect of her tale that had the most lasting impression on me. Stories like Yeon-Mi’s reaffirmed to me the importance of access to unbiased information, for human beings to feed their natural curiosity and for the overall development of societies. 

The concept of ‘freedom of information’ is vast in scope, an extension of the right to free expression and free speech, amongst others. Such liberties are largely taken for granted in free democracies. For those living in countries like the UK, the suppression of political opposition and curtailed press freedom seems reserved for those living within autocratic, oppressive regimes, similar to those experienced by Yeon-Mi.

While the protection of this fundamental right in the UK is grounded in democratic ideals and international declarations, easily accessed online misinformation now poses a unique predicament to current understandings of credible, unbiased content. The rapid emergence of digital mediums has led to an unprecedented democratization of knowledge, which has altered not just who can access information, but who can produce and distribute it. 

Fundamentally, this enables a greater capacity for individual expression and knowledge acquisition, key indicators of development. Yet its shortcomings arise in the form of misinformation that spreads erroneous content, as well as disinformation, which intends to deceive its recipients for political or monetary gain. False information such as this enables the politics of inequality and division that are emerging in the Global North. In order to defend ourselves from such a threat, it is vital that societies become more resilient to the spread of false information and aware of its ramifications.

The Wikimedia community can play a key role in advocating and promoting the need for responsible, credible knowledge sharing. At Wikimedia UK, we have been taking measures to prevent this threat to democracy, by launching programmes focused on digital, political and media literacy. Our intention is to provide our beneficiaries with the tools needed to become resilient to fake news and manipulative content, as well as also becoming contributors of high quality, reliable online content in their own right. In addition, through our work with cultural and academic institutions across the country, we are increasing both the amount and quality of our content, meaning that all users can benefit from the discoveries and teachings of academics across a variety of disciplines. 

This work takes place  across the country. Last year for example, we partnered with Menter Môn and the National Library of Wales to launch information literacy projects across the Isle of Anglesey as part of the new Wikimedia module on the Welsh Baccalaureate, and are now working in schools across the area. The WikiMôn project was presented with Mentrau Iaith Cymru’s technology award in January 2020 in recognition of its impact. 

Our projects and programmes that aim to increase digital literacy amongst communities can have a wider impact on society as a whole; a population equipped with skills needed to identify misinformation and fake news will be better prepared to act safely and responsibly in online environments. As a result, the internet can become a tool that re-enforces freedom of information in the 21st century, rather than acting as a threat to it.

The emergence of the internet has had drastic impacts on how we communicate across the world, and this includes the spread of ideas and factual information. Over the coming years, it is vital that we come to acknowledge the impact this can have on truth and fact, and how we prepare future generations to utilise the internet and open knowledge in the most responsible and effective way. For Wikimedia UK, this is one of our highest priorities.

This year’s WikiForHumanRights campaign centres around “Right to a Healthy Environment” — connecting the 20th Birthday “Human” theme with the global conversations about COVID-19, environmental crisis, like climate change, and human rights. The content-writing challenge with prizes runs from April 15th – May 15th. Learn more about the challenge and sign up for the challenge here. There is also a draft list of articles to be created or enriched with UN Human Rights.

What does decolonisation mean for Wikipedia?

14:57, Saturday, 03 2021 April UTC
File:Wikidata Map November 2019 Huge by Addshore CC0 1.0

By Richard Nevell, Project Coordinator at Wikimedia UK

Wikipedia is a magnificent tool for sharing knowledge with an enormous reach. Its pages are read 20 billion times every month. But like the world around us, it reflects some long-standing inequalities. At Wikimedia UK one of our strategic priorities is to increase the engagement and representation of marginalised people and subjects. We want to challenge these inequalities and help redress them by sharing information.

Universities and museums – keystones of education and heritage communication – are currently working out how to decolonise their curriculum and their collections. Doing so widens the variety of voices in interpreting and understanding our heritage. Within the context of higher education, courses often focus on the work of white, western people. Decolonisation aims to bring more diverse research into the curriculum.

Finding the best path to decolonisation can be tricky. Removing a subject from a curriculum, or a group of authors from a reading list, does not address the underlying structural inequalities which mean that reading lists tend to be predominantly white. Widening the range of sources used and exploring how the colonial past influences how particular subjects are approached and researched today is important.

Wikipedia itself needs to be decolonised. It began as an English language project, and while it is available in more than 300 languages English is still by far the largest. The dominance of English and a small group of languages risks eroding smaller languages. For people who speak more than one language, they tend to gravitate to where there is more content – even if it is in a language they are less fluent in. While language is an issue, it is one factor that Wikipedia needs to address and it extends to the content of our pages. The Wikipedia article about historians has images of eight people – all of whom are male and European.

Wikimedia UK is taking active steps to combat this. We run events improving Wikipedia’s coverage of under-represented subjects; we support residencies such as those at Coventry University and the Khalili Collections; and we run an annual conference supporting small language communities, the Celtic Knot. The conference has showcased the work of Welsh, Cornish, Irish, and Suomi communities amongst others and helped foster their work.

Coventry University has a programme of activity which aims to decolonise the curriculum, and our Wikimedian in Residence there, Andy Mabbett, is helping lecturers use Wikipedia as a way students can make information about a wide range of topics more accessible. The Khalili Collections comprise 35,000 items, including collections about Islamic art and Japanese culture. The Resident, Martin Poulter, has been sharing information about the collections through Wikipedia, Wikidata, and Wikimedia Commons, bringing them to a new audience. The images are already seen by a million people a month through Wikimedia.

We are working with organisations such the London College of Communication, UAL who have established a Decolonising Wikipedia Network. In the process of learning about Wikipedia and how it works, the students have an opportunity to redress some of the imbalances within Wikipedia. Many of our events try to highlight marginalised communities and figures who have otherwise been overlooked by Wikipedia; it is an ongoing process, often culturally sensitive and one which will take years.


By Lucy Hinnie, Wikimedian in Residence at he British Library, on Twitter at BL_Wikimedian

Hello, I’m Dr Lucy Hinnie and I’ve just joined the Digital Scholarship team at the British Library as the new Wikimedian-in-Residence, in conjunction with Wikimedia UK and the Eccles Centre. My role is to work with the Library to develop and support colleagues with projects using Wikidata, Wikibase and Wikisource.

I am delighted to be working alongside Wikimedia UK in this new role. Advocacy for both the development of open knowledge and the need for structural change has never been more pressing, and the opportunity to work with Wikimedia and the British Library to deliver meaningful change is immeasurably exciting.

Bringing underrepresented people and marginalised communities to the fore is a huge part of this remit, and I am looking to be as innovative in our partnerships as we can be, with a view to furthering the movement towards decolonisation. I’m going to be working with curators and members of staff throughout the Library to identify and progress opportunities to accelerate this work.

Wanuskewin Heritage Park, Saskatoon, December 2020

I have recently returned from a two-year stay in Canada, where I lived and worked on Treaty Six territory and the homeland of the Métis. Working and living in Saskatchewan was a hugely formative experience for me, and highlighted the absolute necessity of forward-thinking, reconciliatory work in decolonisation.

2020 was my year of immersion in Wikimedia – I participated in a number of events, including outreach work by Dr Erin O’Neil at the University of Alberta, Women in Red edit-a-thons with Ewan McAndrew at the University of Edinburgh and the Unfinished Business edit-a-thon run by Leeds Libraries and the British Library. In December 2020 I coordinated and ran my own Wikithon in conjunction with the National Library of Scotland, as part of my postdoctoral project ‘Digitising the Bannatyne MS’.

Front page of the Bannatyne MS, National Library of Scotland, Adv MS 1.1.6. (CC BY 4.0)

Since coming into post at the start of this March I have worked hard to make connections with organisations such as IFLA, Code the City and Art+Feminism. I’ve also been creating introductory materials to engage audiences with Wikidata, and thinking about how best to utilise the coming months.

Andrew Gray took up post as the first British Library Wikipedian in Residence nearly ten years ago, you can read more about this earlier residency here and here. So much has changed since then, but reflection on the legacy of Wikimedia activity is a crucial part of ensuring that the work we do is useful, engaging, vibrant and important. I want to use creative thinking to produce output that opens up BL digital collections in relevant, culturally sensitive and engaging ways.

I am excited to get started! I’ll be posting on the British Library’s Digital Scholarship blog regularly about my residency, so please do subscribe to the blog to follow my progress.


Digital Skills for Heritage poster by Katie Crampton (WMUK). CC BY-S.A 4.0.

Wikimedia UK, the national charity for the global Wikimedia movement is among the successful organisations awarded funding by The National Lottery Heritage Fund Digital Skills for Heritage initiative, to raise digital skills and confidence across the UK heritage sector. National Lottery funded Digital Skills for Heritage has expanded thanks to an additional £1 million from the Government’s £1.57 billion Culture Recovery Fund.

Wikimedia UK’s project ‘Developing open knowledge skills, tools and communities of practice for sustainable digital preservation’ is one of 12 grants announced today, awarded to address three distinct areas; driving digital innovation and enterprise, providing answers to organisations’ most pressing concerns, and empowering collaborative work to achieve common aims.

Digital skills are more relevant and necessary than ever as heritage organisations affected by the coronavirus pandemic look toward a more resilient future. In October 2020, The National Lottery Heritage Fund published the findings of its survey of over 4,000 staff, trustees and volunteers at 281 heritage organisations, identifying the current digital skills and attitudes of the sector. The results highlighted what tools and training organisations needed to weather the coronavirus pandemic and move forward into a more resilient and creative future. 

Wikimedia UK has a strong track record of collaborating across heritage and cultural organisations, developing strategies to embed open knowledge and engaging with wider virtual audiences. Over two years £119,000 funding will develop skills, tools and communities of practice for the sustainable digital preservation of heritage. Engagement will be through a range of opportunities, from short webinars explaining the role of open knowledge and the scope it holds for sharing and engaging in collections, through to close collaboration on the development and delivery of strategic plans for open knowledge; enabling participating organisations to ensure that heritage is better explained as well as preserved for the long term.

Chief Executive of Wikimedia UK, Lucy Crompton-Reid, said “Wikimedia UK is excited to have been awarded funding from the National Lottery Heritage Fund to deliver this vital and timely project. Our ambition is to equip heritage staff and volunteers with the skills and tools to share their content and collections online, with a particular focus on increasing access to underrepresented cultural heritage. We look forward to working in partnership with the heritage sector to make this happen, and to ensure that the extraordinary reach and longevity of Wikipedia and the other Wikimedia projects benefits everyone.”  

Josie Fraser, Head of Digital Policy at The National Lottery Heritage Fund, said, “Throughout the coronavirus pandemic we have all seen the essential role that digital skills have played in helping heritage organisations continue to work, communicate and connect. We are proud that our National Lottery funded Digital Skills for Heritage projects have provided the sector with practical support when it has been most needed.  

“The £1 million Culture Recovery Fund boost from DCMS recognises the value of digital skills and allows us to expand the initiative. These new grants focus on what organisations have told us they need most – digital innovation, enterprise and business skills to improve and rethink how the sector operates.”

Caroline Dinenage, Minister for Digital and Culture, said,
“I have been really impressed by the innovative ways that sites and projects have already pivoted during the pandemic, but now more than ever it is essential that our heritage sector has the latest digital skills to bring our history to life online. This £1 million boost from the Culture Recovery Fund will ensure that staff and volunteers have the skills they need to keep caring for the past and conserving for the future through the sector’s reopening and recovery.”

Read more about the awards here.

Further information & images:

About The National Lottery Heritage Fund
Using money raised by the National Lottery, we Inspire, lead and resource the UK’s heritage to create positive and lasting change for people and communities, now and in the future. Website:

Follow @HeritageFundUK on Twitter, Facebook and Instagram and use #NationalLotteryHeritageFund and #HereForDigital

Women EmpowerED: Wikipedia Editathon

15:20, Wednesday, 03 2021 March UTC

By Sarah Lappin, final year computer science and artificial intelligence student at the University of Edinburgh, and President for Edinburgh University Women in STEM (EUWiSTEM). 

In Summer 2020, I organised my first Wikipedia Editathon as part of the Women in STEM Connect series. With an eye-opening talk, physicist and Wikipedia diversity advocate, Dr Jess Wade introduced me to the issue of underrepresentation on Wikipedia and left me, and seemingly all our attendees, feeling inspired to start editing Wikipedia. With training from University of Edinburgh Wikimedian in Residence, Ewan McAndrew, and a total of 15 editors, we were able to contribute fourteen thousand words, edit forty-two articles and add five new articles to Wikipedia by the end of the event. I created my first Wikipedia article on Dr Jessica Borger, an Australian T-Cell immunologist, and caught the editing bug!

Women EmpowerED by Sarah Lappin.

To celebrate International Womens’ Day 2021, Edinburgh University Women in STEM (EUWiSTEM) has joined forces with 5 other female and gender minority lead societies: Edinburgh University Women in Business, Women in Law, Women in Politics and International Relations, EconWomen, and Hoppers, the society of women and gender minorities in Informatics. Together we are hosting Women EmpowerED, a week-long celebration aiming to showcase the achievements of women in different fields and discuss the issues women currently face, with a focus on cross-disciplinary inclusion. 

The theme of International Womens’ Day 2021 is #ChooseToChallenge. Fitting with that theme, it is important we acknowledge the achievements of diversity and inclusion initiatives but it is equally crucial that we continue to challenge the norms and push for further improvements. Women EmpowerED has chosen to kick-off our celebrations on March 6th with an event that fits these aims perfectly – a Wikipedia Edit-a-thon. At this event, we aim to improve the representation of women and gender minorities on Wikipedia, focusing on those who have chosen to challenge societal norms, and inspirational women in the host societies’ fields. Ewan McAndrew will be providing editing training during the event, helping us to make Wikipedia editing accessible to all. 

We are pleased to be welcoming Bruce and John Usher Professor of Public Health in the Usher Institute, Dr Linda Bauld, to give a talk on the importance of representation online at the Edit-a-thon. Online platforms now have a massive influence on society, and most of these platforms are rife with internet ‘trolls’, and political agendas. As the fifth most visited website worldwide, and what is designed to be a source of reliable information for users, it is crucial that Wikipedia is free from bias and abuse. As Jess Wade explains in a 2018 TEDXLondon talk, “the majority of history has been written by men, about men, for other men.” 

But we can start to change that through Wikipedia. 

In our event, not only will we add and improve articles for women and gender minorities, but we also hope to increase the diversity of its editors, making a lasting impact on Wikipedia. 

If you would like to learn how to edit articles or assist in our mission to improve diversity on Wikipedia, you can get tickets to our editathon on eventbrite.

For more information on Women EmpowerED visit our website.

The wiki gender gap and Women’s History Month

11:16, Monday, 01 2021 March UTC
Women’s History Month banner by Katie Crampton (WMUK). CC BY-S.A 4.0.

By Lucy Crompton-Reid, Chief Executive of Wikimedia UK.

Wikipedia’s vision is a world in which everyone has access to the sum of the world’s knowledge, but to do this, we must have representation from all the world’s voices. For the past five years Wikimedia UK has been working to address inequality and bias across the projects, with a key strategic aim being to increase the engagement and representation of marginalised people and subjects on Wikipedia. Whilst there are all sorts of ways in which structures of power and privilege can exclude people, during Women’s History Month we will be shining a light on the gender gap, and thinking critically about how women are represented on Wikipedia and the Wikimedia projects. 

In a world where women are still systematically oppressed in many countries – and where, even in countries with gender equality written into the legislative framework, systemic bias still pervades – the ‘gender gap’ can feel like an intractable issue. We have seen how the Covid-19 pandemic has disproportionately affected women, and deepened pre-existing inequalities, despite men being more likely to die from the disease. Globally, according to the UN, even the limited gains made in the past decade on issues such as education, early marriage and political representation are at risk of being rolled back. Here in the UK, a report by the House of Commons Women and Equalities Committee acknowledges the particular and disproportionate economic impact on people who are already vulnerable, and highlights how existing gendered inequalities have been ignored and sometimes exacerbated by the pandemic policy response. 

Within this context, working to increase and improve the representation of women, non-binary people and related subjects on Wikimedia is more important than ever. If ‘you can’t be what you can’t see’ then we need to make sure that the world’s free knowledge resource – which is read more than 15 billion times a month – is telling everyone’s story. This includes women, people of colour, disabled people, LGBTQ+ folks and those living outside the United States and Western Europe. Those people, and those stories, exist – they don’t need to be written into Wikipedia to come alive. But for many of us, Wikipedia is the ‘first stop’ when we want to learn about the world. By writing women’s stories into Wikipedia and the wider information ecosystem – by making them more discoverable – we will be helping women around the world discover who they are and can be. 

People of any gender can, and do, commit time and energy to addressing gender inequality on Wikimedia. This might be by creating new articles about women, training new female editors, raising awareness of the gender gap or myriad other things. Increasingly, editing Wikipedia is being recognised as a form of knowledge activism which helps to address gaps in information, and generate discussions about how knowledge and information is created, curated and contested online. 

Fixing the gender gap on Wikimedia is a huge challenge. Much has been written about the reasons for this, as well as the many initiatives and tools that have been developed to try to address the lack of parity on Wikipedia and the other Wikimedia projects. I’m not going to repeat that here, but instead I’m going to introduce some of the extraordinary people involved in this work. I’m pleased and proud that Wikimedia UK will be talking to the following four amazing women as part of a special Women’s History Month series of interviews, with one video to be released every Monday in March:

  • Kira Wisniewski, Executive Director of Art+Feminism – an intersectional feminist non-profit organisation that directly addresses the information gap about gender, feminism, and the arts on the internet. 
  • Dr Rebecca O’Neill, Project Co-ordinator at Wikimedia Ireland, Vice-Chair of Women in Technology and Science Ireland and Secretary of the National Committee for Commemorative Plaques in Science and Technology.
  • Dr Victoria Leonard – Fellow of the Royal Historical Society, Postdoctoral Researcher in Late Ancient History and Founder and Co-Chair of the Women’s Classical Committee – who will be talking about her work to increase the visibility of women in classics on Wikipedia, through #WCCWiki.
  • Dr Alice White, Digital Editor at Wellcome Collection and former Wikimedian in Residence at Wellcome Library. 

I will be giving the final interview in March to round up the series and reflect on the future priorities, challenges and opportunities for Wikimedia and the gender gap.

Recognising our own privilege when it comes to knowledge and information is important. I feel very privileged to be writing this blogpost for Women’s History Month for Wikimedia UK; to have this platform when so many women’s voices aren’t heard. On that note, if there’s anything you would like me to include in my interview later this month, please let me know. I’m keen to hear about and to showcase all the different ways in which people and communities are addressing underrepresentation on Wikimedia, so please contact me on or by twitter (@lcromptonreid) if you would like me to share your story.

Gender and deletion on Wikipedia

20:36, Monday, 06 2019 May UTC

So, a really interesting question cropped up this weekend:

I’m trying to find out how many biographies of living persons exist on the English Wikipedia, and what kind of data we have on them. In particular, I’m looking for the gender breakdown. I’d also like to know when they were created; average length; and whether they’ve been nominated for deletion.

This is, of course, something that’s being discussed a lot right now; there is a lot of emerging push-back against the excellent work being done to try and add more notable women to Wikipedia, and one particular deletion debate got a lot of attention in the past few weeks, so it’s on everyone’s mind. And, instinctively, it seems plausible that there is a bias in the relative frequency of nomination for deletion – can we find if it’s there?

My initial assumption was, huh, I don’t think we can do that with Wikidata. Then I went off and thought about it for a bit more, and realised we could get most of the way there of it with some inferences. Here’s the results, and how I got there. Thanks to Sarah for prompting the research!

(If you want to get the tl;dr summary – yes, there is some kind of difference in the way older male vs female articles have been involved with the deletion process, but exactly what that indicates is not obvious without data I can’t get at. The difference seems to have mostly disappeared for articles created in the last couple of years.)

Statistics on the gender breakdown of BLPs

As of a snapshot of yesterday morning, 5 May 2019, the English Wikipedia had 906,720 articles identified as biographies of living people (BLPs for short). Of those, 697,402 were identified as male by Wikidata, 205,117 as female, 2464 had some other value for gender, 1220 didn’t have any value for gender (usually articles on groups of people, plus some not yet updated), and 517 simply didn’t have a connected Wikidata item (yet). Of those with known gender, it breaks down as 77.06% male, 22.67% female, and 0.27% some other value. (Because of the limits of the query, I didn’t try and break down those in any more detail.)

This is, as noted, only articles about living people; across all 1,626,232 biographies in the English Wikipedia with a gender known to Wikidata, it’s about 17.83% female, 82.13% male, and 0.05% some other value. I’ll be sticking to data on living people throughout this post, but it’s interesting to compare the historic information.

So, how has that changed over time?

BLPs by gender and date of creation

This graph shows all existing BLPs, broken down by gender and (approximately) when they were created. As can be seen, and as might be expected, the gap has closed a bit over time.

Percentage of BLPs which are female over time

Looking at the ratio over time (expressed here as %age of total male+female), the relative share of female BLPs was ~20% in 2009. In late 2012, the rate of creation of female BLPs kicked up a gear, and from then on it’s been noticeably above the long-term average (almost hitting 33% in late 2017, but dropping back since then). This has driven the overall share steadily and continually upwards, now at 22.7% (as noted above).

Now the second question, do the article lengths differ by gender? Indeed they do, by a small amount.

BLPs by current article size and date of creation

Female BLPs created at any time since 2009 are slightly longer on average than male ones of similar age, with only a couple of brief exceptions; the gap may be widening over the past year but it’s maybe too soon to say for sure. Average difference is about 500 bytes or a little under 10% of mean article size – not dramatic but probably not trivial either. (Pre-2009 articles, not shown here, are about even on average)

Note that this is raw bytesize – actual prose size will be smaller, particularly if an article is well-referenced; a single well-structured reference can be a few hundred characters. It’s also the current article size, not size at creation, hence why older articles tend to be longer – they’ve had more time to grow. It’s interesting to note that once they’re more than about five years old they seem to plateau in average length.

Finally, the third question – have they been nominated for deletion? This was really interesting.

Percentage of BLPs which have previously been to AFD, by date of creation and gender

So, first of all, some caveats. This only identifies articles which go through the structured “articles for deletion” (AFD) process – nomination, discussion, decision to keep or delete. (There are three deletion processes on Wikipedia; the other two are more lightweight and do not show up in an easily traceable form). It also cannot specifically identify if that exact page was nominated for deletion, only that “an article with exactly the same page name has been nominated in the past” – but the odds are good they’re the same if there’s a match. It will miss out any where the article was renamed after the deletion discussion, and, most critically, it will only see articles that survived deletion. If they were deleted, I won’t be able to see them in this analysis, so there’s an obvious survivorship bias limiting what conclusions we can draw.

Having said all that…

Female BLPs created 2009-16 appear noticeably more likely than male BLPs of equivalent age to have been through a deletion discussion at some point in their lives (and, presumably, all have been kept). Since 2016, this has changed and the two groups are about even.

Alongisde this, there is a corresponding drop-off in the number of articles created since 2016 which have associated deletion discussions. My tentative hypothesis is that articles created in the last few years are generally less likely to be nominated for deletion, perhaps because the growing use of things like the draft namespace (and associated reviews) means that articles are more robust when first published. Conversely, though, it’s possible that nominations continue at the same rate, but the deletion process is just more rigorous now and a higher proportion of those which are nominated get deleted (and so disappear from our data). We can’t tell.

(One possible explanation that we can tentatively dismiss is age – an article can be nominated at any point in its lifespan so you would tend to expect a slowly increasing share over time, but I would expect the majority of deletion nominations come in the first weeks and then it’s pretty much evenly distributed after that. As such, the drop-off seems far too rapid to be explained by just article age.)

What we don’t know is what the overall nomination for deletion rate, including deleted articles, looks like. From our data, it could be that pre-2016 male and female articles are nominated at equal rates but more male articles are deleted; or it could be that pre-2016 male and female articles are equally likely to get deleted, but the female articles are nominated more frequently than they should be. Either of these would cause the imbalance. I think this is very much the missing piece of data and I’d love to see any suggestions for how we can work it out – perhaps something like trying to estimate gender from the names of deleted articles?

Update: Magnus has run some numbers on deleted pages, doing exactly this – inferring gender from pagenames. Of those which were probably a person, ~2/3 had an inferred gender, and 23% of those were female. This is a remarkably similar figure to the analysis here (~23% of current BLPs female; ~26% of all BLPs which have survived a deletion debate female)

So in conclusion

  • We know the gender breakdown: skewed male, but growing slowly more balanced over time, and better for living people than historical ones.
  • We know the article lengths; slightly longer for women than men for recent articles, about equal for those created a long time ago.
  • We know that there is something different about the way male and female biographies created before ~2017 experience the deletion process, but we don’t have clear data to indicate exactly what is going on, and there are multiple potential explanations.
  • We also know that deletion activity seems to be more balanced for articles in both groups created from ~2017 onwards, and that these also have a lower frequency of involvement with the deletion process than might have been expected. It is not clear what the mechanism is here, or if the two factors are directly linked.

How can you extract this data? (Yes, this is very dull)

The first problem was generating the lists of articles and their metadata. The English Wikipedia category system lets us identify “living people”, but not gender; Wikidata lets us identify gender (property P21), but not reliably “living people”. However, we can creatively use the petscan tool to get the intersection of a SPARQL gender query + the category. Instructing it to explicitly use Wikipedia (“enwiki” in other sources > manual list) and give output as a TSV – then waiting for about fifteen minutes – leaves you with a nice clean data dump. Thanks, Magnus!

(It’s worth noting that you can get this data with any characteristic indexed by Wikidata, or any characteristic identifiable through the Wikipedia category schema, but you will need to run a new query for each aspect you want to analyse – the exported data just has article metadata, none of the Wikidata/category information)

The exported files contain three things that are very useful to us: article title, pageid, and length. I normalised the files like so:

grep [0-9] enwiki_blp_women_from_list.tsv | cut -f 2,3,5 > women-noheader.tsv

This drops the header line (it’s the only one with no numeric characters) and extracts only the three values we care about (and conveniently saves about 20MB).

This gives us two of the things we want (age and size) but not deletion data. For that, we fall back on inference. Any article that is put through the AFD process gets a new subpage created at “Wikipedia:Articles for deletion/PAGENAME”. It is reasonable to infer that if an article has a corresponding AFD subpage, it’s probably about that specific article. This is not always true, of course – names get recycled, pages get moved – but it’s a reasonable working hypothesis and hopefully the errors are evenly distributed over time. I’ve racked my brains to see if I could anticipate a noticeable difference here by gender, as this could really complicate the results, but provisionally I think we’re okay to go with it.

To find out if those subpages exist, we turn to the enwiki dumps. Specifically, we want “enwiki-latest-all-titles.gz” – which, as it suggests, is a simple file listing all page titles on the wiki. Extracted, it comes to about 1GB. From this, we can extract all the AFD subpages, as so:

grep "Articles_for_deletion/" enwiki-latest-all-titles | cut -f 2 | sort | uniq | cut -f 2 -d / | sort | uniq > afds

This extracts all the AFD subpages, removes any duplicates (since eg talkpages are listed here as well), and sorts the list alphabetically. There are about 424,000 of them.

Going back to our original list of articles, we want to bin them by age. To a first approximation, pageid is sequential with age – it’s assigned when the page is first created. There are some big caveats here; for example, a page being created as a redirect and later expanded will have the ID of its initial creation. Pages being deleted and recreated may get a new ID, pages which are merged may end up with either of the original IDs, and some complicated page moves may end up with the original IDs being lost. But, for the majority of pages, it’ll work out okay.

To correlate pageID to age, I did a bit of speculative guessing to find an item created on 1 January and 1 July every year back to 2009 (eg pageid 43190000 was created at 11am on 1 July 2014). I could then use these to extract the articles corresponding to each period as so:

awk -F '\t' '$2 >= 41516000 && $2 < 43190000' < men-noheader.tsv > bins/2014-1-M
awk -F '\t' '$2 >= 43190000 && $2 < 44909000' < men-noheader.tsv > bins/2014-2-M

This finds all items with a pageid (in column #2 of the file) between the specified values, and copies them into the relevant bin. Run once for men and once for women.

Then we can run a short report, along these lines (the original had loops in it):

  cut -f 1 bins/2014-1-M | sort > temp-M
  echo -e 2014-1-M"\tM\t"`cat bins/2014-1-M | wc -l`"\t"`awk '{ total += $3; count++ } END { print total/count }' bins/2014-1-M`"\t"`comm -1 -2 temp-M afds | wc -l` >> report.tsv

This adds a line to the file report.tsv with (in order) the name of the bin, the number of entries in it, the mean value of the length column, and a count of the number which also match names in the afds file. (The use of the temp-M file is to deal with the fact that the comm tool needs properly sorted input).

After that, generating the data is lovely and straightforward – drop the report into a spreadsheet and play around with it.

George Ernest Spero, the vanishing MP

15:04, Sunday, 17 2019 March UTC

As part of the ongoing Wikidata MPs project, I’ve come across a number of oddities – MPs who may or may not have been the same person, people who essentially disappear after they leave office, and so on. Tracking these down can turn into quite a complex investigation.

One such was George Ernest Spero, Liberal MP for Stoke Newington 1923-24, then Labour MP for Fulham West 1929-30. His career was cut short by his resignation in April 1930; shortly afterwards, he was declared bankrupt. Spero had already left the country for America, and nothing more was heard of him. The main ambiguity was when he died – various sources claimed either 1960 or 1976, but without it being clear which was more reliable, or any real details on what happened to him after 1930. In correspondence with Stephen Lees, who has been working on an incredibly useful comprehensive record of MP’s death-dates, I did some work on it last year and eventually confirmed the 1960 date; I’ve just rediscovered the notes from this and since it was an interesting little mystery, thought I’d post them.

George Spero, MP and businessman

So, let’s begin with what we know about him up to the point at which he vanished.

George Ernest Spero was born in 1894. He began training at the Royal Dental Hospital in 1912, and served in the RNVR as a surgeon during the First World War. He had two brothers who also went into medicine; Samuel was a dentist in London (and apparently also went bankrupt, in 1933), while Leopold was a surgeon or physician (trained at St. Mary’s, RNVR towards the end of WWI, still in practice in the 1940s). All of this was reasonably straightforward to trace, although oddly George’s RNVR service records seem to be missing from the National Archives.

After the war, he married Rina Ansley (nee Rina Ansbacher, born 14 March 1902) in 1922; her father was a wealthy German-born stockbroker, resident in Park Lane, who had naturalised in 1918. They had two daughters, Rachel Anne (b. 1923) and Betty Sheila (b. 1928). After his marriage, Spero went into politics in Leicester, where he seems to have been living, and stood for Parliament in the 1922 general election. The Nottingham Journal described him as for “the cause of free, unfettered Liberalism … Democratic in conviction, he stands for the abolition of class differences and for the co-operation of capital and labour.” However, while this was well-tailored to appeal to the generally left-wing voters of Leicester West, and his war record was well-regarded, the moderate vote was split between the Liberal and National Liberal candidates, with Labour taking the seat.

The Conservative government held another election in 1923, aiming to strengthen a small majority (does this sound familiar?), and Spero – now back in London – contested Stoke Newington, then a safe Conservative seat, again as a left Liberal. With support from Labour, who did not contest the seat, Spero ran a successful campaign and unseated the sitting MP. He voted in support of the minority Labour government on a number of occasions, and was one of the small number of Liberal rebels who supported them in the final no-confidence vote. However, this was not enough to prevent Labour fielding a candidate against him in 1924; the Conservative candidate took 57% of the vote, with the rest split evenly between Labour and Liberal.

Spero drifted from the Liberals into the Labour Party, probably a more natural home for his politics, joining it in 1925. By the time of the next general election, in May 1929, he had become the party’s candidate for Fulham West, winning it from the Conservatives with 45% of the vote.

He was a moderately active Government backbencher for the next few months, including being sent as a visitor to Canada during the recess in September 1929, travelling with his wife. While overseas, she caused some minor amusement to the British papers after reporting the loss of a £6,000 pearl necklace – they were delighted to report this alongside “socialist MP”. He was last recorded voting in Hansard in December, and did not appear in 1930. In February and March he was paired for votes, with a newspaper report in early March stating that he had been advised to take a rest to avoid a complete nervous breakdown about the start of the year, and had gone to the South of France, but “hopes to return to Parliament before the month is out”. However, on 9th April he formally took the Chiltern Hundreds (it is interesting that a newspaper report suggested his local party would choose whether to accept the resignation).

However, things were moving quickly elsewhere. A case was brought against him in the High Court for £10,000, arising from his sale of a radio company in 1928-29. During the court hearing, at the end of May, it was discovered that a personal cheque for £4000 given by Spero to guarantee the company’s debts had been presented to his bank in October 1929, but was not honoured. He had at this point claimed to be suing the company for £20,000, buying six months legal delay, sold his furniture, and – apparently – left the country for America. Bankruptcy proceedings followed later that year (where he was again stated to be in America) and, unsurprisingly, his creditors seem to have received very little.

At this point, the British trail and the historic record draw to a gentle close. But what happened to him?

The National Portrait Gallery gave his death as 1960, while an entry in The Palgrave Dictionary of Anglo-Jewish History reported that they had traced his death to 1976 in Belgrade, Yugoslavia (where, as a citizen, it was registered with the US embassy). Unfortunately, it did not go into any detail about how they worked this out, and this just heightened the mystery – if it was true, how had a disgraced ex-MP ended up in Yugoslavia on a US passport three decades later? And, conversely, who was it had died in 1960?

George Spears, immigrant and doctor

We know that Spero went to America in 1929-30; that much seemed to be a matter of common agreement. Conveniently, the American census was carried out in April 1930, and the papers are available. On 18 April, he was living with his family in Riverside Drive, upper Manhattan; all the names and ages line up, and Spero is given as a medical doctor, actively working. Clearly they were reasonably well off, as they had a live-in maid, and it seems to be quite a nice area.

In 1937, he petitioned for American citizenship in California, noting that he had lived there since March 1933. As part of the process, he formally notified that he intended to change his name to George Ernest Spears. (He also gave his birthdate as 2 March 1894, of which more later).

While we can be reasonably confident these are the same man due to the names and dates of the family, the match is very neatly confirmed by the fact that the citizenship papers have a photograph, which can be compared to an older newspaper one. There is fifteen years difference, but we can see the similarities between the prospective MP of 27 and the older man of 43.

George Spears, with the same family, then reappears in the 1940 census, back in Riverside Drive. He is now apparently practicing as an optician, and doing well – income upwards of $6000. Finally, we find a draft record for him living in Huntingdon, Long Island at some point in 1942. Note his signature here, which is visibly the same hand as in 1937, except “E. Spears” not “Ernest Spero”.

It is possible he reverted to his old name for a while – there are occasional appearances of a Dr. George Spero, optometrist, in the New York phone books between the 1940s and late 1950s. Not enough detail to be sure either way, though.

So at this point, we can trace Spero/Spears continually from 1930 to 1942. And then nothing, until on 7 January 1960, George E. Spears, born 2 March 1894, died in California. Some time later, in June 1976, George Spero, born 11 April 1894, died in Belgrade, Yugoslavia, apparently a US citizen. Which one was our man?

The former seemed more likely, but can we prove it? The death details come from an index, which gives a mother’s maiden name of “Robinson” – unfortunately the full certificate isn’t there and I did not feel up to trying to track down a paper Californian record to see what else it said.

If we return to the UK, we can find George Spero in the 1901 census in Dover, with his parents Isidore Sol [Solomon], a ‘dental mechanic’, and Rachel, maiden name unknown. The family later moved to London, the parents naturalised, Isidore died in 1925 – and probate goes to “George Ernest Spero, physician”, which seems to confirm that this is definitely the right family and not a different George Spero. The 1901 censuses note that two of the older children were born in Dublin, so we can trace them in the Irish records. Here we have an “Israel S Spero” marrying Rachel Robinson in 1884, and a subsequent child born to Solomon Israel Spero and Rachel Spero nee Robinson. There are a few other Speros or Spiros appearing in Dublin, but none married around the right time, and none with such similar names. If Israel Solomon Spero is the same as Isidore Solomon Spero, this all ties up very neatly.

It leaves open the mystery, however, of who died in Yugoslavia. It seems likely this was a completely different man (who had not changed his name), but I have completely failed to trace anything about him. A pity – it would have been nice to definitively close off that line of enquiry.

Generalized classification of claims’ meaningworthiness

17:12, Thursday, 03 2019 January UTC

Generalizing a Foucault comment from 1970 on accepted shared knowledge, truth, and power:

The system of [assigning value to statements] is essential to the structure and functioning of our society.  There is a constant battle around this – the ensemble of rules according to which [valued and devalued statements] are separated and specific effects of power are attached to the former.  This is a battle about the status of truth and the practical and political role it plays. It is necessary to think of these political problems not in terms of science and ideology, but in terms of accepted knowledge and power.

Here are a few propositions, to be further tested and evaluated:

  1. Let τ be a system of ordered procedures for the production, regulation, distribution, [evaluation], and operation of statements.  A system linked in a circular way with systems of power that produce and sustain it, and with the effects of power which it induces and which extend it.  A regime of systems.  Such a regime is not merely ideological or superstructural; its [early stage] was a condition of the formation and development of its environment.
  2. The essential [social, political] problem for designers and maintainers of τ is not to criticize its ideology or [relation] to science, or to ensure a particular scientific practice is [correct], but to ascertain how to constitute new politics of knowledge. The problem is not changing people’s beliefs, but the political, practical, institutional regime of producing and evaluating statements about the world.
  3. This is not a matter of emancipating τ from systems of power (which would be an illusion, for it is already power) but of detaching its power from the forms of hegemony [social, economic, cultural], within which it operated [when it was designed].
  4. These [political, social, economic, cultural, semantic] questions are not error, illusion, ideology, or distraction: they illuminate truth itself.

I have been thinking about this in the context of recent work with the Knowledge Futures Group and the Truth & Trust coalition gathered around TED.

(from an interview with Foucault first published in L’Arc 70.)

Anonymizing data on the users of Wikipedia

16:22, Wednesday, 25 2018 July UTC

Updated for the new year: with specific things we can all start doing 🙂

Wikipedia currently tracks and stores almost no data about its readers and editors.  This persistently foils researchers and analysts inside the WMF and its projects; and is largely unnecessary.

Not tracked last I checked: sessions, clicks, where on a page readers spend their time, time spent on page or site, returning users.  There is a small exception: data that can fingerprint a user’s use of the site is stored for a limited time, made visible only to developers and checkusers, in order to combat sockpuppets and spam.

This is all done in the spirit of preserving privacy: not gathering data that could be used by third parties to harm contributors or readers for reading or writing information that some nation or other powerful group might want to suppress.  That is an essential concern, and Wikimedia’s commitment to privacy and pseudonymity is wonderful and needed.

However, the data we need to improve the site and understand how it is used in aggregate doesn’t require storing personally identifiable data that can be meaningfully used to target editors in specific. Rather than throwing out data that we worry would expose users to risk, we should be fuzzing and hashing it to preserve the aggregates we care about.  Browser fingerprints, including the username or IP, can be hashed; timestamps and anything that could be interpreted as geolocation can have noise added to them.

We could then know things such as, for instance:

  • the number of distinct users in a month, by general region
  • how regularly each visitor comes to the projects; which projects + languages they visit [throwing away user and article-title data, but seeing this data across the total population of ~1B visitors]
  • particularly bounce rates and times: people finding the site, perhaps running one search, and leaving
  • the number of pages viewed in a session, its tempo, or the namespaces they are in [throwing away titles]
  • the reading + editing flows of visitors on any single page, aggregated by day or week
  • clickflows from the main page or from search results [this data is gathered to some degree; I don’t know how reusably]

These are just rough descriptions — great care must be taken to vet each aggregate for preserving privacy. but this is a known practice that we could do with expert attention..

What keeps us from doing this today?  Some aspects of this are surely discussed in places, but is hard to find.  Past discussions I recall were brought to an early end by [devs worrying about legal] or [legal worrying about what is technically possible].

Discussion of obstacles and negative-space is generally harder to find on wikis than discussion of works-in-progress and responses to them: a result of a noun-based document system that requires discussions to be attached to a clearly-named topic!

What we can do, both researchers and data fiduciaries:

  • As site-maintainers: Start gathering this data, and appoint a couple privacy-focused data analysts to propose how to share it.
    • Identify challenges, open problems, solved problems that need implementing.
  • Name the (positive, future-crafting, project-loving) initiative to do this at scale, and the reasons to do so.
    • By naming the positive aspect, distinguish this from a tentative caveat to a list of bad things to avoid, which leads to inaction.  (“never gather data!  unless you have extremely good reasons, someone else has done it before, it couldn’t possibly be dangerous, and noone could possibly complain.“)
  • As data analysts (internal and external): write about what better data enables.  Expand the list above, include real-world parallels.
    • How would this illuminate the experience of finding and sharing knowledge?
  • Invite other sociologists, historians of knowledge, and tool-makers to start working with stub APIs that at first may not return much data.

Without this we remain in the dark —- and, like libraries who have found patrons leaving their privacy-preserving (but less helpful) environs for data-hoarding (and very handy) book-explorers, we remain vulnerable to disuse.

Back in January, I wrote up some things I was aiming to do this year, including:

Firstly, I’d like to clear off the History of Parliament work on Wikidata. I haven’t really written this up yet (maybe that’s step 1.1) but, in short, I’m trying to get every MP in the History of Parliament database listed and crossreferenced in Wikidata. At the moment, we have around 5200 of them listed, out of a total of 22200 – so we’re getting there. (Raw data here.) Finding the next couple of thousand who’re listed, and mass-creating the others, is definitely an achievable task.

Well, seven months later, here’s where it stands:

  • 9,372 of a total 21,400 (43.7%) of History of Parliament entries been matched to records for people in Wikidata.
  • These 9,372 entries represent 7,257 people – 80 have entries in three HoP volumes, and 1,964 in two volumes. (This suggests that, when complete, we will have about ~16,500 people for those initial 21,400 entries – so maybe we’re actually over half-way there).
  • These are crossreferenced to a lot of other identifiers. 1,937 of our 7,257 people (26.7%) are in the Oxford Dictionary of National Biography, 1,088 (15%) are in the National Portrait Gallery database, and 2,256 (31.1%) are linked to their speeches in the digital edition of Hansard. There is a report generated each night crosslinking various interesting identifiers.
  • Every MP in the 1820-32 volume (1,367 of them) is now linked and identified, and the 1790-1820 volume is now around 85% complete. (This explains the high showing for Hansard, which covers 1805 onwards)
  • The metadata for these is still limited – a lot more importing work to do – but in some cases pretty decent; 94% of the 1820-32 entries have a date of death, for example.

Of course, there’s a lot more still to do – more metadata to add, more linkages to make, and so on. It still does not have any reasonable data linking MPs to constituencies, which is a major gap (but perhaps one that can be filled semi-automatically using the HoP/Hansard links and a clever script).

But as a proof of concept, I’m very happy with it. Here’s some queries playing with the (1820-32) data:

  • There are 990 MPs with an article about them in at least one language/WM project. Strikingly, ten of these don’t have an English Wikipedia article (yet). The most heavily written-about MP is – to my surprise – David Ricardo, with articles in 67 Wikipedias. (The next three are Peel, Palmerston, and Edward Bulwer-Lytton).
  • 303 of the 1,367 MPs (22.1%) have a recorded link to at least one other person in Wikidata by a close family relationship (parent, child, spouse, sibling) – there are 803 links, to 547 unique people – 108 of whom are also in the 1820-32 MPs list, and 439 of whom are from elsewhere in Wikidata. (I expect this number to rise dramatically as more metadata goes in).
  • The longest-surviving pre-Reform MP (of the 94% indexed by deathdate, anyway) was John Savile, later Earl of Mexborough, who made it to August 1899…
  • Of the 360 with a place of education listed, the most common is Eton (104), closely followed by Christ Church, Oxford (97) – there is, of course, substantial overlap between them. It’s impressive to see just how far we’ve come. No-one would ever expect to see anything like that for Parliament today, would we.
  • Of the 1,185 who’ve had first name indexed by Wikidata so far, the most popular is John (14.4%), then William (11.5%), Charles (7.5%), George (7.4%), and Henry (7.2%):
  • A map of the (currently) 154 MPs whose place of death has been imported:

All these are of course provisional, but it makes me feel I’m definitely on the right track!

So, you may be asking, what can I do to help? Why, thankyou, that’s very kind…

  • First of all, this is the master list, updated every night, of as-yet-unmatched HoP entries. Grab one, load it up, search Wikidata for a match, and add it (property P1614). Bang, one more down, and we’re 0.01% closer to completion…
  • It’s not there? (About half to two thirds probably won’t be). You can create an item manually, or you can set it aside to create a batch of them later. I wrote a fairly basic bash script to take a spreadsheet of HoP identifiers and basic metadata and prepare it for bulk-item-creation on Wikidata.
  • Or you could help sanitise some of the metadata – here’s some interesting edge cases:
    • This list is ~680 items who probably have a death date (the HoP slug ends in a number), but who don’t currently have one in Wikidata.
    • This list is ~540 people who are titled “Honourable” – and so are almost certainly the sons of noblemen, themselves likely to be in Wikidata – but who don’t have a link to their father. This list is the same, but for “Lord”, and this list has all the apparently fatherless men who were the 2nd through 9th holders of a title…

The Web is highly distributed and in flux; the people using it, even moreso.  Many projects exist to optimize its use, including:

  1. Reducing storage and bandwidth:  compressing parts of the web; deduplicating files that exist in many places, replacing many with pointers to a single copy of the file [Many browsers & servers, *Box]
  2. Reducing latency and long-distance bandwidth:  caching popular parts of the web locally around the world [CDNs, clouds, &c]
  3. Increasing robustness & permanence of links: caching linked pages (with timestamps or snapshots, for dynamic pages) [Memento, Wayback Machine, perma, amber]
  4. Increasing interoperability of naming schemes for describing or pointing to things on the Web, so that it’s easier to cluster similar things and find copies or versions of them [HvdS’s 15-year overview of advancing interop]

This week I was thinking about the 3rd point. What would a comprehensively backed-up Web of links look like?  How resilient can we make references to all of the failure modes we’ve seen and imagined?  Some threads for a map:

  1. Links should include timestamps, important ones should request archival permalinks.
    • When creating a reference, sites should notify each of the major cache-networks, asking them to store a copy.
    • Robust links can embed information about where to find a cache in the a tag that generates the link (and possibly a fuzzy content hash?).
    • Permalinks can use an identifier system that allows searching for the page across any of the nodes of the local network, and across the different cache-networks. (Browsers can know how to attempt to find a copy.)
  2. Sites should have a coat of amber: a local cached snapshot of anything linked from that site, stored on their host or a nearby supernode.  So as long as that site is available, snapshots of what it links to are, too.
    • We can comprehensively track whether sites have signalled they have an amber layer.  If a site isn’t yet caching what they link to, readers can encourage them to do so or connect them to a supernode.
    • Libraries should host amber supernodes: caches for sites that can’t host those snapshots on their host machine.
  3. Snapshots of entire websites should be archived regularly
    • Both public snapshots for search engines and private ones for long-term archives.
  4. A global network of mirrors (a la [C]LOCKSS) should maintain copies of permalink and snapshot databases
    • Consortia of libraries, archives, and publishers should commit to a broad geographic distribution of mirrors.
      • mirrors should be available within any country that has expensive interconnects with the rest of the world;
      • prioritization should lead to a kernel of the cached web that is stored in ‘seed bank‘ style archives, in the most secure vaults and other venues
  5. There should be a clear way to scan for fuzzy matches for a broken link. Especially handy for anyone updating a large archive of broken links.
    • Is the base directory there? Is the base URL known to have moved?
    • Are distant-timestamped versions of the file available?  [some robustlink implementations do this already]
    • Are there exact matches elsewhere in the web for a [rare] filename?  Can you find other documents with the same content hash? [if a hash was included in the link]
    • Are there known ways to contact the original owner of the file/directory/site?

Related questions: What other aspects of robustness need consideration? How are people making progress at each layer?  What more is needed to have a mesh of archived links at every scale? For instance, WordPress supports a chunk of the Web; top CDNs cache more than that. What other players can make this happen?  What is needed for them to support this?

Most popular videos on Wikipedia, 2015

23:12, Thursday, 14 2016 January UTC

One of the big outstanding questions for many years with Wikipedia was the usage data of images. We had reasonably good data for article pageviews, but not for the usage of images – we had to come up with proxies like the number of times a page containing that image was loaded. This was good enough as it went, but didn’t (for example) count the usage of any files hotlinked elsewhere.

In 2015, we finally got the media-pageviews database up and running, which means we now have a year’s worth of data to look at. In December, someone produced an aggregated dataset of the year to date, covering video & audio files.

This lists some 540,000 files, viewed an aggregated total of 2,869 million times over about 340 days – equivalent to 3,080 million over a year. This covers use on Wikipedia, on other Wikimedia projects, and hotlinked by the web at large. (Note that while we’re historically mostly concerned with Wikipedia pageviews, almost all of these videos will be hosted on Commons.) The top thirty:

14436640 President Obama on Death of Osama bin Laden.ogv
10882048 Bombers of WW1.ogg
10675610 20090124 WeeklyAddress.ogv
10214121 Tanks of WWI.ogg
9922971 Robert J Flaherty – 1922 – Nanook Of The North (Nanuk El Esquimal).ogv
9272975 President Obama Makes a Statement on Iraq – 080714.ogg
7889086 Eurofighter 9803.ogg
7445910 SFP 186 – Flug ueber Berlin.ogv
7127611 Ward Cunningham, Inventor of the Wiki.webm
6870839 A11v 1092338.ogg
6865024 Ich bin ein Berliner Speech (June 26, 1963) John Fitzgerald Kennedy trimmed.theora.ogv
6759350 Editing Hoxne Hoard at the British Museum.ogv
6248188 Dubai’s Rapid Growth.ogv
6212227 Wikipedia Edit 2014.webm
6131081 Newman Laugh-O-Gram (1921).webm
6100278 Kennedy inauguration footage.ogg
5951903 Hiroshima Aftermath 1946 USAF Film.ogg
5902851 Wikimania – the Wikimentary.webm
5692587 Salt March.ogg
5679203 CITIZENFOUR (2014) trailer.webm
5534983 Reagan Space Shuttle Challenger Speech.ogv
5446316 Medical aspect, Hiroshima, Japan, 1946-03-23, 342-USAF-11034.ogv
5434404 Physical damage, blast effect, Hiroshima, 1946-03-13 ~ 1946-04-08, 342-USAF-11071.ogv
5232118 A Day with Thomas Edison (1922).webm
5168431 1965-02-08 Showdown in Vietnam.ogv
5090636 Moon transit of sun large.ogg
4996850 President Kennedy speech on the space effort at Rice University, September 12, 1962.ogg
4983430 Burj Dubai Evolution.ogv
4981183 Message to Scientology.ogv

(Full data is here; note that it’s a 17 MB TSV file)

It’s an interesting mix – and every one of the top 30 is a video, not an audio file. I’m not sure there’s a definite theme there – though “public domain history” does well – but it’d reward further investigation…

“When I was 5 years old, my mother always told me that happiness was the key to life.  When I went to school, they asked me what I wanted to be when I grew up.  I wrote down happy. They told me I didn’t understand the assignment, and I told them they didn’t understand life.”  —Lennon

From the BODYWORLDS exhibit in Amsterdam, full of flayed and preserved human bodies.

Taking pictures with flying government lasers

20:38, Friday, 02 2015 October UTC

Well, sort of.

A few weeks ago, the Environment Agency released the first tranche of their LIDAR survey data. This covers (most of) England, at varying resolution from 2m to 25cm, made via LIDAR airborne survey.

It’s great fun. After a bit of back-and-forth (and hastily figuring out how to use QGIS), here’s two rendered images I made of Durham, one with buildings and one without, now on Commons:

The first is shown with buildings, the second without. Both are at 1m resolution, the best currently available for the area. Note in particular the very striking embankment and cutting for the railway viaduct (top left). These look like they could be very useful things to produce for Commons, especially since it’s – effectively – very recent, openly licensed, aerial imagery…

1. Selecting a suitable area

Generating these was, on the whole, fairly easy. First, install QGIS (simplicity itself on a linux machine, probably not too much hassle elsewhere). Then, go to the main data page and find the area you’re interested in. It’s arranged on an Ordnance Survey grid – click anywhere on the map to select a grid square. Major grid squares (Durham is NZ24) are 10km by 10km, and all data will be downloaded in a zip file containing tiles for that particular region.

Let’s say we want to try Cambridge. The TL45 square neatly cuts off North Cambridge but most of the city is there. If we look at the bottom part of the screen, it offers “Digital Terrain Model” at 2m and 1m resolution, and “Digital Surface Model” likewise. The DTM is the version just showing the terrain (no buildings, trees, etc) while the DSM has all the surface features included. Let’s try the DSM, as Cambridge is not exactly mountainous. The “on/off” slider will show exactly what the DSM covers in this area, though in Cambridge it’s more or less “everything”.

While this is downloading, let’s pick our target area. Zooming in a little further will show thinner blue lines and occasional superimposed blue digits; these define the smaller squares, 1 km by 1 km. For those who don’t remember learning to read OS maps, the number on the left and the number on the bottom, taken together, define the square. So the sector containing all the colleges along the river (a dense clump of black-outlined buildings) is TL4458.

2. Rendering a single tile

Now your zip file has downloaded, drop all the files into a directory somewhere. Note that they’re all named something like tl4356_DSM_1m.asc. Unsurprisingly, this means the 1m DSM data for square TL4356.

Fire up QGIS, go to Layer > Add raster layer, and select your tile – in this case, TL4458. You’ll get a crude-looking monochrome image, immediately recognisable by a broken white line running down the middle. This is the Cam. If you’re seeing this, great, everything’s working so far. (This step is very helpful to check you are looking at the right area)

Now, let’s make the image. Project > New to blank everything (no need to save). Then Raster > Analysis > DEM (terrain models). In the first box, select your chosen input file. In the next box, the output filename – with a .tif suffix. (Caution, linux users: make sure to enter or select a path here, otherwise it seems to default to home). Leave everything else as default – all unticked and mode: hillshade. Click OK, and a few seconds later it’ll give a completed message; cancel out of the dialogue box at this point. It’ll be displaying something like this:

Congratulations! Your first LIDAR rendering. You can quit out of QGIS (you can close without saving, your converted file is saved already) and open this up as a normal TIFF file now; it’ll be about 1MB and cover an area 1km by 1km. If you look closely, you can see some surprisingly subtle details despite the low resolution – the low walls outside Kings College, for example, or cars on the Queen’s Road – Madingley Road roundabout by the top left.

3. Rendering several tiles

Rendering multiple squares is a little trickier. Let’s try doing Barton, which conveniently fits into two squares – TL4055 and TL4155. Open QGIS up, and render TL4055 as above, through Raster > Analysis > DEM (terrain models). Then, with the dialogue window still open, select TL4155 (and a new output filename) and run it again. Do this for as many files as you need.

After all the tiles are prepared, clear the screen by starting a new project (again, no need to save) and go to Raster > Miscellaneous > Merge. In “Input files”, select the two exports you’ve just done. In “Output file”, pick a suitable filename (again ending in .tif). Hit OK, let it process, then close the dialog. You can again close QGIS without saving, as the export’s complete.

The rendering system embeds coordinates in the files, which means that when they’re assembled and merged they’ll automatically slot together in the correct position and orientation – no need to manually tile them. The result should look like this:

The odd black bit in the top right is the edge of the flight track – there’s not quite comprehensive coverage. This is a mainly agricultural area, and you can see field markings – some quite detailed, and a few bits on the bottom of the right-hand tile that might be traces of old buildings.

So… go forth! Make LIDAR images! See what you can spot…

4. Command-line rendering in bulk

Richard Symonds (who started me down this rabbit-hole) points out this very useful post, which explains how to do the rendering and merging via the command line. Let’s try the entire Durham area; 88 files in NZ24, all dumped into a single directory –

for i in `ls *.asc` ; do gdaldem hillshade -compute_edges $i $i.tif ; done -o NZ24-area.tif *.tif

rm *.asc.tif

In order, that a) runs the hillshade program on each individual source file ; b) assembles them into a single giant image file; c) removes the intermediate images (optional, but may as well tidy up). The -compute_edges flag helpfully removes the thin black lines between sectors – I should have turned it on in the earlier sections!

Lists of favorites Facebook groups

09:33, Thursday, 13 2015 August UTC

Linterweb is releasing on its website Allingroups a new page containing lists of favorites Facebook groups concerning specific topics. Listed groups are the Facebook groups with most members, or groups we appreciate for some reasons. The topics are chosen according to the areas of interest of Linterweb employees.


The current lists focus on the following topics:

Britanny (this topic is important to us as our company is currently located in this region, in the northwest of France); video games; sports;; music; shared accommodation; Erasmus; online sales; cooking; car-sharing (while the United Nations Climate Change Conference will soon be held in Paris, in december 2015, this topic seems more relevant than ever); job search (while Linterweb tries to be a corporate citizen, we feel very concerned with the topic of unemployment); downloads; books; films; art and creation; concerts  the latter four topics are important to us, as Linterweb has always been interested in and has always been supporting artistic creation, including performing arts); news; fishing; health and welfare; series.

All these lists are displayed on the page Indicateur des groupes.

We’all add new lists on new topics on a regular basis. If you have suggestions of topics or of groups that you tink we should add to our selection, please contact us at

Best regards, Pascal Martin, manager of Linterweb.

Linterweb is a web company that, for now several years, has been developing various Wikipedia oriented programs, including:

  • Wikiwix, a semantic web search engine that gives only results out of the databases of the Wikimedia Foundation projects;
  • Okawix, the offline Wikipedia browser free of copyrights and free of charge that allows you to read offline the articles of the various Wikimedia Foundation projects;
  • a program that archives the external web pages of the Wikipedia articles (that is, the web pages outside Wikipedia but linked from a Wikipedia article), so that their content remains available and that those external links don’t get broken; this program is used, in particular, for all external links of all French speaking Wikimedia projects, as well as on the Romanian and Hungarian speaking Wikipedias;
  •  Allingroups Facebook auto-poster, a service dedicated to automatically sending messages to your Facebook groups;

WMF Audit Committee update – Call for Volunteers

23:07, Friday, 05 2015 June UTC

The Wikimedia Foundation has an Audit Committee that represents its Board in overseeing financial and accounting matters.  This includes reviewing the foundation’s financials, its annual tax return, and an independent audit by KPMG. For details, and the current committee members, see the WMF’s Audit Committee page and the Audit Committee charter.

I currently serve as the Audit Committee chair.  We are forming the committee for 2015-16, and are looking for volunteers from the community.

Members serve on the Committee for one year, from July through July.  The Foundation files its annual tax return in the U.S. in April, and publishes its annual plan in June.  Committee members include trustees from the Foundation’s board and contributors from across the Wikimedia movement.

Time commitment for the committee is modest: reviews are carried out via three or four conference calls over the course of the year.  The primary requirement is financial literacy: some experience with finance, accounting or auditing.

If you are interested in joining the Committee for the coming year, please email me at sj at with your CV, and your thoughts on how you could contribute. Thank you!

Allingroups Release

11:19, Wednesday, 13 2015 May UTC

Today I’m glad to present you Allingroups, a new service proposed by Linterweb.

For quite a few years, here at Linterweb, we’ve been working mainly on Wikimedia oriented services, like Wikiwix, a Wikipedia oriented search engine, Okawix, an offline Wikipedia browser, or Wikiwix Archives, a service used for instance on the French, the Romanian or the Hungarian speaking Wikipedias, that allows keeping a copy of all Internet sources quoted in Wikipedia articles, so that they don’t get definitively lost, even if the initial Internet source has been moved or removed.

We’re now working at something quite different, more aimed at the Facebook community: Allingroups, a Facebook auto-poster.


Members of Facebook have the possibility to post messages on their fan pages or in their groups (groups of friends, of relatives, of fans, business groups… of which they are a member). If you want to post one message in one group or on one page, it is convenient. The problem begins when you want to publish the same message in various groups or pages. If you want to post the same message in ten groups or pages, you have to repeat ten times: go into the group or onto the page, write you message, post it, go to the second group or page, do the same stuff, etc. Well, with ten groups or pages, it’s probably boring but still possible, but if you’ve got several hundred groups or pages, you don’t want to spend all your day doing this, it would take up all your time!

That’s what Allingroups is all about: saving you time!

Allingroups allows you to save time by automatically posting messages to part or all of your groups or pages, while taking care to prevent you from being blocked or banned by Facebook quite restrictive publishing rules (that are in addition being reinforced those days).


What’s more, contrary to most of other existing Facebook auto-posters, Allingroups doesn’t need to be installed on your computer: the program is run on our servers. As a consequence, your computer doesn’t need to be switched on and connected to Facebook while publishing your messages. So that, not only you’ll save a lot of time, but you’ll also save a huge amount of money and of electricity: just one of our servers dedicated to publishing all messages on Facebook uses much less energy than the ten thousand computers of our ten thousand current Allingroups users. At a time when the climate change is becoming an increasingly important concern, it is worth to be noticed: many small energy gains of this kind can, all together, amount to important energy savings for the planet.

How does it work?

Our Facebook auto-poster is easy to use:

  1. you sign up to Allingroups with your Facebook e-mail address;
  2. you go to the page Create a campaign, where you may write your message, choose a Facebook or any other web page to add to your message;
  3. you may select the option “Add the affiliation link to your message”, which will add to your message a link to Allingroups; doing so, you get free Allingroups credits;
  4. you select the groups and pages to which you want to post the message in the list of your Facebook groups and pages;
  5. you click on the Save campaign button. And that’s all!


How expensive is it?

Not very expensive, in my opinion : basically, one Allingroup credit allows you to post one message to one group or page. So 100 Alingroups credits allow you to post one message to one hundred groups or pages, or one hundred messages to one group or page, or two messages to fifty groups or pages, or five messages to twenty groups or pages: share your credits as you like among your groups and pages.

In addition, you currently get 500 free Alingroups when you sign up, plus 200 more credits thanks to the following promo code: MATT LINTERWEB, that you will enter while signing up; plus other free credits through the affiliation link: once a person joins Allingroups through this link, your account will be credited with 200 credits and you will become their sponsor.

When anyone you sponsor signs up to a plan, you will benefit from 10% of the credits that they receive. When somebody you sponsor sponsors somebody else, you will benefit from 8% of the credits that they acquire. At the third level, you will benefit from 6%, 4% at the 4th, and 2% at the 5th level.

And you can also buy Allingroups credits on the My Credits tab. Our prices may vary in the future, but currently the price of an Allingroups credit  is €0.01. Not even the cost of electricity if you did the publishing yourself. It means you can send 100 messages for €1. Considering the time and the electricity saved, I think it’s worth it!


If you use Facebook a lot, you should definitely give it a try! You’ll save time, for you, and electricity, for the sake of the planet. While the United Nations Climate Change Conference will soon be held in Paris, in december 2015, this issue seems more relevant than ever.


We and the people of Linterweb, we have always tried to keep the Wikipedia community informed of our ideas and our work concerning the Wikimedia projects, especially thanks to our blog We’ll keep on writing articles concerning Wikimedia on this blog on a regular basis. In addition, from now on we’ll publish all articles concerning Allingroups and the Facebook community on a new blog dedicated to Allingroups:


Hope to see you soon on Allingroups, and on our new blog! Matthews, Allingroups team member 🙂

Wikidata and identifiers – part 2, the matching process

19:39, Thursday, 27 2014 November UTC

Yesterday, I wrote about the work we’re doing matching identifiers into Wikidata. Today, the tools we use for it!


The main tool we’re using is a beautiful thing Magnus developed called mix-and-match. It imports all the identifiers with some core metadata – for the ODNB, for example, this was names and dates and the brief descriptive text – and sorts them into five groups:

  • Manually matched – these matches have been confirmed by a person (or imported from data already in Wikidata);
  • Automatic – the system has guessed these are probably the same people but wants human confirmation;
  • Unmatched – we have no idea who these identifiers match to;
  • No Wikidata – we know there is currently no Wikidata match;
  • N/A – this identifier shouldn’t match to a Wikidata entity (for example, it’s a placeholder, a subject Wikidata will never cover, or an cross-reference with its own entry).

The goal is to work through everything and move as much as possible to “manually matched”. Anything in this group can then be migrated over to Wikidata with a couple of clicks. Here’s the ODNB as it stands today:

(Want to see what’s happening with the data? The recent changes link will show you the last fifty edits to all the lists.)

So, how do we do this? Firstly, you’ll need a Wikipedia account, and to log in to our “WiDaR” authentication tool. Follow the link on the top of the mix-and-match page (or, indeed, this one), sign in with your Wikipedia account if requested, and you’ll be authorised.

On to the matching itself. There’s two methods – manually, or in a semi-automated “game mode”.

How to match – manually

The first approach works line-by-line. Clicking on one of the entries – here, unmatched ODNB – brings up the first fifty entries in that set. Each one has options on the left hand side – to search Wikidata or English Wikipedia, either by the internal search or Google. On the right-hand side, there are three options – “set Q”, to provide it with a Wikidata ID (these are all of the form Q—–, and so we often call them “Q numbers”); “No WD”, to list it as not on Wikidata; “N/A”, to record that it’s not appropriate for Wikidata matching.

If you’ve found a match on Wikidata, the ID number should be clearly displayed at the top of that page. Click “set Q” and paste it in. If you’ve found a match via Wikipedia, you can click the “Wikidata” link in the left-hand sidebar to take you to the corresponding Wikidata page, and get the ID from there.

After a moment, it’ll display a very rough-and-ready precis of what’s on Wikidata next to that line –

– which makes it easy to spot if you’ve accidentally pasted in the wrong code! Here, we’ve identified one person (with rather limited information, just gender and deathdate, currently in Wikidata, and marked another as definitely not found)

If you’re using the automatically matched list, you’ll see something like this:

– it’s already got the data from the possible matches but wants you to confirm. Clicking on the Q-number will take you to the provisional Wikidata match, and from there you can get to relevant Wikipedia articles if you need further confirmation.

How to match – game mode

We’ve also set up a “game mode”. This is suitable when we expect a high number of the unmatched entries to be connectable to Wikipedia articles; it gives you a random entry from the unmatched list, along with a handful of possible results from a Wikipedia search, and asks you to choose the correct one if it’s there. you can get it by clicking [G] next to the unmatched entries.

Here’s an example, using the OpenPlaques database.

In this one, it was pretty clear that their Roy Castle is the same as the first person listed here (remember him?), so we click the blue Q-number; it’s marked as matched, and the game generates a new entry. Alternatively, we could look him up elsewhere and paste the Q-number or Wikipedia URL in, then click the “set Q” button. If our subject’s not here – click “skip” and move on to the next one.

Finishing up

When you’ve finished matching, go back to the main screen and click the [Y] at the end of the list. This allows you to synchronise the work you’ve done with Wikidata – it will make the edits to Wikidata under your account. (There is also an option to import existing matches from Wikidata, but at the moment the mix-and-match database is a bit out of synch and this is best avoided…) There’s no need to do this if you’re feeling overly cautious, though – we’ll synchronise them soon enough. The same page will also report any cases where two distinct Wikidata entries have been matched to the same identifier, which (usually) shouldn’t happen.

If you want a simple export of the matched data, you can click the [D] link for a TSV file (Q-number, identifier, identifier URL & name if relevant), and some stats on how many matches to individual wikis are available with [S].

Brute force

Finally, if you have a lot of matched data, and you are confident it’s accurate without needing human confirmation, then you can adopt the brute-force method – QuickStatements. This is the tool used for pushing data from mix-and-match to Wikidata, and can be used for any data import. Instructions are on that page – but if you’re going to use it, test it with a few individual items first to make sure it’s doing what you think, and please don’t be shy to ask for help…

So, we’ve covered a) what we’re doing; and b) how we get the information into Wikidata. Next instalment, how to actually use these identifiers for your own purposes…

Wikidata identifiers and the ODNB – where next?

21:59, Wednesday, 26 2014 November UTC

Wikidata, for those of you unfamiliar with it, is the backend we are developing for Wikipedia. At its simplest, it’s a spine linking together the same concept in different languages – so we can tell that a coronation in English matches Tacqoyma in Azeri or Коронація in Ukranian, or thirty-five other languages between. This all gets bundled up into a single data entry – the enigmatically named Q209715 – which then gets other properties attached. In this case, a coronation is a kind of (or subclass of, for you semanticians) “ceremony” (Q2627975), and is linked to a few external thesauruses. The system is fully multilingual, so we can express “coronation – subclass of – ceremony” in English as easily as “kroning – undergruppe af – ceremoni” in Danish.

So far, so good.

There has been a great deal of work around Wikipedia in recent years in connecting our rich-text articles to static authority control records – confirming that our George Washington is the same as the one the Library of Congress knows about. During 2012-13, these were ingested from Wikipedia into Wikidata, and as of a year ago we had identified around 420,000 Wikidata entities with authority control identifiers. Most of these were from VIAF, but around half had an identifier from the German GND database, another half from ISNI, and a little over a third LCCN identifiers. Many had all four (and more). We now support matching to a large number of library catalogue identifiers, but – speaking as a librarian – I’m aware this isn’t very exciting to anyone who doesn’t spend much of their time cataloguing…

So, the next phase was to move beyond simply “authority” identifiers and move to ones that actually provide content. The main project that I’ve been working on (along with Charles Matthews and Magnus Manske, with the help of Jo Payne at OUP) is matching Wikidata to the Oxford Dictionary of National Biography – Wikipedia authors tend to hold the ODNB in high regard, and many of our articles already use it as a reference work. We’re currently about three-quarters of the way through, having identified around 40,000 ODNB entries who have been clearly matched to a Wikidata entity, and the rest should be finished some time in 2015. (You can see the tool here, and how to use that will be a post for another day.) After that, I’ve been working on a project to make links between Wikidata and the History of Parliament (with the assistance of Matthew Kilburn and Paul Seaward) – looking forward to being able to announce some results from this soon.

What does this mean? Well, for a first step, it means we can start making better links to a valuable resource on a more organised basis – for example, Robin Owain and I recently deployed an experimental tool on the Welsh Wikipedia that will generate ODNB links at the end of any article on a relevant subject (see, eg, Dylan Thomas). It means we can start making the Wikisource edition of the (original) Dictionary of National Biography more visible. It means we can quickly generate worklists – you want suitable articles to work on? Well, we have all these interesting and undeniably notable biographies not yet covered in English (or Welsh, or German, or…)

For the ODNB, it opens up the potential for linking to other interesting datasets (and that without having to pass through wikidata – all this can be exported). At the moment, we can identify matches to twelve thousand ISNIs, twenty thousand VIAF identifiers, and – unexpectedly – a thousand entries in IMDb. (Ten of them are entries for “characters”, which opens up a marvellous conceptual can of worms, but let’s leave that aside…).

And for third parties? Well, this is where it gets interesting. If you have ODNB links in your dataset, we can generate Wikipedia entries (probably less valuable, but in oh so many languages). We can generate images for you – Wikidata knows about openly licensed portraits for 214,000 people. Or we can crosswalk to whatever other project we support – YourPaintings links, perhaps? We can match a thousand of those. It can go backwards – we can take your existing VIAF links and give you ODNB entries. (Cataloguers, take note.)

And, best of all, we can ingest that data – and once it’s in Wikidata, the next third party to come along can make the links directly to you, and every new dataset makes the existing ones more valuable. Right now, we have a lot of authority control data, but we’re lighter on serious content links. If you have a useful online project with permanent identifiers, and you’d like to start matching those up to Wikidata, please do get in touch – this is really exciting work and we’d love to work with anyone wanting to help take it forward.

Update: Here’s part 2: on how to use the mix-and-match tool.

Successful communities have learned a few things about how to maintain healthy public spaces. We could use a handbook for community designers gathering effective practices. It is a mark of the youth of interpublic spaces that spaces such as Twitter and Instagram [not to mention niche spaces like Wikipedia, and platforms like WordPress] rarely have architects dedicated to designing and refining this aspect of their structure, toolchains, and workflows.

Some say that ‘overly’ public spaces enable widespread abuse and harassment. But the “publicness” of large digital spaces can help make them more welcoming in ways than physical ones – where it is harder to remove graffiti or eggs from homes or buildings – and niche ones – where clique formation and systemic bias can dominate. For instance, here are a few ‘soft’ (reversible, auditable, post-hoc) tools that let a mixed ecosystem review and maintain their own areas in a broad public space:

Allow participants to change the visibility of comments:  Let each control what they see, and promote or flag it for others.

  • Allow blacklists and whitelists, in a way that lets people block out harassers or keywords entirely if they wish. Make it easy to see what has been hidden.
  • Rating (both average and variance) and tags for abuse or controversy can allow for locally flexible display.  Some simple models make this hard to game.
  • Allow things to be incrementally hidden from view.  Group feedback is more useful when the result is a spectrum.

Increase the efficiency ratio of moderation and distribute it: automate review, filter and slow down abuse.

  • Tag contributors by their level of community investment. Many who spam or harass try to cloak in new or fake identities.
  • Maintain automated tools to catch and limit abusive input. There’s a spectrum of response: from letting only the poster and moderators see the input (cocooning), to tagging and not showing by default (thresholding), to simply tagging as suspect (flagging).
  • Make these and other tags available to the community to use in their own preferences and review tools
  • For dedicated abuse: hook into penalties that make it more costly for those committed to spoofing the system.

You can’t make everyone safe all of the time, but can dial down behavior that is socially unwelcome (by any significant subgroup) by a couple of magnitudes.  Of course these ideas are simple and only work so far.  For instance, in a society at civil war, where each half are literally threatened by the sober political and practical discussions of the other half, public speech may simply not be safe.

[You may do UTTERLY ANYTHING with this work.]



Utter details and variants

Laws on Wikidata

19:17, Tuesday, 09 2014 September UTC

So, I had the day off, and decided to fiddle a little with Wikidata. After some experimenting, it now knows about:

  • 1516 Acts of the Parliament of the United Kingdom (1801-present)
  • 194 Acts of the Parliament of Great Britain (1707-1800)
  • 329 Acts of the Parliament of England (to 1707)
  • 20 Acts of the Parliament of Scotland (to 1707)
  • 19 Acts of the Parliament of Ireland (to 1800)

(Acts of the modern devolved parliaments for NI, Scotland, and Wales will follow.)

Each has a specific “instance of” property – Q18009569, for example, is “act of the Parliament of Scotland” – and is set up as a subclass of the general “act of parliament”. At the moment, there’s detailed subclasses for the UK and Canada (which has a seperate class for each province’s legislation) but nowhere else. Yet…

These numbers are slightly fuzzy – it’s mainly based on Wikipedia articles and so there are a small handful of cases where the entry represents a particular clause (eg Q7444697, s.4 and s.10 of the Human Rights Act 1998), or cases hwere multiple statutes are treated in the same article (eg Q1133144, the Corn Laws), but these are relatively rare and, mostly, it’s a good direct correspondence. (I’ve been fairly careful to keep out oddities, but of course, some will creep in…)

So where next? At the moment, these almost all reflect Wikipedia articles. Only 34 have a link to (English) Wikisource, though I’d guess there’s about 200-250 statutes currently on there. Matching those up will definitely be valuable; for legislation currently in force and on the Statute Law Database, it would be good to be able to crosslink to there as well.

Good morning, Frankfurt

08:33, Sunday, 25 2014 May UTC


Dreikönigskirche (Epiphany Church), along the Main River, on my morning jog in Frankfurt during a visit for some Wikipedia community activities.

Lila Tretikov named as Wikimedia’s upcoming ED

21:49, Thursday, 01 2014 May UTC

And there was much rejoicing. Welcome, Lila!

The Wikimedia Foundation has an Audit Committee which represents the Board in oversight of financial and accounting issues, including planning, reporting, audits, and internal controls (see foundation wiki Audit Committee page for details). The Committee serves for one year, from July through the late spring when the Foundation files its annual tax return in the U.S. This past year the committee included representatives both from the Foundation’s board and from across the Wikimedia movement. I currently serve as the Committee Chair.

We’re now forming the 2014-2015 Audit Committee and would like to call for volunteers from the community. The time commitment is modest: review the Foundation’s financial practices and financial statements/filings, and then join three or four conference calls during the year with the staff and our independent auditors at KPMG (see the Audit Committee charter for full duties). The primary requirement is “financial literacy”, some experience with finance, accounting or audit. As it is a governance and oversight role, Committee members cannot serve under a pseudonym.

If you are interested in serving on the Committee, please email me at stu at with your resume/CV and your thoughts on how you think you could contribute. Thank you.