If you asked people to name an American women’s suffragist, a few names would get repeated over and over. They’d mention Alice Paul, Susan B. Anthony, or Elizabeth Cady Stanton. Perhaps some would be able to name a woman of color, though perhaps just Ida B. Wells. The stories of black suffragettes are often excluded from the narrative of how women won the vote. With our courses supported in collaboration with the National Archives, however, our Wiki Scholars have made significant strides in telling a more inclusive—and more accurate—story of how women campaigned for suffrage. Here’s how they improved Wikipedia this summer:

  • One African-American suffragist who is overlooked on Wikipedia no longer is Helen Appo Cook. Since this article was created in mid-July, over 2,000 readers have had the chance to learn more about Cook, a leader of the black women community in Washington, DC.
  • Another biography added was that of Maud E. Craig Sampson Williams, a civil rights leader and community organizer from El Paso, Texas who helped create the El Paso Equal Franchise League.
  • Myra Virginia Simmons was also added to Wikipedia. A leader of the Colored American Equal Suffrage League, Simmons helped educate women on their right to vote after California passed women’s suffrage in 1911.
Ora Brown Stokes Perry, who was the president of the Virginia Negro Women’s League of Voters. (Public domain)

Our Wiki Scholars also improved already existing biographies of black suffragists.

  • In Tennessee, there was Juno Frankie Pierce. Along with her work for suffrage, Pierce helped organize successful community demonstrations for public restrooms for colored women, as Nashville lacked any.
  • Mattie E. Coleman, also from Tennessee, was one of the first black women to become a physician in the state. She contributed to alliance building between black and white suffragists in Nashville.
  • Belle Squire co-founded the Alpha Suffrage Club with Ida B. Wells. Squire refused to pay property taxes until women had the right to vote.

In the new article Women’s suffrage in Virginia, one of our trained Wiki Scholars fully confronts the discriminatory views of white suffragists, who viewed excluding black women “not [as] a matter of principle but of expediency”, as supporting suffrage for black women would cost them support in the South. A woman’s suffrage organization in the state even distributed pamphlets “arguing that ‘the enfranchisement of Virginia women would increase white supremacy'”. Because black women were unable to join suffrage organizations in the state, they formed the Virginia Negro Women’s League of Voters. Black women, consistently denied a seat at the table among their white peers who also advocated for suffrage, formed many such organizations, including the Colored Women’s Progressive Franchise, another creation by one of our Wiki Scholars. At this article, readers can learn about an organization that was ahead of its time, leading to its eventual disintegration.

Interested in learning more about our course offerings? Visit learn.wikiedu.org.

If you’re interested in buying out a customized professional development course for faculty at your institution, contact Director of Partnerships Jami Mathewson at jami@wikiedu.org.

Header images via Wikimedia Commons [1] [2] [3] [4] [5] [6] [7]

Over 180 memories of Armenian history are now available to everyone in the world, free of charge, thanks to a collaborative project between Wikimedia Norge (Norway), Wikimedia Armenia—both independent Wikimedia affiliate organizations—and the National Archives of Norway.

All of the photographs were taken by Norwegian missionary and nurse Bodil Biørn (1871–1960), who proselytized within modern-day Armenia from 1907 until 1935. She witnessed the horror and carnage of the Armenian Genocide, carried out by the Ottoman Empire during the First World War, and captured photographs to document Armenia and Armenian history.

Having recognized their importance, Wikimedia Norge and Wikimedia Armenia initiated the Bodil Biørn project in January 2018. It aimed to digitize the photos and transcribe/translate descriptions on Wikimedia Commons, the free media repository that holds most of the photographs used on Wikipedia and Wikimedia’s other free knowledge projects.

Collaborating with another Wikimedia affiliate on a GLAM—galleries, libraries, archives, and museums—project has had a high impact on the productivity and success of the project. Wikimedia affiliates, as in the case of Wikimedia Norge and Wikimedia Armenia, can be a bridge between GLAM organizations separated by geography. For example, the Bodil Biørn project helped connect The National Archive in Oslo to the Armenian Genocide Institute-museum in Yerevan. GLAM organizations are more encouraged when third parties from other countries are involved in the project; it eases the process of convincing GLAM organizations to collaborate, and assists in reaching out to more communities.

The success of this collaborative project has allowed Wikimedia Norway and Armenia to expand its scope to include Biørn’s letters written during this time period. From a historical perspective, the opening up of her letters to interested readers and researchers is extremely valuable as a contrast to the male-dominated sources, both primary and secondary, available from and about this region and time period.

The project has also inspired Wikimedia Armenia to launch another collaborative GLAM project—this time with Wikimedia Austria and with the aim of digitizing the best of the library and museum of Vienna’s Mekhitarist congregation.

David Saroyan, Wikimedia Armenia
Aleksander Grønnestad, Student Intern (May 2019), Wikimedia Norway

You can see Biørn’s photographs for yourself on Wikimedia Commons.

Tech News issue #38, 2019 (September 16, 2019)

00:00, Monday, 16 2019 September UTC
TriangleArrow-Left.svgprevious 2019, week 38 (Monday 16 September 2019) nextTriangleArrow-Right.svg
Other languages:
Deutsch • ‎English • ‎français • ‎português do Brasil • ‎suomi • ‎svenska • ‎čeština • ‎Ελληνικά • ‎русский • ‎українська • ‎עברית • ‎العربية • ‎فارسی • ‎हिन्दी • ‎中文 • ‎日本語 • ‎ꯃꯤꯇꯩ ꯂꯣꯟ

Boats moored in Zoebigker Harbor on Lake Cospuden, located near Markkleeberg, Saxony, Germany.

This photo comes to us from Wikimedia Commons, the freely licensed media repository whose holdings are extensively used on Wikimedia’s many projects, including Wikipedia. You can use the photo for just about any purpose as long as you credit the author (Ansgar Koreng), copyright license (CC BY-SA 4.0), and link to the original URL.

Ed Erhart, Senior Editorial Associate, Communications
Wikimedia Foundation

This post is the tenth installment of a weekly series, and you can sign up for our MailChimp mailing list to be notified when the next edition is published.

Public archaeology at its most effective

17:40, Friday, 13 2019 September UTC

“Wikipedia’s popularity and reach mean that archaeologists should actively engage with the website by adding and improving archaeological content.”


Academia is changing its mind about Wikipedia. Peer-reviewed research studies published in the last few years have found value in teaching students how to evaluate the site, rather than turning them away from using it altogether. One such study from 2019 conducted by Katherine Grillo and Daniel Contreras (published by Cambridge University Press) narrows in on teaching with Wikipedia in archaeology courses. The paper concludes that having students write Wikipedia articles related to archaeological course topics:

  • brings academic information out of the silo and into the hands of the public,
  • helps correct misinformation prevalent in mainstream narratives of history, and
  • gives students skills to communicate what they’re learning to non-specialist audiences.


“We conclude that Wikipedia’s utopian mission aligns with many of the goals of public archaeology and argue that archaeology has much to gain by engaging with—rather than ignoring or even shunning—Wikipedia.”


The importance of public engagement

Like with many academic disciplines, archaeologists are seeking new ways to engage the public in their field. “Without public engagement, awareness and funding of archaeological research and conservation of archaeological sites are unlikely to find continued popular support,” the paper states.

Why is Wikipedia the answer?

With 500 million readers each month, Wikipedia offers a massive and effective public educational tool. “The publics whom archaeologists seek to reach now overwhelmingly rely on Wikipedia as a source of basic information,” Dr. Grillo and Dr. Contreras assert, citing Schroeder 2018:122–124.

Wikipedia presents opportunities not only to improve coverage of archaeology in a publicly accessible place, but also to correct misinformation by pointing people to academic knowledge when they seek information about popular culture. Students of archaeology can improve a Wikipedia article about Indiana Jones, for example, by linking to articles about the real archaeological sites.

“The need for accurate and widely available information about archaeology is acute, especially considering our discipline’s notable problems with the proliferation of pseudoscientific theories in the public realm (Ancient Aliens, Legends of the Lost, and similar television programs are obvious and egregious offenders),” write Dr. Grillo and Dr. Contreras.

How do we bring more archaeological information to Wikipedia?

Students are well-positioned to do this public outreach work on behalf of the archaeological field. They’re already consulting academic sources to write research papers. A Wikipedia writing assignment can channel that work into a public place. Students also remember what it’s like learning about these topics for the first time and often have a good perspective on how to speak about them to a non-specialist audience. The process of synthesizing course topics also “invest[s] and immerse[s] students in the production of archaeological content.”

Student learning outcomes

“By actively editing Wikipedia pages themselves, students improve their abilities to critically evaluate problematic Wikipedia pages, and they gain experience in reviewing academic literature,” the paper concludes. “Student motivation and dedication to producing quality writing is overall high, given the understanding that thousands of people will be reading their work.”

“Given the growing emphasis on job training in higher education, activities such as participation in the Wiki Education program that both provide marketable skills and promote media literacy, may be particularly valuable.”

It’s now Dr. Grillo’s fourth term teaching a Wikipedia writing assignment. Since she began in 2016, the total impact her students have made on Wikipedia is this: They’ve added 69,000 words and 900 references to 70 articles. They’ve created 18 new articles. And all of that work has been viewed by Wikipedia’s readers more than 3 million times.


“In practice, taking advantage of Wiki Education has proven effective both as pedagogy and as public outreach.”


Want to get involved? Wiki Education can help.

With Wiki Education’s free assignment management tools and student trainings, instructors new to Wikipedia can feel well-equipped to guide their students in editing assignments. Visit teach.wikiedu.org to access those resources.

Read more perspectives from instructors who have done this assignment here. Explore other research studies here

Header image by Maclemo, CC BY-SA 3.0 via Wikimedia Commons. Thumbnail image by Florenceguillot, CC BY-SA 4.0 via Wikimedia Commons.

ALTC Personal Highlights

15:05, Friday, 13 2019 September UTC

I’ve already written an overview and some thoughts on the ALTC keynotes, this post is an additional reflection on some of my personal highlights of the conference. 

I was involved in three sessions this year; Wikipedia belongs in education with Wikimedia UK CEO Lucy Crompton-Reid and UoE Wikimedian in Residence Ewan McAndrew, Influential voices – developing a blogging service based on trust and openness with DLAM’s Karen Howie, and Supporting Creative Engagement and Open Education at the University of Edinburgh with LTW colleagues Charlie Farley and Stewart Cromar.  All three sessions went really well, with lots of questions and engagement from the audience.  

It’s always great to see that lightbulb moment when people start to understand the potential of using Wikipedia in the classroom to develop critical digital and information literacy skills.    There was a lot of interest in (and a little envy of) UoE’s Academic Blogging Service and centrally supported WordPress platform, blogs.ed.ac.uk, so it was great to be able to share some of the open resources we’ve created along the way including policies, digital skills resources, podcasts, blog posts, open source code and the blogs themselves.  And of course there was a lot of love for our creative engagement approaches and open resources including Board Game Jam and the lovely We have great stuff colouring book.  

Stewart Cromar also did a gasta talk and poster on the colouring book and at one point I passed a delegate standing alone in the hallway quietly colouring in the poster.  As I passed, I mentioned that she could take one of the colouring books and home with her.  She nodded and smiled and carried on colouring.  A lovely quite moment in a busy conference.

It was great to hear Charlie talking about the enduringly popular and infinitely adaptable 23 Things course, and what made it doubly special was that she was co-presenting with my old Cetis colleague R. John Robertson, who is now using the course with his students at Seattle Pacific University.   I’ve been very lucky to work with both Charlie and John, and it’s lovely to see them collaborating like this.

Our Witchfinder General intern Emma Carroll presented a brilliant gasta talk on using Wikidata to geographically locate and visualise the different locations recorded within the Survey of Scottish Witchcraft Database.  It’s an incredible piece of work and several delegates commented on how confidently Emma presented her project.  You can see the outputs of Emma’s internship here https://witches.is.ed.ac.uk/about

Emma Carroll, CC BY NC 2.0, Chris Bull for Association for Learning Technology

I really loved Kate Lindsay’s thoughtful presentation on KARE, a kind, accessible, respectful, ethical scaffolding system to support online education at University College of Estate Management.  And I loved her Rosa Parks shirt. 

Kate Lindsay, CC BY NC, Chris Bull for Association for Learning Technology

I also really enjoyed Claudia Cox’s engaging and entertaining talk Here be Dragons: Dispelling Myths around BYOD Digital Examinations.  Claudia surely wins the prize for best closing comment…

Sheila MacNeill and Keith Smyth gave a great talk on their conceptual framework for reimagining the digital university which aims to challenge neoliberalism through discursive, reflective digital pedagogy.  We need this now more than ever.

Keith Smyth, CC BY, Lorna M. Campbell

Sadly I missed Helen Beetham’s session Learning technology: a feminist space? but I heard it was really inspiring.  I think I can count on one hand the number of times I’ve been able to hear Helen talk, we always seem to be programmed in the same slot!  I also had to miss Laura Czerniewicz’s Online learning during university shut downs, so I’m very glad it was recorded. I’m looking forward to catching up with is as soon as I can.

The Learning Technologist of the Year Awards were truly inspiring as always. Lizzie Seymour, Learning Technology Officer, Royal Zoological Society of Scotland at Edinburgh Zoo was a very well deserved winner of the individual award, and I was really proud to see the University of Edinburgh’s Lecture Recording Team win the team award.  So many people across the University were involved in this project so it was great to see their hard work recognised.

UoE Lecture Recording Team, CC BY NC, Chris Bull for Association for Learning Technology

Without doubt though the highlight of the conference for me was Frances Bell‘s award of Honorary Life Membership of the Association for Learning Technology.  Frances is a dear friend and an inspirational colleague who really embodies ALT’s core values of participation, openness, collaboration and independence, so it was a huge honour to be invited to present her with the award.  Frances’ nomination was led by Catherine Cronin, who wasn’t able to be at the conference, so it gave me great pleasure to read out her words.

“What a joy to see Frances Bell – who exemplifies active, engaged and generous scholarship combined with an ethic of care –being recognised with this Honorary Life Membership Award by ALT.

As evidenced in her lifetime of work, Frances has combined her disciplinary expertise in Information Systems with historical and social justice perspectives to unflinchingly consider issues of equity in both higher education and wider society.

Uniquely, Frances sustains connections with people across higher education, local communities and creative networks in ways which help to bridge differences without ignoring them, and thus to enable understanding.

Within and beyond ALT, we all have much to thank her for.” 

I confess I couldn’t look at Frances while I was reading Catherine’s words as it was such an emotional moment.   I’m immensely proud of ALT for recognising Frances’ contribution to the community and for honouring her in this way.

Frances Bell, Honorary Life Member or ALT, CC BY NC, Chris Bull for Association for Learning Technology

And finally, huge thanks to Maren, Martin and the rest of the ALT team for organising another successful, warm and welcoming conference. 

Monthly Report, July 2019

21:17, Wednesday, 11 2019 September UTC


  • July felt like Wikidata month at Wiki Education! This month we launched two exciting ways to learn about Wikidata: in person workshops and online courses. We facilitated our very first Wikidata workshop, a day-long, in person meeting hosted by METRO in New York. The 14 participants in this course added nearly 160 references to Wikidata and edited over 80 items. There were more than 300 total edits and eight new items were created. We also debuted our virtual Wikidata courses this month! We started our beginner’s Wikidata course, Join the Open Data Movement, and an intermediate course, Elevate Your Collections. These two virtual courses are geared toward connecting library resources and librarians to Wikidata. We are eager to get to know the 22 participants and look forward to the contributions they will make to Wikidata.
  • In July we shipped major new features from two of our summer interns, along with an initial batch of improvements to the student user experience. Amit Joki completed the main feature planned for his Google Summer of Code internship project: a much more intuitive way to pick which wikis to track for programs on the global Programs & Events Dashboard. During the program setup process (or later on), program leaders can now explicitly see and change the set of wikis that will be tracked. Khyati Soneji also completed the main feature planned for her Outreachy internship project: the Dashboard now keeps track of the number of references added by each revision (limited to English Wikipedia and Wikidata so far).



Wikipedia Student Program

Status of the Wikipedia Student Program for Summer 2019 in numbers, as of July 31:

  • 42 Wiki Education courses were in progress (25, or 60%, were led by returning instructors)
  • 685 student editors were enrolled
  • 64% of students were up-to-date with their assigned training modules
  • Students edited 555 articles, created 66 new entries, and added 335,000 words and 2,980 references

Our Summer 2019 courses were in full swing in July, and while 42 courses is a far cry from our Fall and Spring loads, it’s nothing to scoff at either. Though typically shorter in duration, many of our summer courses make the Wikipedia assignment a central part of the curriculum to ensure that students get the training they need to make successful Wikipedia contributions.

Wikipedia Student Program Manager Helaine Blumenthal spent much of July recruiting for the Fall 2019 term, resulting in 100 course pages ready to go on the Dashboard. She also spent a considerable amount of time reviewing the results of the Spring 2019 instructor survey and will be either implementing or exploring new ways to improve the quality of the Student Program during the coming academic year.

Senior Wikipedia Expert Ian Ramjohn joined 1,200 plant biologists at the Botany 2019 Conference “Sky Islands and Desert Seas” in Tucson, Arizona. Ian engaged conference attendees in conversation about the role of Wiki Education’s platform and tools in science education and science communication.

Student work highlights:

Several classes chose to edit or create new articles for books, and Matthew Dischinger’s class on Atlanta in Contemporary Culture at Georgia State University was no exception. His students focused on several books, two of which were Thomas Mullen’s Darktown and James Baldwin’s The Evidence of Things Not Seen. Both books focused on the history of Atlanta, however Darktown is a historical fiction novel set in Darktown Atlanta during the 1940s while Baldwin’s book is non-fiction and looks at the Atlanta Child Murders. Mullen’s Darktown follows two African-American police officers, both fictionalized depictions of Atlanta’s first eight African-American police officers. They’re assigned to the murder of a young African-American woman, however the racist present both within the legal system and without makes finding justice for the deceased difficult. The author was inspired to write the novel after reading Where Peachtree Meets Sweet Auburn: A Saga of Race and Family and finished his first draft after Michael Brown was shot. It was met with critical praise upon its release and so impressed Jamie Foxx that he purchased the rights to the series in order to turn it into a television series.

The origin of The Evidence of Things Not Seen began when Walter Lowe, the first black editor at Playboy magazine, asked James Baldwin to write a story concerning Atlanta’s missing children. From there Baldwin began researching and soon began work on the book. Within its pages Baldwin looks at not only the murder, but also the race relations within the city. He found that the relationship between the African-American community and the police, particularly the African-American police-officers, was a difficult one as the community felt that they couldn’t trust the police or even the African-American officers, something that Baldwin noted seemed to sting for the black officers. He wanted the book to be more than just a recording of the crimes and resulting trial, as he wanted to understand Atlanta and the people working to discover the killer and provide support to the parents of missing children. Since its release in 1985, this book has been met with praise and been the focus of many journal articles.

Women have and will continue to contribute much to the world of science and technology, especially as many areas continue to introduce or beef up their STEM programs for young girls and women. However even as we look to the future, it’s important to remember the past — especially as women are often overlooked or downplayed in the annals of scientific history. These thoughts were no doubt at the forefront of the minds of the students attending Alexandra Edwards’ Writing Women Back into Tech History class at the Georgia Institute of Technology. Her students created 11 new articles, two of which are on Erna Hamburger and Anne-Marie Staub. Erna Hamburger was a Swiss engineer and the first woman in the history of Switzerland to be named a professor at a STEM university, when she became professor of electrometry at the University of Lausanne. Prior to this appointment this powerhouse challenged gender roles by becoming the first female student in her engineering classes and received an engineering-electrician diploma and a doctorate in technical sciences from the École Polytechnique Fédérale de Lausanne (EPFL). After this she joined the Swiss army and worked as an electrical engineer at Paillard SA in Sainte-Croix, Switzerland, before taking on the job as the head of work at the electrotechnical laboratory at EPFL, among other work positions. One of her major innovations was her creation of an apparatus for radio-wave reception and her radio-wave research included topics such as a system of optical registration from tone frequencies and ultra-short waves. Hamburger was also a staunch advocate of higher education and in 2006 the Erna Hamburger Prize was created to honor her lasting legacy and memory.

The life and career of Anne-Marie Staub is just as impressive. Staub was a French biochemist who spent most of her career at the Institut Pasteur. Her work on antihistamines, serology, Salmonella, tyvelose, and immunology earned her several awards and honors. From the start Staub proved herself to be extraordinary, learning to read and write from her mother and earned several degrees in general mathematics, chemistry and general physics, physiology, and biochemistry from the Sorbonne. She later added to her education by attending microbiology courses at the Institut Pasteur from 1935 to 1936, before joining the institute to work on her PhD thesis. She worked alongside Daniel Bovet and her first published works helped lead to the discovery of antihistamines. Like so many women in STEM fields, her contributions to Bovet’s Nobel Prize winning research have been largely forgotten despite Bovet crediting her in his 1957 Nobel lecture and the award presenter mentioning her in his speech. Not one to let anything slow her down, Staub worked on vaccines for anthrax while also teaching French, German, and first aid to soldiers engaged in World War II. From there she continued to work in her field and from 1955 to 1975 took part in research on antigens of Salmonella, identifying tyvelose as a component of the O-antigen of Salmonella. It’s thanks to Alexandra’s students that Wikipedia now has an article that shows her indelible mark on history so that current and future generations can read of her accomplishments.

When the Randolph County High School was founded in Wedowee, Alabama, principal Moses R. Weston could not have predicted the events that would unfold over the course of the next 85 years. Within three months of the school opening its doors the first school building would be destroyed by a large fire, necessitating the construction of a new high school or that the students would be moved to a new location in 1938. The school would later become the focus of a controversy in 1994, when then principal Hulond Humphries threatened to cancel the prom if any students chose to attend prom with someone of a different race. This threat was bravely challenged by ReVonda Bowen, a young lady of mixed race who asked who she should take to prom. However rather than admit that what he was doing was wrong, he instead told her that her your parents “made a mistake, having a mixed-race child.” Bowen filed a lawsuit against Humphries over his remarks and he was put on paid leave, however he was reinstated after only two weeks as neither he nor the school district admitted wrongdoing. In August 1994 the school caught on fire and media outlets such as Jet linked the fire to Humphries’ racist remarks about a year prior. As a result Humphries resigned as principal and took a position with the school district’s central office, overseeing the rebuilding of the school. The son of a local black protest leader was charged with the crime, however as there was no evidence the young man’s lawyer was able to get him acquitted. This school and its tumultuous history would likely have not had its own article if not for a student in Dr. Michel Aaij’s English Comp I class at Auburn University, Montgomery.

The altars at St. Peter Catholic Church during the Easter season are really beautiful. Image uploaded by a student in Dr. Michel Aaij’s English Comp I class at the Auburn University Montgomery.
Adorable goats lounge around Jackson Lake Island, uploaded by a student from Dr. Michel Aaij’s English Comp I class at the Auburn University Montgomery.
After completing filming for Big Fish, the set for the Town of Spectre was abandoned and is now a tourist attraction. Image uploaded by a student in Dr. Michel Aaij’s English Comp I class at the Auburn University Montgomery.

This summer Salisbury University professor Dr. Lena Woodis wanted her students to look at Big Ideas in Chemistry. They did more than just look, as one student chose to expand the article on Andreas Libavius. Hailed as a Renaissance man, Libavius was taught at the University of Jena and then became a physicist at the Gymnasium in Rothenburg before founding the Gymnasium at Coburg. However his interests was not limited to this as Libavius was interested in practicing alchemy, the foundations of which paved the way for modern day chemistry. He wrote several books on the topic, one of which, Alchemia, was one of the first chemistry textbooks ever written. Like many alchemists of the time, Libavius believed in chrysopoeia (χρυσοποιία), or the ability to transmute a base metal into gold. Along with alchemy texts Libavius also wrote on the subject of medicine and put out one of the first German medical texts as well as a four volume collection of lectures on natural science. However while relatively much is known about his works, little is known about his personal life. It’s known that he had four children and that he died in July 1616, however the name of his wife is uncertain.

Many people are aware of the cruel witch hunts that took place throughout the world, however some might be surprised that people are still accused of witchcraft. Possibly one of the most heinous accusations are against people who are not in a place to defend themselves or get out of a bad situation. Children are not exempt from these witchcraft accusations and solidly fall into the realm of those unable to defend or deflect these accusations. This is likely why one student in Joanna L. Pearce’s Applied History Project at the University of Waterloo chose to edit on the topic. Accusers may use one or more rationales to back up their claims, such as the child’s parentage — if a parent was a witch, the child must be as well. In Africa some believe that AIDS-related illnesses and deaths are a result of witchcraft. If a child is accused, this can lead to the family abandoning the child and the community shunning or punishing them. There are groups such as Safe Child Africa (formerly Stepping Stones Nigeria) aimed at helping these children and raising awareness in hopes of reducing the chances of this happening again in the future.

Smoked meat is a popular form of meat preservation that is practiced throughout the world. It has been used in the past to preserve food, as people lacked easy access to cold storage. It also helped to prepare them for times when food would otherwise be scarce. There’s almost no end to the types of meat that can be smoked; in Africa much of the fish caught is smoked via a heat smoker, as this is the taste preferred by local consumers. In the United States many citizens are fond of bacon and American barbecue, which which evolved from smoking techniques from Europe and Central Asia, combined with the Native American techniques. If you traveled to Scotland or to a grocery store that has it available, you could have Finnan haddie, cold-smoked haddock, which is representative of a regional method of smoking with green wood and peat and had its start in the village of Findon, during medieval times. With all of this variety, it’s not a surprise that a student in Timothy Henningsen’s Research, Writing, and the Production of Knowledge class at the College of DuPage chose to expand this article.

An image of American bacon, added to the smoked meat article edited by Timothy Henningsen’s Research, Writing, and the Production of Knowledge class at the College of DuPage.

Scholars & Scientists Program

Louise DeKoven Bowen
Helen Appo Cook

This month we had five Scholars & Scientists courses active on Wikipedia, in addition to our new Wikidata courses. Four of the five courses started to hit their stride this month, with a lot of exciting contributions to articles. The fifth just got started, but is already identifying some crucial topics for improvement.

In the course we’re running with the Colorado Alliance of Research Libraries, participants have been hard at working improving a wide range of scholarly and historical subjects:

  • Research question and classical reception studies. Wikipedia is known to struggle with articles on basic academic terms and on academic fields themselves. We were thus excited to see two scholars competently tackle these important topics which otherwise don’t see much activity on Wikipedia.
  • Florence Knoll, an architect, interior designer, furniture designer, and entrepreneur who died earlier this year. She had a major impact on office design. Among other things, she fought against gender stereotypes in professionalizing the field of interior design.
  • Great Lakes Theater, a classic theater company in Ohio founded in 1962.
  • Milicent Patrick (1915–1998), who worked as makeup artist, actor, special effects designer, and animator. She is known for creating the head costume for the titular Creature from the Black Lagoon in 1954.
  • Black Girl Magic, a movement the Huffington Post said aims to “celebrate the beauty, power, and resilience of Black women.”

In our two courses run in collaboration with the National Archives, Scholars continued to improve articles on women’s suffrage and suffragists, celebrating the centennial of the Nineteenth Amendment. We have been consistently impressed with the great impact Scholars are having on Wikipedia’s coverage of these topics at a time when we know the public is looking for more information about them:

We are also running two courses with the Society of Family Planning. Though the second course just started this month, participants in the first course are making great contributions to Wikipedia articles on abortion and contraception:

  • Tubal ligation, the surgical procedure commonly known as having one’s “tubes tied,” is one of the most popular forms of contraception. The article receives more than 550 pageviews every day. Over the past month it was significant expanded and improved by a Wiki Scientist, who is now responsible for 89% of the article.
  • An abortion fund is a non-profit that provides assistance to low-income women who cannot afford the costs of an abortion. The article was expanded, many parts rewritten, and many sources added or replaced by a Wiki Scientist.
  • The osmotic dilators article was a short stub with some outdated citations before a Wiki Scientist updated it and more than doubled its content.
  • Other Wiki Scientists fixed errors, updated statistics, expanded sections, or improved citations in the articles on emergency contraception, dilation and evacuation, late termination of pregnancy, mifepristone, and the main abortion article.


In July, we launched two exciting ways to learn about Wikidata: in person workshops and online courses.

Wiki Education Wikidata workshop at METRO, NYC July 2019

We facilitated our very first Wikidata workshop, a day-long, in person meeting hosted by METRO in New York. The curriculum for this workshop was a blend of beginner and intermediate Wikidata concepts, balancing presentations around various Wikidata fundamentals accompanied by live editing sessions. We had 14 participants sign up for this workshop, all of whom contributed to Wikidata during the course of the day. President of Wikimedia-NYC Megan Wacha and Wikimedian-in-Residence at the University of Virginia’s Data Science Institute Lane Rasberry were kind enough to take the time to speak to the participants about ways to participate in the Wikidata/Wikimedia movement on a regular basis. In addition to fostering community around Wikidata with librarians from the New York region, we’re excited to see further develop in-person curricula as well as our online course offerings.

Participants in this course added nearly 160 references to Wikidata and edited over 80 items. There were more than 300 total edits and eight new items were created! We are extremely pleased with the results of the workshop and look forward to facilitating more in the future.

Speaking of online courses, we also debuted our virtual Wikidata courses this month! We started our beginner’s Wikidata course, Join the Open Data Movement, and an intermediate course, Elevate Your Collections. These two virtual courses are geared toward connecting library resources and librarians to Wikidata. They will both be six weeks long, covering Wikidata policies, showcasing tools, sharing best editing practices, and connecting participants to the Wikidata community. The participants come from libraries, museums, and beyond. We are eager to get to know the 22 participants and look forward to the contributions they will make to Wikidata.

Visiting Scholars Program

Northeastern University Visiting Scholar Rosie Stephenson-Goodknight wrote two articles on notable women writers this month. Emily Gilmore Alden (1834–1914) was an author and educator on the faculty of Monticello Seminary, serving as the school’s poet for almost half a century. Amelia Minerva Starkweather (1840–1926) was an educator and author who spent much of her life working on philanthropic and charitable enterprises. Neither of these women had articles on Wikipedia before Rosie wrote their stories.

In 1954 President Dwight Eisenhower vetoed a coin intended to commemorate the 150th anniversary of the Louisiana Purchase. It passed both houses of Congress, but Eisenhower worried about getting into the habit of costly celebrations when it wasn’t clear there was much interest, but that there was potential for counterfeiting. The article about that proposed coin, the Louisiana Purchase Sesquicentennial half dollar was brought to Featured Article level this month by George Mason University Visiting Scholar Gary Greenbaum.

Andrew Newell, Visiting Scholar with the Deep Carbon Observatory, started an interesting new article on the deep biosphere, part of the biosphere below the first few meters of the surface and goes 5 or more kilometers below the continental surface and 10.5 kilometers below the sea surface.

Amelia Minerva Starkweather



In July, we were thrilled to confirm a new Wiki Scientists course with the National Science Policy Network (NSPN). NSPN is sponsoring an upcoming Wiki Scientists course as part of their 2020 Election Initiative. The initiative highlights the importance of rigorous science in policy making in the United States and offers early career scientists a way to get involved. 

The course will run from September 6–November 22, as our team of Wikipedia experts guides participants through the process of contributing high quality content to Wikipedia. While these early career scientists make critical scientific knowledge more accessible to the public, they’ll hone their science communication skills.


In July, we got a commitment for an 18-month unrestricted grant of $700K from the William and Flora Hewlett Foundation. This grant will give us the freedom and flexibility to work toward our current annual plan goals, as well as our longer-term strategic goals.

We continued our fundraising efforts by submitting letters of inquiry to the Bernard and Audre Rappoport Foundation and to the Leighty Foundation. We received a welcome response from the Leighty Foundation, with an indication that they will make a small, unrestricted donation (less than $10K) to Wiki Education.

We submitted our mid-year report to the Wikimedia Foundation and the report was approved. We expect a release of our second payment for this grant in early August.

Finally, Chief Advancement Officer TJ Bliss had several conversations with several funder partners who agreed to make introductions and connections to other potential funders. He also spoke with a high-net worth individual who is interested in work that Wiki Education might do in the future related to K–12 education.


We featured some guest blog posts on our site this month. Alliana Drury, an undergraduate student at Indiana University of Pennsylvania, urges educators in higher education to adopt Wikipedia writing assignments after she completed one herself. And Dr. Jason Todd the story of his students at Xavier University of Louisiana, who dramatically improved the Wikipedia article of their local town, saving the area from erasure in cultural memory.

And as a conclusion to the Spring term, Wikipedia Student Program Manager Helaine Blumenthal shared some take-aways instructors had once their Wikipedia assignments ended.

NC State University College of Natural Resources published a great article on the impact of Wikipedia assignments conducted at the university. And the Deep Carbon Observatory shared the impact of their members taking our professional development courses to learn Wikipedia editing themselves.

Blog posts:

External media:


In July we shipped major new features from two of our summer interns, along with an initial batch of improvements to the student user experience.

Amit Joki completed the main feature planned for his Google Summer of Code internship project: a much more intuitive way to pick which wikis to track for programs on the global Programs & Events Dashboard. During the program setup process (or later on), program leaders can now explicitly see and change the set of wikis that will be tracked. This was previously possibly only through a somewhat confusing workaround, and addresses the most frequently-asked question from new program leaders.

Khyati Soneji also completed the main feature planned for her Outreachy internship project: the Dashboard now keeps track of the number of references added by each revision (limited to English Wikipedia and Wikidata so far). This is the first time any metrics tool we know if in the Wikimedia ecosystem has supported ‘references added’ statistics.

Both Amit and Khyati started work on stretch goals this month. Amit’s next feature will allow individual articles to be ‘untracked’ for a course or program — a feature especially useful for thematic editathons where active editors participate and may make unrelated edits during the time period being tracked. Khyati’s new feature will add the option to scope a program’s statisitics based on a query from the powerful PetScan tool, which provides finer-grained control than the category- or template-based scoping we’ve had previously.

The ‘My Articles’ section of Dashboard course pages for students saw several improvements, including better handling of sandbox locations for the article(s) students are working on. In particular, when students are working in a group, they are now all pointed to the same sandbox to collaborate. This lays the groundwork for bigger changes that Software Developer Wes Reid has been designing for the Fall 2019 and Spring 2020 terms.

Finance & Administration

Overall expenses in July were $165K, (25K) less than the budgeted plan of $190K. Programs were under by ($18K) due to travel ($12K) and Communications ($2K). General and Administration were under by ($6K) due to a combination of timing, including taxes renewed the prior month ($5K), Cultivation Event ($4K), Professional Services ($2K), and an uptick in Shared expenses +$6K. While both General and Administration and Fundraising were both under by ($1K) each due to vacation use accrual. Governance was right on budget for the month of July.

Wiki Education Expenses 2019-07

Office of the ED

Current priorities:

  • Preparing for the upcoming quarterly finance and audit committee meeting
  • Continuing to experiment with different earned income models
  • Improving the coordination of work between the Advancement and the Programs department

In July, Frank went on kin care for the first half of the month, so he could take care of his wife who underwent neurosurgery. During the second half of the month, Frank conducted a postmortem for the NARA pilot, prepared his keynote for the quality track at this year’s Wikimania in Stockholm, provided feedback to the Wikimedia Foundation’s new strategy, created a draft budget for one of our new grant proposals, and performed other day-to-day duties in his double role as Executive Director and acting CFO. Also in July, Frank met twice remotely with Meaghan Duff, Senior Vice President of Partnerships and Strategy at Faculty Guild, a faculty professional development project, in order to learn from each other’s experiences in generating earned revenue.


* * *

This Month in GLAM: August 2019

07:22, Wednesday, 11 2019 September UTC

Measuring Wikipedia page load times

00:04, Wednesday, 11 2019 September UTC

This post shows how we measure and interpret load times on Wikipedia. It also explains what real-user metrics are, and how percentiles work.

Navigation Timing

When a browser loads a page, the page can include program code (JavaScript). This program will run inside the browser, alongside the page. This makes it possible for a page to become dynamic (more than static text and images). When you search on Wikipedia.org, the suggestions that appear are made with JavaScript.

Browsers allow JavaScript to access some internal systems. One such system is Navigation Timing, which tracks how long each step takes. For example:

  • How long to establish a connection to the server?
  • When did the response from the server start arriving?
  • When did the browser finish loading the page?

Where to measure: Real-user and synthetic

There are two ways to measure performance: Real user monitoring, and synthetic testing. Both play an important role in understanding performance, and in detecting changes.

Synthetic testing can give high confidence in change detection. To detect changes, we use an automated mechanism to continually load a page and extract a result (eg. load time). When there is a difference between results, it likely means that our website changed. This assumes other factors remained constant in the test environment. Factors such as network latency, operating system, browser version, and so on.

This is good for understanding relative change. But synthetic testing does not measure the performance as perceived by users. For that, we need to collect measurements from the user’s browser.

Our JavaScript code reads the measurements from Navigation Timing, and sends them back to Wikipedia.org. This is real-user monitoring.

How to measure: Percentiles

Imagine 9 users each send a request: 5 users get a result in 5ms, 3 users get a result in 70ms, and for one user the result took 560ms. The average is 88ms. But, the average does not match anyone’s real experience. Let’s explore percentiles!

Diagram showing 9 labels: 5ms, 5ms, 5ms, 5ms, 5ms, 70ms, 70ms, 70ms, and 560ms.

The first number after the lower half (or middle) is the median (or 50th percentile). Here, the median is 5ms. The first number after the lower 75% is 70ms (75th percentile). We can say that "for 75% of users, the service responded within 70ms". That’s more useful.

When working on a service used by millions, we focus on the 99th percentile and the highest value (100th percentile). Using medians, or percentiles lower than 99%, would exclude many users. A problem with 1% of requests is a serious problem. To understand why, it is important to understand that, 1% of requests does not mean 1% of page views, or even 1% of users.

A typical Wikipedia pageview makes 20 requests to the server (1 document, 3 stylesheets, 4 scripts, 12 images). A typical user views 3 pages during their session (on average).

This means our problem with 1% of requests, could affect 20% of pageviews (20 requests x 1% = 20% = ⅕). And 60% of users (3 pages x 20 objects x 1% = 60% ≈ ⅔). Even worse, over a long period of time, it is most likely that every user will experience the problem at least once. This is like rolling dice in a game. With a 16% (⅙) chance of rolling a six, if everyone keeps rolling, everyone should get a six eventually.

Real-user variables

The previous section focussed on performance as measured inside our servers. These measurements start when our servers receive a request, and end once we have sent a response. This is back-end performance. In this context, our servers are the back-end, and the user’s device is the front-end.

It takes time for the request to travel from the user’s device to our systems (through cellular or WiFi radio waves, and through wires.) It also takes time for our response to travel back over similar networks to the user’s device. Once there, it takes even more time for the device’s operating system and browser to process and display the information. Measuring this is part of front-end performance.

Differences in back-end performance may affect all users. But, differences in front-end performance are influenced by factors we don’t control. Such as network quality, device hardware capability, browser, browser version, and more.

Even when we make no changes, the front-end measurements do change. Possible causes:

  • Network. ISPs and mobile network carriers can make changes that affect network performance. Existing users may switch carriers. New users come online with a different choice distribution of carrier than current users.
  • Device. Operating system and browser vendors release upgrades that may affect page load performance. Existing users may switch browsers. New users may choose browsers or devices differently than current users.
  • Content change. Especially for Wikipedia, the composition of an article may change at any moment.
  • Content choice. Trends in news or social media may cause a shift towards different (kinds of) pages.
  • Device choice. Users that own multiple devices may choose a different device to view the (same) content.

The most likely cause for a sudden change in metrics is ourselves. Given our scale, the above factors usually change only for a small number of users at once. Or the change might happen slowly.

Yet, sometimes these external factors do cause a sudden change in metrics.

Case in point: Mobile Safari 9

Shortly after Apple released iOS 9 (in 2015), our global measurements were higher than before. We found this was due to Mobile Safari 9 introducing support for Navigation Timing.

Before this event, our metrics only represented mobile users on Android. With iOS 9, our data increased its scope to include Mobile Safari.

iOS 9, or the networks of iOS 9 users, were not significantly faster or slower than Android’s. The iOS upgrade affected our metrics because we now include an extra 15% of users – those on Mobile Safari.

Where desktop latency is around 330ms; mobile latency is around 520ms. Having more metrics from mobile, skewed the global metrics toward that category.

Line graph for responseStart metric from desktop pageviews. Values range from 250ms to 450ms. Averaging around 330ms.
Line graph for responseStart metric from mobile pageviews. Values range from 350ms to 700ms. Averaging around 520ms.

The above graphs plot the "75th percentile" of responseStart for desktop and mobile (from November 2015). We combine these metrics into one data point for each minute. The above graphs show data for one month. There is only enough space on the screen to have each point represent 3 hours. This works by taking the mean average of the per-minute values within each 3 hour block. While this provides a rough impression, this graph does not show the 75th percentile for November 2015. The next section explains why.

Average of percentiles

Opinions vary on how bad it is to take the average of percentiles over time. But one thing is clear: The average of many 1-minute percentiles is not the percentile for those minutes. Every minute is different, and the number of values also varies each minute. To get the percentile for one hour, we need all values from that hour, not the percentile summary from each minute.

Below is an example with values from three minutes of time. Each value is the response time for one request. Within each minute, the values sort from low to high.

Diagram with four sections. Section One is for the minute 08:00 to 08:01, it has nine values with the middle value of 5ms marked as the median. Section Two is for 08:01 to 08:02 and contains five values, the median is 560ms. Section Three is 08:02 to 08:03, contains five values, the median of Section Three is 70ms. The last section, Section Four, is the combined diagram from 08:00 to 08:03 showing all nineteen values. The median is 70ms.

The average of the three separate medians is 211ms. This is the result of (5 + 560 + 70) / 3. The actual median of these values combined, is 70ms.


To compute the percentile over a large period, we must have all original values. But, it’s not efficient to store data about every visit to Wikipedia for a long time. We could not quickly compute percentiles either.

A different way of summarising data is by using buckets. We can create one bucket for each range of values. Then, when we process a time value, we only increment the counter for that bucket. When using a bucket in this way, it is also called a histogram bin.

Let’s process the same example values as before, but this time using buckets.

There are four buckets. Bucket A is for values below 11ms. Bucket B is for 11ms to 100ms. Bucket C is for 101ms to 1000ms. And Bucket D is for values above 1000ms. For each of the 19 values, we find the associated bucket and increase its counter.

After processing all values, the counters are as follows. Bucket A holds 9, Bucket B holds 4, Bucket C holds 6, and Bucket D holds 0.

Based on the total count (19) we know that the median (10th value) must be in bucket B, because bucket B contains values 10 to 13. And that the 75th percentile (15th value) must be in bucket C because it contains values 14 to 19.

We cannot know the exact millisecond value of the median, but we know the median must be between 11ms and 100ms. (This matches our previous calculation, which produced 70ms.)

When we use exact percentiles, our goal was for that percentile to be a certain number. For example, if our 75th percentile today is 560ms, this means for 75% of users a response takes 560ms or less. Our goal could be to reduce the 75th percentile to below 500ms.

When using buckets, goals are defined differently. In our example, 6 out of 19 responses (32%) are above 100ms (bucket C and D), and 13 of 19 (68%) are below 100ms (bucket A and B). Our goal could be to reduce the percentage of responses above 100ms. Or the opposite, to increase the percentage of responses within 100ms.

Rise of mobile

Traffic trends are generally moving towards mobile. In fact, April 2017 was the first month where Wikimedia mobile pageviews reached 50% of all Wikimedia pageviews. And after June 2017, mobile traffic has stayed above 50%.

Bar chart showing percentages of mobile and desktop pageviews for each month in 2017. They mostly swing equal at around 50%. Looking closely, we see mobile first reaches 51% in April. In May it was below 50% again. But for June and every month since then mobile has remained above 50%. The peak was in October 2017, where mobile accounted for 59% of pageviews. The last month in the graph, November 2017 shows 53% of mobile pageviews.

Global changes like this have a big impact on our measurements. This is the kind of change that drives us to rethink how we measure performance, and (more importantly) what we monitor.

Further reading

The Wikimedia Foundation is excited to announce $2.5 million in support from Craig Newmark Philanthropies that will help to ensure the security of Wikipedia, as well as the organization’s other sites and global community of volunteers.

At a time of increased cybersecurity threats, this philanthropic investment from the organization of craigslist founder Craig Newmark will help the Wikimedia Foundation vigorously monitor and thwart risks to its free knowledge projects. This effort will also help to protect information about the organization’s users and projects as well as provide everyone around the world with safe and secure access to its platforms on all devices.

“Wikipedia’s continued success as a top-10 website that has hundreds of millions of users makes it a target for vandalism, hacking, and other cybersecurity threats that harm the free knowledge movement and community,” said John Bennett, Director of Security at the Wikimedia Foundation. “That’s why we are working proactively to combat problems before they arise. This investment will allow us to further expand our security programs to identify current and future threats, create effective countermeasures, and improve our overall security controls.”

With this support, the Wikimedia Foundation’s security team will grow and mature a host of security controls and services. These include application security, risk management, incident response, and more.

“As disinformation and other security threats continue to jeopardize the integrity of our democracy, we must invest in systems that protect the services that work so hard to get accurate and trustworthy information in front of the public,” said Newmark. “That’s why I eagerly continue to support the Wikimedia Foundation and its projects—like Wikipedia, the place where facts go to live.”

This investment builds on Craig Newmark’s long-time support of the free knowledge movement and the Wikimedia Foundation. Prior to this contribution, Newmark had donated nearly $2 million to the organization’s projects. This includes initial philanthropic funding for the Community Health Initiative, an effort that supports better tools and policies for addressing harassment on Wikimedia Foundation projects, as well as contributions to the Wikimedia Endowment.

“Our platforms were built on the belief that security and privacy sustain freedom of expression,” said Katherine Maher, Executive Director of the Wikimedia Foundation. “Now, more than ever before, there is an urgent need to invest in tools and practices that protect our users and our platforms. With Craig’s generous support, we will be able to better respond to security threats while building a sustainable source for free knowledge for everyone around the world.”

• • •

About the Wikimedia Foundation 

The Wikimedia Foundation is the international non-profit organization that operates Wikipedia and the other Wikimedia free knowledge projects. Our vision is a world in which every single human can freely share in the sum of all knowledge. We believe that everyone has the potential to contribute something to our shared knowledge, and that everyone should be able to access that knowledge freely. We host Wikipedia and the Wikimedia projects, build software experiences for reading, contributing, and sharing Wikimedia content, support the volunteer communities and partners who make Wikimedia possible, and advocate for policies that enable Wikimedia and free knowledge to thrive. The Wikimedia Foundation is a charitable, not-for-profit organization that relies on donations. We receive financial support from millions of individuals around the world, with an average donation of about $15. We also receive donations through institutional grants and gifts. The Wikimedia Foundation is a United States 501(c)(3) tax-exempt organization with offices in San Francisco, California, USA.

About Craig Newmark Philanthropies

Craig Newmark Philanthropies was created by craigslist founder Craig Newmark to support and connect people and drive broad civic engagement. It works to advance people and grassroots organizations that are getting stuff done in areas that include trustworthy journalism & information security, voter protection, gender diversity in technology, and veterans & military families.

Moving Plants

02:59, Tuesday, 10 2019 September UTC
All humans move plants, most often by accident and sometimes with intent. Humans, unfortunately, are only rarely moved by the sight of exotic plants. 

Unfortunately, the history of plant movements is often difficult to establish. In the past, the only way to tell a plant's homeland was to look for the number of related species in a region to provide clues on their area of origin. This idea was firmly established by Nikolai Vavilov before he was sent off to Siberia, thanks to Stalin's crank-scientist Lysenko, to meet an early death. Today, genetic relatedness of plants can be examined by comparing the similarity of DNA sequences (although this is apparently harder than with animals due to issues with polyploidy). Some recent studies on individual plants and their relatedness have provided insights into human history. A study on baobabs in India and their geographical origins in East Africa established by a study in 2015 and that of coconuts in 2011 are hopefully just the beginnings. These demonstrate ancient human movements which have never received much attention from most standard historical accounts.
Inferred trasfer routes for Baobabs -  source

Unfortunately there are a lot of older crank ideas that can be difficult for untrained readers to separate. I recently stumbled on a book by Grafton Elliot Smith, a Fullerian professor who succeeded J.B.S.Haldane but descended into crankdom. The book "Elephants and Ethnologists" (1924) can be found online and it is just one among several similar works by Smith. It appears that Smith used a skewed and misapplied cousin of Dollo's Law. According to him, cultural innovation tended to occur only once and that they were then carried on with human migrations. Smith was subsequently labelled a "hyperdiffusionist", a disparaging term used by ethnologists. When he saw illustrations of Mayan sculpture he envisioned an elephant where others saw at best a stylized tapir. Not only were they elephants, they were Asian elephants, complete with mahouts and Indian-style goads and he saw this as definite evidence for an ancient connection between India and the Americas! An idea that would please some modern-day Indian cranks and zealots.

Smith's idea of the elephant as emphasised by him.
The actual Stela in question
 "Fanciful" is the current consensus view on most of Smith's ideas, but let's get back to plants. 

I happened to visit Chikmagalur recently and revisited the beautiful temples of Belur on the way. The "Archaeological Survey of India-approved" guide at the temple did not flinch when he described an object in the hand of a carved figure as being maize. He said maize was a symbol of prosperity. Now maize is a crop that was imported to India and by most accounts only after the Portuguese reached the Americas in 1492 and made sea incursions into India in 1498. In the late 1990s, a Swedish researcher identified similar  carvings (actually another one at Somnathpur) from 12th century temples in Karnataka as being maize cobs. It was subsequently debunked by several Indian researchers from IARI and from the University of Agricultural Sciences where I was then studying. An alternate view is that the object is a mukthaphala, an imaginary fruit made up of pearls.
Somnathpur carvings. The figures to the
left and right hold the puported cobs in their left hands.
(Photo: G41rn8)

The pre-Columbian oceanic trade ideas however do not end with these two cases from India. The third story (and historically the first, from 1879) is that of the sitaphal or custard apple. The founder of the Archaeological Survey of India, Alexander Cunningham, described a fruit in one of the carvings from Bharhut, a fruit that he identified as custard-apple. The custard-apple and its relatives are all from the New World. The Bharhut Stupa is dated to 200 BC and the custard-apple, as quickly pointed out by others, could only have been in India post-1492. The Hobson-Jobson has a long entry on the custard apple that covers the situation well. In 2009, a study raised the possibility of custard apples in ancient India. The ancient carbonized evidence is hard to evaluate unless one has examined all the possible plant seeds and what remains of their microstructure. The researchers however establish a date of about 2000 B.C. for the carbonized remains and attempt to demonstrate that it looks like the seeds of sitaphal. The jury is still out.
The Hobson-Jobson has an interesting entry on the custard-apple
I was quite surprised that there are not many writings that synthesize and comment on the history of these ideas on the Internet and somewhat oddly I found no mention of these three cases in the relevant Wikipedia article (naturally, fixed now with an entire new section) - pre-Columbian trans-oceanic contact theories

There seems to be value for someone to put together a collation of plant introductions to India along with sources, dates and locations of introduction. Some of the old specimens of introduced plants may well be worthy of further study.

Introduction dates
  • Pithecollobium dulce - Portuguese introduction from Mexico to Philippines and India on the way in the 15th or 16th century. The species was described from specimens taken from the Coromandel region (ie type locality outside native range) by William Roxburgh.
  • Eucalyptus globulus? - There are some claims that Tipu planted the first of these (See my post on this topic).  It appears that the first person to move eucalyptus plants (probably E. globulosum) out of Australia was  Jacques Labillardière. Labillardiere was surprized by the size of the trees in Tasmania. The lowest branches were 60 m above the ground and the trunks were 9 m in diameter (27 m circumference). He saw flowers through a telescope and had some flowering branches shot down with guns! (original source in French) His ship was seized by the British in Java and that was around 1795 or so and released in 1796. All subsequent movements seem to have been post 1800 (ie after Tipu's death). If Tipu Sultan did indeed plant the Eucalyptus here he must have got it via the French through the Labillardière shipment.  The Nilgiris were apparently planted up starting with the work of Captain Frederick Cotton (Madras Engineers) at Gayton Park(?)/Woodcote Estate in 1843.
  • Muntingia calabura - when? - I suspect that Tickell's flowerpecker populations boomed after this, possibly with a decline in the Thick-billed flowerpecker.
  • Delonix regia - when?
  • In 1857, Mr New from Kew was made Superintendent of Lalbagh and he introduced in the following years several Australian plants from Kew including Araucaria, Eucalyptus, Grevillea, Dalbergia and Casuarina. Mulberry plant varieties were introduced in 1862 by Signor de Vicchy. The Hebbal Butts plantation was establised around 1886 by Cameron along with Mr Rickets, Conservator of Forests, who became Superintendent of Lalbagh after New's death - rain trees, ceara rubber (Manihot glaziovii), and shingle trees(?). Apparently Rickets was also involved in introducing a variety of potato (kidney variety) which got named as "Ricket". -from Krumbiegel's introduction to "Report on the progress of Agriculture in Mysore" (1939) [Hebbal Butts would be the current day Airforce Headquarters)

Further reading
  • Johannessen, Carl L.; Parker, Anne Z. (1989). "Maize ears sculptured in 12th and 13th century A.D. India as indicators of pre-columbian diffusion". Economic Botany 43 (2): 164–180.
  • Payak, M.M.; Sachan, J.K.S (1993). "Maize ears not sculpted in 13th century Somnathpur temple in India". Economic Botany 47 (2): 202–205. 
  • Pokharia, Anil Kumar; Sekar, B.; Pal, Jagannath; Srivastava, Alka (2009). "Possible evidence of pre-Columbian transoceanic voyages based on conventional LSC and AMS 14C dating of associated charcoal and a carbonized seed of custard apple (Annona squamosa L.)" Radiocarbon 51 (3): 923–930. - Also see
  • Veena, T.; Sigamani, N. (1991). "Do objects in friezes of Somnathpur temple (1286 AD) in South India represent maize ears?". Current Science 61 (6): 395–397.
  • Rangan, H., & Bell, K. L. (2015). Elusive Traces: Baobabs and the African Diaspora in South Asia. Environment and History, 21(1):103–133. doi:10.3197/096734015x1418317996982 [The authors however make a mistake in using Achaya, K.T. Indian Food (1994) who in turn cites Vishnu-Mittre's faulty paper for the early evidence of Eleusine coracana in India. Vishnu-Mittre himself admitted his error in a paper that re-examined his specimens - see below]
Dubious research sources
  • Singh, Anurudh K. (2016). "Exotic ancient plant introductions: Part of Indian 'Ayurveda' medicinal system". Plant Genetic Resources. 14(4):356–369. 10.1017/S1479262116000368. [Among the claims here are that Bixa orellana was introduced prior to 1000 AD - on the basis of Sanskrit names which are assigned to that species - does not indicate basis or original dated sources. The author works in the "International Society for Noni Science"! ] 
  • The same author has rehashed this content with several references and published it in no less than the Proceedings of the INSA - Singh, Anurudh Kumar (2017) Ancient Alien Crop Introductions Integral to Indian Agriculture: An Overview. Proceedings of the Indian National Science Academy 83(3). There is a series of cherry-picked references, many of the claims of which were subsequently dismissed by others or remain under serious question. In one case there is a claim for early occurrence of Eleusine coracana in India - to around 1000 BC. The reference cited is in fact a secondary one - the original work was by Vishnu-Mittre and the sample was rechecked by another bunch of scientist and they clearly showed that it was not even a monocot - in fact Vishnu-Mittre himself accepted the error - the original paper was Vishnu-Mittre (1968). "Protohistoric records of agriculture in India". Trans. Bose Res. Inst. Calcutta. 31: 87–106. and the re-analysis of the samples can be found in - Hilu, K. W.; de Wet, J. M. J.; Harlan, J. R. Harlan (1979). "Archaeobotanical Studies of Eleusine coracana ssp. coracana (Finger Millet)". American Journal of Botany. 66 (3):330–333. Clearly INSA does not have great peer review and have gone with argument by claimed authority.
  • PS 2019-August. Singh, Anurudh, K. (2018). Early history of crop presence/introduction in India: III. Anacardium occidentale L., Cashew Nut. Asian Agri-History 22(3):197-202. Singh has published another article claiming that cashew was present in ancient India well before the Columbian exchange - with "evidence" from J.L. Sorenson of a sketch purportedly made from a Bharhut stupa balustrade carving - the original of which is not found here and a carving from Jambukeshwara temple with a "cashew" arising singly and placed atop a stalk that rises from below like a lily! He also claims that some Sanskrit words and translations (from texts/copies of unknown provenance or date) confirm ancient existence. I accidentally asked about whether he had examined his sources carefully and received a rather interesting response which I find very useful as a classic symptom of the problems of science in India. More interestingly I learned that John L. Sorenson is well known for his affiliation with the Church of Jesus Christ of Latter-day Saints and apparently part of Mormon foundations is the claim that Mesoamerican cultures were of Semitic origin and much of the "research" of their followers have attempted to bolster support for this by various means.

Women in Red at UMass Lowell is a one-year project to build digital literacy capacity in higher education and address the gender gap on campus and in Wikipedia. To kick off this project, 13 faculty at the University of Massachusetts Lowell will be accepted into Wiki Education’s synchronous Wiki Scholars course, meeting online once a week with a team of Wikipedia experts for an in-depth training about how to contribute content to Wikipedia. The application for interested UMass Lowell faculty is open now and closes September 20th!

There are three key components to this project:

1) faculty learning how to edit Wikipedia together;

2) student participation in Wikipedia; and

3) expanding public knowledge of notable women.


1) Wiki Scholars course teaches faculty how to edit Wikipedia

Over 10 weeks, Wiki Education’s team of Wikipedia experts will facilitate collaborative group sessions among UMass Lowell faculty to immerse them in Wikipedia’s technical, procedural, and cultural practices. Wiki Education will help these scholars incorporate published information about notable women from the community or from their field of study to Wikipedia. The Center for Women & Work, a research center at UMass Lowell, has compiled a list of women scholars for inclusion in the project. Each scholar will use publications available to UMass Lowell, this list of women scholars, and other relevant sources to significantly expand at least two biographies of notable women.

Upon course completion, participants will receive a shareable, electronic certificate issued by UMass Lowell and Wiki Education, designating them as UMass Lowell Wiki Scholars. They’ll have developed the technical skills and Wikipedia know-how to disseminate their knowledge to the public and build Wikipedia-writing assignments into their curriculum.

2) Faculty pass on their new skills to students

Once participating faculty members complete their Wiki Scholars course, they will commit to teaching with Wikipedia in the following academic year. That’s when they’ll sign up to receive free tools and support for their classrooms through our Student Program. Building curriculum around Wikipedia-writing assignments is a great way to engage students and enhance their 21st Century digital literacy skills.

We believe this training period will give UMass Lowell instructors added confidence and competence to implement Wikipedia-writing projects into their curricula for the very first time.

3) The public benefits from more information about notable women

Institutions like UMass Lowell have archives about historic women in the community that they are eager to share with the public. Faculty who are accepted into this Wiki Scholars course will help uncover the lives and contributions of those women by bringing the information to Wikipedia. In doing so, they’ll also be helping correct a serious imbalance in Wikipedia’s content: only 17.97% of Wikipedia’s biographies are of women.

If you search for ‘Jacqueline Moloney’ there is no Wikipedia page for her. She is listed as Chancellor on the UMass Lowell Wikipedia page, one of the six women mentioned on the page (four of whom have buildings named after them, like Dierexa Southwick). Professor Holly Yanco, Distinguished University Professor, is mentioned as well, in the description of the NERVE Center.

If you search for UMass Lowell, you’ll find two Wikipedia pages: one for the University itself and one for the River Hawks. But under ‘notable’ people of UMass Lowell are four individuals, all men: Andre Dubus III — novelist and UML faculty member; Craig MacTavish — former NFL player and GM; Marty Meehan — former Congressman and current president of the UMass system; and, James Costos — former ambassador to Spain and Andorra.

Nowhere are the notable women of UMass Lowell mentioned. Not Thelma Todd, the film actress, who attended UMass Lowell when it was the Lowell Normal School. Or Mary Agnes Hallaren, who graduated from Lowell Teacher’s College in 1927 and went on to become the head of the Women’s Army Corps. Hallaren was the first woman to serve as a regular Army officer. She commanded the first women’s battalion to go overseas in 1943. Mary Agnes Hallaren was awarded the Legion of Honor, the Bronze Star, and the Army Commendation Medal. She was featured in Tom Brokaw’s The Greatest Generation. Todd and Hallaren have their own Wikipedia pages, but are not mentioned as ‘notable’ on the University’s own page. Is there a bias here?

For UMass Lowell faculty interested in joining this project, visit the landing page for more information and to apply by September 20, 2019.

This project is a collaboration among the UMass Lowell College of Education (supported by Judith Davidson), UMass Lowell Libraries (supported by Sara Marks and Anthony Sampas), and the Center for Women & Work (supported by June Lemen). These departments have sponsored 13 seats in this course. Participation for accepted UMass Lowell faculty is free.

If you’re interested in buying out a customized professional development course for faculty at your institution, contact Director of Partnerships Jami Mathewson at jami@wikiedu.org.

Tech News issue #37, 2019 (September 9, 2019)

00:00, Monday, 09 2019 September UTC
TriangleArrow-Left.svgprevious 2019, week 37 (Monday 09 September 2019) nextTriangleArrow-Right.svg
Other languages:
Bahasa Indonesia • ‎Deutsch • ‎English • ‎italiano • ‎polski • ‎português do Brasil • ‎suomi • ‎čeština • ‎Ελληνικά • ‎русский • ‎українська • ‎עברית • ‎العربية • ‎فارسی • ‎हिन्दी • ‎中文 • ‎日本語

weeklyOSM 476

16:40, Sunday, 08 2019 September UTC



The tourism organisation of the Durmitor National Park in Žabljak, Montenegro, recommends the use of OSM 1 | Photo © CC0


  • Stolpersteine (literally “stumbling blocks”) are small brass-plated cubes laid, around Europe, in front of the last-known residences or workplaces of those who were driven out or murdered by the Nazis. Reclus asked (automatic translation) if the 8700 Stolpersteine with a Wikidata entry are linked to OpenStreetMap.
  • Hauke ​​Stieler has made (automatic translation) a map of objects tagged shop=yes in Germany. Rendering of shop=yes was dropped in OpenStreetMap Carto v4.22.0.
  • amilopowers’s proposal to tag the possibility of withdraw cash in a shop or amenity can now be voted on.
  • Klumbumbus proposes traffic_calming=dynamic_bump for the new type of dynamic traffic calming, whose impact depends on the driver’s speed, and asks for your opinion.
  • Vadim Shlyakhov is proposing leisure=sunbathing to mark outdoor locations where people can sunbathe.


  • The OSM Operations Team announced that anonymous users will shortly be no longer able to comment on notes. The reasons and more background information can be found in a GitHub issue, which was opened 2 years ago.
  • Samuel Darkwah Manu, founder of the Kwame Nkrumah University of Science and Technology YouthMappers and a member of the OSM Ghana community, shares his experience participating in the Open Cities Accra project.
  • OpenStreetMap encourages all mappers to vote for the OpenStreetMap Awards 2019. Voting ends on 18 September so vote now.

OpenStreetMap Foundation

  • For the upcoming 2019 OSMF Board Elections Kate Chapman announced that Mikel Maron will be up for re-election and Frederik Ramm, as well as Kate Chapman, will be stepping down and not be re-running.


  • The State of the Map is looming. In less than 2 weeks the annual OSM event, with many interesting events, will start in Heidelberg, Germany. The State of the Map will take place from 21st to 23rd September and follows directly after the HOT Summit at the same place.
  • Lukas and Fabian, from HeiGIT, presented a 90 minute lab about analysing OpenStreetMap data history with the ohsome platform at the FOSS4G 2019 conference in Bucharest. The teaching material and code has been made available as linked snippets in the GIScience HD Gitlab.

Humanitarian OSM

  • HOT reports about the work on setting up an effective solid waste collection system with Open Source Tools in Dar es Salaam.


  • [1] The tourism organisation of the Durmitor National Park in Žabljak, Montenegro, recommends the use of OSM on their mountain bike maps and also on their homepage with the wording “For successful orientation in the region of Durmitor and Sinjajevina, we recommend that you use OpenStreetMap and Open Cycle Maps which are regularly updated and new information is added every day”.

Open Data

  • The German federal state Saxony has released (automatic translation) aerial images, digital topographic maps, elevation and landscape models, and cadastre data to the public. Unfortunately the new data is licensed with a CC-BY-like licence and, hence, not compatible with our requirements. However, orthophotos and a public map with roads, road names, building footprints and house numbers were already and will continue to be available to OSM mappers.
  • The CCC is hosting a video, from the Free and Open Source Software for Geospatial (FOSS4G) event in Bucharest, about how to use OSM and Wikidata together with data science tools and Python.


  • tchaddad explains, in his user diary, how Wikidata queries using SPARQL and the API work, and how Wikidata could be used to improve Nominatim.


  • Paul Norman informed us about an update to Carto, OSM’s main map style. The minor improvements include bug fixes, performance and code cleanup, as well as some optical changes such as retail colour fill on malls, and the rendering of historic=citywalls the same as barrier=city_wall.
  • Sarah Hoffmann announced a new release of osm2pgsql. The new version (1.0.0) drops support for old-style multipolygons and has received major functionality improvements.

Did you know …

  • … Matt Mikus answered the question of whether Minnesota has more shoreline than California, Hawaii and Florida, combined. With the help of OSM data he found that the answer is yes if you include rivers and streams in your calculation.
  • … about the ski resort CopenHill in the Danish capital Copenhagen? It was built, as a globally unique project, on the green roof of a waste incineration plant to give Danes an opportunity to spend their ski holidays in their own country. In OSM it looks like this.

OSM in the media

  • The Austrian newspaper Der Standard wrote (de) an article about China’s practice of distorting its maps. While Russia ended its efforts to falsify roads, rivers and even city quarters at the end of the 80s, in light of upcoming satellite imagery based maps, China continues its own efforts for unknown reasons. A map law, with 68 paragraphs, requires that only approved maps are allowed to be published, with “correct” borders of course. Only 14 Chinese companies have a licence to produce and publish maps. The distortion is assumed to range between 50 and 700 metres and can be seen when comparing a satellite image on Google maps with the road map layer. The article mentions OSM as an alternative with “controversial legality”.

Other “geo” things

  • On Day 3 of Pista ng Mapa, Leigh deployed her DJI Phantom 4 to survey the event venue and its surrounding community. Leigh uploaded all of the drone-derived data into OpenAerialMap, including the elevation models. Maning Sambale demonstrates how to use QGIS to extract the heights from the derived DSM/DTM and to use them for visualising building polygons from OpenSteetMap.
  • The Africa Geospatial Data and Internet Conference 2019 will be held in Accra, Ghana from 22nd to 24th October. The conference aims to bring people together in discussions on public policy issues relating to geospatial and open data, ICTs and the Internet in Africa.
  • For more than 30 years WGS84 has acted as a “pivot” datum through which one datum can be transformed into another. Michael Giudici explains how, in a world that demands sub-metre accuracy, the WGS84 pivot has outlived its usefulness. The “GDAL Coordinate System Barn Raising” is currently working on improving GDAL, PROJ, and libgeotiff so they can handle time-dependent coordinate reference systems and accurately transform between datums.
  • Katja Seidel investigated (automatic translation) alternatives to GPSies to use after it is merged into AllTrails.

Upcoming Events

Where What When Country
Minneapolis State of the Map U.S. 2019 [1] 2019-09-06-2019-09-08 united states
Taipei OSM x Wikidata #8 2019-09-09 taiwan
Bordeaux Réunion mensuelle 2019-09-09 france
Toronto Toronto Mappy Hour 2019-09-09 canada
Salt Lake City SLC GeoBeers 2019-09-10 united states
Hamburg Hamburger Mappertreffen 2019-09-10 germany
Lyon Rencontre mensuelle pour tous 2019-09-10 france
Wuppertal OSM-Treffen Wuppertaler Stammtisch im Hutmacher 18 Uhr 2019-09-11 germany
Leoben Stammtisch Obersteiermark 2019-09-12 austria
Munich Münchner Stammtisch 2019-09-12 germany
Berlin 135. Berlin-Brandenburg Stammtisch 2019-09-12 germany
San José Civic Hack Night & Map Night 2019-09-12 united states
Budapest OSM Hungary Meetup reboot 2019-09-16 hungary
Bratislava Missing Maps mapathon Bratislava #7 2019-09-16 slovakia
Habay Rencontre des contributeurs du Pays d’Arlon 2019-09-16 belgium
Cologne Bonn Airport Bonner Stammtisch 2019-09-17 germany
Lüneburg Lüneburger Mappertreffen 2019-09-17 germany
Reading Reading Missing Maps Mapathon 2019-09-17 united kingdom
Salzburg Maptime Salzburg Mapathon 2019-09-18 austria
Edinburgh FOSS4GUK 2019 2019-09-18-2019-09-21 united kingdom
Heidelberg Erasmus+ EuYoutH OSM Meeting 2019-09-18-2019-09-23 germany
Heidelberg HOT Summit 2019 2019-09-19-2019-09-20 germany
Heidelberg State of the Map 2019 [2] 2019-09-21-2019-09-23 germany
Nantes Journées européennes du patrimoine 2019-09-21 france
Bremen Bremer Mappertreffen 2019-09-23 germany
Nottingham Nottingham pub meetup 2019-09-24 united kingdom
Mannheim Mannheimer Mapathons 2019-09-25 germany
Lübeck Lübecker Mappertreffen 2019-09-26 germany
Düsseldorf Stammtisch 2019-09-27 germany
Dortmund Mappertreffen 2019-09-27 germany
Nagoya 第2回まちマップ道場-伊勢湾台風被災地を訪ねる- 2019-09-28 japan
Strasbourg Rencontre périodique de Strasbourg 2019-09-28 france
Kameoka 京都!街歩き!マッピングパーティ:第12回 穴太寺(あなおじ) 2019-09-29 japan
Prizren State of the Map Southeast Europe 2019-10-25-2019-10-27 kosovo
Dhaka State of the Map Asia 2019 2019-11-01-2019-11-02 bangladesh
Wellington FOSS4G SotM Oceania 2019 2019-11-12-2019-11-15 new zealand
Grand-Bassam State of the Map Africa 2019 2019-11-22-2019-11-24 ivory coast

Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please check your event in our public calendar preview and correct it, where appropriate.

This weeklyOSM was produced by Polyglot, Rogehm, SK53, SunCobalt, TheSwavu, YoViajo, derFred, geologist, jinalfoflia.

2019 Youth Film Festival in Charlottesville

11:13, Sunday, 08 2019 September UTC

On Saturday 7 September 2019 I attended the 18th Annual Youth Film Festival in Charlottesville. A nonprofit organization called Light House Studio presents this.

I like that there is an organization which provides a channel for youth to produce and publish films locally. Because I am a media access advocate, I liked less that all the films had a tag “copyright Lighthouse Studio”, which communicates that the nonprofit organization acquires the copyright from all creators in the program. I do not necessarily mind them acquiring the copyright but they also assert a conventional copyright license after the manner of a film studio, and I would prefer that either they use a free and open license or permit creators to retain the copyright. The other context I have for my view is that I have seen repeatedly that nonprofit organizations of this sort invest no budget, expertise, or consideration of the long-term management of their media collections, and typically they lose the cataloging metadata of content which they produce. The usual outcome is that the media becomes mostly undiscoverable in a few years, when I would rather it be archived for the long term. This is all speculation based on my past experience, observations, their copyright notice, and their lack of published archiving procedure.

I enjoyed the works. I liked the two documentaries more than the others. One was interviews with local Charlottesville students who were immigrants from Central or South America. Those students said that people in Charlottesville harassed them either for being Latino or speaking Spanish. This seems believable to me because as a recent move here I see strange racism and prejudice here continually. Local people typically express the idea that Charlottesville is a friendly place but they compare it to other towns in the region, which people describe as either ignorant or sometimes proundly hatemongering. I still get surprised when I see historically oppressed demographics here, women, black people, Latin, LGBT+ and the rest act deferentially to an oppressive norm. 

Another documentary had students visit local nursing homes, ask residents where they would like to virtually visit, then put virtual reality headsets on them. This was after the genre of video for exposing someone to technology not of their generation. The people in this video had little awareness of virtual reality and were moved by the experience.

Young people are capable of meaningful media creation and publishing when they have the opportunity to do so. I expect that participants in the program take great inspiration for years from the work they produce. Probably the video production for these films happens in a week, so as a life experience, this entire program seems high impact at relatively low cost. I recognize that a complicated nonprofit network must exist in a community for this to work, including funding to the host organization but also to the youth organizations which make the student participants ready to join these programs.

Some of the homes and locations featured in this program were evidence that at least one young person on the production team was from a wealthy family. I try to notice when there is a nonprofit community resource which offers benefits within easier reach of the wealthy as compared to the underserved. I appreciate that the host organization in this case is seeking diversity, but of course diversity costs money and the rich kids’ families pay the participation fee.

The entire event was great and would compare favorably with anything similar.

Language barriers to @Wikidata

14:49, Saturday, 07 2019 September UTC
Wikidata is intended to serve all the languages of all the Wikipedias for starters. It does in one very important way; all the interwiki links or the links between articles on the same subject are maintained in Wikidata.

For most other purposes Wikidata serves the "big" languages best, particularly English. This is awkward because particularly people reading other languages stand to gain most from Wikidata. The question is: how do we chip away on this language barrier.

Giving Wikidata data an application is the best way to entice people to give Wikidata a second look.. Here are two:
  • Commons is being wikidatified and it now supports a "depicts" statement. As more labels become available in a language, finding pictures in "your" language becomes easy and obvious. It just needs an application
  • Many subjects are likely to be of interest in a language. Why not have projects like the Africa project with information about Africa shared and updated by the Listeria bot? Add labels and it becomes easier to use, link to Reasonator for understanding and add articles for a Wikipedia to gain content.
Key is the application of our data. Wikidata includes a lot, the objective is to find the labels and we will when the results are immediately applicable. It will also help when we consider the marketing opportunities that help foster our goals.


@Wikidata - #Quality is in the network

13:15, Saturday, 07 2019 September UTC
What amounts to quality is a recurring and controversial subject. For me quality is not so much in the individual statements for a particular Wikidata item, it is in how it links to other items.

As always, there has to be a point to it. You may want to write Wikipedia articles about chemists, artists, award winners. You may want to write to make the gender gap less in your face but who to write about?

Typically connecting to small subsets is best. However we want to know about the distribution of genders so it is very relevant to add a gender. Statistically it makes no difference in the big picture but for subsets like: the co-authors of a scientist or a profession, an award, additional data helps understand how the gender gap manifests itself.

The inflation of "professions" like "researcher" is such that it is no longer distinctive, at most it helps with the disambiguation from for instance soccer stars. When a more precise profession is known like "chemist" or "astronomer", all subclasses of researcher, it is best to remove researcher as it is implied.

Lists like members of "Young Academy of Scotland", have their value when they link as widely as possible. Considering only Wikidata misses the point, it is particularly the links to the organisations, the authorities (ORCiD, Google Scholar, VIAF) but also Twitter like for this psychologist. We may have links to all of them, the papers, the co-authors. But do we provide quality when people do not go into the rabbit hole?

Today, Wikipedia was hit with a malicious attack that has taken it offline in several countries for intermittent periods. The attack is ongoing and our Site Reliability Engineering team is working hard to stop it and restore access to the site.

As one of the world’s most popular sites, Wikipedia sometimes attracts “bad faith” actors. Along with the rest of the web, we operate in an increasingly sophisticated and complex environment where threats are continuously evolving. Because of this, the Wikimedia communities and Wikimedia Foundation have created dedicated systems and staff to regularly monitor and address risks. If a problem occurs, we learn, we improve, and we prepare to be better for next time.

We condemn these sorts of attacks. They’re not just about taking Wikipedia offline. Takedown attacks threaten everyone’s fundamental rights to freely access and share information. We in the Wikimedia movement and Foundation are committed to protecting these rights for everyone.

Right now, we’re continuing to work to restore access wherever you might be reading Wikipedia in the world. We’ll keep you posted.

Wikipedia tells us that “the Umeda Sky Building is the nineteenth-tallest building in Osaka Prefecture, Japan, and one of the city’s most recognizable landmarks. It consists of two 40-story towers that connect at their two uppermost stories, with bridges and an escalator crossing the wide atrium-like space in the center.”

This photo comes to us from Wikimedia Commons, the freely licensed media repository whose holdings are extensively used on Wikimedia’s many projects, including Wikipedia. You can use the photo for just about any purpose as long as you credit the author (Martin Falbisoner), copyright license (CC BY-SA 4.0), and link to the original URL.

Ed Erhart, Senior Editorial Associate, Communications
Wikimedia Foundation

This post is the ninth installment of a weekly series, and you can sign up for our MailChimp mailing list to be notified when the next edition is published.

The anatomy of search: In search of…

22:00, Thursday, 05 2019 September UTC

A galloping overview

As we have done before, let’s get a bird’s-eye view of the parts of the search process: text comes in and gets processed and stored in a database (called an index); a user submits a query; documents that match the query are retrieved from the index, ranked based on how well they match the query, and presented to the user.

Today we’ll look at the user’s query and how potential results are matched and ranked.

In search of…

At this point, we’ve done a lot of preparation getting our documents ready to be searched. In previous installments, we’ve looked at how we break text into tokens (which are approximately words, in English), normalize them (converting to lowercase, removing diacritics, and more), remove stop words, stem the remainder (converting them to their root forms), and store the results in an index (which is, roughly, a database of the words in all our documents). Now that we have our text stored in an index or two, things are going to move along much faster as we take advantage of all that preparation: we are ready for users to submit queries!

The first thing we do when we we get a user query is that we process it pretty much like we did the text we’ve indexed: we tokenize, normalize, and stem the query, so that the words of the query are in the same form as the words from our documents in our index.

Suppose a user—having unconsciously recast The Cat in the Hat in the mold of Of Mice and Men—searches for Of Cats and Hats.[1] Our tokens are Of, Cats, and, Hats. These get regularized—here just lowercased—to of, cats, and, hats. Removing stop words (of, and) and stemming the rest leaves cat, hat.

Screenshot of the results of searching for Of Cats and Hats on English Wikipedia (in August 2019). The top three matches are all related to The Cat in the Hat, in part because stop words (like of, and, the, and in) are mostly ignored, and stemming equates cats with cat, and hats with hat. The sister search results from Wikisource are not too exciting, but the Wikiquote results look promising!

Looking up the two remaining words—cat and hat—in our index (which for now we’ll say has only five documents in it, D1–D5), we find the following document and word position information. (Note that “W4, W27” in column D3 for cat means that cat was found in Document #3 as Word #4 and #27):

Okay… what next?

AND and OR or NOT—Wait, what?

The classic search engine approach for combining search terms uses Boolean logic, which combines true and false values with AND, OR, and NOT operators.[2] In the search context, true and false refer to whether your search term is in a particular document. Basic search engines generally use an implicit AND operator. This means that a query like cat dog mouse is equivalent to cat AND dog AND mouse—requiring all three terms to be in a document for it to be shown as a result.

In our Of Cats and Hats example above, that means that both cat and hat need to be in a document for it to be shown as a result. We can see from the inverted index snippets above that both cat and hat occur in D2 and D3, while only cat is in D1 and only hat is in D4 and D5. So an implicit (or explicit[4]) AND operator would return the intersection of the two lists, namely D2 and D3 for the query Of Cats and Hats.

The OR operator returns the union of the two lists—so any document with either cat or hat is a good result. Thus, D1, D2, D3, D4, and D5 would all be results for cat OR hat.

Classic boolean search also allows you to exclude results that have a particular word in them, using NOT. So cat NOT hat would only return D1. (D2 and D3, while they do have cat in them, would be excluded because they also have some form of the now unwanted hat.)

As of August 2019, on-wiki search does not actually support boolean querying very well. By default, search behaves as if there were an implicit AND between query terms, and supports NOT for negation (you can use the NOT operator, or use a hyphen or an exclamation point). Parentheses are currently ignored, and explicit AND and OR are not well supported, especially when combined.[6]

Three diagrams representing the result of applying various logical operators; A is the circle on the left, B is the circle on the right. The first diagram represents A (AND) NOT B. The second is the inverse of the first: (NOT A) OR B. The third represents A IFF B (“A if and only if B”),[7] which is equivalent to (A AND B) OR (NOT A AND NOT B). I’ve never heard of a search system that supports IFF as a boolean operator; it does not seem like it would normally be useful—but there is probably a lawyer somewhere who knows exactly how they would want to use it.

Billions and Billions

If, as in our toy example above, you only get a handful of search results, their order doesn’t really matter all that much. But on English Wikipedia, for example, Of Cats and Hats gets over 15,000 results—with eight of the top ten all related to The Cat in the Hat. How did those results get to the top of the list?

We’ll again look at the classic search engine approach to this problem. Not all query terms are created equal: stop words are an explicit attempt to address the problem, but filtering them out causes its own problems. For example, “to be or not to be” is made up entirely of likely stop words, and there is a UK band called “The The”.[8] Stop words divide the world into black and white “useful” and “useless” categories, when reality calls for context-dependent shades of grey!

Enter term frequency (TF) and inverse document frequency (IDF), two extremely venerable measures of relevance in information retrieval. First, let’s recall our inverted index information for cat and hat that we have in our example index:

Term frequency is a measure of how many times a word appears in a given document. We see that cat appears in D1 and D2 only once, giving each a TF of 1, while it appears in D3 twice, giving a TF of 2. In some sense, then, D3 is arguably more “about” cats than D1 or D2.

Document frequency is a measure of how many different documents a word appears in. As we stipulated earlier, we have only five documents in our collection: D1–D5. We see that cat appears in 3/5 of all documents, while hat appears in 4/5. Inverse document frequency is the reciprocal of the document frequency (more or less, see below!)—so cat has an IDF of 1.67, while hat has an IDF of 1.25, making cat a slightly more valuable query term than hat.

The simplest version of a TF–IDF score for a word in a given document is just the TF value multiplied by the IDF value. You can then combine each document’s TF–IDF scores for all the words in your query to get a relevance score for each document, hopefully floating the best ones to the top of the results list.

Characteristic, yet distinctive

I like to say that term frequency measures how characteristic a word is of a document, while inverse document frequency is a measure of how distinctive a word is in a corpus—because TF and IDF are not absolute metrics for particular words, but depend heavily on context—i.e., on the corpus they find themselves in.

To see what we mean by “characteristic” and “distinctive”, consider the word the in a corpus of general English documents. It may appear hundreds of times in a given document (like this one!!), making it very characteristic of that document. On the other hand, it probably occurs in about 99% of English documents of reasonable length, so it is not very distinctive. In a multilingual collection of documents, though, with, say, only 5% of documents in English, the would probably be both characteristic and distinctive of English documents because it doesn’t appear in most non-English documents—unless the French are discussing tea (and you are folding diacritics).

Similarly, in a collection of articles on veterinary medicine for household pets, the word cat is probably not very distinctive, while in English Wikipedia it is moderately distinctive, because it appears in only about 10% of documents.

Good stemming can improve the accuracy of TF and IDF scores. For TF, we don’t want to compute that cat is somewhat characteristic of a document, and also compute that cats is somewhat characteristic of the same document when we could combine the counts for the two and say that cat(s) is very characteristic of the document. Similarly for IDF, combining the counts for forms of a word that are somewhat distinctive—e.g., hat appears in some documents, hats appears in some others—may reveal that collectively hat(s) is actually less distinctive than either form alone appears to be.

On English Wikipedia, one of my favorite words—homoglyph—is very characteristic of the article on homoglyphs, where it appears 28 times, while being very distinctive within Wikipedia, because it only occurs in 112 articles (out of almost 6 million).

Speaking of millions of articles, when the numbers start getting big, TF and IDF values can get kind of crazy—especially IDF. While there is probably some minute difference in distinctiveness between being in 10,001 documents rather than 10,005, it’s probably less than the difference between being in 1 document and being in 5. Therefore IDF formulations tend to use a logarithm to help keep their scale in check.

On-wiki, we now use a variant of TF called BM25, which “saturates”, moving quickly towards a maximum value, so that eventually additional instances of a word in a document no longer meaningfully increase the TF score. Really, 48 instances of cat in an article isn’t really that much more “about” cats than 45 instances—either way, that’s a lot of cats.

Simple TF just increases linearly without limit, while BM25 increases much more slowly, and towards a maximum value.

Another complication for TF is document length, especially if—as in Wikipedia—document length can vary wildly. If a document is 10,000 words long and contains cat three times, is that document more or less “about” cat than a 500-word document that contains cat twice? Various versions of length normalizations for TF try to account for this concern (as does BM25).

Math nerds can dig into more of the variants of TF–IDF on Wikipedia.

Modern problems require modern solutions

As I mentioned above, TF–IDF is quite venerable—it dates back to the 1960s! So of course modern search systems use more advanced techniques[9]—though many of these have admittedly been around (and on-wiki) for donkey’s years.

Proximity: Let us again harken back once again to our sample index statistics:

D2 has both cat and hat in it, but they are located at Word #3 and Word #45, respectively. That’s probably the same paragraph, so it’s pretty good. But D3 has them at Word #27 and Word #28—right next to each other! All other things being equal, we might want to give D2 a little boost and D3 a bigger boost in scoring.

Document structure: Query terms that are in the title of an article are probably better matches than those found in the body. Matches in the opening text of a Wikipedia article are probably more relevant than those buried in the middle of the article. Matches in the body are still better than those in the auxiliary text (image captions and the like).

Quotes: Quotes actually get overloaded in many search systems to mean search for these words as a phrase and search for these exact words. Obviously, if you only have one word in quotes—like "cats"—there’s not really a phrase there, so you are only matching the exact word. Searching for "the cats in the hats" is asking for those exact words in that exact order. On English Wikipedia, this gets zero results because searching for the exact words prevents stemmed matches to “the cat in the hat”.

A generally under-appreciated feature of on-wiki search is quotes-plus-tilde: putting a tilde after a phrase in quotes keeps the phrase order requirement, but still allows for stemming and other text processing. So "the cats in the hats"~ will in fact match “the cat in the hat”!

Query-independent features: Sometimes there are good hints about which documents are better than others that don’t depend on the query! For example, the popularity or quality of an article can be used to give those articles a boost in the rankings. (In commercial contexts, other query-independent features—like what’s on sale or overstocked, what you bought last time, or even paid ranking—could also be used to give results a boost.)

Multiple indexes: As mentioned before (in footnote 8 and in previous installments of this series), we actually use multiple indexes for on-wiki search. The “text” index uses all the text processing we’ve described along the way—tokenization, normalization, stop words, and stemming—while the “plain” field only undergoes tokenization and normalization. We can score results from each index independently and then combine them.

Multiple scores: However, when you try to combine all of the useful signals you may have about a document and a query—the TF and IDF of the query terms, where in the document the matches come from (title, opening text, auxiliary text), the popularity of the article, matches from the “text” index, matches from the “plain” index, etc., etc.—it can become very difficult to find the ideal numerical combination of all the weights and factors.

Tuning such a scoring formula can turn into a game of whack-a-mole. Every time you make a small improvement in one area, things get a little worse in another area. Many of the basic ideas that go into a score are clear—a title match is usually better than a match in the body; an exact match is usually better than a stemmed match, a more popular document is probably better than less popular document. The details, however, can be murky—is a title match twice as good as a match in the body, or three times? Is a more popular document 10% better than a less popular document, or 15%? Does popularity scale linearly, or logarithmically?

To prevent any scoring formula from getting totally out of control, you generally have to commit to a single answer to these questions. A very popular article is indeed probably best if the user’s query is just one word. However, if the user’s query is several words and also an exact title match for a much less frequently visited article, then that’s probably the best result. How do you optimize for all of that at once? And will the answer be the same for English and Russian Wikipedia? Chinese? Japanese? Hebrew? (Hint: The answer is “probably not”.)

So what do we do? Enter machine learning…

Machine learning: Machine learning is too big of a topic to even begin to properly address it here. One of the important things to know is that machine learning lets you combine a lot of potential pieces of information in a way that selects the most useful bits. It also combines them in complex and conditional ways—for example, maybe article popularity is less important when you have more search terms.

Detail of The “Tree of Knowledge”. CirrusSearch uses decision trees to build our predictive models for ranking.

And the one of best things about machine learning is that the entire process is largely automated—if you have enough examples of the “right” answer to learn from. The automation also means that you can learn different models for different wikis and you can update your models regularly and adapt to changing patterns in the way people search.

We’ve built, tested, and deployed machine learning–based models for ranking to the 19 biggest Wikipedias (those with >1% of all search traffic): Arabic, Chinese, Dutch, English, Finnish, French, German, Hebrew, Indonesian, Italian, Japanese, Korean, Norwegian, Persian, Polish, Portuguese, Russian, Swedish, and Vietnamese.

Further reading / Homework

If you found footnote 1 and footnote 4 really interesting, read up on the “Use–mention distinction”.

I put together a video back in January of 2018, available on Commons, that covers the Bare-Bones Basics of Full-Text Search. It starts with no prerequisites, and covers tokenization and stemming, inverted indexes, basic boolean and proximity retrieval operations, TF/IDF and the vector space model of similarity, field-level indexing, and using multiple indexes, and then touches on some of the elements of scoring. If you’ve read and enjoyed all the posts in this series, then this video is a great way to reinforce the basic ideas and expand a bit into a few new areas.

A screenshot from “Bare-Bones Basics of Full-Text Search”, showing a worked example of mapping TF–IDF values into a vector space of words for cosine-based similarity comparison—which is some fun stuff we didn’t cover in this blog series. Check it out!


[1] Formatting searches is hard. If I write that a user searched for “Of Cats and Hats,” does that mean they used quotes, which gives very different results? Also—independent of the quotes—was the comma included in the search string? Some people have suggested using slashes for /search strings/. This can give linguists, particularly phoneticians, mild fits, but it also causes a different kind of ambiguity if your search supports regular expressions, like /insource:/cat in the hat/i/. I prefer italics because people can’t normally type in italics in a search box—weird 𝑈𝑛𝑖𝑐𝑜𝑑𝑒 characters notwithstanding—but it still doesn’t solve the problem of a final period or comma!

[2] Similar to the issue of how to format queries in footnote 1, we need to be able to distinguish AND and OR the search operators from and and or the normal words.[3] Typically, we do that by capitalizing them.

[3] Speaking of normal words, and and or in common usage are not as precise as their Boolean counterparts. It’s not that hard to construct sentences where and means OR and or means AND. To wit: “I need a shirt I can wear to work or to a party.” Clearly, if someone then offered you a shirt that you could wear to a party but not to work, you would not be pleased. You are asking for a shirt that can do both (i.e., work AND party). “I have friends who play football, baseball, and soccer professionally.” You almost certainly mean you have friends who each play one of the listed sports (i.e., football OR baseball OR soccer). (Even Bo Jackson only played two of them professionally—and, chances are, you don’t know Bo.)

[4] In the old days—that is, through the 1990s—before public search engines were available on the internet and most search engine users were carefully trained lawyers, librarians, or other scholars, search operators like AND and OR where generally explicit, and queries were sometimes much more complex, with nested parentheses to make up for the lack of normalization, stemming, or thesauri. Thus, (attorney OR attorneys OR lawyer OR lawyers OR law OR laws) AND (dog OR dogs OR cat OR cats OR pet OR pets OR animal OR animals)[5] would be used, where today we expect decent results from pet lawyer—and indeed “Animal law” is the top five results on English Wikipedia as of this writing. (See also footnote 6.)

[5] Another side effect of the lack of stemming was that nouns were easier to search for than verbs—at least in English and many other European languages—because nouns often have many fewer forms. You could easily go over your query length limit trying to fit in several dozen verb forms from a Romance language, for example. Nevertheless this is one kind of query expansion that can be optionally performed when your index doesn’t use stemmed forms of words.

[6] The current (August 2019) situation with on-wiki searching and boolean operators is complex and my advice is to avoid using OR or explicit AND in your queries for now because they can give extremely unintuitive results—especially with keywords and namespaces—as the result of a low-level translation of the binary boolean formalism into an incompatible unary formalism. See more at Help:CirrusSearch/Logical operators on MediaWiki.

[7] The word iff is a weird one. Conjunctions are usually considered a closed class, so adding new ones is rare. In English it’s also generally hard to pronounce iff in a way that distinguishes it from regular if—though ifffffffffff might work. A lot of the strangeness of iff has been calqued into other languages by following a similar recipe—take the word for “if” and double one of the letters, preferably a consonant at the end of the word if one is available—giving the following pairs: Danish and Norwegian hvis/hviss, Finnish jos/joss, French si/ssi, Hebrew אםם/אם, Portuguese se/sse, Romanian dacă/ddacă, Spanish si/sii (bucking the consonant-based trend), and Swedish om/omm.

[8] As mentioned in the previous installment on indexing, on-wiki we alleviate this problem by having multiple indexes. The “text” index uses all the text processing we’ve described along the way—tokenization, normalization, stop words, and stemming—while the “plain” field only undergoes tokenization and normalization, making it possible to find “to be or not to be” and “The The”, especially as titles.

[9] Advances in search have been enabled in part by more sophisticated algorithms, but they have also been helped immeasurably by Moore’s Law. Moore’s Law roughly predicts that computing power doubles every two years; it may have started slowing down in the last few years, but it held true for decades. Thus 50 years of technological improvements have led to computers in the 2010s that are tens of millions of times faster than those of the 1960s. All that extra horsepower makes many more things possible!

To get a little perspective on how much Moore’s Law matters, consider the following. A program that took one minute to run in 2015 might have taken almost 64 years to run on comparable hardware from 1965. So, in 1965 it would actually have been faster to wait 30 years and then run the program on new hardware in 1995, when it would have taken about 17 hours. That’s not exactly fast, but you’d only have to let it run overnight. Ten years later in 2005 and you could probably run it while you get lunch because it would take just over half an hour. If Moore’s Law does manage to hold up through 2025, the wait will be down to a couple of seconds. Exponential growth is amazing.[10]

[10] [!! MATH ALERT !!] For smaller rates of growth—i.e., well under 20% (note that Moore’s Law works out to about 41% per year, so this doesn’t apply)—you can estimate doubling time for r% growth as 72/r. It’s not accurate enough for planning the date of your retirement party, but it’s good for back-of-the-envelope calculations. 3% annual growth will double in 24 years, but in that time 6% growth will double twice—i.e., quadruple.

A graph of 3% interest vs 6% interest growth over 75 years. At 24 years, 6% growth gives roughly double that of 3%. At 72 years—admittedly beyond most investment horizons—6% growth is roughly eight times as much as 3% growth—approximately 64x vs 8x.

New Dashboard feature simplifies student peer review

19:26, Thursday, 05 2019 September UTC

Wiki Education launched the Dashboard in 2015 and it has truly been a game changer for instructors and students running Wikipedia assignments. Since that time, we’ve been steadily improving the Dashboard so that instructors can more easily facilitate their Wikipedia assignments; students can make meaningful and lasting contributions to Wikipedia; and so we here at Wiki Education can provide the most effective and efficient support possible to all of our program participants. This fall, we’ve launched a series of changes that will streamline how and where students do different portions of their Wikipedia assignment.

One of the more challenging aspects of the Wikipedia assignment is understanding Wikipedia’s “namespaces”. It can certainly be confusing to discern sandboxes from main space, and userpages from talk pages. That’s why we’ve now built in special pages for different aspects of the Wikipedia assignment to eliminate these ambiguities, all of which are incorporated into the Dashboard and student trainings.

Two examples of these changes are where students will do their Article Evaluations and their Peer Reviews (two exercises that fall in different weeks in our assignment template). As students walk through these exercises they’ll be instructed to launch special pages that we’ve populated with useful templates.

Students take the article evaluation exercise through the Dashboard once they have a Wikipedia article to work on. It will now take them to a page where they should complete the evaluation.

When students launch these pages, they’ll be presented with clearly laid out instructions on how to critically evaluate Wikipedia articles or how to review their peers’ contributions.

Students are now taken to a subpage on Wikipedia for the “evaluate an article” exercise, which will make the exercise easier for instructors to find and grade.

We’ve developed similar templates where students can house their bibliographies as well as begin their drafts among others.

Another new feature on the Dashboard is the My Articles section that students will see on the Home page of their Dashboard course page. Students can now more easily assign themselves articles to work on, as well as find and review their peer’s work. Students will be able to access many of these templates from the My Articles section so they’ll always know where to find their work.

The My Articles section is visible to students on the Home tab of their Dashboard course page. It helps students keep track of where their drafts are on Wikipedia and where to complete their peer review(s) for classmates.

The Wikipedia assignment is no doubt a departure for most instructors and students, and we hope that the above changes make it a bit less daunting for newcomers to the site!

For more posts about Dashboard feature updates, check out the digital infrastructure category on our blog. To learn more about the Dashboard, visit teach.wikiedu.org.

On a germ trail

04:22, Thursday, 05 2019 September UTC

Hidden away in the little Himalayan town of Mukteshwar is a fascinating bit of science history. Cattle and livestock really mattered a lot in the pre-engine past, especially for transport and power,  on farms and in cities but also and especially for people in power. Hyder Ali and Tipu were famed and feared for their ability to move their guns rapidly, most famously, making use of bullocks, of the Amrut Mahal and Hallikar breeds. The subsequent British conquerors saw the value and maintained large numbers of them, at the Commissariat farm in Hunsur for instance.

The Commissariat Farm, Hunsur
Photo by Wiele & Klein, from: The Queen's Empire. A pictorial and descriptive record. Volume 2.
Cassell and Co. London (1899). [p. 261]
The original photo caption given below, while being racy, was most definitely inaccurate,
these were not maintained for beef :

It is said that the Turkish soldier will live and fight upon a handful of dates and a cup of water, the Greek upon a few olives and a pound of bread—an excellent thing for the commissariats of the two armies concerned, no doubt! But though Turk and Greek will be satisfied with this Spartan fare, the British soldier will not—not if he can help it, that is to say. Sometimes he cannot help it, and then it is only just to him to admit that he bears himself at a pinch as a soldier should, and is satisfied with what he can get. But what the British soldier wants is beef, and plenty of it : and he is a wise and provident commander who will contrive that his men shall get what they want. Here we see that the Indian Government has realised this truth. The picture represents the great Commissariat Farm at Hunsur in Mysore, where the shapely long-horned bullocks are kept for the use of the army.
Report of the cattle plague commission
led by J.H.B. Hallen (1871)

Imagine the situation when cattle die off in their millions - the estimated deaths of cows and buffaloes in 1870 was 1 million. Around 1871, it rang alarm bells high enough to have a committee examining the situation. Britain had had a major "cattle plague" outbreak in 1865 and so the matter was not unknown to the public. The generic term for the mass deaths was "murrain", a rather old-fashioned word that refers to an epidemic disease in sheep and cattle derived from the French word morine, or "pestilence," with roots in Latin mori "to die." A commission headed by Staff Veterinary Surgeon J.H.B. Hallen went across what would best be called the "cow belt" of India and noted among other things that the cattle in the hills were doing better and that rivers helped isolate the disease. Remarkably there were two little-known Indians members - Mirza Mahomed Ali Jan (a deputy collector) and Hem Chunder Kerr (a magistrate and collector). The report includes 6 maps with spots where the outbreaks occurred in each year from 1860 to 1866 and the spatial approach to epidemiology is dominant. This is perhaps unsurprising given that the work of John Snow would have been fresh in medical minds. One point in the report that caught my eye was "Increasing civilization, which means in India clearing of jungle, making of roads, extended agriculture, more communication with other parts, buying and selling, &c, provides greater facilities for the spread of contagious diseases of stock." The committee identified the largest number of deaths to be caused by rinderpest. Rinderpest has a very long history and the its attacks in Europe are quite well documented. There had been two veterinary congresses in Europe that looked at rinderpest. One of the early researchers was John Burdon Sanderson (a maternal grand-uncle of J.B.S. Haldane) who noted that the blood of infected cattle was capable of infecting others even before the source individual showed any symptoms of the disease. He also examined the relationship to smallpox and cowpox through cross-vaccination and examination for resistance. C.A. Spinage in his brilliant book (but with a European focus) on The Cattle Plague - A History (2003) notes that rinderpest belongs to the Paramyxoviruses, a morbillivirus which probably existed in Pleistocene Bovids and perhaps the first relative that jumped to humans was measles, and was associated with the domestication of cattle. The English believed that the origin of rinderpest lay in Russia. The Russians believed it came from the Mongols.
Gods slaandehand over Nederland, door de pest-siekte onder het rund vee
[God's lashing hand over the Netherlands, due to the plague disease among cattle]
Woodcut by Jan Smits (1745) - cattle epidemics evoked theological explanations
The British government made a grant of £5,000 in 1865 for research into rinderpest which was apparently the biggest ever investment in medical research upto that point of time. This was also a period when there was epidemic cholera epidemic, mainly affecting the working class, and it was noted that hardly any money was spent on it. (Spinage:328) The result of the rewards was that a very wide variety of cures were proffered and Spinage provides an amusing overview. One cure claim came from a Mr. M. Worms of Ceylon and involved garlic, onion, and asafoetida. Worms was somehow related to Baron Rothschild and the cure was apparently tested on some of Rothschild's cattle with some surprising recoveries. Inoculation as in small pox treatments were tried by many and they often resulted in infection and death of the animals.

As for the India scene, it appears that the British government did not do much based on the Hallen committee report. There were attempts to regulate the movement of cattle but it seems that the idea that it could be prevented through inoculation or vaccination had to wait. In the 1865 outbreak in Britain, one of the control measures was the killing and destruction of infected cattle at the point of import. This finally brought an end to outbreaks in 1867. Several physicians in India tried experiments in inoculation. In India natural immunity was noted and animals that overcame the disease were valued by their owners. In India natural immunity was noted and animals that overcame the disease were valued by their owners. In 1890 Robert Koch was called into service in the Cape region on the suggestion of Dr J. Beck. In 1897 Koch announced that bile from infected animals could induce resistance on inoculation. Koch was then sent on to India to examine the plague leaving behind a William Kolle to continue experiments in a disused mine building at Kimberley belonging to the De Beers. Around the same time experiments were conducted by Herbert Watkins-Pitchford and Arnold Theiler who found that serum from cattle that recovered worked as an effective inoculation. They however failed to publish and received little credit. Koch, a German, beating the English researchers was a cause of hurt pride.

The Brown Institution was destroyed in 1944
by German bombing
Interesting to see how much national pride was involved in all this. The French had established an Imperial Bacteriological Institute at Constantinople with Louis Pasteur as their leading light. This was mostly headed by Pasteur Institute Alumni. Maurice Nicolle and Adil-Bey were involved in rinderpest research. They demonstrated that the causal agent was small enough to pass through bacterial filters. In India, Alfred Lingard was chosen in 1890 to examine the problems of livestock diseases and to find solutions. Lingard had gained his research experience at the Brown Animal Sanatory Institution - whose workers included John Burdon Sanderson. About six years earlier, Robert Koch, had caused more embarrassment to the British establishment by identifying the cholera causing bacteria in Calcutta. Koch had however not demonstrated that his bacteria isolate could cause disease in uninfected animals - thereby failing one of the required tests for causality that now goes by the name of Koch's postulates. There were several critiques by British researchers who had been working for a while on cholera in India - these included David Douglas Cunningham (who was also a keen naturalist and wrote a couple of general natural history books as well) and T.R. Lewis (who had spent some time with German researchers).  The British government (the bureaucrats were especially worried about quarantine measures for cholera and had a preference for old-fashioned miasma theories of disease) felt the need for a committee to examine the conflict between the English and German claims - and they presumably chose someone with a knowledge of German for it -  Emanuel Edward Klein assisted by Heneage Gibbes. Klein was also from the Brown Animal Sanatory Institution and had worked with Burdon Sanderson. Now Klein, the Brown Institution, Burdon Sanderson and many of the British physiologists had come under the attack of the anti-vivisection movement. During the court proceedings that examined claims of cruelty to animals by the anti-vivisectionists, Klein, an east European (of Jewish descent) with his poor knowledge of English had made rather shocking statements that served as fodder for some science fiction written in that period with evil characters bearing a close resemblance to Klein! Even Lingard had been accused of cruelty, feeding chickens with the lungs of tuberculosis patients, to examine if the disease could be transmitted. E.H. Hankin, the man behind the Ganges bacteriophages had also been associated with the vivisection-researchers and the British Indian press had even called him a vivisector who had escaped to India.

Lingard initially worked in Pune but he found the climate unsatisfactory for working on anti-rinderpest sera. In 1893 he moved the laboratory in the then remote mountain town of Mukteshwar (or Muktesar as the British records have it) and his first lab burnt down in a fire. In 1897 Lingard invited Koch and others to visit and Koch's bile method was demonstrated. The institution, then given the grand name of Imperial Bacteriological Laboratory was rebuilt and it continues to exist as a unit of the Indian Veterinary Research Institute. Lingard was able to produce rinderpest serum in this facility - producing 468,853 doses between 1900 and 1905 and the mortality of inoculated cattle was as low as 0.43%. The institute grew to produce 1,388,560 doses by 1914-15. Remarkably, several countries joined hands in 1921 to attack rinderpest and other livestock diseases and it is claimed that rinderpest is now the second virus (after smallpox) to have been eradicated. The Muktesar institution and its surroundings were also greatly modified with dense plantations of deodar and other conifers. Today this quiet little village centered around a temple to Shiva is visited by waves of tourists and all along the route one can see the horrifying effects of land being converted for housing and apartments.

The Imperial Bacteriological Laboratory c. 1912 (rebuilt after the fire)
In 2019, the commemorative column can be seen.
Upper corridor
A large autoclave made by Manlove & Alliott, Nottingham.
Stone marker
A cold storage room built into the hillside
Koch in 1897 at Muktesar
Seated: Lingard, Koch, Pfeiffer, Gaffky

The habitat c. 1910. One of the parasitologists, a Dr Bhalerao,
described parasites from king cobras shot in the area.

The crags behind the Mukteshwar institute, Chauli-ki-Jhali, a hole in a jutting sheet of rock (behind and not visible)
is a local tourist attraction.
Here then are portraits of three scientists who were tainted in the vivisection debate in Britain, but who were able to work in India without much trouble.
E.H. Hankin

Alfred Lingard

Emanuel Edward Klein

The cattle plague period coincides nicely with some of the largest reported numbers of Greater Adjutant storks and perhaps also a period when vultures prospered, feeding on the dead cattle. We have already seen that Hankin was quite interested in vultures. Cunningham notes the decline in adjutants in his Some Indian Friends and Acquaintances (1903). The anti-vivisection movement, like other minority British movements such as the vegetarian movement, found friends among many educated Indians, and we know of the participation of such people as Dr Pranjivan Mehta in it thanks to the work of the late Dr. S. R. Mehrotra. There was also an anti-vaccination movement, and we know it caused (and continues to cause) enough conflict in the case of humans but there appears to be little literature related to opposition to their use on livestock in India.

Further reading
Thanks are due to Dr Muthuchelvan and his colleague for an impromptu guided tour of IVRI, Mukteshwar.
The Imperial Bacteriologist - Alfred Lingard in this case in 1906 - was apparently made "Conservator" for the "Muktesar Reserve Forest" and the 10 members of the "Muktesar Shikar Club" were given exemption from fees to shoot carnivores on their land in 1928. See National Archives of India document.
Klein, Gibbes and D.D. Cunningham were also joined by H.V. Carter (who contributed illustrations to Gray's Anatomy - more here).

Perf Matters at Wikipedia 2015

19:51, Wednesday, 04 2019 September UTC

Hello, WANObjectCache

This year we achieved another milestone in our multi-year effort to prepare Wikipedia for serving traffic from multiple data centres.

The MediaWiki application that powers Wikipedia relies heavily on object caching. We use Memcached as horizontally scaled key-value store, and we’d like to keep the cache local to each data centre. This minimises dependencies between data centres, and makes better use of storage capacity (based on local needs).

Aaron Schulz devised a strategy that makes MediaWiki caching compatible with the requirements of a multi-DC architecture. Previously, when source data changed, MediaWiki would recompute and replace the cache value. Now, MediaWiki broadcasts “purge” events for cache keys. Each data centre receives these and sets a “tombstone”, a marker lasting a few seconds that limits any set-value operations for that key to a miniscule time-to-live. This makes it tolerable for recache-on-miss logic to recompute the cache value using local replica databases, even though they might have several seconds of replication lag. Heartbeats are used to detect the replication lag of the databases involved during any re-computation of a cache value. When that lag is more than a few seconds (a large portion of the tombstone period), the corresponding cache set-value operation automatically uses a low time-to-live. This means that large amounts of replication lag are tolerated.

This and other aspects of WANObjectCache’s design allow MediaWiki to trust that cached values are not substantially more stale, than a local replica database; provided that cross-DC broadcasting of tiny in-memory tombstones is not disrupted.

First paint time now under 900ms

In July we set out a goal: improve page load performance so our median first paint time would go down from approximately 1.5 seconds to under a second – and stay under it!

I identified synchronous scripts as the single-biggest task blocking the browser, between the start of a page navigation and the first visual change seen by Wikipedia readers. We had used async scripts before, but converting these last two scripts to be asynchronous was easier said than done.

There were several blockers to this change. Including the use of embedded scripts by interactive features. These were partly migrated to CSS-only solutions. For the other features, we introduced the notion of “delayed inline scripts”. Embedded scripts now wrap their code in a closure and add it to an array. After the module loader arrives, we process the closures from the array and execute the code within.

Another major blocker was the subset of community-developed gadgets that didn’t yet use the module loader (introduced in 2011). These legacy scripts assumed a global scope for variables, and depended on browser behaviour specific to serially loaded, synchronous, scripts. Between July 2015 and August 2015, I worked with the community to develop a migration guide. And, after a short deprecation period, the legacy loader was removed.

Line graph that plots the firstPaint metric for August 2015. The line drops from approximately one and a half seconds to 890 milliseconds.

Hello, WebPageTest

Previously, we only collected performance metrics for Wikipedia from sampled real-user page loads. This is super and helps detect trends, regressions, and other changes at large. But, to truly understand the characteristics of what made a page load a certain way, we need synthetic testing as well.

Synthetic testing offers frame-by-frame video captures, waterfall graphs, performance timelines, and above-the-fold visual progression. We can run these automatically (e.g. every hour) for many urls, on many different browsers and devices, and from different geo locations. These tests allow us to understand the performance, and analyse it. We can then compare runs over any period of time, and across different factors. It also gives us snapshots of how pages were built at a certain point in time.

The results are automatically recorded into a database every hour, and we use Grafana to visualise the data.

In 2015 Peter built out the synthetic testing infrastructure for Wikimedia, from scratch. We use the open-source WebPageTest software. To read more about its operation, check Wikitech.

The journey to Thumbor begins

Gilles evaluated various thumbnailing services for MediaWiki. The open-source Thumbor software came out as the most promising candidate.

Gilles implemented support for Thumbor in the MediaWiki-Vagrant development environment.

To read more about our journey to Thumbor, read The Journey to Thumbor (part 1).

Save timing reduced by 50%

Save timing is one of the key performance metrics for Wikipedia. It measures the time from when a user presses “Publish changes” when editing – until the user’s browser starts to receive a response. During this time, many things happen. MediaWiki parses the wiki-markup into HTML, which can involve page macros, sub-queries, templates, and other parser extensions. These inputs must be saved to a database. There may also be some cascading updates, such as the page’s membership in a category. And last but not least, there is the network latency between user’s device and our data centres.

This year saw a 50% reduction in save timing. At the beginning of the year, median save timing was 2.0 seconds (quarterly report). By June, it was down to 1.6 seconds (report), and in September 2015, we reached 1.0 seconds! (report)

Line graph of the median save timing metric, over 2015. Showing a drop from two seconds to one and a half in May, and another drop in June, gradually going further down to one second.

The effort to reduce save timing was led by Aaron Schulz. The impact that followed was the result of hundreds of changes to MediaWiki core and to extensions.

Deferring tasks to post-send

Many of these changes involved deferring work to happen post-send. That is, after the server sends the HTTP response to the user and closes the main database transaction. Examples of tasks that now happen post-send are: cascading updates, emitting “recent changes” objects to the database and to pub-sub feeds, and doing automatic user rights promotions for the editing user based on their current age and total edit count.

Aaron also implemented the “async write” feature in the multi-backend object cache interface. MediaWiki uses this for storing the parser cache HTML in both Memcached (tier 1) and MySQL (tier 2). The second write now happens post-send.

By re-ordering these tasks to occur post-send, the server can send a response back to the user sooner.

Working with the database, instead of against it

A major category of changes were improvements to database queries. For example, reducing lock contention in SQL, refactoring code in a way that reduces the amount of work done between two write queries in the same transaction, splitting large queries into smaller ones, and avoiding use of database master connections whenever possible.

These optimisations reduced chances of queries being stalled, and allow them to complete more quickly.

Avoid synchronous cache re-computations

The aforementioned work on WANObjectCache also helped a lot. Whenever we converted a feature to use this interface, we reduced the amount of blocking cache computation that happened mid-request. WANObjectCache also performs probabilistic preemptive refreshes of near-expiring values, which can prevent cache stampedes.

Profiling can be expensive

We disabled the performance profiler of the AbuseFilter extension in production. AbuseFilter allows privileged users to write rules that may prevent edits based on certain heuristics. Its profiler would record how long the rules took to inspect an edit, allowing users to optimise them. The way the profiler worked, though, added a significant slow down to the editing process. Work began later in 2016 to create a new profiler, which has since completed.

And more

Lots of small things. Including the fixing of the User object cache which existed but wasn’t working. And avoid caching values in Memcached if computing them is faster than the Memcached latency required to fetch it!

We also improved latency of file operations by switching more LBYL-style coding patterns to EAFP-style code. Rather than checking whether a file exists, is readable, and then checking when it was last modified – do only the latter and handle any errors. This is both faster and more correct (due to LBYL race conditions).

So long, Sajax!

Sajax was a library for invoking a subroutine on the server, and receiving its return value as JSON from client-side JavaScript. In March 2006, it was adopted in MediaWiki to power the autocomplete feature of the search input field.

The Sajax library had a utility for creating an XMLHttpRequest object in a cross-browser-compatible way. MediaWiki deprecated Sajax in favour of jQuery.ajax and the MediaWiki API. Yet, years later in 2015, this tiny part of Sajax remained popular in Wikimedia's ecosystem of community-developed gadgets.

The legacy library was loaded by default on all Wikipedia page views for nearly a decade. During a performance inspection this year, Ori Livneh decided it was high time to finish this migration. Goodbye Sajax!

Further reading

This year also saw the switch to encrypt all Wikimedia traffic with TLS by default. More about that on the Wikimedia blog.

— Timo Tijhof

Last week to sign up for September Wikidata courses!

16:29, Wednesday, 04 2019 September UTC

It’s the last week to sign up for our September Wikidata how-to courses! Virtual sessions meet once a week for an hour and dive into how to use, as well as contribute to, the global data repository that is Wikidata. Over six weeks, course participants learn how open data practices best align with their professional and organizational goals. We have beginning and intermediate courses both starting the third week of September. Register below by September 9th!

Enroll with a colleague and get $200 off the $800 course price.

Join the Open Data Movement

An online course for beginners to linked data, or for those looking for a curriculum that covers data ethics, the advantages of linked data, and an overview of Wikidata policies.

  • Register here for: Tuesdays 10-11am PST, September 17 – October 22, 2019

Elevate your Collections

An online course for anyone already familiar with linked data or Wikidata, or those looking for a project-based course that explores specific Wikidata tools and approaches.

  • Register here for: Wednesdays 11am-12pm PST, September 18 – October 23, 2019

These courses are relevant for…

People working with data and collections across industries are turning to Wikidata to accomplish their organizational missions. Organizational use cases of Wikidata are varied, but a common thread is the fact that Wikidata is one of the most direct ways to get accurate information to people who are looking for it. It’s machine readable, which means that digital assistants, AI, bots, and scripts all pull from Wikidata to answer users’ questions.

“Adding and editing content in Wikidata can raise the prominence of factual knowledge and improve the visibility of marginalized groups and knowledge.”

Library organizations have taken note. The ARL, IFLA, and the PCC have all identified Wikidata as being of strategic importance to librarians and the future of library collections.

Museums are excited about Wikidata too. The Met, the Cleveland Museum of Art, and MOMA have embarked upon Wikidata projects that have generated excitement.

Consider how opening your data through Wikidata will elevate access to your collections. Whether it’s gallery data, archival data, civic data or another collection, revamping your institution’s metadata practices through the use of Wikidata will help support the open data movement. We’ll help you or your staff identify what projects are best suited to your goals.

For more information about our Wikidata course offerings, visit data.wikiedu.org.

Wiki Loves Monuments 2019 Is Now Open!

23:14, Tuesday, 03 2019 September UTC

Wikimedia volunteers around the world started the 10th edition of Wiki Loves Monuments on September 1st, 2019. This is the 10th edition of WLM, a remarkable achievement for a competition that has garnered the status of “the world’s largest open photography contest” in Guinness World Records. We are excited and look forward to running another successful round of adding monument(al) photos from all over the world to Wikipedia. Photos submitted through WLM illustrate the more than 1.4 million monuments on Wikipedia and help more people around the world to learn about history and national heritage of all participating countries.

WLM is for everyone! If you ever wondered how to start giving back to the wealth of knowledge on Wikipedia that all of us use on a daily basis, this is a great way to start. Everybody can join the competition by submitting a photograph of a nationally registered monument on Wikimedia Commons before September 30, following the instructions for each country. You can participate in as many national competitions as you wish. The national and international winning entries in WLM normally enjoy exposure by making national and international headlines.

Wiki Loves Monuments is an annual photo competition celebrating built cultural heritage. It is organized by volunteers around the world, and up to top ten photographs from each country are selected for an international finale. The winners from 2018 included a mosque in Iran, ruins of an ancient city in Jordan, a cathedral in the United Kingdom, an abandoned factory in Poland and other photos from Bangladesh, Russia, Romania, Nepal, Denmark and elsewhere. These may be regular sights for some people, but thanks to Wiki Loves Monuments photographers, they will be more documented on Wikipedia and more accessible to everyone around the world, free of cost, forever.

Over the years, WLM has not only collected millions of photos for Wikipedia, it has also formed communities, strengthened local Wikipedia-related activities and brought Photography and Wikipedia enthusiast together. WLM photos are not just random photos. They come with personal stories, some of which from 2018 winners you can read here. They help preserve monuments and occasionally bring awareness to the ones that are endangered.

Wiki Loves Monuments is built on three simple criteria. First, all photos are freely licensed, like all other contributions to Wikipedia and Wikimedia Commons. By giving permission to the public to share these photos, it ensures that the results can remain widely available forever. Second, all photos must contain an identified monument, e.g., a building or art of historic significance – we want to know what heritage is on the photo, so that we can actually use it. Each country maintains a list of registered historic sites that are eligible for the competition. Third, the photo must be uploaded in the month of September. You are always welcome to contribute your photography to Wikimedia Commons, but photos uploaded before or after the month of September may not be considered for the competition. If you would like more details on Wiki Loves Monuments in your country, you can visit wikilovesmonuments.org/participate.

Whether you are going on a trip to visit someplace new, share this wonderful photo on a holiday trip many years ago or take a quick picture of a landmark where you live, we’re excited to see your photographs.

Good luck!

Production Excellence: July 2019

15:54, Tuesday, 03 2019 September UTC

How’re we doing on that strive for operational excellence? Read this first anniversary edition to find out!

📊 Month in numbers
  • 5 documented incidents. [1]
  • 53 new Wikimedia-prod-error reports. [2]
  • 44 closed Wikimedia-prod-error reports. [3]
  • 218 currently open Wikimedia-prod-error reports in total. [4]

The number of recorded incidents over the past month, at five, is equal to the median number of incidents per month (2016-2019). – Explore this data.

To read more about these incidents, their investigations, and pending actionables; check Incident documentation § 2019.

*️⃣One year of Excellent adventures!

Exactly one year ago this periodical started to provide regular insights on production stability. The idea was to shorten the feedback cycle between deployment of code that leads to fatal errors and the discovery of those errors. This allows more people to find reports earlier, which (hopefully) prevents them from sneaking into a growing pile of “normal” errors.

576 reports were created between 15 July 2018 and 31 July 2019 (tagged Wikimedia-prod-error).
425 reports got closed over that same time period.

Read the first issue in story format, or the initial e-mail.

📉 Outstanding reports

Take a look at the workboard and look for tasks that might need your help. The workboard lists error reports, grouped by the month in which they were first observed.


Or help someone who already started with their patch:
Open prod-error tasks with a Patch-For-Review

Breakdown of recent months (past two weeks not included):

  • November: 1 report left (unchanged). ⚠️
  • December: 3 reports left (unchanged). ⚠️
  • January: 1 report left (unchanged). ⚠️
  • February: 2 reports left (unchanged). ⚠️
  • March: 4 reports left (unchanged). ⚠️
  • April: 10 of 14 reports left (unchanged). ⚠️
  • May: 2 reports got fixed! (4 of 10 reports left). ❇️
  • June: 2 reports got fixed! (9 of 11 reports left). ❇️
  • July: 18 new reports from last month remain unsolved.

🎉 Thanks!

Thank you to @aaron, @Anomie, @ArielGlenn, @Catrope, @cscott, @Daimona, @dbarratt, @dcausse, @EBernhardson, @Jdforrester-WMF, @jeena, @MarcoAurelio, @SBisson, @Tchanders, @Tgr, @tstarling, @Urbanecm; and everyone else who helped by finding, investigating, or resolving error reports in Wikimedia production. Thanks!

Until next time,

– Timo Tijhof

Quote: 🎙 “Unlike money, hope is for all: for the rich as well as for the poor.”


[1] Incidents. – wikitech.wikimedia.org/wiki/Special:PrefixIndex?prefix=Incident…

[2] Tasks created. – phabricator.wikimedia.org/maniphest/query…

[3] Tasks closed. – phabricator.wikimedia.org/maniphest/query…

[4] Open tasks. – phabricator.wikimedia.org/maniphest/query…

Not content with liveblogging the ALTC keynotes, gasta sessions and AGM, I’m also going to be taking part in two presentations and one panel.  Yikes!  So if you’re interested in learning why Wikimedia belongs in education, how to develop an academic blogging service based on trust and openness, and supporting creative engagement through open education, why not come along and join us 🙂

Wikipedia belongs in education: Principles and Practice

Wikipedia belongs in educationTuesday Sep 3 2019, 2:45pm – 3:45pm, Room 2.14
Lucy Crompton-Reid, Ewan McAndrew, and Lorna Campbell

This panel session, featuring short presentations and audience Q&A, will outline the thinking and research that underpins Wikimedia UK’s education programme, present some of the work that’s been delivered as part of this programme over the past few years, and discuss opportunities for future educational partnerships. We’ll also highlight the ways that you can get involved in this work at an individual and/or institutional level, and the benefits of working with Wikimedia in education.

Read more.

Supporting Creative Engagement and Open Education at the University of Edinburgh 

Thursday Sep 5 2019, 12:15pm – 1:15pm, McEwan Hall
Lorna Campbell, Stephanie (Charlie) Farley, and Stewart Cromar

This joint presentation will introduce the University of Edinburgh’s vision and strategy for OER and playful engagement, showcase examples of some of the playful approaches we employ, demonstrate how these help to foster creative approaches to teaching, learning and engaging with our collections, and reflect critically on researching their effectiveness.  Come along and see real world examples of how supporting openness and playful engagement at the institutional level can foster creativity and innovation, and gain inspiration about how these approaches could be used in your own contexts and institution. You’ll also be able to pick up one of our free “We have great stuff” OER colouring books! 

Read more

Influential voices – developing a blogging service based on trust and openness 

Thursday Sep 5 2019, 2:00pm – 3:00pm, Room 2.14
Karen Howie and Lorna Campbell

This presentation will reflect on the first year year of the University of Edinburgh’s new Academic Blogging Service.  We worked closely with academic colleagues, to take a broad view of the different uses of blogs, including reflective blogging, writing for public audiences, group blogging and showcasing research to develop a new academic blogging service that launched in October 2018. The service incorporates existing tools (inc. those built into our VLE and portfolio platforms), improved documentation, new digital skills workshops and materials, and a brand new centrally supported WordPress platform (blogs.ed.ac.uk) to support types of blogging that were not well catered for previously. The philosophy of our new blogging platform was to start from a position of openness and trust, allowing staff and students to develop their own voices.  Come along to learn more about our Academic Blogging Service and find out about the free and open resources we developed along the way.

Learn more. 

Look forward to seeing you at ALTC! 

Tech News issue #36, 2019 (September 2, 2019)

00:00, Monday, 02 2019 September UTC
TriangleArrow-Left.svgprevious 2019, week 36 (Monday 02 September 2019) nextTriangleArrow-Right.svg
Other languages:
Bahasa Indonesia • ‎Deutsch • ‎English • ‎Tiếng Việt • ‎français • ‎polski • ‎português do Brasil • ‎suomi • ‎čeština • ‎Ελληνικά • ‎русский • ‎українська • ‎עברית • ‎العربية • ‎فارسی • ‎中文 • ‎日本語

weeklyOSM 475

17:28, Saturday, 31 2019 August UTC


lead picture

OSM-UK’s current quarterly project: mapping solar panels 1 | data © OpenstreetMap contributors


  • Vespucci now allows the creation of custom presets on-the-fly. Simon Poole tweets a short demonstration showing how it can be done.
  • Thejesh GN put out a call for volunteers to map CCTV cameras in West and South Bangalore.
  • Joseph Eisenberg is asking for comments on the campsite properties proposal. The proposal aims to allow adding more information about camping areas.
  • SK53 provided an update on OSM-UK’s current quarterly project: mapping solar panels. As of 23rd August around 67,500 solar installations had been mapped. It is expected that over 10% of rooftop installations and substantially more than half of solar farms will be in OSM at the end of the quarter.
  • Apple is going to start paid/directed editing in Malaysia, and is looking for community feedback.
  • EmBH suggested (automatic translation) that isced:level could be automatically added to schools in Germany. They suggested a Python script could add levels, based on the names of the features. A discussion of the merits and the foreseen difficulties of the proposal followed.
  • AkuAnakTimur, a local active mapper in Malaysia, has collected GPS traces and is attempting to help GlobalLogic/Grab paid mappers with their common mistakes of tagging and geometry.
  • Lübeck suggests (de) to look for hydrogen filling stations in your local area as he noticed missing stations when compared with h2.live.


OpenStreetMap Foundation

  • OpenStreetMap Ireland, OpenStreetMap Česká, and OSGeo Oceania have applied to become local chapters of the OpenStreetMap Foundation.
  • The OSMF Board released the minutes of their face-to-face meeting, which was held on 18 and 19 May in Brussels.


Humanitarian OSM

  • HOT has been experimenting with how machine learning could help them provide a better mapping experience and produce more accurate and effective maps for people responding to crises. HOT is hoping that, by using AI to estimate the size and difficulty of a mapping task, they can split the workload into quadrants that are of equal effort to complete.


  • citylab.com features an article about the Thomas Guide, a street map of Los Angeles with 3000 spiral-bound pages that was required to navigate before the era of electronic aids.
  • After a nostalgic article by Lora Bliss, Ilya Zverev remembered (ru) (automatic translation) his box of printed maps, and wondered, why we don’t spend as much time map-gazing nowadays. He outlined issues with mobile guidance and popular web maps, and found that besides satellite imagery, OpenStreetMap is the only map suited for a thorough in-depth exploring.


  • The German Environment Agency (Umweltbundesamt) has produced an online service that shows the air quality across Germany, displayed on an OpenStreetMap base map. The same data can be tracked, in real-time, in a new smartphone app (automatic translation).

Open Data

  • Guido Gehrt interviewed (automatic translation) Professor Paul Becker, the recently appointed president of the German Federal Agency for Cartography and Geodesy . The interview covers why the Professor chose to take up the role and what his goals for new position. Of interest was his comments about how they should make more use of OpenStreetMap data, given that it is produced quickly in large quantities by highly motivated mappers. The Professor’s view is that OpenStreetMap is a data treasure that cannot remain untapped.
  • With the Geological Data Act, the German Federal Ministry of Economics and Energy has a legislative project (automatic translation) in the pipeline with which geological data will have to be made available to the public by the state geological offices on a far larger scale than hitherto. The law says nothing about licences and costs for citizens.


  • Christoph Hanser announced that the Trufi App is now Open Source. The app, developed by the Trufi Association, provides public transport information in Cochabamba, Bolivia. Please spread the info so that cities in emergent countries can make use of this public transport app.


  • gravitystorm opened a pull request on GitHub as he’d like OSM to support multiple API versions, so that API 0.7 can be deployed while still using API 0.6. The API 0.7 is something that has been discussed since 2009 and the wishlist for a new version is very long.
  • Is OpenStreetCam closed source now? Apparently not, summer holidays and a small team make it hard to keep the GitHub part up to date.
  • The University of Washington and yes! magazine report about a new OSM-based map and routing app for pedestrians in Seattle. It is intended to help walkers to avoid hills, construction, and barriers to accessibility. The team is also working on creating a set of standards and toolkits that can improve the mapping of detailed, real-world conditions on pedestrian pathways and intersections. Examples include: difficult sidewalk widths, problematic surfaces. or the existence of ramps, handrails and lighting.


  • JOSM has reached version 15322 i.e. release 19.08. The most obvious change is the new logo. The new version allows the addition of changeset tags via remote control, improves the display with new MapCSS and Mappaint functions and -as usually- includes many many more enhancements.
  • Version 2.15.5 of iD has been released. A list of the improvements can be found in the GitHub comments.
  • beaconeer publishing has released a major update of its free POI finder iOS app AnyFinder. The app now caches collections of POIs locally which greatly speeds up the user experience when selecting the kind of POI to search for. OSM tags are now updated dynamically so the company can add support for new tags without having to release a new version of the app. AnyFinder 2.0 currently finds 88 different kinds of POIs and supports 63 OSM tags and 198 dedicated tag values.
  • You want to know which software is available for OSM? Here is the contact point with the current versions.

Did you know …

  • … how to tag the times a restaurant serves food? If they are different to the regular opening_hours, these can be tagged as opening_hours:kitchen.
  • … that there is an atlas of German road accidents? Not all states report the location of road accidents, hence the seemingly perfect driving records of, for example, Düsseldorfers.

OSM in the media

  • Andreas Baum and the rest of the team at the Berliner Taggespiegel have taken a deep dive (automatic translation) into the recently released accident data for Berlin. Along with extensive statistical analysis you can view every single site of an accident on an OpenStreetMap base map.

Other “geo” things

  • The United States is a large and varied country. Inzitarie has attempted the impossible: to produce a map of the most distinctive “cultural regions” of the USA.
  • The Austrian National Library is using (de) crowdsourcing to categorise, tag and geolocate 5,000 digitized historical aerial photographs from the 1930s.
  • Do you know the expression “All roads lead to Rome!”? The website Move_Lab has developed the project Roads To Rome which proves that this is true. (we reported 282, 284, 285, 415) All roads to Rome, and all other routes calculated during the course of this project have been navigated by the GraphHopper. The routing was based on OpenStreetMap data.

Upcoming Events

Where What When Country
Suva Missing Maps Fiji 2019-08-27-2019-08-30 fiji
Dortmund Mappertreffen 2019-08-30 germany
Galway Galway mapping party 2019-08-31 ireland
La Balme-de-Sillingy Incontro comunità piemontese 2019-08-31 italy
Maebashi オープンストリートマップセミナーとマッピングパーティ 2019-09-01 japan
London Missing Maps Mapathon London 2019-09-03 united kingdom
Stuttgart Stuttgarter Stammtisch 2019-09-04 germany
Bochum Mappertreffen 2019-09-05 germany
Dresden Stammtisch Dresden 2019-09-05 germany
Montrouge Rencontre des contributeurs de Montrouge et alentours 2019-09-05 france
Nantes Réunion mensuelle 2019-09-05 france
Minneapolis State of the Map U.S. 2019 [1] 2019-09-06-2019-09-08 united states
Salt Lake City SLC GeoBeers 2019-09-10 united states
Hamburg Hamburger Mappertreffen 2019-09-10 germany
Leoben Stammtisch Obersteiermark 2019-09-12 austria
Munich Münchner Stammtisch 2019-09-12 germany
Berlin 135. Berlin-Brandenburg Stammtisch 2019-09-12 germany
San José Civic Hack Night & Map Night 2019-09-12 united states
Bratislava Missing Maps mapathon Bratislava #7 2019-09-16 slovakia
Habay Rencontre des contributeurs du Pays d’Arlon 2019-09-16 belgium
Cologne Bonn Airport Bonner Stammtisch 2019-09-17 germany
Lüneburg Lüneburger Mappertreffen 2019-09-17 germany
Reading Reading Missing Maps Mapathon 2019-09-17 united kingdom
Edinburgh FOSS4GUK 2019 2019-09-18-2019-09-21 united kingdom
Heidelberg Erasmus+ EuYoutH OSM Meeting 2019-09-18-2019-09-23 germany
Heidelberg HOT Summit 2019 2019-09-19-2019-09-20 germany
Heidelberg State of the Map 2019 [2] 2019-09-21-2019-09-23 germany
Prizren State of the Map Southeast Europe 2019-10-25-2019-10-27 kosovo
Dhaka State of the Map Asia 2019 2019-11-01-2019-11-02 bangladesh
Wellington FOSS4G SotM Oceania 2019 2019-11-12-2019-11-15 new zealand
Grand-Bassam State of the Map Africa 2019 2019-11-22-2019-11-24 ivory coast

Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please check your event in our public calendar preview and correct it, where appropriate.

This weeklyOSM was produced by Nakaner, Rogehm, SK53, SunCobalt, TheSwavu, YoViajo, derFred, keithonearth.

Value ranking of Wikimedia contributions

09:44, Saturday, 31 2019 August UTC

Consider all of the people who engage in activities in the Wikimedia platforms. The Wikimedia community values some activities more than others. I want to share my own view of what I think are the community assessments for most valuable engagement.

The most valuable contribution, according to community esteem, is sharing mainspace general encyclopedic content, which is in the most popular article, and also in an article which has existed for the longest time but been the poorest quality. The edit is more impressive when a large number of other editors have been trying to edit that article but failing to find consensus. The contribution should cite the most authoritative reliable source, and is more useful in a situation where that source was not previously identified in the article. Perhaps this editing contribution in this one article solves a challenge in an entire field of articles, for example, perhaps lots of other articles have attempted to describe some specific issue, but now that there is a summary of the general concept in a higher level article, all of those other articles can forego explaining the issue in favor of linking to the general article for the centralized explanation.

I personally am attracted to a variation on this theme, which is topic discovery of some topic in high demand, but for which there is currently no Wikipedia article. This situation occurs when people are discussing a concept which either the media and sources call by various names, or for which there is no name at all. Some articles which I created for odd topics which were popular but had no names include Facebook–Cambridge Analytica data scandal, 2G spectrum case, Antibiotic use in livestock, Health information on Wikipedia, and Pollution of the Hudson River. What all these topics have in common is that they all needed the Wikipedia editorial process to give a name to the concept. Topic discovery is one of the forms of original research which is appropriate to publish first in Wikipedia. I get impressed when I see popular articles with these kinds of arbitrary titles which someone had to create. In the case of the 2G spectrum case and the Facebook scandal, before I created those articles, the media discourse was directing much of the blame to the victims rather than the perpetrators. Any journalist feels pride at rerouting popular discourse, and in Wikipedia, many Wikipedia editors have a shared culture experience of both being hobbyists and also pulling the levers of power which enable them to direct some conversation. Sharing stories like these to share is part of of wiki community culture.

Among people who follow Wikimedia administration the most commonly checked value rank is whether a user has ever been involved in wiki negativity. From this perspective, brilliant feats are much less valuable except on a foundation of exclusively positive social interactions with others. To get special userrights in Wikimedia projects, like administrator rights, two things get checked: has the person had diverse experience in the various basic processes, and in the course of experimenting with those functions, was the person conflict prone. A person who can travel around being a slight net positive gets more community respect than someone who makes exceptionally good contributions but who also has negative exchanges. Someone for everyone it is easier to get into conflict online or in Wikipedia than it is away from devices. I try to be forgiving to anyone who makes a commitment to being positive and avoiding negativity. I can only pray and hope that I retain and grow whatever ability I have to stay positive myself. There is always a way to collaborate and discuss with positivity so that everyone involved enjoys, feels enthusiastic, and gets respect.

Once a user establishes positivity in administration, then actual wiki-specific activities come into view. People who make popular proposals, such as for policy changes or new processes, make themselves prominent as to enact a proposal many people must read it, see the proposer’s name, then sign onto it.

Similarly, after there is a debate someone has to declare consensus. At any given time in English Wikipedia there are at least 5 raging debates where 100 people are arguing some point, typically with about half people on each side. People get enthusiastic about Wikipedia debates because they establish a discourse, so once someone makes a point, that point can become part of the corpus of discussion without needing to be made again. Because of this arguing in Wikipedia necessarily advances toward some end. Social and technological infrastructure continuously improve the efficiency of arguments. When the time comes to end one, typically in 2-4 weeks, often one person does the close and drafts the result. Everyone who participates in that discussion watches that person doing the close, so that person is keen to issue a result which will get them respect from both sides of the mob in the argument.

Some under-respected Wikimedia contributions are those which are quieter and do not attract attention. Many volunteer software developers use extensive training and experience to introduce new tools to Wikimedia projects, and for whatever social reasons, software developers tend to socialize in a way that passes through the attention of non-software developers. Tools routinely appear, and they might be entirely volunteer generated and save the Wikimedia community thousands or tens of thousands of labor hours, and yet the intervention leaves no media trace and gives no credit. The situation can be as if fire has fallen from the sky to primitive man, and suddenly society completely changes, but no one remarks on the instantaneous universal adoption of new technology and radical behavior changes.

A huge number of people contribute Wikimedia content, whether Wikipedia prose, Commons media, Wikidata data, or whatever, and they decline to socialize in the forums. Of course this is the primary function of the Wikimedia platform, to enable people to share general reference media, but when people do this outside the context of socializing they might not get recognition. Some contributors may not even notice when other people thank them or seek them out to discuss their contributions. The people who make their contributions visible are necessarily the people who want to talk with others in Wikimedia projects, so this social class gets more esteem than the non-social class.

Older blog entries