Help my CI job fails with exit status -11

17:05, Thursday, 21 2019 March UTC

For a few weeks, a CI job had PHPUnit tests abruptly ending with:

returned non-zero exit status -11

The connoisseur [ 1 ] would have recognized that the negative exit status indicates the process exited due to a signal. On Linux, 11 is the value for the SIGSEGV signal, which is usually sent by the kernel to the process as a result of an improper machine instruction. The default behavior is to terminate the process (man 7 signal) and to generate a core dump file (I will come to that later).

But why? Some PHP code ended up triggering a code path in HHVM that would eventually try to read outside of its memory range, or some similar low level fault. The kernel knows that the process completely misbehaved and thus, well, terminates it. Problem solved, you never want your program to misbehave when the kernel is in charge.

The job had recently been switched to use a new container in order to benefit from more recent lib and to match the OS distributions used by the Wikimedia production system. My immediate recommendation was to rollback to the previous known state, but eventually I have let the task to go on and have been absorbed by other tasks (such as updating MediaWiki on the infrastructure).

Last week, the job suddenly began to fail constantly. We prevent code from being merged when a test fails, and thus the code stays in a quarantine zone (Gerrit) and cannot be shipped. A whole team could not ship code (the Language-Team ) for one of their flagship projects (ContentTranslation .) That in turn prevents end users from benefiting from new features they are eager for. The issue had to be acted on and became an unbreak now! kind of task. And I went to my journey.

returned non-zero exit status -11, that is a good enough error message. A process in a Docker container is really just an isolated process and is still managed by the host kernel. First thing I did was to look at the kernel syslog facility on our instances, which yields:

kernel: [7943146.540511] php[14610]:
  segfault at 7f1b16ffad13 ip 00007f1b64787c5e sp 00007f1b53d19d30
     error 4 in libpthread-2.24.so[7f1b64780000+18000]

php there is just HHVM invoked via a php symbolic link. The message hints at libpthread which is where the fault is. But we need a stacktrace to better determine the problem, and ideally a reproduction case.

Thus, what I am really looking for is the core dump file I alluded to earlier. The file is generated by the kernel and contains an image of the process memory at the time of the failure. Given the full copy of the program instructions, the instructions it was running at that time, and all the memory segments, a debugger can reconstruct a human readable state of the failure. That is a backtrace, and is what we rely on to find faulty code and fix bugs.

The core file is not generated. Or the error message would state it had coredumped, i.e. the kernel generated the core dump file. Our default configuration is to not generate any core file, but usually one can adjust it from the shell with ulimit -c XXX where XXX is the maximum size a core file can occupy (in kilobytes, in order to prevent filling the disk). Docker being just a fancy way to start a process, it has a setting to adjust the limit. The docker run inline help states:

--ulimit ulimit Ulimit options (default [])

It is as far as useful as possible, eventually the option to set is: --ulimit core=2147483648 or up to 2 gigabytes. I have updated the CI jobs and instructed them to capture a file named core, the default file name. After a few runs, although I could confirm failures, no files got captured. Why not?

Our machines do not use core as the default filename. It can be found in the kernel configuration:

name=/proc/sys/kernel/core_pattern
/var/tmp/core/core.%h.%e.%p.%t

I thus went on the hosts looking for such files. There were none.

Or maybe I mean None or NaN.

Nada, rien.

The void.

The result is obvious, try to reproduce it! I ran a Docker container doing a basic while loop, from the host I have sent the SIGSEGV signal to the process. The host still had no core file. But surprise it was in the container. Although the kernel is handling it from the host, it is not namespace-aware when it comes time to resolve the path. My quest will soon end, I have simply mounted a host directory to the containers at the expected place:

mkdir /tmp/coredumps
docker run --volume /tmp/coredumps:/var/tmp/core ....

After a few builds, I had harvested enough core files. The investigation is then very straightforward:

$ gdb /usr/bin/hhvm /coredump/core.606eb29eab46.php.2353.1552570410
Core was generated by `php tests/phpunit/phpunit.php --debug-tests --testsuite extensions --exclude-gr'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f557214ac5e in __pthread_create_2_1 (newthread=newthread@entry=0x7f55614b9e18, attr=attr@entry=0x7f5552aa62f8, 
    start_routine=start_routine@entry=0x7f556f461c20 <timer_sigev_thread>, arg=<optimized out>) at pthread_create.c:813
813    pthread_create.c: No such file or directory.
[Current thread is 1 (Thread 0x7f55614be3c0 (LWP 2354))]

(gdb) bt
#0  0x00007f557214ac5e in __pthread_create_2_1 (newthread=newthread@entry=0x7f55614b9e18, attr=attr@entry=0x7f5552aa62f8, 
    start_routine=start_routine@entry=0x7f556f461c20 <timer_sigev_thread>, arg=<optimized out>) at pthread_create.c:813
#1  0x00007f556f461bb2 in timer_helper_thread (arg=<optimized out>) at ../sysdeps/unix/sysv/linux/timer_routines.c:120
#2  0x00007f557214a494 in start_thread (arg=0x7f55614be3c0) at pthread_create.c:456
#3  0x00007f556aeebacf in __libc_ifunc_impl_list (name=<optimized out>, array=0x7f55614be3c0, max=<optimized out>)
    at ../sysdeps/x86_64/multiarch/ifunc-impl-list.c:387
#4  0x0000000000000000 in ?? ()

Which @Anomie kindly pointed out is an issue solved in libc6. Once the container has been rebuilt to apply the package update, the fault disappears.

One can now expect new changes to appear to ContentTranslation.


[ 1 ] ''connoisseur'', from obsolete French, means "to know" https://en.wiktionary.org/wiki/connoisseur . I guess the English language forgot to apply update on due time and can not make any such change for fear of breaking back compatibility or locution habits.

The task has all the technical details and log leading to solving the issue: T216689: Merge blocker: quibble-vendor-mysql-hhvm-docker in gate fails for most merges (exit status -11)

(Some light copyedits to above -- Brennen Bearnes)

Fulfilling your potential

18:38, Wednesday, 20 2019 March UTC

In the decade since Bob Cummings asked Are We Ready to Use Wikipedia to Teach Writing?, the answer for hundreds of instructors has been a resounding “yes!” It’s easy to make a convincing case for using a Wikipedia assignment in the classroom. Writing a Wikipedia article teaches students valuable skills while offering an authentic experience. But a combination of theory, individual student experiences, and studies that focus on single classes can only go so far to establish the benefits of the assignment. To draw broader conclusions about the benefits of a Wikipedia writing assignment, you need to look across classes, across instructors, and look at what a whole range of students experienced.

In the Fall of 2016, Wiki Education partnered with Zachary McDowell to remedy this deficiency in the study of the benefits of Wikipedia assignments for students. From a pool of 6,000 college students participating in Wiki Education supported classes, a total of 1,627 and 97 instructors completed surveys and a subset of them were involved in 13 follow–up focus groups. Preliminary results were published in June 2017, and were discussed in a previous blog post. Now Matthew Vetter, Zachary McDowell, and Mahala Stewart have published an article on this study in the journal Computers and CompositionFrom Opportunities to Outcomes: The Wikipedia–Based writing Assignment.

Overall, students and instructors found the experience to be positive; none of them found the Wikipedia assignment to be less valuable at teaching the skills they were surveyed about. Most found the assignment to be more valuable, especially in areas like developing digital literacy, learning about the reliability of online sources and learning to write clearly. There were differences between groups: medical students found the assignment to be better at developing critical thinking skills than did social science students, while both medical students and students in introductory writing classes found the assignment to be more valuable in terms of teaching peer review skills than did students in the arts and humanities. While most of these experiences were common across categories like gender, social class and race, there were some differences. Women were less likely to edit in areas of Wikipedia outside of the assignment, while Asian/Pacific Islander students were more likely than white students to find the assignment helpful for learning how to write a literature review.

Research like this helps to establish the value of a Wikipedia–based writing assignment as a tool for student learning. Theory, models, anecdotes and case studies are all valuable tools for establishing what should work, but in the end, models are only meaningful when the survive confrontation with data. It’s nice to see that ours does.


Interested in teaching with Wikipedia? Visit teach.wikiedu.org for all you need to get started.

Volunteer editor communities in four language Wikipedias—German, Czech, Danish, and Slovak—have decided to black out the sites on 21 March in opposition to the current version of the proposed EU Copyright Directive.

Those language editions of Wikipedia will redirect all visitors to a banner about the directive, blocking access to content on Wikipedia for 24 hours. A final vote on the directive is expected on 26 March.

These independent language communities decided to black out in the same way most decisions are made on Wikipedia—through discussion and consensus, something summarized in a statement from the German Wikipedia volunteer community: “Each of these independent Wikipedia communities has been engaging in public online discussions as to their course of action, and voting on whether and how to protest. They have done this according to their own rules of governance.”

• • •

This is not the first time volunteer editors have decided to take a stand on policy issues that may impact Wikipedia and the broader free and open internet.* However, it is something that happens rarely and with clear intention, and for good reason: Wikipedia’s volunteer communities recognize the importance that any one Wikipedia plays in the world, and its authority as a collection of knowledge, relies upon the neutrality of its content. And as we said in another community-led civic engagement seven years ago, “Wikipedia’s articles are neutral, [but] its existence is not.”

The Wikimedia Foundation supports decisions from our community about how they choose to include the Wikimedia projects in policy issues that directly impact our mission and values. We also recognize that our community is often in the best position to understand the local policy context and make decisions about how the site gets involved.  The Wikimedia Foundation’s role is to respect each community’s decision and work to ensure that their action is supported.

This community governance is a part of what makes Wikipedia and the Wikimedia projects so robust and so unique. These sites are not just about millions of people compiling information, but about providing knowledge equity: ensuring that people everywhere have a say in contributing knowledge and how that knowledge is used.

That’s why the Wikimedia Foundation’s role in these black outs is supportive, focused on ensuring that each community’s decisions are heard, respected, and not disrupted.

Sherwin Siy, Senior Public Policy Manager
Jan Gerlach, Senior Public Policy Manager
Wikimedia Foundation

*In 2011 and 2012, the Italian, English, and Russian Wikipedias went dark in opposition to proposed laws in Italy, the United States, and Russia (respectively).

• • •

The Wikimedia Foundation’s position on the EU Copyright Directive, separate from the opinions of the communities who edit Wikimedia sites, is clear: it would be a net loss for free knowledge in the world. To learn more and find out how to get involved, visit fixcopyright.wikimedia.org.

Our paper “A Walk on the Child Side: Investigating Parents’ and
Children’s Experience and Perspective on Mobile
Technology for Outdoor Child Independent Mobility” was accepted at CHI 2019 and also got a Honourable Mention. Wow!

You can read the paper and download the pdf at the paper page. Enjoy!

Inspiring students, pioneering women and virtual dragons

10:48, Wednesday, 20 2019 March UTC

February and March are always busy months for Open Education and this year was no exception, with the University’s Festival of Creative Learning, Open Education Week and International Women’s Day all coming back to back.

Niko is unimpressed…, CC BY, Lorna M. Campbell

The fun and games kicked off with Festival of Creative Learning in mid February.  My OER Service colleague Charlie ran a really fun and thought provoking 23 Things for Digital Confidence workshop.  The workshop challenged us to explore how we engage with technology in creative ways and we also got to play with some really cool augmented reality toys.  Oh, and there were dragons!  I took them home but I don’t think my cat was very impressed :}

Later in the week I helped to run a Get Blogging! workshop with Karen, Lila and Mark from DLAM, which guided students through the process of setting up a blog on Reclaim Hosting and provided them with some pointers on the benefits of blogging and topics they could write about.  I don’t usually get to work directly with undergraduate students so it was a really rewarding experience.  Their enthusiasm was infectious and it was great to see how proud and excited they were to leave at the end of the day with their very own brand new blog.  The fabulous feedback the students left was just the icing on the cake.  My slides from the day are here: Why Blog?

At the beginning of March we celebrated Open Education Week, I’ve already written a post about the activities we planned over the course of the week, and they all went really well.  We curated eight blog posts from staff, students and graduates on the Open.Ed blog over the course of the week, each bringing a unique perspective on engaging with open education. You can read a round-up of of these posts here.  I particularly like this quote from Martin Tasker, our very first Open Content Curation Intern, who is now building a career as a software engineer.

“In an age where where the world is both more connected and less trusting than ever, the onus is on institutions such as universities to use their reputations and resources to promote open education. As well as benefiting the public, it benefits the institutions themselves – there’s little better in the way of marketing than having potential applicants having already experienced some learning at your institution.” 

I’ve often quoted Martin’s Open Content Curation blog posts when I talk, and I’m sure I’ll be quoting his Open Education Week blog post, Reflecting on the Importance of Open Education, too.  

My daughter’s contribution to International Women’s Day, CC BY SA, RJ McCartney

International Women’s Day fell at the end of Open Education Week and Information Services marked the event by hosting a Women of Edinburgh Wikipedia Editathon and naming the Board Room in Argyle House after Brenda Moon, the first woman to head up a research university library in the 1980s, and who played a major role in bringing the University into the digital age. I spent part of the day updating the Wikipedia entry I’d previously written about Mary Susan McIntosh to include information about her work as a Women’s Rights Advocate campaigning for legal and financial rights for married and co-habiting women, defending the right to sexual expression, and arguing against censorship of pornography. 

The following week I was off down to UCL for their Open Education Symposium.  It was a privilege to be invited to share the University of Edinburgh’s strategic approach to Open Education, and it was great to hear about some of ways that openness is supported across UCL.  I particularly enjoyed hearing a group of Arts and Sciences BASc students reflecting on their positive experience of engaging with Wikibooks.  Their comments reflected those of our Edinburgh student who have participated in Wikipedia assignments and editathons. 

Somehow, in amongst all that, there was also several ALTC submissions, the launch of femedtech.net, and my daughter’s 13th birthday.  How the hell did that happen?! 

The lack of tooling or support for tooling has been causing problems in complicated code bases like the codebase for our mobile site, so we carved out a proposal to create a bridge from our existing codebase to a more modern one using Webpack. I'll talk about what we did and why.

The majority of Wikipedia's front-end assets are served by a system called ResourceLoader, that has been part of the MediaWiki software since 2010. Of particular interest to us is how it packs JavaScript assets, by concatenating and compressing them. This capability predates Webpack (2012) and similar tools so this should not be considered a case of Not invented here.

This tooling has allowed developers to build front-end code without tooling. Back in 2010, this was actually the norm. If you look at similar projects which have been around for as long as ours in the open source ecosystem, you'll find Makefiles concatenating JavaScript files - artifacts of this era.

As the JavaScript ecosystem has flourished, it's becoming impossible to build JavaScript without some reliance on tools. Needless to say most front-end developers are now accustomed to working with tooling. Being unable to use tooling to build our JavaScript has arguably handicapped us a little, especially as we turn our attention to more complicated ambitious projects such as the page previews feature.

Unlike many JavaScript module systems, the ResourceLoader system we use focuses on the collection and delivery of various loosely coupled assets which it discovers from a manifest rather than a file. Currently, when writing JavaScript we need to define a module inside a special file called extension.json which is then interpreted in PHP and sent to the user which looks like so:

"mobile.toc": {
        // define which environments to run this module
"targets": [
                "mobile",
                "desktop"
        ],
        // manifest of dependencies (think require/import)
"dependencies": [
                "mobile.startup",
                "mobile.toc.images"
        ],
// script files in order they need to be concatenated
        "scripts": [
                "resources/mobile.toc/TableOfContents.js"
        ],
        // styles that should be loaded via JavaScript
"styles": [
                "resources/mobile.toc/toc.less"
        ],
        // manifest of templates to load to support JavaScript
"templates": {
                "toc.hogan": "resources/mobile.toc/toc.hogan",
                "heading.hogan":  "resources/mobile.toc/tocHeading.hogan"
        },
        // message keys needed to load to support internationalization
"messages": [
                "toc"
        ]
},

Without a Node.js module system, our developers have had to work with a client-side library similar to Require.js and manual editing of what's essentially a manifest to discover JavaScript whilst keeping a mental picture of how it all fits together and the code is split. We had to manage the JavaScript dependency trees ourselves. As my work colleague, Stephen put so nicely:

we essentially had to fill out paperwork to create a file" and now "adding a new file is as easy as right click new file".

Using Webpack to handle our JavaScript rather than our own in-house ResourceLoader has achieved several things for us:

  • Webpack manages complicated dependency trees for us which avoids loading errors (e.g. code loading in the wrong order)
  • gives us more control over public and private interfaces to community gadgets
  • allows us to expose interfaces for testing
  • encourages separation of logic into reusable modules (files) without any mental strain
  • allows delegation of problems such as code splitting to tooling rather than the human mind

Making code easier to work with

Previously, adding or even renaming any source file to our repository required not only creating the file but registering it by listing it in an array in a JSON file (see developer notes at the end of this article for more). We had 86 JavaScript files which we reduced to 19 files built via Webpack from 101 source files. The increase in source files reflects the teams ability to embrace Webpack and separate code responsibilities more effortlessly.

Now that we make use of Webpack, we only have to require it via a require statement. This might seem basic stuff, but it has made a big difference.

Similarly, we've not had to worry about polluting the global namespace. Previously all our files were wrapped in an IIFE but now we've been able to remove that wrapper and also the indenting associated with it.

A familiar stack

While we didn't measure it, and it could just because we hire awesome people, we've all noticed that our new hire got up and running with our code much quicker than previous hires.

We're using tooling that is well supported and has utilized new tooling to help us write better code. We're making use of bundlesize to track the size of our JavaScript assets in the code itself (and preventing unexpected increases with our continuous integration stack). We're also making use of nyc to track code coverage (see below).

Faster unit tests

Previously, our unit tests couldn't be run cheaply on the command line in Node.js. Any npm script wanting to run them would have to boot up a browser e.g. Phantomjs and have a working MediaWiki instance. This was slow, manual and error-prone as tests from other unrelated projects could break our own tests. It was pseudo-integration testing than unit testing. Now, with a few minor changes (mock libraries that emulate "MediaWiki"), we can run these tests from the command line in headless mode using the qunit node library. We use Webpack to build a file that can be run in the old method, to avoid confusing developers in other teams who are used to running tests this way. During our refactor, 44 QUnit tests were slowly migrated to run in Node.js.

This is obviously much faster as it doesn't require a MediaWiki install and doesn't require booting up a browser. As a result, there is more motivation for engineers to write tests in the first place, in fact we've already added 15 new test files.

Code coverage

Previously, our bespoke tooling made testing coverage very tricky and manual. As a result we didn't measure. Now that we are using Webpack and headless Node.js tests, we are able to work this out. After porting a file to the new module loading system, we made sure to document the code coverage of the file. It has shown us that roughly 50% of the JavaScript code we run is covered by tests. More alarmingly, 45 of our 81 files had 0% coverage. In particular it became clear that we were avoiding writing tests where it meant exposing private interfaces on a global JavaScript variable.

As we refactor with our remaining project time we aim closer to 100% test coverage, now that our tooling makes it easier to write tests and we are now able to enforce coverage in the repository via the nyc library meaning that code coverage can only get better.

More future-proof code

While we've been forced to look at the code, we've been noticing ways to improve it and prepare it for a modern future. We've been replacing calls to jQuery with calls to a wrapper for jQuery, meaning it's becoming clearer about what we use jQuery for, and when and how we might not need it. By keeping the definition of our project around problems rather than solutions, it's been easy to justify and prioritize this work.

Similarly, we've been migrating to use ES5 functions where possible instead of jQuery.

While we're not removing jQuery from our stack just yet, we've found inspiration in other efforts to do this such as Github to at least make this a real possibility.

Versioning

Our mobile front-end relies heavily on the Hogan template library. Previously, any vendor JavaScript had to copy and pasted into the repository itself. When we started we didn't actually know what version of Hogan we were using and had to diff it with several production versions! Now that we use Webpack, we can pull Hogan directly from the npm repository, so we know exactly what we are shipping and can upgrade easily if necessary. We are exploring leaning more on our tooling with ideas to use transpiling and include template source code in JavaScript files.

Webpack didn't solve everything

While Webpack has helped us organize our JavaScript files better, it doesn't seem to solve all our problems (yet). For example. since Wikipedia supports over 200 languages, we haven't found a way Webpack can ship message strings in the scaleable way that ResourceLoader does. I'm excited about the prospect of identifying these problems and filling in those blanks where necessary, but right now we have a nice balance of the best of ResourceLoader and Webpack in our codebase.

A buggy history

03:14, Tuesday, 19 2019 March UTC
—I suppose you are an entomologist?—I said with a note of interrogation.
—Not quite so ambitious as that, sir. I should like to put my eyes on the individual entitled to that name! A society may call itself an Entomological Society, but the man who arrogates such a broad title as that to himself, in the present state of science, is a pretender, sir, a dilettante, an impostor! No man can be truly called an entomologist, sir; the subject is too vast for any single human intelligence to grasp.
The Poet at the Breakfast Table (1872) by Oliver Wendell Holmes, Sr. 
 
A collection of biographies
with surprising gaps (ex. A.D. Imms)
The history of interest in Indian insects has been approached by many writers and there are several bits and pieces available in journals and there are various insights distributed across books. There are numerous ways of looking at how people historically viewed insects. One attempt is a collection of biographies, some of which are uncited verbatim (and not even within quotation marks) accounts  from obituaries, by B.R. Subba Rao who also provides something of a historical thread connecting the biographies. Keeping Indian expectations in view, Subba Rao and M.A. Husain play to the crowd. Husain was writing in pre-Independence times where there was a genuine conflict between Indian intellectuals and their colonial masters. They begin with interpretations of mentions of insects in old Indian writings. As can be expected there are mentions of honey, shellac, bees, ants, and a few nuisance insects in old texts. Husain takes the fact that the term Satpada षट्पद or six-legs existed in the 1st century Amarakosa to suggest that Indians were far ahead of time because Latreille's Hexapoda, the supposed analogy, was proposed only in 1825. Such histories gloss over the structures on which science and one can only assume that they failed to find the development of such structures in the ancient texts that they examined. The identification of species mentioned in old texts are often based on ambiguous translations should leave one wondering what the value of claiming Indian priority in identifying a few insects is. For instance K.N. Dave translates a verse from the Atharva-veda and suggests an early date for knowledge of shellac. This interpretation looks dubious and sure enough, Dave has been critiqued by Mahdihassan.  The indragopa (Indra's cowherd) is supposedly something that appears after the rains. Sanskrit scholars have identified it variously as the cochineal insect (the species Dactylopius coccus is South American!), the lac insect, a firefly(!) and as Trombidium (red velvet mite) - the last matches the blood red colour mentioned in a text attributed to Susrutha. To be fair, ambiguities resulting from translation are not limited to those that deal with Indian writing. Dikairon (Δικαιρον), supposedly a highly-valued and potent poison from India was mentioned in the work Indika by Ctesias 398 - 397 BC. One writer said it was the droppings of a bird. Valentine Ball thought it was derived from a scarab beetle. Jeffrey Lockwood claimed that it came from the rove beetles Paederus sp. And finally a Spanish scholar states that all this was a misunderstanding and that Dikairon was not a poison, and believe it or not, was a masticated mix of betel leaves, arecanut, and lime! One gets a far more reliable idea of ancient knowledge and traditions from practitioners, forest dwellers, the traditional honey harvesting tribes, and similar people that have been gathering materials such as shellac and beeswax. Unfortunately, many of these traditions and their practitioners are threatened by modern laws, economics, and culture. These practitioners are being driven out of the forests where they live, and their knowledge was hardly ever captured in writing. The writers of the ancient Sanskrit texts were probably associated with temple-towns and other semi-urban clusters and it seems like the knowledge of forest dwellers was not considered merit-worthy.

A more meaningful overview of entomology may be gained by reading and synthesizing a large number of historical bits, of which there are a growing number. The 1973 book published by the Annual Reviews Inc. should be of some interest. I have appended a selection of sources that I have found useful in adding bits and pieces to form a historic view of entomology in India. It helps however to have a broader skeleton on which to attach these bits and minutiae. Here, there area also truly verbose and terminology-filled systems developed by historians of science (for example, see ANT). I prefer an approach that is free of a jargon overload and like to look at entomology and its growth along three lines of action - cataloguing with the main product being collection of artefacts and the assignment of names, communication and vocabulary-building are social actions involving groups of interested people who work together with the products being scholarly societies and journals, and pattern-finding where hypotheses are made, and predictions tested. I like to think that anyone learning entomology also goes through these activities, often in this sequence. With professionalization there appears to be a need for people to step faster and faster into the pattern-finding way which also means that less time is spent on the other two streams of activity. The fast stepping often is achieved by having comprehensive texts, keys, identification guides and manuals. The skills involved in the production of those works - ways to prepare specimens, observe, illustrate, or describe are often not captured by the books themselves.

Cataloguing

The cataloguing phase of knowledge gathering, especially of the (larger and more conspicuous) insect species of India grew rapidly thanks to the craze for natural history cabinets of the wealthy (made socially meritorious by the idea that appreciating the works of the Creator was as good as attending church)  in Britain and Europe and their ability to tap into networks of collectors working within the colonial enterprise. The cataloguing phase can be divided into the non-scientific cabinet-of-curiosity style especially followed before Darwin and the more scientific forms. The idea that insects could be preserved by drying and kept for reference by pinning, [See Barnard 2018] the system of binomial names, the idea of designating type specimens that could be inspected by anyone describing new species, the system of priority in assigning names were some of the innovations and cultural rules created to aid cataloguing. These rules were enforced by scholarly societies, their members (which would later lead to such things as codes of nomenclature suggested by rule makers like Strickland, now dealt with by committees that oversee the  ICZN Code) and their journals. It would be wrong to assume that the cataloguing phase is purely historic and no longer needed. It is a phase that is constantly involved in the creation of new knowledge. Labels, catalogues, and referencing whether in science or librarianship are essential for all subsequent work to be discovered and are essential to science based on building on the work of others, climbing the shoulders of giants to see further. Cataloguing was probably what the physicists derided as "stamp-collecting".

Communication and vocabulary building

The other phase involves social activities, the creation of specialist language, groups, and "culture". The methods and tools adopted by specialists also helps in producing associations and the identification of boundaries that could spawn new associations. The formation of groups of people based on interests is something that ethnographers and sociologists have examined in the context of science. Whereas some of the early learned societies were spawned by people with wealth and leisure, some of the later societies have had other economic forces in their support.

Like species, interest groups too specialize and split to cover more specific niches, such as those that deal with applied areas such as agriculture, medicine, veterinary science and forensics. There can also be interest in behaviour, and evolution which, though having applications, are often do not find economic support.

Pattern finding
Eleanor Ormerod, an unexpected influence
in the rise of economic entomology in India

The pattern finding phase when reached allows a field to become professional - with paid services offered by practitioners. It is the phase in which science flexes its muscle, specialists gain social status, and are able to make livelihoods out of their interest. Lefroy (1904) cites economic entomology as starting with E.C. Cotes [Cotes' career in entomology was short, after marrying the famous Canadian journalist Sara Duncan in 1889 he too moved to writing] in the Indian Museum in 1888. But he surprisingly does not mention any earlier attempts, and one finds that Edward Balfour, that encyclopaedic-surgeon of Madras collated a list of insect pests in 1887 and drew inspiration from Eleanor Ormerod who hints at the idea of getting government support, noting that it would cost very little given that she herself worked with no remuneration to provide a service for agriculture in England. Her letters were also forwarded to the Secretary of State for India and it is quite possible that Cotes' appointment was a result.

As can be imagined, economics, society, and the way science is supported - royal patronage, family, state, "free markets", crowd-sourcing, or mixes of these - impact the way an individual or a field progresses. Entomology was among the first fields of zoology that managed to gain economic value with the possibility of paid employment. David Lack, who later became an influential ornithologist, was wisely guided by his father to pursue entomology as it was the only field of zoology where jobs existed. Lack however found his apprenticeship (in Germany, 1929!) involving pinning specimens "extremely boring".

Indian reflections on the history of entomology

Kunhikannan died at the rather young age of 47
A rather interesting analysis of Indian science is made by the first native Indian entomologist to work with the official title of "entomologist" in the state of Mysore - K. Kunhikannan. Kunhikannan was deputed to pursue a Ph.D. at Stanford (for some unknown reason many of the pre-Independence Indian entomologists trained in Stanford rather than England) through his superior Leslie Coleman. At Stanford, Kunhikannan gave a talk on Science in India. He noted in his 1923 talk :

In the field of natural sciences the Hindus did not make any progress. The classifications of animals and plants are very crude. It seems to me possible that this singular lack of interest in this branch of knowledge was due to the love of animal life. It is difficult for Westerners to realise how deep it is among Indians. The observant traveller will come across people trailing sugar as they walk along streets so that ants may have a supply, and there are priests in certain sects who veil that face while reading sacred books that they may avoid drawing in with their breath and killing any small unwary insects.
He then examines science sponsored by state institutions, by universities and then by individuals. About the last he writes:
Though I deal with it last it is the first in importance. Under it has to be included all the work done by individuals who are not in Government employment or who being government servants devote their leisure hours to science. A number of missionaries come under this category. They have done considerable work mainly in the natural sciences. There are also medical men who devote their leisure hours to science. The discovery of the transmission of malaria was made not during the course of Government work. These men have not received much encouragement for research or reward for research, but they deserve the highest praise., European officials in other walks of life have made signal contributions to science. The fascinating volumes of E. H. Aitken and Douglas Dewar are the result of observations made in the field of natural history in the course of official duties. Men like these have formed themselves into an association, and a journal is published by the Bombay Natural History Association[sic], in which valuable observations are recorded from time to time. That publication has been running for over a quarter of a century, and its volumes are a mine of interesting information with regard to the natural history of India.
This then is a brief survey of the work done in India. As you will see it is very little, regard being had to the extent of the country and the size of her population. I have tried to explain why Indians' contribution is as yet so little, how education has been defective and how opportunities have been few. Men do not go after scientific research when reward is so little and facilities so few. But there are those who will say that science must be pursued for its own sake. That view is narrow and does not take into account the origin and course of scientific research. Men began to pursue science for the sake of material progress. The Arab alchemists started chemistry in the hope of discovering a method of making gold. So it has been all along and even now in the 20th century the cry is often heard that scientific research is pursued with too little regard for its immediate usefulness to man. The passion for science for its own sake has developed largely as a result of the enormous growth of each of the sciences beyond the grasp of individual minds so that a division between pure and applied science has become necessary. The charge therefore that Indians have failed to pursue science for its own sake is not justified. Science flourishes where the application of its results makes possible the advancement of the individual and the community as a whole. It requires a leisured class free from anxieties of obtaining livelihood or capable of appreciating the value of scientific work. Such a class does not exist in India. The leisured classes in India are not yet educated sufficiently to honour scientific men.
It is interesting that leisure is noted as important for scientific advance. Edward Balfour, mentioned earlier, also made a similar comment that Indians were too close to subsistence to reflect accurately on their environment!  (apparently in The Vydian and the Hakim, what do they know of medicine? (1875) which unfortunately is not available online)

Kunhikannan may be among the few Indian scientists who dabbled in cultural history, and political theorizing. He wrote two rather interesting books The West (1927) and A Civilization at Bay (1931, posthumously published) which defended Indian cultural norms while also suggesting areas for reform. While reading these works one has to remind oneself that he was working under and with Europeans and would not have been able to have many conversations on these topics with Indians. An anonymous writer who penned the memoir of his life in his posthumous work notes that he was reserved and had only a small number of people to talk to outside of his professional work.
Entomologists meeting at Pusa in 1919
Third row: C.C. Ghosh, Ram Saran, Gupta, P.V. Isaac, Y. Ramachandra Rao, Afzal Husain, Ojha, A. Haq
Second row: M. Zaharuddin, C.S. Misra, D. Naoroji, Harchand Singh, G.R. Dutt, E.S. David, K. Kunhi Kannan, Ramrao S. Kasergode, J.L.Khare, Jhaveri, V.G.Deshpande, R. Madhavan Pillai, Patel, A. Mujtaba, P.C. Sen
First row: Capt. Froilano de Mello, Robertson-Brown, S. Higginbotham, C.M. Inglis, C.F.C. Beeson, Gough, Bainbrigge Fletcher, Bentley, Senior-White, T.V. Rama Krishna Ayyar, C.M. Hutchinson, Andrews, H.L.Dutt


Entmologists meeting at Pusa in 1923
Fifth row (standing) Mukerjee, G.D.Ojha, Bashir, Torabaz Khan, D.P. Singh
Fourth row (standing) M.O.T. Iyengar, R.N. Singh, S. Sultan Ahmad, G.D. Misra, Sharma,Ahmad Mujtaba, Mohammad Shaffi
Third row (standing) Rao Sahib Y Rama Chandra Rao, D Naoroji, G.R.Dutt, Rai Bahadur C.S. Misra, SCJ Bennett (bacteriologist, Muktesar), P.V. Isaac, T.M. Timoney, Harchand Singh, S.K.Sen
Second row (seated) Mr M. Afzal Husain, Major RWG Hingston, Dr C F C Beeson, T. Bainbrigge Fletcher, P.B. Richards, J.T. Edwards, Major J.A. Sinton
First row (seated) Rai Sahib PN Das, B B Bose, Ram Saran, R.V. Pillai, M.B. Menon, V.R. Phadke (veterinary college, Bombay)

Note: As usual, these notes are spin-offs from researching and writing Wikipedia entries, in this case on several pioneering Indian entomologists. It is remarkable that even some people in high offices, such as P.V. Isaac, the last Imperial Entomologist, and grandfather of noted writer Arundhati Roy, is largely unknown (except as the near-fictional Pappachi in Roy's God of Small Things)


References
An index to entomologists who worked in India or described a significant number of species from India - with links to Wikipedia links (where possible - the gaps are huge)
(woefully incomplete - feel free to let me know of additional candidates)

 
Edward Percy Stebbing - T.B. Fletcher - Edward Ernest Green - E.C. Cotes - Harold Maxwell Lefroy - Frank Milburn Howlett - S.R. Christophers - Leslie C. Coleman - T.V. Ramakrishna Ayyar - Yelsetti Ramachandra Rao - Magadi Puttarudriah - Hem Singh Pruthi - Shyam Sunder Lal Pradhan - James Molesworth Gardner - Vakittur Prabhakar Rao - D.N. Raychoudhary - C.F.W. Muesenbeck  - Mithan Lal Roonwal - Ennapada S. Narayanan - M.S. Mani - T.N. Ananthakrishnan - K. Kunhikannan - Muhammad Afzal Husain
 

    Teaching students how to communicate science

    17:32, Monday, 18 2019 March UTC

    Thais Morata and Erin Haynes at the University of Cincinnati recognize the importance of students having robust science communication skills. So last Fall 2018, they incorporated a Wikipedia writing assignment into their course where students could expand Wikipedia pages about science topics that were interesting to them.

    The course, Communicating Your Science, “will enable students in scientific disciplines to develop the skills needed to explain their research to non-specialists and public audiences. Class sessions will address a variety of communication areas, including speaking and writing to lay audiences, reporting research results to community members, preparing briefings for policy-makers, and communicating with different media outlets.” *

    The flexible nature of a Wikipedia assignment lets students guide their own research, find topics that they’re interested in, and ultimately make a difference for public knowledge about something they care about. Students are great “translators” of scientific information because they remember what it was like to learn about these complex topics for the first time. A Wikipedia assignment also offers students a great exercise in synthesizing information from a variety of sources.

    In end of term presentations, students walked through their process for selecting topics to improve on Wikipedia and what they thought of the assignment. The class created and expanded quite a variety of topics by the close of the project.

    One student set out to find a Wikipedia article that interested him and was also in need of improvements. He began his search at the article about lead. But seeing that it was a Featured Article (Wikipedia’s highest quality designation), he decided it wasn’t a good place to begin editing. He decided to get more specific, looking at the article for lead poisoning. Nope, that was a Good Article (the second highest quality designation on Wikipedia), so he likely couldn’t make a lot of meaningful contributions there. Looking at the “See also” section of the article for even more specialized topics, he found the article for the lead-crime hypothesis.

    As this student expanded upon in the article, the lead-crime hypothesis is the theory that high lead rates in a community are responsible for high rates of crime in that same community. The theory developed after environmental policies were passed to reduce blood-lead levels in a local population. Children who grew up during the period of reduced lead-levels committed fewer crimes in young adulthood than had previous generations. The lead-crime hypothesis was born to explain the change.

    In his presentation, the student explained that before he made changes, he saw that the article was lacking citations. “I set forth to add the necessary citations to the page, and then also identify information I could add without stepping on anyone’s toes.”

    Screenshot of the Dashboard’s authorship-highlighting ability. Content that the student added to the lead-hypothesis article appears in purple.

    Wikipedia is naturally a collaborative space. Volunteers build off each other’s work, expanding upon and deleting previous content when new information comes along. Sometimes students feel intimidated to enter this community space, which has particular norms and expectations about what information makes it into the encyclopedia.

    But once they get over the initial hurdles of making their first edits, many students find themselves feeling more confident about articulating course topics. And they feel a sense of pride having contributed both to a topic they care about and on a platform where their work can potentially be seen by millions.

    The student added more than 4,000 words to the existing article about the hypothesis. “A lot of that is improving the background material, expanding on what they’d already said trying to tie everything together,” he shared. He also linked content within the article to other articles, thinking about the topic in the context of the larger fields of sociology and chemistry. Wikipedia writing is inevitably interdisciplinary that way.

    By the time he was finished adding contributions over the period of the course, the article was deemed well-sourced enough that the warning template that had initially warned readers about flaws in the content could be removed.

    “Overall, I liked this project. It was something I’d never done before,” he concluded.

    Another student’s process for her Wikipedia writing assignment looked a bit different. As she shared in her final presentation about the project, she’s generally interested in “the role of microRNA in driving neurological disease, like fragile X syndrome and epilepsy.”

    At first, she had trouble finding a Wikipedia article that fit her interests and needed improvements. A lot of microRNA topics she was finding had already been extensively covered. “Until I realized that there wasn’t a page for microRNA 324-5p, which is the major microRNA that I’m focusing on right now in my research.”

    So the student started the Wikipedia article for MiR-324-5p. “It took a lot of research,” she said. “I went through the history, structure and targets, and its different functions. MicroRNAs have incredibly complex pathways; they interact with hundreds of molecules and proteins. So I wanted to compile all of that information.”

    When reflecting on her goals at the start of the project, she said, “I wanted to add something new and to learn more about my own research in the process.”

    2,500 words later, plus a helpful graphic she found on Wikimedia Commons, the article is accessible to Wikipedia’s millions of readers.

    Graphic describing the role of miRNA in a cancer cell.
    Image: File:Role of miRNA in a cancer cell.svg, Philippe Hupé, CC BY-SA 3.0, via Wikimedia Commons.

    “I was able to put all that information in place for someone who is maybe encountering this topic for the first time. They can come in a read all about it, and go through the sequence without having to go through a bunch of databases.”

    Wikipedia writing assignments are farther reaching than a traditional assignment, which might just be read by the instructor and filed away.

    “The article has already been seen 246 times,” the student gushed. “I’m impressed.”

    “And I linked in a bunch of articles. So for anything that was difficult or a science-y word, I put in a link for. And I also went back and linked some articles to this one, like epilepsy and other explorations of microRNA.”

    Not only does the act of linking make the article more accessible to the general reader, but it requires critical thinking of the student making the links. Both students thought about their topic within larger frameworks of knowledge. They thought about how it relates to other topics, formed a mental web within their discipline, and thought across that web.

    “I liked this assignment. It was an opportunity to learn about this new microRNA.”

    And although she hasn’t had interactions yet on the new article’s Talk Page (where other volunteers can discuss changes to the content), the student mentions that she was thanked by another editor for her work. “That was nice.”

    Interested in teaching with Wikipedia? Visit teach.wikiedu.org for everything you need to know to incorporate an assignment like this into your course.

    Tech News issue #12, 2019 (March 18, 2019)

    00:00, Monday, 18 2019 March UTC
    TriangleArrow-Left.svgprevious 2019, week 12 (Monday 18 March 2019) nextTriangleArrow-Right.svg
    Other languages:
    Deutsch • ‎English • ‎español • ‎français • ‎polski • ‎português do Brasil • ‎suomi • ‎svenska • ‎čeština • ‎русский • ‎українська • ‎עברית • ‎العربية • ‎فارسی • ‎مصرى • ‎کوردی • ‎हिन्दी • ‎中文 • ‎日本語 • ‎粵語

    George Ernest Spero, the vanishing MP

    15:04, Sunday, 17 2019 March UTC

    As part of the ongoing Wikidata MPs project, I’ve come across a number of oddities – MPs who may or may not have been the same person, people who essentially disappear after they leave office, and so on. Tracking these down can turn into quite a complex investigation.

    One such was George Ernest Spero, Liberal MP for Stoke Newington 1923-24, then Labour MP for Fulham West 1929-30. His career was cut short by his resignation in April 1930; shortly afterwards, he was declared bankrupt. Spero had already left the country for America, and nothing more was heard of him. The main ambiguity was when he died – various sources claimed either 1960 or 1976, but without it being clear which was more reliable, or any real details on what happened to him after 1930. In correspondence with Stephen Lees, who has been working on an incredibly useful comprehensive record of MP’s death-dates, I did some work on it last year and eventually confirmed the 1960 date; I’ve just rediscovered the notes from this and since it was an interesting little mystery, thought I’d post them.

    George Spero, MP and businessman

    So, let’s begin with what we know about him up to the point at which he vanished.

    George Ernest Spero was born in 1894. He began training at the Royal Dental Hospital in 1912, and served in the RNVR as a surgeon during the First World War. He had two brothers who also went into medicine; Samuel was a dentist in London (and apparently also went bankrupt, in 1933), while Leopold was a surgeon or physician (trained at St. Mary’s, RNVR towards the end of WWI, still in practice in the 1940s). All of this was reasonably straightforward to trace, although oddly George’s RNVR service records seem to be missing from the National Archives.

    After the war, he married Rina Ansley (nee Rina Ansbacher, born 14 March 1902) in 1922; her father was a wealthy German-born stockbroker, resident in Park Lane, who had naturalised in 1918. They had two daughters, Rachel Anne (b. 1923) and Betty Sheila (b. 1928). After his marriage, Spero went into politics in Leicester, where he seems to have been living, and stood for Parliament in the 1922 general election. The Nottingham Journal described him as for “the cause of free, unfettered Liberalism … Democratic in conviction, he stands for the abolition of class differences and for the co-operation of capital and labour.” However, while this was well-tailored to appeal to the generally left-wing voters of Leicester West, and his war record was well-regarded, the moderate vote was split between the Liberal and National Liberal candidates, with Labour taking the seat.

    The Conservative government held another election in 1923, aiming to strengthen a small majority (does this sound familiar?), and Spero – now back in London – contested Stoke Newington, then a safe Conservative seat, again as a left Liberal. With support from Labour, who did not contest the seat, Spero ran a successful campaign and unseated the sitting MP. He voted in support of the minority Labour government on a number of occasions, and was one of the small number of Liberal rebels who supported them in the final no-confidence vote. However, this was not enough to prevent Labour fielding a candidate against him in 1924; the Conservative candidate took 57% of the vote, with the rest split evenly between Labour and Liberal.

    Spero drifted from the Liberals into the Labour Party, probably a more natural home for his politics, joining it in 1925. By the time of the next general election, in May 1929, he had become the party’s candidate for Fulham West, winning it from the Conservatives with 45% of the vote.

    He was a moderately active Government backbencher for the next few months, including being sent as a visitor to Canada during the recess in September 1929, travelling with his wife. While overseas, she caused some minor amusement to the British papers after reporting the loss of a £6,000 pearl necklace – they were delighted to report this alongside “socialist MP”. He was last recorded voting in Hansard in December, and did not appear in 1930. In February and March he was paired for votes, with a newspaper report in early March stating that he had been advised to take a rest to avoid a complete nervous breakdown about the start of the year, and had gone to the South of France, but “hopes to return to Parliament before the month is out”. However, on 9th April he formally took the Chiltern Hundreds (it is interesting that a newspaper report suggested his local party would choose whether to accept the resignation).

    However, things were moving quickly elsewhere. A case was brought against him in the High Court for £10,000, arising from his sale of a radio company in 1928-29. During the court hearing, at the end of May, it was discovered that a personal cheque for £4000 given by Spero to guarantee the company’s debts had been presented to his bank in October 1929, but was not honoured. He had at this point claimed to be suing the company for £20,000, buying six months legal delay, sold his furniture, and – apparently – left the country for America. Bankruptcy proceedings followed later that year (where he was again stated to be in America) and, unsurprisingly, his creditors seem to have received very little.

    At this point, the British trail and the historic record draw to a gentle close. But what happened to him?

    The National Portrait Gallery gave his death as 1960, while an entry in The Palgrave Dictionary of Anglo-Jewish History reported that they had traced his death to 1976 in Belgrade, Yugoslavia (where, as a citizen, it was registered with the US embassy). Unfortunately, it did not go into any detail about how they worked this out, and this just heightened the mystery – if it was true, how had a disgraced ex-MP ended up in Yugoslavia on a US passport three decades later? And, conversely, who was it had died in 1960?

    George Spears, immigrant and doctor

    We know that Spero went to America in 1929-30; that much seemed to be a matter of common agreement. Conveniently, the American census was carried out in April 1930, and the papers are available. On 18 April, he was living with his family in Riverside Drive, upper Manhattan; all the names and ages line up, and Spero is given as a medical doctor, actively working. Clearly they were reasonably well off, as they had a live-in maid, and it seems to be quite a nice area.

    In 1937, he petitioned for American citizenship in California, noting that he had lived there since March 1933. As part of the process, he formally notified that he intended to change his name to George Ernest Spears. (He also gave his birthdate as 2 March 1894, of which more later).

    While we can be reasonably confident these are the same man due to the names and dates of the family, the match is very neatly confirmed by the fact that the citizenship papers have a photograph, which can be compared to an older newspaper one. There is fifteen years difference, but we can see the similarities between the prospective MP of 27 and the older man of 43.

    George Spears, with the same family, then reappears in the 1940 census, back in Riverside Drive. He is now apparently practicing as an optician, and doing well – income upwards of $6000. Finally, we find a draft record for him living in Huntingdon, Long Island at some point in 1942. Note his signature here, which is visibly the same hand as in 1937, except “E. Spears” not “Ernest Spero”.

    It is possible he reverted to his old name for a while – there are occasional appearances of a Dr. George Spero, optometrist, in the New York phone books between the 1940s and late 1950s. Not enough detail to be sure either way, though.

    So at this point, we can trace Spero/Spears continually from 1930 to 1942. And then nothing, until on 7 January 1960, George E. Spears, born 2 March 1894, died in California. Some time later, in June 1976, George Spero, born 11 April 1894, died in Belgrade, Yugoslavia, apparently a US citizen. Which one was our man?

    The former seemed more likely, but can we prove it? The death details come from an index, which gives a mother’s maiden name of “Robinson” – unfortunately the full certificate isn’t there and I did not feel up to trying to track down a paper Californian record to see what else it said.

    If we return to the UK, we can find George Spero in the 1901 census in Dover, with his parents Isidore Sol [Solomon], a ‘dental mechanic’, and Rachel, maiden name unknown. The family later moved to London, the parents naturalised, Isidore died in 1925 – and probate goes to “George Ernest Spero, physician”, which seems to confirm that this is definitely the right family and not a different George Spero. The 1901 censuses note that two of the older children were born in Dublin, so we can trace them in the Irish records. Here we have an “Israel S Spero” marrying Rachel Robinson in 1884, and a subsequent child born to Solomon Israel Spero and Rachel Spero nee Robinson. There are a few other Speros or Spiros appearing in Dublin, but none married around the right time, and none with such similar names. If Israel Solomon Spero is the same as Isidore Solomon Spero, this all ties up very neatly.

    It leaves open the mystery, however, of who died in Yugoslavia. It seems likely this was a completely different man (who had not changed his name), but I have completely failed to trace anything about him. A pity – it would have been nice to definitively close off that line of enquiry.

    weeklyOSM 451

    12:50, Sunday, 17 2019 March UTC

    05/03/2019-11/03/2019

    Logo

    MapRoulette has been revised – now version 3.2 1 | map data © OpenStreetMap contributors

    Uploadfilter – Article 13 EU Copyright Directive

    • Michael Reichert announces that openstreetmap.de will again protest against Article 13 of the EU Copyright Directive. (de) (automatic translation)
    • Heise Online reports (de) (automatic translation) that the German Wikipedia will be turned off for one day on 21 March as a protest against article 13 of the new EU Copyright Directive.

    Mapping

    • Following the decision to cease rendering leisure=common in OSM’s main map style Carto, a replacement is being discussed on the HOT mailing list. Apparently the requirements for the replacement are not too difficult; as Andrew Buck wrote “I would accept anything the community comes up with as long as it renders or the rendering rules are updated before the change.”
    • François Lacombe announced on the tagging mailing list the start of voting for the proposal to improve the tagging of substations and the network hierarchy in general.
    • The proposal for highway=via_ferrata and via_ferrata_scale=* for tagging protected climbing routes in the Alps and certain other locations has been rewritten. Comments and suggestions are welcomed.
    • SK53 wrote on his “Maps matter” blog about mapping roof-top solar panels in Nottingham, UK. He summarises what he learnt and makes some suggestions for finding them in other places. Also mcld found that solar panels in London were harder to find. Both activities arose because of an animated discussion on Twitter, initiated by Jack Kelly, about using machine learning to find solar panels.
    • naoliv announced in his user diary the availability of a new JOSM layer specific for Brazilian street names, which can be used to take over street names from IBGE‘s Base de faces de logradouros. In his post he explains the issues with the previous solution and how the new layer can be used.
    • raphaelmirc explains how to use MapRoulette. (automatic translation)

    Community

    • We recently reported about the issue of non-compliant attributions by OSM data consumers. Our commentary suggested that, based on the responses, it appears that the topic is not that important for other mappers, or they are resigned to current practices. However, we heavily underestimated the momentum the topic gained after writing the text and we missed the chance to incorporate the feedback right before publishing our last issue. We’re sorry for our oversight.Martin Koppenhöfer is right when complaining about it. The responses on the missing attribution are noteworthy. For example, Paul Norman points to the fact that sometimes users of our data find room for their logo but not for the attribution required when using our data. Martijn van Exel provided an example how to attribute even on an iPhone screen. Simon Poole adds that the majority of data consumers are providing acceptable attribution or at least fix it when pointed to the issue. He also indicated that the community can expect a draft attribution guideline for discussion in the upcoming months. However, there are many, many more contributions to the discussion that are worth reading.
    • OpenStreetMap has been accepted as a mentoring organisation for this year’s Google Summer of Code. On the OSM developers’ mailing list, Tobias Knerr calls for more project ideas to be added to the project ideas wiki page.
    • The Swiss Federal Railways SBB has set up (automatic translation) an internal group of volunteers called “OSM at SBB” to discuss OSM related things.
    • The blog of OpenCage data, the operator of OpenCage Geocoder, has published an interview with Richard Fairhurst, the maker of Deriviste (and a few other things).
    • Jochen Topf wrote a blog post about the changes for openstreetmapdata.com users after the move to a new server following FOSSGIS’ funding agreement for the new server infrastructure.

    Imports

    • User Hokkosha published (automatic translation) a proposal for a data import (automatic translation) from Hokkaido Takushoku Bus and asked for feedback. This proposal will be produced in English after discussion in Japanese.

    Events

    • The OSM group of Nancy and its surroundings meets (automatic translation) regularly in partnership with the “Fabrique Collective de la Culture du Libre”.

    Humanitarian OSM

    • The World Bank Group has contracted the Humanitarian OpenStreetMap Team to digitise buildings in Tanzania from satellite imagery. By adding this data to OSM it is intended to help develop an off-grid energy sector, to complement the national grid, by allowing businesses to identify sites that could be served by mini-grids and solar home systems.
    • HOT published an article on the International Women’s Day 2019 with their highlights of gender diversity activity from the last 12 months.
    • sevendaysvt.com published an article about a humanitarian mapping event, co-hosted by the University of Vermont’s Humanitarian Mapping Club and Code for BTV, where attendees were introduced to OSM by mapping buildings in rural Africa.

    Maps

    switch2OSM

    • A German version of the NewsHereNow website/webapp is now available (English version). Simply click on the “Use my GPS location” button to find the physically nearest, non-chain (“local”) restaurants. If your favourite restaurant is missing, please make sure to add it to OpenStreetMap. For convenience, under the HERE/HIER tab is a button that takes you to OpenStreetMap centred at your nearest intersection.

    Open Data

    Software

    • The support for QGIS 2.18 LTR (Long Term Release) has ended. Users are asked to upgrade to QGIS 3.4 LTR.
    • Wille published a blog post that explains how to set up some tools to receive notifications from OSMCha via Telegram or e-mail. That way you will be notified whenever there are new changesets that match a filter you saved on OSMCha.
    • MapRoulette is now at version 3.2. The latest version of the micro-task platform for OSM features a new design, a new user dashboard, improved task completion flow, and much more. Martijn van Exel summarises the new features in his diary.
    • In his OSM diary Simon Poole celebrates the 10th anniversary of Vespucci and thanks all of the contributing developers. He also gives a preview of upcoming features such as multi-polygon support, turn-restrictions rendering, and a revamped style configuration.

    Programming

    • Amon Santana, from HeiGIT, reports about a number of interesting new features in the latest version of the interactive OpenRouteService API Playground.

    Releases

    • Version 3.3 of OsmAnd has been released. The team removed Facebook and Firebase analytics from the free version – the paid version OsmAnd+ already didn’t include them. The application now offers navigation for public transport, although still in a beta phase, recieved a redesigned directions menu and improved quick action menu, as well as the information about a route.

    Did you know …

    • … Anita Graser’s “Free and Open Source GIS Ramblings“? It is a series of discussions about GIS data analysis.
    • … about Mappics? This is an open source map based travel photos gallery, with automatic place and weather descriptions of the very moment the photos were taken.
    • … that there are shops that sell goods with as little packaging as possible? Customers can bring their own reusable packaging to reduce waste. OSM recognises this using the tag bulk_purchase = yes
    • … the overview of all existing OSM Working Groups?

    Other “geo” things

    • Gonzalo López published on NOSOLOSIG an article called “Geocoding of addresses: a pending subject”. (es) (automatic translation)
    • Sascha Fendrich, from HeiGIT, describes a study about semantic association rule learning using OSM data. He presents an example to explain the technique used and its potential for finding exceptions to general rules that can be used for data quality analysis.
    • Martijn van Exel explains how to use the new OpenStreetCam upload scripts.

    Upcoming Events

    Where What When Country
    Dresden FOSSGIS 2019 2019-03-13-2019-03-16 germany
    Berlin 129. Berlin-Brandenburg Stammtisch 2019-03-14 germany
    Chemnitz Chemnitzer Linux-Tage 2019 2019-03-16-2019-03-17 germany
    Kyoto 京都!街歩き!マッピングパーティ:第6回 善峯寺 2019-03-17 japan
    Taipei OSM x Wikidata #2 2019-03-18 taiwan
    Cologne Bonn Airport Bonner Stammtisch 2019-03-19 germany
    Nottingham East Midlands Pub meetup 2019-03-19 england
    London #geomob London 2019-03-19 england
    Scotland Edinburgh 2019-03-19 uk
    Salt Lake City SLC Map Night 2019-03-19 united states
    Lüneburg Lüneburger Mappertreffen 2019-03-19 germany
    Toulouse Rencontre mensuelle 2019-03-20 france
    Karlsruhe Stammtisch 2019-03-20 germany
    Nagoya 図書で調べて編集するオープンデータワークショップ 2019-03-21 japan
    Heidelberg Mannheimer Mapathons Bei der VHS-Heidelberg für “Wochen gegen Rassismus” des Interkulturellen Zentrums 2019-03-21 germany
    Greater Vancouver area Metrotown mappy Hour 2019-03-22 canada
    Tokyo ミャンマーに絵本と地図を届けよう~ミャンマーに届ける翻訳絵本作り&自由な世界地図作り~ 2019-03-23 japan
    Bremen Bremer Mappertreffen 2019-03-25 germany
    Joué-lès-Tours Rencontre Mensuelle 2019-03-25 france
    Graz Stammtisch Graz 2019-03-25 austria
    Portmarnock Erasmus+ EuYoutH_OSM Meeting 2019-03-25-2019-03-29 ireland
    Zurich Missing Maps Mapathon Zurich 2019-03-27 switzerland
    Montpellier Réunion mensuelle 2019-03-27 france
    Université libre de Bruxelles (ULB) National Mapathon 2019-03-27 belgium
    UCL Louvain-la-Neuve National Mapathon 2019-03-27 belgium
    KUL Leuven National Mapathon 2019-03-27 belgium
    UMONS Mons National Mapathon 2019-03-27 belgium
    Lübeck Lübecker Mappertreffen 2019-03-28 germany
    VUB Brussel National Mapathon 2019-03-28 belgium
    ULIEGE Liège National Mapathon 2019-03-28 belgium
    UNAMUR Namur National Mapathon 2019-03-28 belgium
    UGENT Gent National Mapathon 2019-03-28 belgium
    Düsseldorf Stammtisch 2019-03-29 germany
    UCL Louvain-la-Neuve National Mapathon 2019-03-30 belgium
    ULIEGE Liège National Mapathon 2019-03-30 belgium
    Bochum Mappt die Innenstadt – Mappingtag für Einsteiger*innen und Fortgeschrittene 2019-03-31 germany
    Montpellier State of the Map France 2019 2019-06-14-2019-06-16 france
    Angra do Heroísmo Erasmus+ EuYoutH_OSM Meeting 2019-06-24-2019-06-29 portugal
    Minneapolis State of the Map US 2019 2019-09-06-2019-09-08 united states
    Edinburgh FOSS4GUK 2019 2019-09-18-2019-09-21 united kingdom
    Heidelberg Erasmus+ EuYoutH_OSM Meeting 2019-09-18-2019-09-23 germany
    Heidelberg HOT Summit 2019 2019-09-19-2019-09-20 germany
    Heidelberg State of the Map 2019 (international conference) 2019-09-21-2019-09-23 germany
    Grand-Bassam State of the Map Africa 2019 2019-11-22-2019-11-24 ivory coast

    Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please check your event in our public calendar preview and correct it, where appropriate.

    This weeklyOSM was produced by Nakaner, Polyglot, Rogehm, SK53, SunCobalt, TheSwavu, YoViajo, derFred, k_zoar, kartonage, keithonearth.

    Sharing the sum of all knowledge is what we have always aimed for in our movement. In Commons we have realised a project that illustrates all Wikimedia projects and in Wikidata we have realised a project that links all Wikimedia projects and more.

    When we tell the world about the most popular articles in Wikipedia, it is important to realise that we do not inform what the most popular subjects are. We could, but so far we don't. The most popular subjects is the sum of all traffic of all Wikipedia articles on the same subject. Providing this data is feasible; it is a "big data" question.

    We do have accumulated data for the traffic of articles on all Wikipedias, we can link the articles to the Wikidata items. What follows is simple arithmetic. Powerful because it will show that English Wikipedia is less than fifty percent of all traffic. That will help make the existing bias for English Wikipedia and its subjects visible particularly because it will be possible to answer a question like: "What are the most popular subjects that do not have an article in English?" and compare those to popular diversity articles.

    In Wikidata we know about the subjects of all Wikipedias but it too is very much a project based on English. That is a pity when Wikidata is to be the tool that helps us find what subjects people are looking for that are missing in a Wikipedia. For some there is an extension to the search functionality that helps finding information. It uses Wikidata and it supports automated descriptions.

    Now consider that this tool is available on every Wikipedia. We would share more information.With some tinkering, we would know what is missing where. There are other opportunities; we could ask logged in users to help by adding labels for their language to improve Wikidata. When Wikidata does not include the missing information, we could ask them to add a Wikidata item and additional statements, a description to improve our search results.

    This data approach is based on the result of a process; the negative results of our own Search and it is based on active cooperation of our users. At the same time, we accumulate negative results of search where there has been no interaction, link it to Wikidata labels and gain an understanding of the relevance of these missing articles. This fits in nicely with the marketing approach to "what it is that people want to read in a Wikipedia".
    Thanks,
          GerardM

    Wikipedia’s article on the culture of India notes that the country contains “thousands of distinct and unique … languages, religion [and] customs [that] differ from place to place within the country.”

    To bring the vast knowledge of these many differing cultures to the world requires volunteers who both understand individual languages, religions and cultures of India and have high-quality sources at their fingertips to cite on Wikipedia. It’s the latter requirement that’s often challenging for local editors: high-quality academic articles are often behind paywalls, which make them difficult to obtain for editors who want to share their knowledge with the world.

    But now, the editing communities within India has one more resource for finding academic articles written by academics: the Wikipedia Library (TWL), which provides experienced Wikipedia editors with access to high-quality sources to use on the encyclopedia.

    Over the past year, TWL has been working with Indic-language Wikimedia communities to create new partnerships and run events. The Indian communities have adapted the TWL model, provides free access to reliable sources through partnerships with journals and publishers. This is a model that can be used by other communities that want to add additional sources to their language Wikipedia. We’ll start with how Wikimedia India collaborated with an academic journal to provide accounts for Wikipedia editors.

    How Hindi Wikipedia adapted TWL for their own uses

    In 2018, a branch of TWL was created on the Hindi Wikipedia—the first in any Indian language—by Shweta Yadav, a volunteer coordinator.

    A volunteer with Wikimedia India then worked with TWL to sign an agreement with the academic social sciences journal Economic and Political Weekly. The agreement, the first of its kind in India, distributed 10 free accounts to the journal for Wikipedians editing topics related to India. Wikipedia editors can use these accounts to access the journal’s latest issues as well as its archives dating back to the late 1940s. Additionally, EPW editorials are translated into seven Indian languages (Gujarati, Hindi, Tamil, Kannada, Telugu, Marathi, and Odia) and can be used as references on Wikimedia projects in those languages.

    A conference to share learnings across the sub-continent

    Once they successfully partnered with an academic journal, editors across the sub-continent began to discuss a conference focused on the future of reference support.

    TWLCon (short for The Wikipedia Library Conference) was conceptualised in early 2018 by Krishna Chaitanya Velaga, a volunteer Outreach Coordinator for TWL, as the first-ever national TWL convening. The conference, funded partly by a Wikimedia Foundation Rapid Grant and partly by CIS-A2K, brought over 20 people to Pune, India in January 2019 at Gokhale Institute of Politics and Economics (GIPE), one of the most prominent economics research institutes in India. There, members of the Indian Wikipedia community discussed ways to create additional partnerships and resources for their members, as well as ways to participate in  #1Lib1Ref campaign, where librarians are asked to add references to Wikipedia.

    What was initially conceived as just a brainstorming session evolved into a packed two-day conference featuring community members from over 10 Wikipedia languages. It created an open, collaborative and thematic space for editors to learn, teach, and strategize ideas on prospective TWL projects.

    “TWL is a unique and impressive project with a lot of scope in [the] future,” says Rupika Sharma, one of TWLCon’s participants. “If developed proactively, this can be [a] turning point in bringing libraries and Wikimedia to collaborate for fruitful partnerships in the coming time. The paid journals collaborating with TWL is the ongoing success that can be made even more successful by encouraging local Indic language Wikipedia to improve their content.”

    Shweta Yadav, another TWLCon participant said, “Citations are the backbone of Wikipedia, and libraries/librarians can play a major role in providing the resources and expertise. Therefore, we need to introduce more and more librarians to Wikipedia and steps like [the] resource exchange, digitization of libraries and #1Lib1Ref not only help improve Wikipedia but also attract librarians towards editing.”

    One of the main takeaways of TWLCon was the significance of having a regional gathering where participants could mutually understand and reflect on the needs of their own community, and adapt their plans accordingly.

    The Indian communities are now moving ahead with their next projects, including the creation of a Punjabi and a Bengali TWL branch. There is a lot to look forward to.

    Aaron Vasanth, Global Coordinator, The Wikipedia Library, Community Programs
    Sam Walton, Program Officer, The Wikipedia Library, Community Programs

    If you’re interested in creating a local Wikipedia Library branch or have any questions about The Wikipedia Library, feel free to get in touch with the team at wikipedialibrary@wikimedia.org.

    This blog post has been updated to clarify the history of TWL and the Hindi Wikipedia. 

    Per Princeton instructor and African American Studies scholar Dr. Wendy Belcher, “African thought continues to be marginalized, even though radical black intellectuals have shaped a number of social movements and global intellectual history. African youths are innovating new models that are revolutionizing the sciences, law, social and visual media, fashion, etc.” She has taught multiple courses on African literature and in the fall of 2018, taught a class on radical African thought and revolutionary youth culture, where she tasked students with creating Wikipedia articles. This post looks at three articles created by these students.

    Emmanuel Blayo Wakhweya was an Ugandan politician and economist who served as the Ugandan Minister of Finance under Idi Amin from 1971 until his high profile defection in London in 1975. Born and educated in Uganda, Wakhweya became a district administrative officer after completing his Master’s at Makerere University. He then became the Assistant Secretary to the Treasury in the Milton Obote administration, where his abilities caused him to rise in the ranks until he was appointed the Minister of the Treasury (1969) and then the Minister of Finance after Idi Amin’s 1971 Ugandan coup d’état. Amin wanted Wakhweya to help fix Uganda’s crumbling economy, which was now facing several complications such as high state spending. This chaos would result in Wakhweya defecting about four years later in early 1975, with him stating that he “can’t imagine how the ordinary people are still able to carry on because of the shortages of the simplest essentials of life and the soaring cost of living. Uganda is facing economic catastrophe. Either the economic forces will compel Amin to change his policies or there will be an explosion in the country because of popular discontent.” These actions were denounced in Uganda and any of his relatives that remained in Uganda were imprisoned, tortured, and killed. Now exiled from Uganda, Wakhweya moved to the United States, where he worked for the World Bank and the United Nations Economic Commission for Africa. After he retired, Wakhweya was able to return to Uganda, where he died in 2001.

    Another new article that students created was one on the short lived French literary journal Tropiques, which was founded by Martinican intellectuals such as Aimé and Suzanne Césaire. Published in Martinique from 1941 to 1945, this journal’s issues contained poetry, essays, and fiction and due to the contributions of surrealist André Breton, became a leading voice of surrealism in the Caribbean. Other topics discussed in Tropiques included colonialism, a vital topic for both its readers and writers, as Martinique was controlled by the French State during this time. This Vichy-supported government tried to shut down the journal and even tried to deny it paper for publishing. This did not work, as Tropiques simply resumed publication once the Free French arrived and even released a double issue to make up for the prior censored publication.

    Finally, student movements in Uganda are a vital part of the country’s history and culture. Spanning back to the 1930s, students have fought against various forms of oppression and injustice that includes colonial rule and most recently, a tax on social media that critics state is aimed at silencing protesters. Students and instructors at Makerere University have often participated in these protests and in 2016 the university was closed in response to a student and instructor led strike over finance concerns, budget cuts, and tensions over the Kasese Massacre.

    For some, Wikipedia is the easiest way to learn about a new concept or topic, which is why contributions by students and instructors using the site as an educational tool can make such a big difference in the world. If you would like to include Wikipedia writing as a learning tool with your class, visit teach.wikiedu.org to find out how you can gain access to tools, online trainings, and printed materials.


    ImageFile:Makerere University, Main Administration Block(main building).JPGEric Lubega and Elias Tuheretze, CC BY-SA 3.0, via Wikimedia Commons.

    Work progresses on CI tool evaluation

    15:13, Thursday, 14 2019 March UTC

    The working group to consider future tooling for continuous integration is making progress (see previous blog post J148 for more information). We're looking at and evaluating alternatives and learning of new needs within WMF.

    If you have CI needs that are not covered by building from git in a Linux container, we would like to hear from you. For example, building iOS applications is difficult without a Mac/OS X build worker, so we're looking into what we can do to provide that. What else is needed?

    We're currently aiming to make CI much more "self-serve" so that as much as possible can be done by developers themselves, without having to go via or through the Release Engineering team.

    Our list of candidates include systems that are not open source or are "open core" (open source, but with optional proprietary parts). We will be self-hosting, and open source is going to be a hard requirement. "Open core" may be an acceptable compromise for a system that is otherwise very good. We want to look at all alternatives, however, so that we know what's out there and what's possible.

    We track our work in Phabricator, ticket T217325.

    2019: Next steps

    13:29, Thursday, 14 2019 March UTC

    We’re excited to announce that our recent Project Grant proposal has been approved. 🙂 This means there will be lots of improvements coming up in 2019, with focus on improving stability and the upload experience for users.

    Our first priority will be rewriting the legacy backend code to adhere to modern standards and reduce complexity (especially the network layer, which currently uses a deprecated API). This is aimed at resolving a few major lingering bugs (especially upload failures for a few users), as well as creating a solid technical foundation to base future improvements on. Several new features are slated for release after that, including filters and bookmarks for the “Nearby places that needs pictures” feature, a pause and resume function for uploads, and a “limited connection” mode.

    Thank you so much to everyone who has supported us thus far, especially in the last rocky year! 🙂 At the conclusion of this grant, we hope to deliver a much better app to you.

    In Fall 2018, Dr Rebecca Barnes of Colorado College began asking her environmental science students to write Wikipedia pages for women scientists. In response, her students have risen to the occasion, producing a total of 52 new biographies for women in a wide variety of STEM fields since then.

    “I am excited for my students to learn about the myriad of paths scientists take in their lives – what they study – how they have contributed to their field – why they got interested in science and what led them to becoming a scientist in the first place,” wrote Dr. Barnes at the inception of the project.

    She even reached out to her following on Twitter to brainstorm a list of scientists in need of a biography on the fifth most visited site on the internet – leading to a collaborative Google Doc documenting the work that needs to be done.

    Biographies students wrote

    Every student from her most recent course this Spring created brand new articles for women scientists.

    Erika Marín-Spiotta is a biogeochemist and ecosystem ecologist. … She is best-known for her research of the terrestrial carbon cycle and is an advocate for underrepresented groups in the sciences, specifically women.”

    Claudia Benitez-Nelson is an Associate Dean in the College of Arts and Science and professor in the School of the Earth, Ocean & Environment at the University of South Carolina. Her research is in chemical oceanography and marine biogeochemistry.”

    Jill S. Baron is an American ecosystem ecologist specializing in studying the effects of atmospheric nitrogen deposition in mountain ecosystems.”

    Sharon J. Hall is an ecosystem ecologist and Associate Professor at the School of Life Sciences at Arizona State University. Her research focuses on ecosystem ecology and the ways that human activity interacts with the environment.

    From left: Erika Marín-Spiotta, Claudia Benitez-Nelson, Jill S. Baron, Sharon J. Hall.

    …to name a few!

    Why is this important?

    Dr. Barnes was initially inspired to pursue this project after reading about Dr. Jess Wade’s Wikipedia work (Dr. Wade wrote 270 biographies of women in science in one year!) featured in the Guardian. Dr. Barnes had been following her on Twitter, along with her other “Twitter inspiration” (as she calls her) Dr. Maryam Zaringhalam – both of whom have been fierce advocates for inclusion in science.

    “I thought – I can do this. Better yet, I work at a liberal arts college – my students can also do this! Why? …to increase students sense of belonging and to pay it forward.”

    Dr. Jess Wade and Dr. Maryam Zaringhalam even spoke to Dr. Barnes’ first class via Skype. They “discussed why on earth we are doing this and why they have dedicated many of their precious hours to writing Wikipedia biographies of women in STEM,” wrote Dr. Barnes.

    Dr. Jess Wade and Dr. Maryam Zaringhalam talking Dr. Barnes’ class over Skype .
    ImageFile:Dr. Jess Wade and Dr. Maryam Zaringhalam join a Colorado Class writing biographies of Women in STEM.jpgWaterbarnes, CC BY-SA 4.0, via Wikimedia Commons.

    Having students write Wikipedia biographies for women in STEM not only demonstrates to them that diversity and inclusion belongs in STEM, it asserts that to the world. “Research illustrates that a sense of belonging is critical to success (e.g. Dennehy & Dasgupta 2017 PNAS),” writes Dr. Barnes. And less than 18% of biographies on Wikipedia are about women. If the fifth most visited site on the internet reflects the world’s gendered biases, what effects will that continue to have on the future of women in STEM? How we can be catalysts for change?

    What do students think?

    Recognizing the accomplishments of women in STEM shows young people what career paths are open to them. It challenges stereotypes about what a scientist looks like. And it shows young women what’s possible. Dr. Barnes’ students say that the exercise “humanizes science” for them.

    “Gender, race, class – they affect every bit of our society, including science; this is my attempt at explicitly including an ongoing discussion of these topics in my courses,” says Dr. Barnes.

    And some of these new Wikipedia biographies have already been viewed more than 100 times. Keep up the great work!


    Interested in teaching with Wikipedia in your own classroom? Visit teach.wikiedu.org for everything you need to know.


    Interested in learning how to write Wikipedia yourself? Visit learn.wikiedu.org for more information about our online courses.


    Header image: Dr. Barnes, rights reserved.

    Wiki Education publishes program evaluation update

    21:39, Tuesday, 12 2019 March UTC

    In early 2018, Wiki Education piloted a new program, which we called Wikipedia Fellows, and which we’ve since re-launched as Wiki Scholars & Scientists. In the program, we empowered subject matter experts to contribute their knowledge to Wikipedia through a structured online synchronous 12-week course. We ran one course in early 2018, after which we published an evaluation report. In the intervening months, we’ve run 11 additional courses, and today, we’ve published an update to our evaluation report documenting our additional learnings.

    We ran six courses in summer 2018 and five courses in fall 2018, all told training an additional 163 subject matter experts to contribute content to Wikipedia. Participants added 265,000 words to 572 articles, including creating 65 new articles, and demonstrated the impact that bringing academic expertise to Wikipedia can have. This program tackles article topics that are otherwise hard for existing Wikipedians — who may not have the subject matter expertise — or new academics — who may not have the Wikipedia knowledge — to improve. Examples improved through the program include:

    • Feminist poetry is a new article created by a program participant. Creating a new article on a broad topic like this requires a broad understanding of the subject-matter, which is the kind of thing an expert can provide.
    • The hometown association article was heavily tagged and poorly organized. It was the kind of article that accretes content over time, but lacks coherence. A program participant was able to put the pieces together and give it the coherence it was missing.
    • Bette Korber‘s biography, which was created by a program participant, successfully captures her achievements and puts them in the proper context. Again, it’s easy to write a biography as a series of events that give little sense of the importance of their work. It’s harder to put that in context, and show the most important aspects of their professional achievements. It’s the kind of thing an expert, who understands the importance and can contextualize it, is better-equipped to do than someone with less breadth and depth of understanding.

    That’s why we think this program is so meaningful to improving Wikipedia content — and why we hope other Wikimedia organizations will adopt similar programs in their language Wikipedias. In order to facilitate this happening, we’ve published an update to our pilot evaluation, with more details on what we learned in the next 11 cohorts, and an explanation of where we are taking the program in the future. The report was written by Program Managers Ryan McGrady and Will Kent, with input from Senior Wikipedia Expert Ian Ramjohn and Director of Partnerships Jami Mathewson. We firmly believe detailed evaluation reports like this are an important part of participating in the Wikimedia movement, and we are committed to continually documenting our learnings for the benefit of others in the Wikimedia movement. I’ll be speaking about this program at the forthcoming Wikimedia+Education Conference in Donostia, in an effort to encourage more Wikimedia education groups to adopt our model.

    Emilee Helm is a student at the University of Washington. This term, she learned how to create and expand articles on Wikipedia as an assignment in Nathan TeBlunthuis’ Interpersonal Media – Online Communities course. Here, she reflects on what she got out of the experience.

    University of Washington student Emilee Helm.
    Image: File:Headshot-Helm E.jpg, Emileehelm, CC BY-SA 4.0, via Wikimedia Commons.

    When I began working with Wikipedia, I could not have imagined I would be so satisfied with my experience. Considering the website has been in existence nearly my whole life, I have known about Wikipedia for as long as I can remember, utilizing it often. However, the thought of contributing to the site never crossed my mind until this quarter. It was an experience that seemed beyond me—the interface looks old, and something as simple as having to know basic code felt frightening. It was intimidating and felt risky; I did not want to mess it up, or worse, have my ideas rejected by the tight-knit community. With all the tools and assistance provided, I was able to gain confidence and develop a final product that I am undoubtedly proud of.

    Part of Wikipedia’s charm is the ability to write about nearly any topic imaginable. I chose to explore topics in online gaming, creating an article for a popular Twitch.tv streamer. Online gaming and its community are dear to me, and to be able to write about a figure in that realm was really exciting. By finding this intrinsic motivator, my urge to develop a polished article became much stronger than I ever thought it could.

    The Wiki Education tutorials and trainings were pivotal to my success, working to bridge the gap between the confusing Wikipedia interface and the creative process of writing an article. This training provided the opportunity to practice contributing without the fear of ruining an article. These low-risk exercises encouraged me to keep going and allowed me to build confidence in using the system. However, not everyone has the privilege of a university course to provide additional structure to the trainings. As Wikipedia continues to push for more editors, they may benefit from a program like Wiki Education for new users: A place with a clean, easy-to-use interface that allows users to learn community expectations, test their skills and immerse themselves into the community at whatever pace they see fit.

    In my classroom, we spent a lot of time deciphering the many motivations that help retain members of a community and the idea of intrinsic motivation was a factor present in nearly all of them. As I continued to work, I began to embody these intrinsic philosophies. The task at hand became less about the required course work and more about a topic and community that I cared about. We additionally discussed “fun” as being one of the biggest motivators to drive participation in online communities. Perhaps Wikipedia could capitalize on “fun” by helping new users get acquainted with articles, people and community projects that align with their interests from the start. This could be done through a simple survey upon registration, for example. The intrinsic motivations of potential editors are challenged by the user-experience. I think a process such as this would instill a sense of purpose in users from the beginning.

    While Wikipedia has its own communities nested within the broader website, many of these groups are inactive. I believe participation would benefit from a more user-friendly interface. By removing the need for HTML to utilize talk pages and respond to ideas and issues, new users may feel better equipped to participate in a conversations and ask questions. I enjoyed using the Wiki Education Dashboard because it was pleasing to look at and easy to follow; it is hard to say the same about Wikipedia.

    Throughout this experience, I learned the importance of notability, sourcing, and producing quality content. Wikipedia is much more than a place where anyone can write anything they want on an article. Rather, Wikipedia is a space for knowledge to be shared and refined. My perceptions shifted as I was challenged by my instructor and the Wikipedia Expert supporting my course to establish notability more clearly in my article. It takes more than popularity to establish noteworthiness — there has to be evidence of impact by the individual. Wikipedia’s standards aim to meet the criteria of an encyclopedic text, providing reputable information for its visitors.

    After my experience, I plan to contribute to places to which I feel drawn to, to make simple edits in passing, and to provide feedback if I feel compelled. This new understanding of what goes on behind the scenes of each article has completely altered my perspective of what Wikipedia’s mission is. Furthermore, I have been equipped with the proper tools to help. My hope for Wikipedia is that they continue to make improvements that facilitate participation through a user-friendly experience.


    Interested in incorporating a Wikipedia writing assignment into your own course? Visit teach.wikiedu.org for all you need to know to get started.

    The anatomy of search: A place for my stuff

    16:00, Tuesday, 12 2019 March UTC

    A galloping overview

    As we have done before, let’s get a bird’s-eye view of the parts of the search process: text comes in and gets processed and stored in a database (called an index); a user submits a query; documents that match the query are retrieved from the index, ranked based on how well they match the query, and presented to the user. That sounds easy enough, but each step hides a wealth of detail. Today we’ll focus on the step where “text is stored in an index”.

    Also keep in mind that, as discussed in the first installment in this series, humans and computers have very different strengths, and what is easy for one can be incredibly hard for the other.

    A place for my stuff

    In previous installments, we’ve looked at how we break text into tokens (which are approximately words,[1] in English), normalize them (by converting to lowercase, removing diacritics, and more), removing stop words, and stemming the remainder (converting them to their root forms). Now that we have our normalized stemmed words, where do we put them? In an inverted index!

    An inverted index is a more complex, database-like version of the kind of index you find in the back of a book or—as more mature readers may recall using at some point—a card catalog. At a minimum, we want to record where each word occurred, so we can find it again easily.

    Depending on the search features you want to support, you might just record the document ID[2] for each occurrence of a word, though typically the word number (roughly, “it’s the 354th word”) is recorded for more complex searching, and the character offset (“it spans characters 93 through 98”) is also recorded to facilitate, for example, highlighting in search results.

    Let’s take a look at how this works with a couple of sample documents.

    • Document 1: “A dog chased the cats.”
    • Document 2: “The cat chases the mice.”

    After tokenizing, normalizing (here, just lowercasing), removing stop words (a, and the), and stemming (chased/chaseschase, catscat, micemouse), we could have the following inverted index, where “D1:W2” means “in document #1, this is word #2”:

    • cat: D1:W5, D2:W2
    • chase: D1:W3, D2:W3
    • dog: D1:W2
    • mouse: D2:W5

    The words in the index are not necessarily stored in alphabetical order as shown here, or like the index in the back of a book would be, but often use more complex data structures to maximize lookup speed. Really, that’s pretty much it for the basics of building an index, because all the hard work was figuring out what words to put in the index—by tokenizing, normalizing, dropping stop words, stemming, etc.

    The first page of the index of Martino Martini’s Novus Atlas Sinensis, published in 1655.

    I want more, more, more

    As mentioned above, we might want to record more information than just the document ID and word position. Character offset—e.g., dog occupies the 3rd through 5th characters[4] of Document 1 above—can be used for highlighting search terms in text snippets. Character offsets are especially useful when your original word may not be the exact word that was indexed, as is the case with stemmed words and thesaurus terms. For example, our mini index above says that cat is word 5 in document 1, but the actual word in the original text is cats, so knowing its offset (18–21) makes it easier to highlight. Also, if you have indexed lawyer when your original word was attorney, you are really going to need those offsets to know where it occurred in the original text.

    An index can also keep track of additional positional information, like which sentence, paragraph, or section a word was found in, which could be useful for scoring, though nowadays search engines often don’t bother with sentence- or paragraph-level tracking, and as a result you will sometimes see the last word of one sentence and the first word of the next highlighted.[5] That doesn’t usually indicate the same quality of match as two adjacent words in the same sentence, but it’s a shortcut that works well enough most of the time. Search on Wikipedia does, however, give extra weight during scoring to matches found in the title or in the opening text of an article.

    Depending on the use cases you want or need to support there are several ways of keeping track of these different kinds of information. You could annotate a word in the index to indicate that it’s a stemmed word or thesaurus word (and not the original word), which might then be used during scoring and ranking. You could also annotate structural information—like the fact that a word came from the title of an article—or you could incorporate that information into the term itself, indexing something analogous to title:dog to keep track of instances of dog that appear in titles. Another option would be to have a completely separate index just for titles. As with any software, different use cases can point toward different architectures. A title-only index is going to be smaller and may return results faster, but it might also make it harder to combine title-based results with results from another index that contains everything but the title.

    Despite the potential complications, it is often worthwhile to have multiple indexes because they can allow you to search in very different ways.

    Wisdom of the crowd

    One of the problems with stop words is that, albeit rarely, important phrases can be made up entirely of stop words, like the opening words of Hamlet’s famous soliloquy: “To be, or not to be”. It’s unusual for words that usually carry little specific meaning to end up being so weighty!

    For on-wiki search we solve the stop word problem by having two indexes, which we call “text” and “plain”. All of the processing we’ve talked about so far is applied to the text field, while the plain field only undergoes tokenization and normalization—no stemming is applied and no stop words are removed—so to be or not to be is readily found in the plain index.

    In the more typical case, we combine the scoring results from matches in the text field and matches in the plain field to rank the final results. So while the searches hope and change, and hoping changing—both of which are reduced to hope change in the text field—get the same number of results on English Wikipedia (85,736 at the time of writing), the ordering of the results is different. “Hope and Change” is a redirect to the article on the “Barack Obama 2008 presidential campaign”, so matching that exact phrasing pulls it to the top of the results. The song “Everybody’s Changing” (from the album Hopes and Fears) jumps from somewhere between #1400 and #1500 for hope and change to #2 for hoping changing, in large part based on the exact match to one of the words in the title.

    We have other indexes that support search for other forms of the text found in the various wikis. The keyword insource: searches the raw wikitext version of an article, which allows you to match not only markup (like template names and specific tags) but also “hidden” text—like text in links or comments—that a reader doesn’t normally see, but which an editor might find quite interesting. The keyword intitle: is similar, but only applies to titles and redirects. Both intitle: and insource: support regular expression searches, too, which use specialty indexes that allow you to search for really complex patterns.[6] There are keywords that search indexes for category tags (incategory:) and templates (hastemplate:) found in each article as well.

    Of course there are also indexes for different collections of documents (as opposed to multiple indexes of the same documents), such as for each namespace (User, Talk, Help, etc., etc.)

    Further reading / Homework

    If you can’t wait for next time, I put together a video back in January of 2018, available on Commons, that covers the Bare-Bones Basics of Full-Text Search. It starts with no prerequisites, and covers tokenization and stemming, inverted indexes, basic boolean and proximity retrieval operations, TF/IDF and the vector space model of similarity, field-level indexing, using multiple indexes, and then touches on some of the elements of scoring.

    Up next

    In my next blog post, we’ll look at query processing and basic boolean retrieval operations.

    Trey Jones, Senior Software Engineer, Search Platform
    Wikimedia Foundation

    • • •

    Footnotes

    1. In most texts in English and other European languages, most tokens are words, and most words are tokens, so they two terms get used interchangeably. I tend to use “words” rather than “tokens” because the term is more familiar to most readers, but in the context of searching and indexing, I really mean “tokens”—see the blog post “A token of my affection” for more details on tokenization and why it’s harder than it looks.

    2. Document IDs are often just numbers, assigned as the documents are added to the index. Wikipedia and other wiki projects work that way, so the ID for the article on inverted indexes is 3125116. Using increasing numbers is convenient because the IDs are guaranteed to be unique. Titles or any other attribute may not be guaranteed unique—Wikipedia article titles happen to be unique, but in another data set—like general web pages—they need not be. Using increasing numbers also means that some numbers may be unassigned, because the corresponding document has been deleted. So, English Wikipedia doesn’t have an article with ID #1. At the moment, the smallest ID still in use is 10, for AccessibleComputing,[3] which is now a redirect to Computer accessibility.

    3. Note that “AccessibleComputing” is in CamelCase, which was used in the very early days of wikis to indicate links to other topics. Thanks to the fact that we still have the complete edit history of the AccessibleComputing page, we can see the first version of it, which does seem to have used CamelCase for links, including “LegalIssuesInAccessibleComputing”—which makes it clear why CamelCase was ditched in favor of more explicit linking syntax.

    4. Readers who are detail-oriented may have counted to make sure I got my numbers right. Readers who are programmers will complain that the correct span is “2nd through 4th” because counting starts at 0 in many programming languages. In fact, that may be what actually happens internally in some/many/most search engines[citation needed]; it’s what our underlying search engine, Elasticsearch, does.

    The practice stems from the computational stone age—when computers were literally millions of times slower than today—and even “subtract one” was a moderately costly operation. Since the first item in a list is right there at the beginning of the array, you don’t need to do anything to get to it. To get to the second item, you just have to skip the first. To get to the 15th, you have to skip 14. Back in the old days, it was worth it to instead tell the computer that the beginning of the list starts at 0, so you skip 0 things to get to it, the next one is #1, and you skip 1 to get to it. To get to #14, you skip 14, etc. Today, it may seem kind of silly—but it’s tradition!

    5. As an example, if you search for English Wikipedia for snails dominated, you might be hoping to find an article about a time where snails dominated something or other. Instead you get this snippet as the top result: “in the markets and its surrounding streets, from car parts to land snails. Dominated by women traders, the market sells fresh produce, manufactured and”. To be fair, I had to go looking for an example that would behave like this, but they do crop up from time to time in the wild, on-wiki and in other search engines.

    6. The details of regular expressions and regular expression searching are a bit beyond the scope of this blog post, but if you are familiar with the basics of regular expressions, the specialty index we use is pretty neat! For really complex patterns, there’s nothing to do but skim through all the text looking for spans of text that match the pattern. This is very inefficient and on big wikis like the larger Wikipedias, it will time out and not return full results. One trick we use is a trigram index, which tokenizes documents into overlapping three-character sequences and indexes those. We parse regex queries and try to find trigrams in them that we can use to limit the scope of what has to be inefficiently scanned. For example, descendant and descendent are both relatively common, so you might try to search for terms related to both on Wiktionary with the query intitle:/descend[ae]nts/. The search engine first finds documents that have all of these trigrams in the title: des, esc, sce, cen, end, and nts, and then scans just those for the full pattern—it’s a lot less work that way!

    Of course, regexes over titles is faster and easier than regexes over the full text of Wikipedia articles, just because there is so much less text to scan when limited to titles. And the trigram trick can’t help you if you use a crazy regex like /[Ѐ-ԯA-Za-zÀ-ɏɐ-ʯ]*([Ѐ-ԯ][A-Za-zÀ-ɏɐ-ʯ]|[A-Za-zÀ-ɏɐ-ʯ][Ѐ-ԯ])[Ѐ-ԯA-Za-zÀ-ɏɐ-ʯ]*/, which I sometimes do in my volunteer capacity, to find and fix errant Latin/Cyrillic homoglyphs. Everybody needs a hobby.

    Dealing with the Rust

    15:52, Tuesday, 12 2019 March UTC

    Rust is a up-and-coming programming language, developed by the Mozilla Foundation, and is used in the Firefox rendering engine, as well as the Node Package Manager, amongst others. There is a lot to say about Rust; suffice it to say that it’s designed to be memory-safe, fast (think: C or better), it can compile to WebAssembly, and has been voted “most loved language” on StackOverflow in 2017 and 2018. As far as new-ish languages go, this is one to keep an eye on.

    Rust comes with a fantastic package manager, Cargo, which, by default, uses crates (aka libraries) from crates.io, the central repository for Rust code. As part of my personal and professional attempt at grasping Rust (it has a bit of a learning curve), I wrote a tiny crate to access the API of MediaWiki installations. Right now, it can

    • run the usual API queries, and return the result as JSON (using the Rust serde_json crate)
    • optionally, continue queries for which there are more results available, and merge all the results into a single JSON result
    • log in your bot, and edit

    This represents a bare-bones approach, but it would already be quite useful for simple command-line tools and bots. Because Rust compiles into stand-alone binaries, such tools are easy to run; no issues with Composer and PHP versions, node.js version/dependency hell, Python virtualenvs, etc.

    The next functionality to implement might include OAuth logins, and a SPARQL query wrapper.

    If you know Rust, and would like to play, the crate is called mediawiki, and the initial version is 0.1.0. The repository has two code examples to get you started. Ideas, improvements, and Pull Requests are always welcome!

    Update: A basic SPARQL query functionality is now in the repo (not the crate – yet!)

    Associate Professor Dr. Tamar Carroll and Librarian Lara Nicosia use our resources to teach students at Rochester Institute of Technology how to edit and create new Wikipedia pages related to women’s and gender history. Here they reflect on why having students improve the living, public archive is so important. 

    Movements like #MeToo are drawing increased attention to the systemic discrimination facing women in a range of professional fields, from Hollywood and journalism to banking and government.

    Discrimination is also a problem on user-driven sites like Wikipedia. Wikipedia is the fifth most popular website worldwide. In January, the English-language version of the online encyclopedia had over 8.2 billion page views, more than 2000 percent higher than other online reference sites such as IMDb or Dictionary.com.

    The volume of traffic on Wikipedia’s site – coupled with its integration into search results and digital assistants like Alexa and Siri – makes Wikipedia the predominant source of information on the web. YouTube even started including Wikipedia links below videos on highly contested topics. But studies show that Wikipedia underrepresents content on women.

    At the Rochester Institute of Technology, we’re taking steps to empower our students and our global community to address issues of gender bias on Wikipedia.

    Signs of bias

    Driven by a cohort of over 33 million volunteer editors, Wikipedia’s content can change in almost real time. That makes it a prime resource for current events, popular culture, sports and other evolving topics.

    But relying on volunteers leads to systemic biases – both in content creation and improvement. A 2013 study estimated that women only accounted for 16.1 percent of Wikipedia’s total editor base. Wikipedia co-founder Jimmy Wales believes that number has not changed much since then, despite several organized efforts.

    If women don’t actively edit Wikipedia at the same rate as men, topics of interest to women are at risk of receiving disproportionately low coverage. One study found that Wikipedia’s coverage of women was more comprehensive than Encyclopedia Britannica online, but entries on women still constituted less than 30 percent of biographical coverage. Entries on women also more frequently link to entries on men than vice-versa and are more likely to include information on romantic relationships and family roles.

    What’s more, Wikipedia’s policies state that all content must be “attributable to a reliable, published source.” Since women throughout history have been less represented in published literature than men, it can be challenging to find reliable published sources on women.

    An obituary in a paper of record is often a criterion for inclusion as a biographical entry in Wikipedia. So it should be no surprise that women are underrepresented as subjects in this vast online encyclopedia. As The New York Times itself noted, its obituaries since 1851 “have been dominated by white men” – an oversight the paper now hopes to address through its “Overlooked” series.

    Categorization can also be an issue. In 2013, a New York Times op-ed revealed that some editors had moved women’s entries from gender-neutral categories (e.g., “American novelists”) to gender-focused subcategories (e.g., “American women novelists”).

    Wikipedia is not the only online resource that suffers from such biases. The user-contributed online mapping service OpenStreetMap is also more heavily edited by men. On GitHub, an online development platform, women’s contributions have a higher acceptance rate than men, but a study showed that the rate drops noticeably when the contributor could be identified as a woman through their username or profile image.

    Gender bias is also an ongoing issue in content development and search algorithms. Google Translate has been shown to overuse masculine pronouns and, for a time, LinkedIn recommended men’s names in search results when users searched for a woman.

    What can be done?

    The solution to systemic biases that plague the web remains unclear. But libraries, museums, individual editors and the Wikimedia Foundation itself continue to make efforts to improve gender representation on sites such as Wikipedia.

    Organized edit-a-thons can create a community around editing and developing underrepresented content. Edit-a-thons aim to increase the number of active female editors on Wikipedia, while empowering participants to edit entries on women during the event and into the future.

    Later this month, our university library will host its second annual Women on Wikipedia Edit-a-thon in celebration of Women’s History Month. The goal is to improve the content on at least 100 women in one afternoon.

    For the past four years, students in our school’s American Women’s and Gender History course have worked to create new or substantially edit existing Wikipedia entries about women. One student created an entry on deaf-blind pioneer Geraldine Lawhorn, while another added roughly 1,500 words to jazz artist Blanche Calloway’s entry.

    This class was supported by Wiki Education, which encourages educators and students to contribute to Wikipedia in academic settings.

    Through this assignment, students can immediately see how their efforts contribute to the larger conversation around women’s history topics. One student said that it was “the most meaningful assignment she had” as an undergraduate.

    Other efforts to address gender bias on Wikipedia include Wikipedia’s Inspire Campaign; organized editing communities such as Women in Red and Wikipedia’s Teahouse; and the National Science Foundation’s Collaborative Research grant.

    Wikipedia’s dependence on volunteer editors has resulted in several systemic issues, but it also offers an opportunity for self-correction. Organized efforts help to give voice to women previously ignored by other resources.


    This article is republished from The Conversation under a Creative Commons license. Read the original article from March 2018 here: http://theconversation.com/why-wikipedia-often-overlooks-stories-of-women-in-history-92555.

    Creating a Dockerfile for the Wikibase Registry

    14:51, Monday, 11 2019 March UTC

    Currently the Wikibase Registry(setup post) is deployed using the shoehorning approach described in one of my earlier posts. After continued discussion on the Wikibase User Group Telegram chat about different setups and upgrade woes I have decided to convert the Wikibase Registry to use the prefered approach of a custom Dockerfile building a layer on top of one of the wikibase images.

    I recently updated updated the Wikibase registry from Mediawiki version 1.30 to 1.31 and described the process in a recent post, so if you want to see what the current setup and docker-compose file looks like, head there.

    As a summary the Wikibase Registry uses:

    • The wikibase/wikibase:1.31-bundle image from docker hub
    • Mediawiki extensions:
      • ConfirmEdit
      • Nuke

    Creating the Dockerfile

    Our Dockerfile will likely end up looking vaugly similar to the wikibase base and bundle docker files, with a fetching stage, possible composer stage and final wikibase stage, but we won’t have to do anything that is already done in the base image.

    FROM ubuntu:xenial as fetcher
    # TODO add logic
    FROM composer as composer
    # TODO add logic
    FROM wikibase/wikibase:1.31-bundle
    # TODO add logic

    Fetching stage

    Modifying the logic that is used in the wikibase Dockerfile the extra Wikibase Registry extensions can be fetched and extracted.

    Note that I am using the convenience script for fetching Mediawiki extensions from the wikibase-docker git repo matching the version of Mediawiki I will be deploying.

    FROM ubuntu:xenial as fetcher
    
    RUN apt-get update &amp;&amp; \
        apt-get install --yes --no-install-recommends unzip=6.* jq=1.* curl=7.* ca-certificates=201* &amp;&amp; \
        apt-get clean &amp;&amp; rm -rf /var/lib/apt/lists/*
    
    ADD https://raw.githubusercontent.com/wmde/wikibase-docker/master/wikibase/1.31/bundle/download-extension.sh /download-extension.sh
    
    RUN bash download-extension.sh ConfirmEdit;\
    bash download-extension.sh Nuke;\
    tar xzf ConfirmEdit.tar.gz;\
    tar xzf Nuke.tar.gz

    Composer stage

    None of these extensions require a composer install, so there will be no composer step in this example. If Nuke for example required a composer install, the stage would look like this.

    FROM composer as composer
    COPY --from=fetcher /Nuke /Nuke
    WORKDIR /Nuke
    RUN composer install --no-dev

    Wikibase stage

    The Wikibase stage needs to pull in the two fetched extensions and make any other modifications to the resulting image.

    In my previous post I overwrote the entrypoint to something much simpler removing logic to do with ElasticSearch that the Registry is not currently using. In my Dockerfile I have simplified this even further inlining the creation of a simple 5 line entrypoint, overwriting what was provided by the wikibase image.

    I have left the default LocalSettings.php in the image for now, and I will continue to override this with a docker-compose.yml volume mount over the file. This avoid the need to rebuild the image when all you want to do is tweak a setting.

    FROM wikibase/wikibase:1.31-bundle
    
    COPY --from=fetcher /ConfirmEdit /var/www/html/extensions/ConfirmEdit
    COPY --from=fetcher /Nuke /var/www/html/extensions/Nuke
    
    RUN echo $'#!/bin/bash\n\
    set -eu\n\
    /wait-for-it.sh $DB_SERVER -t 120\n\
    sleep 1\n\
    /wait-for-it.sh $DB_SERVER -t 120\n\
    docker-php-entrypoint apache2-foreground\n\
    ' > /entrypoint.sh

    If the composer stage was used to run a composer command on something that was fetched then you would likely need to COPY that extension –from the composer layer rather than the fetcher layer.

    Building the image

    I’m going to build the image on the same server that the Wikibase Registry is running on, as this is the simplest option. More complicated options could involve building in some Continuous Integration pipeline and publishing to an image registry such as Docker Hub.

    I chose the descriptive name “Dockerfile.wikibase.1.31-bundle” and saved the file alongside my docker-compose.yml file.

    There are multiple approaches that could now be used to build and deploy the image.

    1. I could add a build configuration to my docker-compose file specifying the location of the Dockerfile as described here then building the service image using docker-compose as described here.
    2. I could build the image separate to docker-compose, giving it an appropriate name and then simply use that image name (which will exist on the host) in the docker-compose.yml file

    I’m going with option 2.

    docker build --tag wikibase-registry:1.31-bundle-1 --pull --file ./Dockerfile.wikibase.1.31-bundle .

    docker build documentation can be found here. The command tells docker to build an image from the “Dockerfile.wikibase.1.31-bundle” file, pulling new versions of any images being used and giving the image the name “wikibase-registry” with tag “1.31-bundle-1”

    The image should now be visible in the docker images list for the machine.

    root@wbregistry-01:~/wikibase-registry# docker images | grep wikibase-registry
    wikibase-registry         1.31-bundle-1       e5dad76c3975        8 minutes ago       844MB

    Deploying the new image

    In my previous post I migrated from one image to another having two Wikibase containers running at the same time with different images.

    For this image change however I’ll be going for more of a “big bang” approach and I’m pretty confident.

    The current wikibase service definition can be seen below. This includes volumes for the entry point, extensions, LocalSettings and images, some of which I can now get rid of. Also I have removed the requirement for most of these environment variables by using my own entrypoint file and overriding LocalSettings entirely.

    wikibase-131:
        image: wikibase/wikibase:1.31-bundle
        restart: always
        links:
          - mysql
        ports:
         - "8181:80"
        volumes:
          - mediawiki-images-data:/var/www/html/images
          - ./LocalSettings.php:/var/www/html/LocalSettings.php:ro
          - ./mw131/Nuke:/var/www/html/extensions/Nuke
          - ./mw131/ConfirmEdit:/var/www/html/extensions/ConfirmEdit
          - ./entrypoint.sh:/entrypoint.sh
        depends_on:
        - mysql
        environment:
          MW_ADMIN_NAME: "XXXX"
          MW_ADMIN_PASS: "XXXX"
          MW_SITE_NAME: "Wikibase Registry"
          DB_SERVER: "XXXX"
          DB_PASS: "XXXX"
          DB_USER: "XXXX"
          DB_NAME: "XXXX"
          MW_WG_SECRET_KEY: "XXXX"
        networks:
          default:
            aliases:
             - wikibase.svc
             - wikibase-registry.wmflabs.org

    The new service definition has an updated image name, removed redundant volumes and reduced environment variables (DB_SERVER is still used as it is needed in the entrypoint I added).

    wikibase-131:
        image: wikibase-registry:1.31-bundle-1
        restart: always
        links:
          - mysql
        ports:
         - "8181:80"
        volumes:
          - mediawiki-images-data:/var/www/html/images
          - ./LocalSettings.php:/var/www/html/LocalSettings.php:ro
        depends_on:
        - mysql
        environment:
          DB_SERVER: "mysql.svc:3306"
        networks:
          default:
            aliases:
             - wikibase.svc
             - wikibase-registry.wmflabs.org

    For the big bang switchover I can simply reload the service.

    root@wbregistry-01:~/wikibase-registry# docker-compose up -d wikibase-131
    wikibase-registry_mysql_1 is up-to-date
    Recreating wikibase-registry_wikibase-131_1 ... done

    Using the docker-compose images command I can confirm that it is now running from my new image.

    root@wbregistry-01:~/wikibase-registry# docker-compose images | grep wikibase-131
    wikibase-registry_wikibase-131_1    wikibase-registry        1.31-bundle-1   e5dad76c3975   805 MB

    Final thoughts

    • This should probably be documented in the wikibase-docker git repo which everyone seems to find, and also in the README for the wikibase image.
    • It would be nice if there were a single place to pull the download-extension.sh script from, perhaps with a parameter for version?

    The post Creating a Dockerfile for the Wikibase Registry appeared first on Addshore.

    wikibase-docker, Mediawiki & Wikibase update

    14:50, Monday, 11 2019 March UTC

    Today on the Wikibase Community User Group Telegram chat I noticed some people discussing issues with upgrading Mediawiki and Wikibase using the docker images provided for Wikibase.

    As the wikibase-registry is currently only running Mediawiki 1.30 I should probably update it to 1.31, which is the next long term stable release.

    This blog post was written as I performed the update and is yet to be proofread, so expect some typos. I hope it can help those that were chatting on Telegram today.

    Starting state

    Documentation

    There is a small amount of documentation in the wikibase docker image README file that talks about upgrading, but this simply tells you to run update.php.

    Update.php has its own documentation on mediawiki.org.
    None of this helps you piece everything together for the docker world.

    Installation

    The installation creation process is documented in this blog post, and some customization regarding LocalSettings and extensions was covered here.
    The current state of the docker-compose file can be seen below with private details redacted.

    This docker-compose files is found in /root/wikibase-registry on the server hosting the installation. (Yes I know that’s a dumb place, but that’s not the point of this post)

    version: '3'
    
    services:
      wikibase:
        image: wikibase/wikibase:1.30-bundle
        restart: always
        links:
          - mysql
        ports:
         - "8181:80"
        volumes:
          - mediawiki-images-data:/var/www/html/images
          - ./LocalSettings.php:/var/www/html/LocalSettings.php:ro
          - ./Nuke:/var/www/html/extensions/Nuke
          - ./ConfirmEdit:/var/www/html/extensions/ConfirmEdit
        depends_on:
        - mysql
        environment:
          MW_ADMIN_NAME: "private"
          MW_ADMIN_PASS: "private"
          MW_SITE_NAME: "Wikibase Registry"
          DB_SERVER: "mysql.svc:3306"
          DB_PASS: "private"
          DB_USER: "private"
          DB_NAME: "private"
          MW_WG_SECRET_KEY: "private"
        networks:
          default:
            aliases:
             - wikibase.svc
             - wikibase-registry.wmflabs.org
      mysql:
        image: mariadb:latest
        restart: always
        volumes:
          - mediawiki-mysql-data:/var/lib/mysql
        environment:
          MYSQL_DATABASE: 'private'
          MYSQL_USER: 'private'
          MYSQL_PASSWORD: 'private'
          MYSQL_RANDOM_ROOT_PASSWORD: 'yes'
        networks:
          default:
            aliases:
             - mysql.svc
      wdqs-frontend:
        image: wikibase/wdqs-frontend:latest
        restart: always
        ports:
         - "8282:80"
        depends_on:
        - wdqs-proxy
        environment:
          BRAND_TITLE: 'Wikibase Registry Query Service'
          WIKIBASE_HOST: wikibase.svc
          WDQS_HOST: wdqs-proxy.svc
        networks:
          default:
            aliases:
             - wdqs-frontend.svc
      wdqs:
        image: wikibase/wdqs:0.3.0
        restart: always
        volumes:
          - query-service-data:/wdqs/data
        command: /runBlazegraph.sh
        environment:
          WIKIBASE_HOST: wikibase-registry.wmflabs.org
        networks:
          default:
            aliases:
             - wdqs.svc
      wdqs-proxy:
        image: wikibase/wdqs-proxy
        restart: always
        environment:
          - PROXY_PASS_HOST=wdqs.svc:9999
        ports:
         - "8989:80"
        depends_on:
        - wdqs
        networks:
          default:
            aliases:
             - wdqs-proxy.svc
      wdqs-updater:
        image: wikibase/wdqs:0.3.0
        restart: always
        command: /runUpdate.sh
        depends_on:
        - wdqs
        - wikibase
        environment:
          WIKIBASE_HOST: wikibase-registry.wmflabs.org
        networks:
          default:
            aliases:
             - wdqs-updater.svc
    
    volumes:
      mediawiki-mysql-data:
      mediawiki-images-data:
      query-service-data:

    Backups

    docker-compose.yml

    So that you can always return to your previous configuration take a snapshot of your docker-compose file.

    If you have any other mounted files it also might be worth taking a quick snapshot of those.

    Volumes

    The wikibase docker-compose example README has a short section about backing up docker volumes using the loomchild/volume-backup docker image.
    So let’s give that a go.

    I’ll run the backup command for all 3 volumes used in the docker compose file which cover the 3 locations that I care about that persist data.

    docker run -v wikibase-registry_mediawiki-mysql-data:/volume -v /root/volumeBackups:/backup --rm loomchild/volume-backup backup mediawiki-mysql-data_20190129
    docker run -v wikibase-registry_mediawiki-images-data:/volume -v /root/volumeBackups:/backup --rm loomchild/volume-backup backup mediawiki-images-data_20190129
    docker run -v wikibase-registry_query-service-data:/volume -v /root/volumeBackups:/backup --rm loomchild/volume-backup backup query-service-data_20190129

    Looking in the /root/volumeBackups directory I can see that the backup files have been created.

    ls -lahr /root/volumeBackups/ | grep 2019
    -rw-r--r-- 1 root root 215K Jan 29 16:40 query-service-data_20190129.tar.bz2
    -rw-r--r-- 1 root root  57M Jan 29 16:40 mediawiki-mysql-data_20190129.tar.bz2
    -rw-r--r-- 1 root root  467 Jan 29 16:40 mediawiki-images-data_20190129.tar.bz2

    I’m not going to bother checking that the backups are actually complete here, but you might want to do that!

    Prepare the next version

    Grab new versions of extensions


    The wikibase-registry has a couple of extension shoehorned into it mounted through mounts in the docker-compose file (see above).

    We need new versions of these extensions for Mediawiki 1.31 while leaving the old versions in place for the still running 1.30 version.

    I’ll do this by creating a new folder, copying the existing extension code into it, and then changing and fetching the branch.

    # Make copies of the current 1.30 versions of extensions
    root@wbregistry-01:~/wikibase-registry# mkdir mw131
    root@wbregistry-01:~/wikibase-registry# cp -r ./Nuke ./mw131/Nuke
    root@wbregistry-01:~/wikibase-registry# cp -r ./ConfirmEdit ./mw131/ConfirmEdit
    
    # Update them to the 1.31 branch of code
    root@wbregistry-01:~/wikibase-registry# cd ./mw131/Nuke/
    root@wbregistry-01:~/wikibase-registry/mw131/Nuke# git fetch origin REL1_31
    From https://github.com/wikimedia/mediawiki-extensions-Nuke
     * branch            REL1_31    -> FETCH_HEAD
    root@wbregistry-01:~/wikibase-registry/mw131/Nuke# git checkout REL1_31
    Branch REL1_31 set up to track remote branch REL1_31 from origin.
    Switched to a new branch 'REL1_31'
    root@wbregistry-01:~/wikibase-registry/mw131/Nuke# cd ./../ConfirmEdit/
    root@wbregistry-01:~/wikibase-registry/mw131/ConfirmEdit# git fetch origin REL1_31
    From https://github.com/wikimedia/mediawiki-extensions-ConfirmEdit
     * branch            REL1_31    -> FETCH_HEAD
    root@wbregistry-01:~/wikibase-registry/mw131/ConfirmEdit# git checkout REL1_31
    Branch REL1_31 set up to track remote branch REL1_31 from origin.
    Switched to a new branch 'REL1_31'

    Define an updated Wikibase container / service

    We can run a container with the new Mediawiki and Wikibase code in alongside the old container without causing any problems, it just needs a name.

    So below I define this new service, called wikibase-131 using the same general details as my previous wikibase service, but pointing to the new versions of my extensions, and add it to my docker-compose file.

    Note that no port is exposed, as I don’t want public traffic here yet, and also no network aliases are yet defined. We will switch those from the old service to the new service at a later stage.

    wikibase-131:
        image: wikibase/wikibase:1.31-bundle
        restart: always
        links:
          - mysql
        volumes:
          - mediawiki-images-data:/var/www/html/images
          - ./LocalSettings.php:/var/www/html/LocalSettings.php:ro
          - ./mw131/Nuke:/var/www/html/extensions/Nuke
          - ./mw131/ConfirmEdit:/var/www/html/extensions/ConfirmEdit
        depends_on:
        - mysql
        environment:
          MW_ADMIN_NAME: "private"
          MW_ADMIN_PASS: "private"
          MW_SITE_NAME: "Wikibase Registry"
          DB_SERVER: "mysql.svc:3306"
          DB_PASS: "private"
          DB_USER: "private"
          DB_NAME: "private"
          MW_WG_SECRET_KEY: "private"

    I tried running this service as is but ran into an issue with the change from 1.30 to 1.31. (Your output will be much more verbose if you need to pull the image)

    root@wbregistry-01:~/wikibase-registry# docker-compose up wikibase-131
    wikibase-registry_mysql_1 is up-to-date
    Creating wikibase-registry_wikibase-131_1 ... done
    Attaching to wikibase-registry_wikibase-131_1
    wikibase-131_1   | wait-for-it.sh: waiting 120 seconds for mysql.svc:3306
    wikibase-131_1   | wait-for-it.sh: mysql.svc:3306 is available after 0 seconds
    wikibase-131_1   | wait-for-it.sh: waiting 120 seconds for mysql.svc:3306
    wikibase-131_1   | wait-for-it.sh: mysql.svc:3306 is available after 1 seconds
    wikibase-131_1   | /extra-entrypoint-run-first.sh: line 3: MW_ELASTIC_HOST: unbound variable
    wikibase-registry_wikibase-131_1 exited with code 1

    The wikibase:1.31-bundle docker image includes the Elastica and CirrusSearch extensions which were not a part of the 1.30 bundle, and due to the entrypoint infrastructure added along with it I will need to change some things to continue without using Elastic for now.

    Fix MW_ELASTIC_HOST requirement with a custom entrypoint.sh

    The above error message shows that the error occurred while running extra-entrypoint-run-first.sh which is provided as part of the bundle.
    It is automatically loaded by the base image entry point.
    The bundle now also runs some extra steps as part of the install for wikibase that we don’t want if we are not using Elastic.

    If you give the entrypoint file a read through you can see that it does a few things:

    • Makes sure the required environment variables are passed in
    • Waits for the DB server to be online
    • Runs extra scripts added by the bundle image
    • Does the Mediawiki / Wikibase install on the first run (if LocalSettings does not exist)
    • Run apache

    This is a bit excessive for what the wikibase-registry requires right now, so lets strip this down, saving next to our docker-compose file, so /root/wikibase-registry/entrypoint.sh for the wikibase-registry

    #!/bin/bash
    
    REQUIRED_VARIABLES=(MW_ADMIN_NAME MW_ADMIN_PASS MW_WG_SECRET_KEY DB_SERVER DB_USER DB_PASS DB_NAME)
    for i in ${REQUIRED_VARIABLES[@]}; do
        eval THISSHOULDBESET=\$$i
        if [ -z "$THISSHOULDBESET" ]; then
        echo "$i is required but isn't set. You should pass it to docker. See: https://docs.docker.com/engine/reference/commandline/run/#set-environment-variables--e---env---env-file";
        exit 1;
        fi
    done
    
    set -eu
    
    /wait-for-it.sh $DB_SERVER -t 120
    sleep 1
    /wait-for-it.sh $DB_SERVER -t 120
    
    docker-php-entrypoint apache2-foreground

    And mount it in the wikibase-131 service that we have created by adding a new volume.

    volumes:
          - ./entrypoint.sh:/entrypoint.sh

    Run the new service alongside the old one

    Running the service now works as expected.

    root@wbregistry-01:~/wikibase-registry# docker-compose up wikibase-131
    wikibase-registry_mysql_1 is up-to-date
    Recreating wikibase-registry_wikibase-131_1 ... done
    Attaching to wikibase-registry_wikibase-131_1
    {snip, broing output}

    And the service appears in the list of running containers.

    root@wbregistry-01:~/wikibase-registry# docker-compose ps
                  Name                             Command               State          Ports
    -------------------------------------------------------------------------------------------------
    wikibase-registry_mysql_1           docker-entrypoint.sh mysqld      Up      3306/tcp
    wikibase-registry_wdqs-frontend_1   /entrypoint.sh nginx -g da ...   Up      0.0.0.0:8282->80/tcp
    wikibase-registry_wdqs-proxy_1      /bin/sh -c "/entrypoint.sh"      Up      0.0.0.0:8989->80/tcp
    wikibase-registry_wdqs-updater_1    /entrypoint.sh /runUpdate.sh     Up      9999/tcp
    wikibase-registry_wdqs_1            /entrypoint.sh /runBlazegr ...   Up      9999/tcp
    wikibase-registry_wikibase-131_1    /bin/bash /entrypoint.sh         Up      80/tcp
    wikibase-registry_wikibase_1        /bin/bash /entrypoint.sh         Up      0.0.0.0:8181->80/tcp

    Update.php

    From here you should now be able to get into your new container with the new code.

    root@wbregistry-01:~/wikibase-registry# docker-compose exec wikibase-131 bash
    root@40de55dc62fc:/var/www/html#

    And then run update.php

    In theory updates to the database, and anything else, will always be backward compatible for at least 1 major version, which is why we can run this update while the site is still being served from Mediawiki 1.30

    root@40de55dc62fc:/var/www/html# php ./maintenance/update.php --quick
    MediaWiki 1.31.1 Updater
    
    Your composer.lock file is up to date with current dependencies!
    Going to run database updates for wikibase_registry
    Depending on the size of your database this may take a while!
    {snip boring output}
    Purging caches...done.
    
    Done in 0.9 s.

    Switching versions

    The new service is already running alongside the old one, and the database has already been updated, now all we have to do is switch the services over.

    If you want a less big bangy approach you could probably setup a second port exposing the updated version and direct a different domain or sub domain to that location, but I don’t go into that at all here.

    Move the “ports” definition and “networks” definition from the “wikibase” service to the “wikibase-131” service. Then recreate the container for each service using the update configuration. (If you have any other references to the “wikibase” service in the docker-compose.yml file such as in depends-on then you will also need to change this.

    root@wbregistry-01:~/wikibase-registry# docker-compose up -d wikibase
    wikibase-registry_mysql_1 is up-to-date
    Recreating wikibase-registry_wikibase_1 ... done
    root@wbregistry-01:~/wikibase-registry# docker-compose up -d wikibase-131
    wikibase-registry_mysql_1 is up-to-date
    Recreating wikibase-registry_wikibase-131_1 ... done

    If everything has worked you should see Special:Version reporting the newer version, which we now see on the wikibase-registry.

    Cleanup

    Now that everything is updated we can stop and remove the previous “wikibase” service container.

    root@wbregistry-01:~/wikibase-registry# docker-compose stop wikibase
    Stopping wikibase-registry_wikibase_1 ... done
    root@wbregistry-01:~/wikibase-registry# docker-compose rm wikibase
    Going to remove wikibase-registry_wikibase_1
    Are you sure? [yN] y
    Removing wikibase-registry_wikibase_1 ... done

    You can then do some cleanup:

    • Remove the “wikibase” service definition from the docker-compose.yml file, leaving “wikibase-131” in place.
    • Remove any files or extensions (older versions) that are only loaded by the old service that you have now removed.

    Further notes

    There are lots of other things I noticed while writing this blog post:

    • It would be great to move the env vars out of the docker-compose and into env var files.
    • The default entrypoint in the docker images is quite annoying after the initial install and if you don’t use all of the features in the bundle.
    • We need a documentation hub? ;)

    The post wikibase-docker, Mediawiki & Wikibase update appeared first on Addshore.

    It is crucial for organizations that the documentation in their wiki is stored in a comprehensible and revision-proof manner. BlueSpice MediaWiki now supports you in the balancing act between traceability and data protection. Against the background of the General Data Protection Regulation (GDPR), we have revised our enterprise wiki software with regard to the protection of personal data.

    The GDPR defines a number of basic rights that are also relevant for BlueSpice MediaWiki users:

    • Information: Users have the right to know what data about them is stored in the system.
    • Data portability: Users must be able to extract the data stored on a platform and transfer it to another system.
    • Correction, erasure or blocking: Users may request or arrange for their data to be corrected. They may also request the deletion or blocking of access to their data.
    • Forgetting: It must be possible to cancel the linking / assignment of data to individual users.
    • Consent: Users must be able to give their differentiated consent to the storage and use of their data.

    Implementation in BlueSpice MediaWiki: the privacy center

    To support the protection of privacy, BlueSpice MediaWiki now delivers the Privacy Center, which every user can access via the personal menu. Various actions can be performed here:

    • Request anonymization: A user can request that his name be made unrecognizable. To do this, he can either assign a pseudonym himself or accept the system’s suggestion.
    • Request deletion: A user can request that his account and all associated data be removed. For reasons of consistency and traceability, however, this is not completely possible in BlueSpice when it comes to assigning content contributions. We follow a pool approach here. This means that all data that must continue to be held are assigned to a collection user and are therefore no longer individually identifiable.

    By the way, the correction of personal data can normally be carried out by the users themselves, as they can edit the content and their profiles themselves.

     

    BlueSpice privacy center
    BlueSpice privacy center

     

    Information about the data collected

    The privacy center also includes a function to provide information about the data collected. These are determined at the push of a button and include all personal details (e.g. name and e-mail), work data (e.g. saved reminders or workflows), log data (e.g. when which article was edited) and mentions of the person in the content. In a further step, these can be exported as HTML or CSV files.

    Consent to the privacy policy and use of cookies

    Last but not least, the privacy center also allows users to give or withdraw their consent to the privacy policy and to the use of cookies. This will also be requested when the user logs in for the first time. A refusal does not initially have any direct consequences. In the administration interface, however, authorized persons can see which employees have given their consent. They can then decide how to proceed in the event of rejection.

    Balancing user interest and accountability

    In a company wiki, the legitimate interest of the user in the protection of personal data is matched by the requirements of traceability and accountability of the company. For example, companies may have to prove exactly which person made which content change at which time. For this reason, both anonymization and deletion must be confirmed by an authorized person. In the future, this will be done via a central administration interface.

    Summary and outlook

    The privacy extension is delivered with all editions of BlueSpice MediaWiki. It supports the operators of platforms in complying with the requirements of the GDPR and maps them in the software. The team of Hallo Welt! GmbH continues to work continuously to adapt the software to a rapidly changing legal environment and to meet the current interpretations of the GDPR.

    Let’s wiki together!

    Author: Markus Glaser, Hallo Welt! GmbH

    The post Dokumentation according to data protection rights – BlueSpice MediaWiki and the GDPR appeared first on BlueSpice Blog.

    Tech News issue #11, 2019 (March 11, 2019)

    00:00, Monday, 11 2019 March UTC
    TriangleArrow-Left.svgprevious 2019, week 11 (Monday 11 March 2019) nextTriangleArrow-Right.svg
    Other languages:
    Bahasa Indonesia • ‎English • ‎français • ‎magyar • ‎polski • ‎português do Brasil • ‎svenska • ‎čeština • ‎русский • ‎українська • ‎עברית • ‎العربية • ‎مصرى • ‎हिन्दी • ‎中文 • ‎日本語

    Quite! Thanks!

    All the time people want to read articles in a Wikipedia, articles that are not there. For some Wikipedias that is obvious because there is so little and, based on what people read in other Wikipedias, recommendations have been made suggesting what would generate new readers.This has been the approach so far; a quite reasonable approach.

    This approach does not consider cultural differences, it does not consider what is topical in a given "market". To find an answer to the question: what do people want to read, there are several strategies. One is what researchers do: they ask panels, write papers and once it is done there is a position to act upon. There are drawbacks; 
    • you can only research so many Wikipedias
    • for all the other Wikipedias there is no attention
    • the composition of the panels is problematic particularly when they are self selecting
    • there are no results while the research is being done
    The objective of a marketing approach is centered around two questions: 
    • what is it that people are looking for now (and cannot find) 
    • what can be done to fulfill that demand now
    The data needed for this approach; negative search results. People search for subjects all the time and there are all kinds of reasons why they do not find what they are looking for.. Spelling, disambiguation and nothing to find are all perfectly fine reasons for a no show. 

    The "nothing to find" scenario is obvious; when it is sought often, we want an article. Exposing a list of missing articles is one motivator for people to write. Once they have written, we do have the data of how often an article was read. When the most popular new articles of the last month are shown, it is vindication for authors to have written popular articles. It is easy, obvious and it should be part of the data Wikimedia Foundation already collects.. In this way the data is put to use. It is also quite FAIR to make this data available. 

    For the "disambiguation" issue, Wikidata may come to the rescue. It knows what is there and, it is easy enough to add items with the same name for disambiguation purposes. Combine this with automated descriptions and all that is requires is a user interface to guide people to what they are looking for. When there is "only" a Wikidata item, it follows that its results feature in the "no article" category.

    The "spelling" issue is just a variation on a theme. Wikidata does allow for multiple labels. The search results may use of them as well. Common spelling errors are also a big part of the problem. With a bit of ingenuity it is not much of a problem either.

    Marketing this marketing approach should not be hard. It just requires people to accept what is staring them in the face. It is easy to implement, it works for all the 280+ language and it is likely to give a boost to all the other Wikipedias but also to Wikidata.
    Thanks,
            GerardM

    What are WikiJournals?

    17:53, Saturday, 09 2019 March UTC

    This article was jointly authored by Thomas Shafee and Jack Nunn from the WikiJournals board, and edited by John Lubbock of Wikimedia UK.

    The WikiJournals are a new group of peer-reviewed, open-access academic journals which are free to publish in. The twist is that articles published in them are integrated into Wikipedia. At the moment, there are three:

    WikiJournals are also highly unusual for academic journals, as they’re free for both readers and authors!

    What WikiJournals hope to achieve:

    The  aim of these journals is to generate new, high-quality peer-reviewed articles, which can form part of Wikipedia. As well as new articles, submissions can include existing Wikipedia pages, which are then subjected to the exact same rigour as any other submission.

    The hope is that this new way of publishing peer-reviewed content will encourage academics, researchers, students and other experts to get involved in the process of creating and reviewing high-quality content for the Wikimedia project. It also allows participants a way of putting their contributions on their CV with an easily definable output (including DOI links and listing in indexes like Google Scholar).

    When an article gets through the peer review process, there are two copies. The Journal copy can now be reliably cited and stays the same as a ‘version of record’ alongside the public reviewer comments. The Wikipedia version is free to evolve in the normal Wikipedia way as people update it over time, and is linked to the Journal article.

    Since 2014, articles have been published on massive topics like Radiocarbon Dating and niche topics like Æthelflæd. They’ve also published meta analyses, original research, case studies, teaching material, diagrams and galleries!

     

    Some submissions are written from scratch. Others are adapted from existing Wikipedia material. The journal editors invite academic peer reviewers to publicly comment. If published, suitable material is integrated back into Wikipedia to improve the encyclopedia. From ref.

    How to get involved!

    If this sounds like the sort of thing that you’d like to get involved in, support, or just spread the word on, there’s plenty of ways to contribute!

    School projects

    So here’s an example for a teacher. You have a class of 30 keen students who would normally all write an essay on a subject, have it read once, then never seen again. An alternative could be to have students in groups of 5 each chose a section of a neglected Wikipedia article to update and overhaul (there are millions of stub and start class articles to choose from). Each group writes a section of the article, then proofreads each others sections (WikiEdu has a great dashboard for this). Once the article is up to scratch, it’s submitted to the relevant WikiJournal who reaches out to experts in the topic to give in-depth feedback on what can be improved. If you and your students are able to fully address those comments then the article can be published and you and your students have just generated a new Wikipedia article read by thousands, and an academic article to put on their CVs!

    Teachers who would consider using this method as an assessed class exercise can ask for advice from Wikimedia UK. We think that this workflow offers a useful alternative to simply having students write parts of Wikipedia articles in class, which may be harder to assess, and doesn’t provide a final product as tangible as a published journal article.

    Academic outreach

    The current priorities for the WikiJournals are to expand and improve representation on their editorial boards, and to invite article submissions. If you would like to volunteer in these roles, we encourage you to talk to the WikiJournal organisers.

    If you are based in a UK academic institution at a course that has a strong strategic overlap with Wikimedia UK’s strategic priorities, you can also email education@wikimedia.org.uk to talk to us about providing advice on using WikiJournals as part of your course.

    Individuals

    The journals always welcome new submissions. Whether they’re written by a professor or a student, all go through the same process. You could get a team together to submit a brand new article. Or maybe you could overhaul and submit an existing Wikipedia page. You could even help translate an existing article.

    They have a public discussion forum (typical for a wiki, unusual for a journal!) where you can share ideas for improvements, other projects they could reach out to or point out gaps in Wikipedia’s content where they could invite researchers to write an article.

    Each journal has a twitter and facebook account (@WikiJMed, @WikiJSci and @WikiJHum) so feel free to chat with them there. You can even suggest social media posts or accounts to follow. Not into social media? Maybe put a poster in your university tearoom.

    Monthly Report, January 2019

    22:23, Friday, 08 2019 March UTC

    Highlights

    • Wiki Education hosted 2019’s first in-person board meeting in the Presidio of San Francisco in late January. On this occasion, board and staff celebrated Wiki Education’s fifth anniversary.
    • We started third round of our Wiki Scholars professional development course with the National Archives and Records Administration (NARA). Ten people from a wide range of backgrounds have come together to learn to share their knowledge with the public through Wikipedia.
    • In January, Wes Reid joined the Technology department as Wiki Education’s first Software Developer.

    Programs

    Wikipedia Student Program

    Status of the Wikipedia Student Program for Spring 2019 in numbers, as of January 31:

    • 290 Wiki Education-supported courses were in progress (176, or 61%, were led by returning instructors)
    • 5,083 student editors were enrolled
    • 70% of students were up-to-date with their assigned training modules
    • Students edited 715 articles, created 35 new entries, and added more than 209,000 words to Wikipedia.

    As always, January saw a flurry of course pages coming through on the Dashboard. Wikipedia Student Program Manager Helaine Blumenthal spent the majority of her time ensuring that all of Wiki Education’s Spring 2019 courses are set up for success. This meant welcoming back returning instructors and providing that bit of extra support to first-time participants in the Student Program.

    Students are enrolling on the Dashboard and getting their feet wet as they learn to navigate what’s likely the most familiar, unfamiliar site on the web.

    Student work highlights:

    There are always people making new innovations in the world of science, as University of Michigan students in Kush Patel and Anne Cong-Huyen’s class Digital Pedagogy with U-M Library could tell you. One article that they have created so far is the one on Alison R.H. Narayan, a William R. Roush assistant professor at the Department of Chemistry in the College of Literature, Science, and the Arts at the University of Michigan. A Michigan native, Narayan engineered cytochrome P450 enzymes to perform C-H functionalization in not native substrates during her postdoc and for her doctoral thesis she wrote “New Reactions and Synthetic Strategies toward Indolizidine Alkaloids and Pallavicinia Diterpenes”. Her work has been recognized by such major organizations as the American Chemical Society, who named her one of their “Talented 12” in a 2016 issue of Chemical & Engineering News, and by the Research Corporation, who made her one of their 2019 Cottrell Scholars.

    The class also expanded the article on Melanie Sanford, a chemist who also teaches at the University of Michigan and holds the positions of Moses Gomberg Collegiate Professor of Chemistry and Arthur F. Thurnau Professor of Chemistry. She earned her BS and MS at Yale University and went on to gain her Ph.D. from the California Institute of Technology, where she worked with future Nobel Prize recipient Robert H. Grubbs. Sanford followed this up by performed her postdoctoral work at Princeton University. Her work has been recognized by numerous organizations, earning her awards and accolades such as the Royal Society of Chemistry Fluorine Prize and the prestigious MacArthur Fellowship!

    While Wikipedia has a goal of holding the sum of human knowledge, it still isn’t there yet – which is why it’s so important for people to contribute their time and effort to expand articles on not only the very well known topic areas, but those that have not yet reached common knowledge world-wide. Areas like Ongamira may not be as much of a household name as say, Paris, but it holds just as much of a treasure trove of history and culture. Ongamira is a valley near the city of Córdoba, Argentina that contains caves and grottoes of immense archaeological and natural significance. The valley was formerly home to the Comechingones, who settled in the region. Many of them died as a result of battles over the land between the Comechingones and Spanish forces led by conquistadors such as Blas de Rosales, who was granted the lands by Jerónimo Luis de Cabrera. Thanks to efforts by a Paradise Valley Community College student in Kande Mickelsen and Sheila Afnan-Manns’s spring course, this historic valley now has an article on Wikipedia.

    Scholars & Scientists Program

    This month we started a new round of our Wiki Scholars professional development course with the National Archives and Records Administration (NARA). Ten people from a wide range of backgrounds have come together to learn to share their knowledge with the public through Wikipedia. In particular, we will focus on improving articles about women’s suffrage in the United States in advance of NARA’s upcoming exhibit, Rightfully Hers, which will be opening in Washington, DC in May 2019.

    Participants are in the process of selecting their first of two articles to improve. Most will be expanding the articles on prominent (as well as less prominent) suffragists. The first step is exploring areas for opportunity, then participants conduct an evaluation of the articles while learning how to edit Wikipedia and becoming familiar with its policies and guidelines. Finally, the editing begins, improving the articles with resources from NARA and elsewhere.

    Here are some of the articles current Wiki Scholars have chosen to improve:

    • Annie Smith Peck (1850–1935), a mountaineer, adventurer, suffragist, and lecturer. The northern peak of a Peruvian mountain chain was named in her honor.
    • Mary Birdsall (1828–1894), a journalist, suffragist, and temperance worker who served as president of the Indiana Women’s Suffrage Association.
    • Woman suffrage parade of 1913 (also known as the Woman Suffrage Procession), the first suffragist parade in Washington, D.C.
    • Women’s suffrage movement in Washington, focusing on events, people, publications, and activities that took place in Washington State.
    • Lillian Exum Clement (1894–1925), who served in the North Carolina General Assembly (the first woman to do so—and the first woman to serve in any state legislature in the Southern United States).
    • Mabel Ping-Hua Lee (1897–1966), a Chinese religious and women’s rights leader.
    • Amanda Way (1828–1914), a Civil War nurse, minister, and pioneer in the temperance and women’s rights movements.
    Lillian Exum Clement, the first woman to serve in any state legislature in the Southern United States.
    Image: File:Lillian Exum Clement (33361507400).jpg, open license, via Flickr.

    Visiting Scholars Program

    This month Visiting Scholars contributed several high-quality articles to Wikipedia, including both a Featured Article (FA) and a Good Article (GA). These designations are reserved for the highest quality articles and require peer review to ensure they meet strict criteria.

    In 1971, West German stamp dealer Hermann Sieger secretly paid the Apollo 15 astronauts, David Scott, Alfred Worden, and James Irwin, to bring 400 unauthorized postal covers to the surface of the Moon, to be sold on their return. Worden had already received permission to take 144 other covers with him for a stamp collector friend, based on the understanding they would not be sold until after the Apollo program ended. The covers were postmarked the morning of the launch and again after splashdown. When NASA learned Worden’s friend was selling the covers, they warned the astronaut about commercializing their activities. When they learned about Sieger, the three were reprimanded, removed as backup crew members for Apollo 17, and required to give the money back. These events are known as the Apollo 15 postal covers incident, an article developed by George Mason University Visiting Scholar Gary Greenbaum, using resources from the GMU library. Gary brought the article to Good Article level last year, and continued his extensive work on it before it was finally promoted to Featured Article this month.

    Hortensius (On Philosophy) is a lost dialogue written by Cicero in 45 BCE. One might think that a dialogue that’s been lost since the 6th century might be difficult to write a Wikipedia article about. Paul Thomas, Visiting Scholar at the University of Pennsylvania, did just that, bringing it up to Good Article quality. The work was notable at the time and scholars have developed a picture of its content and style based on other writings. It is said to have inspired such important historical figures as Seneca the Younger, Tacitus, Boethius, and even Augustine of Hippo. It’s through the writings of the latter, as well as Nonius Marcellus and others, that we have what pieces of the dialogue still remain.

    Rosie Stephenson-Goodknight, Visiting Scholar at Northeastern University, wrote an impressive 14 articles about women writers this month, including the creation of articles about 12 women who previously were not represented on Wikipedia. For example, Emily Parmely Collins (1814–1909), a suffragist, women’s rights activist and writer. She established the first suffragists’ society in 1848: the Woman’s Equal Rights Union. Frances Manwaring Caulkins (1795–1869) was a historian and genealogist who wrote histories of town in Connecticut. She was elected to be a member of the Massachusetts Historical Society in 1849, and was the first woman to join. Rosa Louise Woodberry (1869–1932) was a journalist, educator, and stenographer. Her philosophy and science writing appeared in journals around the country and she was on the staff of both The Augusta Chronicle and Savannah Press.

    Emily Parmely Collins (1814–1909), a suffragist, women’s rights activist and writer. Collins established the first suffragists’ society in 1848.
    Image: File:EMILY PARMELY COLLINS.jpg, public domain, via Wikimedia Commons.

    Advancement

    In January, the Advancement Team began implementing its newly established charter by establishing regular team meetings, enacting team norms, and identifying/documenting team policies and processes.

    Fundraising

    In January, we received our first installment, totaling $233,000, of the $400,000 Annual Planning Grant from Wikimedia Foundation’s Fund Dissemination Committee. Chief Advancement Officer TJ Bliss had calls and dinner meetings with several potential new funders, all of which asked for concept notes or other follow-up documents. TJ also developed a draft proposal for a funder briefing on Wikipedia, hosted by one of our major funders. This briefing will help build awareness about the importance of Wikipedia to furthering philanthropic efforts generally. TJ and Director of Partnerships Jami Mathewson visited the Stanton Foundation in Boston, gave an oral report on our current work, and requested funding for a Scholars and Scientists course related to public policy. Jami also visited with Program Officers at the Simons Foundation in New York City and described our Wiki Scholars and Scientists efforts and our interest in Wikidata. Customer Success Manager Samantha Weald continued her research and identification of new funders and began drafting outreach letters.

    Scholars & Scientists partnerships and collaborations

    Samantha worked closely with new participants in our NARA Scholars & Scientists course to ensure their needs were met as they began the course.

    At the end of the month, TJ and Jami presented to faculty at the Massachusetts Institute of Technology, sharing collaboration opportunities to share high quality open knowledge via Wikipedia.

    Communications

    January 15th was Wikipedia Day, a day of reflection and celebration for the Wikipedia community across the globe. We published a year in review blog post about what we’ve accomplished since this day last year. We were also featured in The Washington Post in a piece by Stephen Harrison celebrating Wikipedia’s 18th birthday. Later in the month, the National Institute for Occupational Safety and Health recognized our Wikipedia Student Program (and specifically a course we support at the Harvard T.H. Chan School of Public Health) as an effective way to make occupational safety information available to the public.

    Blog posts:

    External media:

    Technology

    Wes Reid.
    Image: File:Wes-reid-profile-image-wiki-education.jpg, Wes (Wiki Ed), CC BY-SA 4.0, via Wikimedia Commons.

    In January, Wes Reid joined the Technology department as Wiki Education’s first Software Developer. Our focus was to bring Wes up to speed on the breadth of our codebase and the broader Wikimedia technology ecosystem, and to prepare for the major technical projects on the horizon. We also had a bevy of volunteer contributions this month, including a large set of improvements to our test suite and a final set of enhancements to enable complete translation of the Dashboard’s training modules.

    In addition to fixing numerous bugs, Wes improved the course approval workflow so that we can more easily keep track of new instructors and how they found out about the Wikipedia Student Program. At the end of the month, Wes and Chief Technology Officer Sage Ross also began transferring Programs & Events Dashboard to a new server for the first time since it was set up in 2015; the operating system on the original server was deprecated for use on the Wikimedia Cloud platform, and would have become inaccessible soon. (The transfer was completed, with minimal downtime, in early February.)

    Outreachy intern Cressence continued her work on the event creation workflow, which now lets users of the global Programs & Events Dashboard choose which type of program they are running, with detailed explanations of the differences. Now she’s turning her attention to the start and end date interface, which has been a frequent point of confusion for global Dashboard users.

    Finance & Administration

    The total expenses for January were $213,000, entirely on target for the budgeted $213,000 amount. The Board meeting occurred in January; however, the budget split the costs between January and February. Where we see an overage in January for the Board, this will balance out in February, where an additional $9K is allocated and will not be used. General and Administrative costs were over budget, a combination of overages in Furniture and Equipment-$5K and Indirect Expenses-$13K, while spending under budget ($4K) Staff meeting, ($8K) Professional Services and ($1K) Rent. Programs were very close to budget, under ($2K). And Technology was under budget ($10K), Payroll ($3K), Professional Services (5K) and Occupancy Costs ($2K).

    The Year-to-date expenses are $1.2M ($240K) under budget of $1.44M. We expected that Fundraising would be under by ($160K) due to a change in plan for professional services ($149K) and deciding not to engage in a cultivation event ($11K). Programs were under ($56K) due to a few changes in processes-professional services ($17K), Travel ($28K), Printing and Reproduction ($11K).) Communication ($4K) and Indirect expenses ($21K) while reporting an overage in Payroll-$21K and furniture and equipment-$4K. General and Administrative are under ($23K) due to a reduction of payroll ($14K) and professional fees mostly relating to Audit and Tax prep ($13K) and administrative costs ($8K) while spending over budget Occupancy-combined direct and indirect-+$11K and Travel $2K. As mentioned in the January report, the Board was over budget for January, due to expense accrual, and will be under budget, come February. Technology is under budget by ($15K) as there was a change in plans in utilizing the budgeted professional fees ($17K) and additional rent ($5K) and instead increased their payroll -$3K and Furniture and equipment-$4K.

    Office of the ED

    • Current priorities:
      • Coordinating and overseeing work on the Annual Plan & Budget for FY 2019/20
      • Improving the organization’s resilience to staffing changes
    Board and senior leadership team members gather at the former Officers Club of the Presidio for the January in-person board meeting

    In January, Executive Director Frank Schulenburg worked with our auditors from Hood & Strong on finalizing our audit for fiscal year 2017/18. This year’s audit is our fourth voluntary audit since 2015 and the board approved the audit report during its in-person board meeting on on January 25. Once our work on form 990 will be done, we’ll publish both the report from Hood & Strong as well as form 990 on our website.

    Board chair PJ Tabit and Sage Ross (as the youngest member of the senior leadership team) jointly cut the anniversary cake, celebrating five years of Wiki Education

    As in former years, January is the month for one of our two in-person board meetings. The meeting serves at looking back at the organization’s performance during the first half of the current fiscal year and at providing the board with a high-level understanding of what to expect for next fiscal year. The meeting kicked off with Frank looking back at five years of Wiki Education (Wiki Education officially started operations in mid February 2014.) TJ then walked the board through our current fundraising efforts and also provided an analysis of our work in opening up a second revenue stream through our fee-for-service based Scholars & Scientists Program. Chief Programs Officer LiAnna Davis provided the board with an update on programs while outlining how our current and future activities connect to the organizational strategy approved by the board in June last year. LiAnna also shared that Wiki Education now brings 19% of all new contributors to the English Wikipedia and 9% of all new Wikipedia editors globally. Sage walked the attendants of the meeting through a presentation that outlined how our digital infrastructure supports our own program participants as well as program leaders in other parts of the world. As this was Sage’s first presentation at a board meeting, he also walked the board through his general philosophy behind making our Dashboard adaptable and sustainable. On the evening of the first meeting day, board and staff joined the crowd at the Internet Archive to celebrate Public Domain Day 2019. The second day of the board meeting was dedicated to a longer discussion about new board member recruitment and a report from board member Richard Knipel about new developments in the U.S. Wikimedia landscape. This in-person board meeting was the first to take place in the Presidio, where Wiki Education’s office is based.

    On the evening prior to the board meeting, board member Ted Yang led a staff education event at Wiki Education’s office around the topic of “Planning for retirement.” This event kicked off our new effort of providing staff with education around topics that are relevant to their employment and life planning.

    Also in January, Frank met virtually with Jens Ohlig and Nicola Zeuner from Wikimedia Deutschland, the German chapter of the Wikimedia Foundation. With Wikimedia Deutschland being the organization responsible for the technical infrastructure of Wikidata and Wiki Education planning on providing Wikidata-related trainings, the two organizations decided to collaborate more closely. As more and more people in the U.S. get information through Wikidata instead of through Wikipedia due to the rise of virtual digital assistants (that rely heavily on structured data), Wikidata will play a more prominent role in Wiki Education’s future programmatic offerings.

    * * *

    Older blog entries