en.planet.wikimedia

October 18, 2017

Wikimedia Scoring Platform Team

Status update (October 6, 2017)

New language support for Bengali, Greek, and Tamil. New advance edit quality support for Albanian and Romanian. We cleaned up the old 'reverted' models where better support is available. We're working on moving to a new dedicated cluster. We improved some models by exploring new sources of signal and cleaning datasets. We started work on JADE and presented on The Keilana Effect at Wikimania.

See more details below.

New language support

We deployed basic edit quality support for Bengali, Greek, and Tamil. We've deployed advanced edit quality support for Albanian, Romanian. Progress was made towards new models for Latvian, Croatian, Bosnian, and Spanish, but these aren't deployed yet.

T166049: Deploy reverted model for elwiki
T156357: Deploy edit quality campaign for Romanian Wikipedia
T163009: Train/test damaging & goodfaith models for Albanian Wikipedia
T162031: Add language support for Latvian (lv)
T166048: Deploy reverted model for tawiki
T170490: Train reverted model for Bengali Wikipedia
T170491: Train reverted model for Greek Wikipedia
T174572: Reverted model for hrwiki
T173087: Add language support for Bosnian
T175628: Add LV dictionary to install.
T172046: Add language support for Croatian (hr.wiki)
T131963: Complete eswiki edit quality campaign
T174687: Add language support for Serbian

See this full table for reference,
https://www.mediawiki.org/wiki/ORES/Support_table

Moving to the new, dedicated cluster

Until now, we've been running ORES on a shared Services cluster. We're happy to announce that the ORES API will be served from a dedicated cluster, probably in a matter of weeks. Stress tests showed some issues that we're still resolving.
T117560: New Service Request: ORES
T169246: Stress/capacity test new ores* cluster

Cleaning up Wikilabels data

@Natalia found some systematic errors in our training data, and corrected several. We also improved the structure of the labeling form to make it more difficult to make cognitive mistakes while labeling.

https://meta.wikimedia.org/wiki/Research_talk:Automated_classification_of_edit_quality/Work_log/2017-07-24
T171491: Unlabeled goodfaith observations are assumed "false" -- should be "true"
T171497: Review training set to check strange examples of labels
T171493: Change "yes/no" in damaging_goodfaith form to "damaging/good" and "good-faith/bad-faith"

Maintenance and documentation

We've been working with Releng on git-lfs (Large File Storage) so that our repositories won't be so big but we'll still be able to maintain historical model versions.
T171619: ORES should use a git large file plugin for storing serialized binaries

We were able to begin work with @srodlund to improve our technical and user documentation.

See https://www.mediawiki.org/wiki/ORES/FAQ

Remove "reverted" model where advanced editquality models are available

This was a noteworthy cleanup: on any wiki where the "damaging" and "goodfaith" models are available, these should be used instead of the "reverted" model. To that end, we're removing the reverted model from these wikis. We held an RFC and no concerns were raised.
T171059: [RfC] Should we remove all reverted models when there is a damaging one?
T172370: Remove reverted models from editquality repo

More, better signal

We experimented with adding Flagged Revs data to our training set
https://meta.wikimedia.org/wiki/Research_talk:Automated_classification_of_edit_quality/Work_log/2017-07-26
T166235: Flagged revs approve model to fiwiki

@Sumit ran several experiments to see if word sentiment analysis could improve our classifier health. We were able to get marginal benefits and so implemented the strategy.
T167305: Experiment with Sentiment score feature for draftquality
T170177: Test draftquality sentiment feature on Editquality

@Natalia ran some experiments with including image-removals in the edit quality models and that didn't seem to affect performance.
T172049: [Investigate] Get signal from adding/removing images

@Nettrom cleaned up the article quality data for English Wikipedia and that allowed us to boost fitness in strange cases (e.g. redirect pages)
T170434: Improve cleaning of article quality assessment datasets

@Ladsgroup added strategies for scanning labels and descriptions for badwords.
T162617: Use 'informals', 'badwords', etc. in Wikidata feature set
T170834: Add basic bad word check to Wikidata feature set

New model proposal: Draft topic prediction

We're working on better ways for routing new page drafts to subject matter experts for review. See our documentation pages. We'll have datasets and some modeling experiments completed soon.

https://meta.wikimedia.org/wiki/Research:Automatic_new_article_topics_suggestion
https://commons.wikimedia.org/wiki/File:New_article_routing.with_ORES.svg

At Wikimania 2017

https://wikimania2017.wikimedia.org/wiki/Submissions/The_Keilana_Effect:_Visualizing_the_closing_coverage_gaps_with_ORES
T170015: [Workshop] How can I get ORES for my wiki?

JADE schema and design

We've spent some time planning how we'll implement the JADE system, which enables ORES users to give us feedback and have that feedback integrated into score results.

T175192: Design JADE scoring schema

For more info see the project's home page (https://www.mediawiki.org/wiki/JADE) and sub-pages https://www.mediawiki.org/wiki/JADE/Schema & https://www.mediawiki.org/wiki/JADE/Implementations

We're actively recruiting ORES stakeholders to be part of our working group.

Thresholds

We're in the process of rolling out a major refactor of the core revscoring library. One of the most exciting new features is the ability for ORES API consumers to fine-tune the thresholds used to define prediction intervals, e.g. "Very likely damaging". These thresholds will be different on every wiki, and the new interface allows us to query statistics built into the model, and satisfy criteria like "get me the threshold with the maximum filter rate, with a recall of at least 90%".

For more details see the blog post, Blog Post: More/better model information and "threshold optimizations"

by awight (Adam Roses Wight) at October 18, 2017 06:53 PM

Mahmoud Hashemi

Unlocking the human potential of Wikidata

Wikidata is amazing. Thanks to the amazing Wikidata evangelists out there, I feel confident that, given at least five minutes, I can convince anyone that Wikidata offers a critical service necessary for Wikipedia, other wiki projects, and generally the future of knowledge. Challengers welcome. :)

But Wikidata has a problem. Right now it’s optimized to ingest and grow. We’ve written about how it’s not ideal for maintenance of the datasets, but automated ingestion of datasets is what Wikidata does best.

All this automated growth doesn’t necessarily connect well with the organic growth of Wikipedia and other projects. And we can see that Wikidata hasn’t truly captured the positive attentions of existing editor communities.

For that human touch Wikidata needs ever so much, it must reach out to the projects that gave rise to it.

One idea for doing this would be to make human editing of Wikidata easier. Make editing Wikidata as easy as adding a citation to Wikipedia. Literally.

Highlight a statement, click a button like “Structure this statement” and grow Wikidata, all without leaving your home wiki.

image

What it might look like to edit Wikidata from Wikipedia.

While Wikitext will always have its place for me, I’ve quite warmed up to the visual editor, and prefer its interface for adding citations. While it’s only an idea for an experiment at the moment, how might an inline Wikidata editor following the same pattern could be change the game for Wikidata?

A lot of data is already citing back to home wikis, but a powerful enough editor could pull the citation on a statement through to the Wikidata entry, along with the import source.

image

So much data is already coming from Wikipedia, but humans can do even better.

I have always thought it would be great to see Wikidata’s support for multi-valued properties leveraged more fully. A language-agnostic knowledgebase will be a new space to compare and resolve facts. A meeting of the many minds across different languages of Wikipedia could spell better information for all.

An advanced enough system could encourage contributions on the basis of coverage, highlighting cited statements which have not yet been structured.

And of course, at the very least, we speed up building an intermediary representation of knowledge, not tied to a specific language. People sharing knowledge across wikis, helping to further bootstrap a Wikidata community with close ties to its older siblings.

October 18, 2017 02:00 PM

Wikimedia Foundation

Why I write about Star Trek

Cast members from the original Star Trek in front of the real-life space shuttle Enterprise. Photo by NASA, public domain.

Editor’s note: David Fuchs has written 49 “featured” articles on Wikipedia; all have been assessed by peer editors against stringent criteria. Several are about Star Trek, the science fiction series that has astounded and inspired millions of people since its debut in 1966, including articles on all six of the original films and on the villainous Khan Noonien Singh.

There’s no time in my life I can’t recall watching the science fiction franchise Star Trek. Even at a young age, I can recall watching old VHS tapes of the original 1960s series at my uncle’s; the Halloween episode “Catspaw” gave young me nightmares. As I grew up, airings of the newest episodes of Deep Space Nine, Voyager, and Enterprise were occasions that my entire family would get together and vocal actor. The end of high school coincided with the ignominious cancellation of the latter series, and I was in college when the rebooted J.J. Abrams-helmed Star Trek was released.

I’ve focused on a lot of different subjects during my career on Wikipedia—paleontological history, video games, books—but my work on Star Trek is something I’m particularly proud of. Alone, or occasionally with a few excellent collaborators, I’ve contributed to twelve “featured” articles on the Star Trek franchise, including the entire run of films from Star Trek: The Motion Picture to Star Trek VI: The Undiscovered Country.

I’ve always been drawn to writing about media on Wikipedia because I see it as a way of using my enthusiasm as a form of entertainment and education. As entertaining or boring as a movie might be, there’s an entirely different story lurking behind what ends up on screen, and I’ve found that those stories are often as engrossing as the final products. Much like Wikipedia, viewers or readers see a final product that was created only with the toil of many people behind the scenes, and in researching the Star Trek films and episodes in books, magazines, and on the web, stories ranging from the remarkable to the quiet, from the tragic to the comic, come up time and time again. As someone who found William Shatner’s performance in The Wrath of Khan remarkably affecting and subdued compared to the iconic scenery-chewing actor pop culture remembers him as, it was interesting to learn that director Nicholas Meyer evinced that performance from Shatner by boring the actor with repeated takes until he would stop hamming it up for the screen.

Delving into the production of Star Trek also gives some useful context in understanding the finished product. The fifth film, The Final Frontier, is often considered one of the worst in the series. Fault is often laid at director and cast member William Shatner’s feet, but in researching the subject I found that Shatner was faced with challenges that would have been tough for even an experienced director to handle, let alone a production that faced numerous setbacks, including a writers strike and sabotage of production vehicles. His ambitious, operatic film was cut down by budget woes and money poured into an unproven special effects house produced terrible results; the poor end results were less a result of Shatner’s purported ego, and more a novice director lacking support. The media we enjoy doesn’t spring fully-formed; it’s the effort of many people working together with different ideas and views, and in researching these articles you learn a bit more about them.

Finally, I enjoy writing about Star Trek because I hope that someone might read an article and get interested in the franchise as well. In today’s trying world climate, I think Star Trek is needed more than ever—an optimistic and remarkably apolitical vision of the future, where humanity has learned to come together for the greater good. It’s a hopeful spirit that is shared with the Wikipedia mission—that through easy access to knowledge, we can find some more understanding, and turn fiction into a better reality.

David Fuchs, Wikimedian

by David Fuchs at October 18, 2017 12:26 PM

Shyamal

Shocking tales from ornithology

Manipulative people have always made use of the dynamics of ingroups and outgroups to create diversions from bigger issues. The situation is made worse when misguided philosophies are peddled, especially by governments and when economics is placed ahead of ecology. The pursuit of easily gamed targets such as GDP makes it easy to gain support through economics, which is man-made. Nationalism, pride, other forms of chauvinism, the creation of enemies and the magnification of war threats are all additional tools in the arsenal of Machiavelli that can be effectively used for misdirecting the masses. One might imagine that the educated, especially the scientists, would be smart enough not to fall into these traps but cases from recent history should quickly dampen any hopes for such optimism.

There is a very interesting book in German by Eugeniusz Nowak called "Wissenschaftler in turbulenten Zeiten" or scientists in turbulent times that deals with the lives of ornithologists, conservationists and other naturalists during the Second World War. Preceded by a series of recollections published in various journals, the book was published in 2010 but I became aware of it only recently while translating some biographies (mostly linked here) into the English Wikipedia. I have not yet actually seen the book (it has about five pages on Salim Ali as well) and have to go by secondary quotations in other English content. Nowak was a student of Erwin Stresemann (with whom the first chapter deals with) and he writes about several European (but mostly German and Russian) ornithologists and their lives during the turbulent 1930s and 40s. Although Europe is pretty far from India, it did makes some ripples far away. Incidentally, Nowak's ornithological work includes studies on the expansion in range of the collared dove (Streptopelia decaocto) which the Germans called the Tuerkentaube, literally the "Turkish dove", a name with a baggage of cultural prejudices.

Nowak's first "recollections" paper notes that he - presents the facts not as accusations or indictments, but rather as a stimulus to the younger generation of scientists to consider the issues, in particular to think “What would I have done if I had lived there or at that time?” - a thought to keep as you read on.

A shocker from this period is a paper by Dr Günther Niethammer on the birds of Auschwitz (Birkenau). This paper (read it online here) was published when Niethammer was posted to the security at the main gate of the concentration camp. You might be forgiven if you thought he was just a victim of the war. Niethammer was a proud nationalist and volunteered to join the Nazi forces in 1937 leaving his position as a curator at the Museum Koenig at Bonn.
The contrast provided by Niethammer who looked at the birds on one side
while ignoring inhumanity on the other provided
novelist Arno Surminski with a title for his 2008 novel -
Die Vogelwelt von Auschwitz
- ie. the birdlife of Auschwitz.

G. Niethammer
Niethammer studied birds around Auschwitz and also shot ducks in numbers for himself and to supply the commandant of the camp Rudolf Höss (if the name does not mean anything please do go to the linked article / or search for the name online).  Upon the death of Niethammer, an obituary (open access PDF here) was published in the Ibis of 1975 - a tribute with little mention of the war years or the fact that he rose to the rank of Obersturmführer. The Bonn museum journal had a special tribute issue noting the works and influence of Niethammer. Among the many tributes is one by Hans Kumerloeve (starts here online). A subspecies of the common jay was named as Garrulus glandarius hansguentheri by Hungarian ornithologist Andreas Keve in 1967 after the first names of Kumerloeve and Niethammer. Fortunately for the poor jay, this name is a junior synonym of  G. g. anatoliae described by Seebohm in 1883.

Meanwhile inside Auschwitz, the Polish artist Wladyslaw Siwek was making sketches of everyday life  in the camp. After the war he became a zoological artist of repute. Unfortunately there is very little that is readily accessible to English readers on the internet.
Siwek, artist who documented life at Auschwitz
before working as a wildlife artist.
 
Hans Kumerloeve
Now for Dr Kumerloeve who also worked in the Museum Koenig at Bonn. His name was originally spelt Kummerlöwe and was, like Niethammer, a doctoral student of Johannes Meisenheimer. Kummerloeve and Niethammer made journeys on a small motorcyle to study the birds of Turkey. Kummerlöwe's political activities started earlier than Niethammer, joining the NSDAP (German: Nationalsozialistische Deutsche Arbeiterpartei = The National Socialist German Workers' Party)  in 1925 and starting the first student union of the party in 1933. Kummerlöwe soon became part of the Ahnenerbe, a think tank meant to give  "scientific" support to the party ideas on race and history. In 1939 he wrote an anthropological study on "Polish prisoners of war". At the museum in Dresden which he headed, he thought up ideas to promote politics and he published them in 1939 and 1940. After the war, it is thought that he went to all the European libraries that held copies of this journal (Anyone interested in hunting it down should look for copies of Abhandlungen und Berichte aus den Staatlichen Museen für Tierkunde und Völkerkunde in Dresden 20:1-15.) and purged them. According to Nowak, he even managed to get his hands on copies held in Moscow and Leningrad!  The Dresden museum was also home to the German ornithologist Adolf Bernhard Meyer (1840–1911). In 1858, he translated the works of Charles Darwin and Alfred Russel Wallace into German and introduced the ideas from evolutionary theory into a whole generation of scientists. Among Meyer's amazing works is a series of avian osteological works which use photography and depict birds in nearly-life-like positions - a less artistic precursor to Katrina van Grouw's 2012 book The Unfeathered Bird. Meyer's skeleton images can be found here. In 1904 Meyer was eased out of the Dresden museum because of rising anti-semitism. 

Nowak's book includes entries on the following scientists: (I keep this here partly for my reference as I intend to improve Wikipedia entries on several of them as and when time and resources permit. Would be amazing if others could pitch in!).
In the first of his recollection papers he writes about the reason for writing them (in his 1998 article) - he saw that the obituary to Prof. Ernst Schäfer  carefully avoiding any mention of his wartime activities. And this brings us to India. In a recent article in Indian Birds, Sylke Frahnert and others have written about the bird collections from Sikkim in the Berlin natural history museum. In their article there is a brief statement that "The  collection  in  Berlin  has  remained  almost  unknown due  to  the  political  circumstances  of  the  expedition". This might be a bit cryptic for many but the best read on the topic is Himmler's Crusade: The true story of the 1939 Nazi expedition into Tibet (2009) by Christopher Hale. Hale writes about Himmler: 
He revered the ancient cultures of India and the East, or at least his own weird vision of them.
These were not private enthusiasms, and they were certainly not harmless. Cranky pseudoscience nourished Himmler’s own murderous convictions about race and inspired ways of convincing others...
Himmler regarded himself not as the fantasist he was but as a patron of science. He believed that most conventional wisdom was bogus and that his power gave him a unique opportunity to promulgate new thinking. He founded the Ahnenerbe specifically to advance the study of the Aryan (or Nordic or Indo-German) race and its origins
From there Hale goes on to examine the motivations of Schäfer and his team. He looks at how much of the science was politically driven. Swastika signs dominate some of the photos from the expedition - as if it provided for a natural tie with Buddhism in Tibet. It seems that Himmler gave Schäfer the opportunity to rise within the political hierarchy. The team that went to Sikkim included Bruno Beger. Beger was a physical anthropologist but with less than innocent motivations although that would be much harder to ascribe to pursuits like botany and ornithology. One of the results from the expedition was a film made by the entomologist of the group, Ernst Krause - Geheimnis Tibet - or secret Tibet - a copy of this 1 hour and 40 minute film is on YouTube. At around 26 minutes, you can see Bruno Beger creating face casts - first as a negative in Plaster of Paris from which a positive copy was made using resin. Hale talks about how one of the Tibetans put into a cast with just straws to breathe from went into an epileptic seizure from the claustrophobia and fear induced. The real horror however is revealed when Hale quotes a May 1943 letter from an SS officer to Beger - ‘What exactly is happening with the Jewish heads? They are lying around and taking up valuable space . . . In my opinion, the most reasonable course of action is to send them to Strasbourg . . .’ Apparently Beger had to select some prisoners from Auschwitz who appeared to have Asiatic features. Hale shows that Beger knew the fate of his selection - they were gassed for research conducted by Beger and August Hirt.
SS-Sturmbannführer Schäfer at the head of the table in Lhasa

In all Hale, makes a clear case that the Schäfer mission had quite a bit of political activity underneath. We find that Sven Hedin (Schäfer was a big fan of him in his youth. Hedin was a Nazi sympathizer who funded and supported the mission) was in contact with fellow Nazi supporter Erica Schneider-Filchner and her father Wilhelm Filchner in India, both of whom were interned later at Satara. while Bruno Beger made contact with Subhash Chandra Bose more than once. [Two of the pictures from the Bundesarchiv show a certain Bhattacharya - who appears to be a chemist working on snake venom - one wonders if he is Abhinash Bhattacharya.]

Of course the war had impacts in the entire region and although my review of Nowak's book must be unique in that I have never managed to access it beyond some online snippets. It is clearly not the last word as there were many other interesting characters including the Russian ornithologist Malchevsky  who survived German bullets thanks to a fat bird observation notebook in his pocket! In the 1950's Trofim Lysenko, the crank scientist who controlled science in the USSR sought Malchevsky's help in proving his own pet theories - one of which was the ideas that cuckoos were the result of feeding hairy caterpillars to young warblers!

Issues arising from race and perceptions are of course not restricted to this period or region, one of the less glorious stories of the Smithsonian Institution concerns the honorary curator Robert Wilson Shufeldt (1850 – 1934) who in the infamous Audubon affair made his personal troubles with his second wife, a grand-daughter of Audubon, into one of race. He also wrote such books as America's Greatest Problem: The Negro (1915) in which we learn of the ideas of other scientists of the period like Edward Drinker Cope! Like many other obituaries, Shufeldt's is a classic whitewash.  

Even as recently as 2015, the University of Salzburg withdrew an honorary doctorate that they had given to the Nobel prize winning Konrad Lorenz for his support of the political setup and racial beliefs. It should not be that hard for scientists to figure out whether they are on the wrong side of history even if they are funded by the state. Perhaps salaried scientists in India would do well to look at the legal contracts they sign with their employers, the state, more carefully.

PS: Mixing natural history with war sometimes led to tragedy for the participants as well. In the case of Dr Manfred Oberdörffer who used his cover as an expert on leprosy to visit the borders of Afghanistan with entomologist Fred Hermann Brandt (1908–1994), an exchange of gunfire with British forces killed him although Brandt lived on to tell the tale.

by Shyamal L. (noreply@blogger.com) at October 18, 2017 03:32 AM

October 17, 2017

Wikimedia Tech Blog

How we collaborated to build a new open source plugin to improve search results across language-wikis

The Tower of Babel by Pieter Brueghel the Elder, public domain.

The Wikimedia Foundation’s Search Platform team recently worked with Daniel Worley and Doug Turnbull from Open Source Connections on a Learning to Rank plugin for Elasticsearch, the software the powers search on Wikimedia sites, designed to apply machine learning to search relevance ranking. We recently chatted with Erik Bernhardson, a software engineer at the Wikimedia Foundation, on how the plugin will help improve search results across language wikis.

Q: Erik, I know you initially used your 10% time to think through this idea. Can you share a little on how that led to thinking about a plug-in?

Erik: Initially I started think about this due to a talk I saw at the Lucene/Solr Revolution 2015 conference. I had also been pondering it a bit because Justin Ormont, an advisor to the search team, had suggested that machine learning could be a reasonable way to solve a subset of our search optimization problems.

One of the problems with optimizing search rankings is that we have to manually choose weights that decide how important various features are. Adding new features means manually retuning the weights. As my colleague Trey Jones noted in a recent blog post on search relevance, those features might include:

“How common each individual word is overall. (As the most common word in the English language, theis probably less important to a query than somewhat rarer friggatriskaidekaphobia.)

How frequently each word appears in a given article. (Five matches is probably better than four matches, right?)

Whether there are any matching words in the title or a redirect. (Well, if you search for just the, an article on English’s definite article does look pretty good.)

How close the words are to each other in the article or title. (Why is there a band called “The The”?)

Whether words match exactly or as related forms. (The query term resume also matches both resuming and résumé.)

How many words are in the query.

How many words are in the article. (Okay, maybe five matches in a twenty thousand word article might be worse than four matches in a five hundred word article.)

How often an article gets read. (Popular articles are probably better.)

How many other articles link to an article. (Did I mention that popular articles are probably better?”

Applying machine learning to this problem puts the complicated question of how to weigh these bits of evidence in the hands of a machine.The machine is much better equipped to handle optimizing the tradeoffs of, for example, increasing the importance given to how well the search query matches the categories of an article, as it is able to take into account what effect that has on a large sampling of queries. Machine learning is additionally able to apply non-linear transformations. We do this to some extent ourselves with the traditional search ranking, but it’s much easier for a computer.

In my 10% time I built out an offline machine learning pipeline. This is something where i can feed clickthrough data in one end, and at the other end it will output example result result pages for search queries. This was useful for proving machine learning was a workable idea, but rather useless for answering a user’s search query in a fraction of a second. That’s why we had the idea to build a plugin for Elasticsearch that was able to store machine learned models and apply them in response to user search queries.

Tell me a little bit about the Elasticsearch Learning To Rank plugin—what does it do, and what problem does it solve?

Erik: The Elasticsearch Learning To Rank plugin primarily allows us to apply a machine learned ranking algorithm for ranking search results. This is not an end-to-end solution, because collecting data for the machine learning to evaluate, deciding what features to provide to the algorithm, and training the actual models is all handled separately.

But this plugin provides a couple of important pieces of the puzzle. It acts as a data store for features—that is, definitions of how to calculate the individual data points that the machine learning utilizes—and models. It provides support for collecting feature vectors needed for training the model. And it performs the final, critical, step of actually using that model to rank queries in response to a user’s search request.

This is certainly not the only way to rank queries, but we have some unique constraints that make it particularly enticing. We run many wikis in many different languages. Ideally the search algorithm would be tuned for each site individually. The appropriate importance of various features may vary between Arabic Wikipedia, English Wikipedia, and German Wikipedia, for example. There are even larger differences between projects, such as Wikipedia and Wiktionary, even in the same language. But we have limited engineering resources devoted to search—and it is next to impossible for us to hand tune the search algorithm for each site.

In the past, we have tuned the search algorithm for English Wikipedia, and every other project and language gets that same tuning regardless of how well it works for their particular site. With machine learning, however, we can learn a model for each wiki that has enough search traffic for us to look at logs of searches and clicks to use statistical modeling to determine what results are preferred.

You mentioned parts of the machine learning that are not handled by the plugin, how does that work?

Erik: We have built up an additional project, highly specialized to our use case, called MjoLniR. MjoLniR does three main things: It applies a User Browsing Model to transform logs of searches and clicks into a relevance score for query/article pairs, It communicates with the Elasticsearch LTR plugin to collect feature vectors for query/article pairs, and it performs training and hyperparameter optimization of ML models in our analytics cluster using XGBoost.

While all of those parts are important, and we couldn’t move forward without them, the most important part of any machine learning process is the training data — a query, an article, and a relevance score denoting how well the query matches the article—used to train the model. We are currently building our training data based on user clickthroughs on search result pages. Clickthroughs from search to an article page are logged and stored for 90 days, per our privacy policy. We then apply a bit of clustering to group together queries that are similar but not exactly the same and apply a user browsing model to transform those clickthroughs into a relevance score.

For example, love, Love and LOVE go together, and loving, the lovings, The lovings, the loving and loving] get grouped together (etc). These groups of clickthroughs are fed into an algorithm called a DBN, or more generally a User Browsing Model. This model takes into account expected user behavior on search pages, such as expecting that if a user clicks the third search result, then they probably also looked at the first and second and rejected them.

The DBN also considers that if the user clicked the third result, then comes back to the search page to click on the fifth result, then the third result is attractive and more suitable than the first or second results, but that the fifth result is probably better for some reason. The DBN aggregates together user behavior for the groups of queries to estimate the relevancy of article and query pairs. We currently require at least ten distinct search sessions in each group of queries, but will be testing variations to determine how many sessions are actually required for reliable relevance data. For more information, I suggest reading Olivier Chapelle and Ya Zhang’s paper “A dynamic bayesian network click model for web search ranking.”

I’m curious about how the search algorithm might differ based on language—can you go into more detail about the ways in which they might be different?

Erik: It’s hard to explain exactly, but the various scores we compose together to create the final score per article/query pair do not have the same distribution across different wikis. Because the distribution varies, the optimal way to combine those values into a final score also varies. Perhaps one way to think of this would be the popularity score we use as a ranking signal—which is generally about the second strongest signal, right after how well the query matches the article title. This popularity score is the percentage of page views over the course of a week that went to that particular article.

Wikis with more articles are going to have lower average values for this, so the algorithm needs to apply slightly different weights. Because this gets a bit complicated we never actually started using the popularity score on wikis other than English Wikipedia prior to applying machine learning. There are of course statistical methods to handle this without going all the way to machine learning; a bit of math could be applied to re-center the data and make it look relatively similar between the wikis, but it would still fundamentally differ between languages such that no exact weight would be correct for all wikis.

Can anyone use this plug-in and if so, how? How can they get more deeply involved in the project?

Erik: Anyone using Elasticsearch, a popular open source search engine, can use the plugin. As mentioned above though the plugin is only the final step. The best way to get involved is to visit the project page on GitHub and try it out. There is a demo included in the repository that demonstrates a full pipeline of building a training set, training a model, loading the model into Elasticsearch, and then ranking search results with the model.

 

How do you know the machine learning results are as good as the hand-tuned results? How much better are they?

Erik: There are a variety of evaluation metrics that can be applied both offline, with judgement lists, and online, with A/B tests. For offline testing we use the same judgement lists that are used to train the model with cross validation to estimate the performance of the model.

Currently we use judgement lists generated by applying statistical models of user behavior to clickthroughs on queries. We are looking to augment this with human judgements via the relevance surveys currently being run on article pages. The Offline evaluation of the machine learning models with our first generation feature set show an improvement of between 20% and 40% of the possible improvement (this varies per-wiki) on the NDCG@10 evaluation metric.

Our online evaluation of the machine learning models has only been completed on English Wikipedia thus far, but we are currently running A/B tests on 18 more language Wikipedias which represent almost all wikis with > 1% of full text search volume.

On English Wikipedia the A/B test showed a clickthrough rate of 106% of the baseline, and session abandonment (sessions that search but don’t click on any results) at 98% of the baseline. While these numbers are not particularly huge, they only represent our initial foray into machine learned ranking. The goal of the very first model is to match or slightly exceed the quality of the existing search results, as there are quite a few pieces to build and tie together to get to this point. Even matching the existing performance means the process of collecting click data, generating judgement lists, collecting feature vectors, training models, and running those models in production is all working correctly. With all those pieces working it is now significantly easier for us to evaluate new ranking signals and improve the scoring function going forward.

Does the machine learning model get out of date?

Erik: Yes. Over time the distribution of feature values changes due to changes in the content of the wiki and the model needs to be retrained with the new feature values. The judgement lists used to train the model also change over time, as we aggregate together the last 90 days of search results and clickthroughs to build the judgement lists. On the largest wikis the percentage of content that is changed is low, relative to the total amount of content, so the feature values don’t change too much. We haven’t been running the machine learned ranking for long enough to say how much the judgement lists change over time, but I expect there to be some variation for two reasons: One is that user behavior can change over time, and second is that the results we return to users change over time, so the data the algorithms have to work with changes.

Do different people get different results?

Erik: While personalization of results can be a feature of machine learned ranking, we don’t apply any form of personalization at the user level or at higher levels, such as by geographic region. With respect to single user personalization, we don’t associate the data used for training, such as web requests or searches, to uniquely identifiable information that would allow for aggregating months of data together for individual users. This is a conscious decision to intrude as little as possible on user privacy while building out a ranking system that learns from user behavior.

Melody Kramer, Senior Audience Development Manager, Communications
Wikimedia Foundation

Thank you to Mairead Whitford Jones for asking really good questions for this post.

by Melody Kramer at October 17, 2017 04:36 PM

Wiki Education Foundation

Domestic Violence Awareness Month and Feminist Economics on Wikipedia

October is Domestic Violence Awareness month, and has been observed as such since 1981. This week specifically, October 15-21 2017, is the National Network to End Domestic Violence‘s Week of Action. During this time, organizations and advocacy groups work to educate the public on programs, services, and community resources to prevent violence and to support survivors. Knowing one’s rights and access to support can be vital and life-saving. Thus, Wikipedia serves as an important resource for such informational literacy. While domestic violence affects all genders, student work on Wikipedia around resources and policies affecting women is the focus of our post today.

Women and girls make up half of the world’s population. Some make it their life’s work to increase awareness of female related matters and to advocate on behalf of other women who can’t speak for themselves. Others contribute by way of education. In the spring of 2017 Dr. Diana Strassmann taught a class on Feminist Economics and Public Policy at the University of Chicago, where students were tasked with writing about topics such as gender relations and the organization of domestic and market work; violence against women; and healthcare.

Domestic violence is an ongoing concern in the world that has yet to be resolved. Part of the issue lies within how society reacts to domestic violence on both the macro and micro level. This grows increasingly more complicated as one examines domestic abuse in different countries, each with its own unique culture and way of addressing problems. One of Dr. Strassmann’s students addressed the subject of domestic violence in Brazil by greatly expanding the topic’s article to approximately 3-4 times its original size. Among the content they added was information about the Domestic Violence Law of 2006, which provided the country’s first legal form of protection for survivors of domestic violence. The law is named Lei Maria da Penha, after a woman by the same name who survived years of domestic abuse and advocated for stronger laws to protect other survivors.

Development aid can greatly assist a country or group with national goals around gender equality. One student chose to expand the article on development aid to include information on types of aid. They added information on the history of development aid for gender equality, beginning with the UN Decade for Women in 1975. Another student expanded Wikipedia’s coverage of the gender responsive approach for girls in the juvenile justice system. This new approach looks at specific issues that may cause female individuals to enter the justice system. The approach also analyzes their needs, as well as gender specific ways to help keep them from re-entering. A similar approach is also being used in some classes in the United States, where educators use culturally responsive curriculum to make the coursework more approachable to students from various backgrounds and cultures. And speaking of education, another student contributed content on female education in West Africa, showing how the region’s history and culture contributed to the quality and format of female education. Much work has been done to lessen the gender disparity in education, but there is still more that needs to be done and at all levels of education; women are still in the minority when it comes to people seeking higher education in West Africa.

Wikipedia has a wealth of knowledge, however the site cannot grow without users contributing and correcting its information. Editing is a wonderful way to teach your students about technical writing, collaboration, and sourcing in a unique learning environment. In a Wikipedia assignment, your students also have the opportunity to contribute to political and social topics that impact important societal discussions.

If you are interested in using Wikipedia with your next class, please contact Wiki Education at contact@wikiedu.org to find out how you can gain access to tools, online trainings, and printed materials.

ImageMariaDaPenha.jpg, by Antonio Cruz/ABr, CC BY 3.0 BR, via Wikimedia Commons.

by Shalor Toncray at October 17, 2017 04:23 PM

Jeroen De Dauw

Introduction to Iterators and Generators in PHP

In this post I demonstrate an effective way to create iterators and generators in PHP and provide an example of a scenario in which using them makes sense.

Generators have been around since PHP 5.5, and iterators have been around since the Planck epoch. Even so, a lot of PHP developers do not know how to use them well and cannot recognize situations in which they are helpful. In this blog post I share insights I have gained over the years, that when sharing, always got an interested response from colleague developers. The post goes beyond the basics, provides a real world example, and includes a few tips and tricks. To not leave out those unfamiliar with Iterators the post starts with the “What are Iterators” section, which you can safely skip if you can already answer that question.

What are Iterators

PHP has an Iterator interface that you can implement to represent a collection. You can loop over an instance of an Iterator just like you can loop over an array:

function doStuff(Iterator $things) {
    foreach ($things as $thing) { /* ... */ }
}

Why would you bother implementing an Iterator subclass rather than just using an array? Let’s look at an example.

Imagine you have a directory with a bunch of text files. One of the files contains an ASCII NyanCat (~=[,,_,,]:3). It is the task of our code to find which file the NyanCat is hiding in.

We can get all the files by doing a glob( $path . '*.txt' ) and we can get the contents for a file with a file_get_contents. We could just have a foreach going over the glob result that does the file_get_contents. Luckily we realize this would violate separation of concerns and make the “does this file contain NyanCat” logic hard to test since it will be bound to the filesystem access code. Hence we create a function that gets the contents of the files, and ones with our logic in it:

function getContentsOfTextFiles(): array {
    // glob and file_get_contents
}

function findTextWithNyanCat(array $texts) {
    foreach ($texts as $text) { if ( /* ... */ ) { /* ... */ } }
}

function findNyanCat() {
    findTextWithNyanCat(getContentsOfTextFiles());
}

While this approach is decoupled, a big drawback is that now we need to fetch the contents of all files and keep all of that in memory before we even start executing any of our logic. If NyanCat is hiding in the first file, we’ll have fetched the contents of all others for nothing. We can avoid this by using an Iterator, as they can fetch their values on demand: they are lazy.

class TextFileIterator implements Iterator {
    /* ... */
    public function current() {
        // return file_get_contents
    }
    /* ... */
}

function findTextWithNyanCat(Iterator $texts) {
    foreach ($texts as $text) { if ( /* ... */ ) { /* ... */ } }
}

function findNyanCat() {
    findTextWithNyanCat(new TextFileIterator());
}

Our TextFileIterator gives us a nice place to put all the filesystem code, while to the outside just looking like a collection of texts. The function housing our logic, findTextWithNyanCat, does not know that the text comes from the filesystem. This means that if you decide to get texts from the database, you could just create a new DatabaseTextBlobIterator and pass it to the logic function without making any changes to the latter. Similarly, when testing the logic function, you can give it an ArrayIterator.

function testFindTextWithNyanCat() {
    /* ... */
    findTextWithNyanCat(new ArrayIterator(['test text', '~=[,,_,,]:3']));
    /* ... */
}

I wrote more about basic Iterator functionality in Lazy iterators in PHP and Python and Some fun with iterators. I also blogged about a library that provides some (Wikidata specific) iterators and a CLI tool build around an Iterator. For more on how generators work, see the off-site post Generators in PHP.

PHP’s collection type hierarchy

Let’s start by looking at PHP’s type hierarchy for collections as of PHP 7.1. These are the core types that I think are most important:

  •  iterable
    • array
    • Traversable
      • Iterator
        • Generator
      • IteratorAggregate

At the very top we have iterable, the supertype of both array and Traversable. If you are not familiar with this type or are using a version of PHP older than 7.1, don’t worry, we don’t need it for the rest of this blog post.

Iterator is the subtype of Traversable, and the same goes for IteratorAggregate. The standard library iterator_ functions such as iterator_to_array all take a Traversable. This is important since it means you can give them an IteratorAggregate, even though it is not an Iterator. Later on in this post we’ll get back to what exactly an IteratorAggregate is and why it is useful.

Finally we have Generator, which is a subtype of Iterator. That means all functions that accept an Iterator can be given a Generator, and, by extension, that you can use generators in combination with the Iterator classes in the Standard PHP Library such as LimitIterator and CachingIterator.

IteratorAggregate + Generator = <3

Generators are a nice and easy way to create iterators. Often you’ll only loop over them once, and not have any problem. However beware that generators create iterators that are not rewindable, which means that if you loop over them more than once, you’ll get an exception.

Imagine the scenario where you pass in a generator to a service that accepts an instance of Traversable:

$aGenerator = function() { /* ... yield ... */ };
$aService->doStuff($aGenerator());

public function doStuff(Traversable $things) {
    foreach ($things as $thing) { /* ... */ }
}

The service class in which doStuff resides does not know it is getting a Generator, it just knows it is getting a Traversable. When working on this class, it is entirely reasonable to iterate though $things a second time.

public function doStuff(Traversable $things) {
    foreach ($things as $thing) { /* ... */ }
    foreach ($things as $thing) { /* ... */ } // Boom if Generator!
}

This blows up if the provided $things is a Generator, because generators are non-rewindable. Note that it does not matter how you iterate through the value. Calling iterator_to_array with $things has the exact same result as using it in a foreach loop. Most, if not all, generators I have written, do not use resources or state that inherently prevents them from being rewindable. So the double-iteration issue can be unexpected and seemingly silly.

There is a simple and easy way to get around it though. This is where IteratorAggregate comes in. Classes implementing IteratorAggregate must implement the getIterator() method, which returns a Traversable. Creating one of these is extremely trivial:

class AwesomeWords implements \IteratorAggregate {
    public function getIterator() {
        yield 'So';
        yield 'Much';
        yield 'Such';
    }
}

If you call getIterator, you’ll get a Generator instance, just like you’d expect. However, normally you never call this method. Instead you use the IteratorAggregate just as if it was an Iterator, by passing it to code that expects a Traversable. (This is also why usually you want to accept Traversable and not just Iterator.) We can now call our service that loops over the $things twice without any problem:

$aService->doStuff(new AwesomeWords()); // no boom!

By using IteratorAggregate we did not just solve the non-rewindable problem, we also found a good way to share our code. Sometimes it makes sense to use the code of a Generator in multiple classes, and sometimes it makes sense to have dedicated tests for the Generator. In both cases having a dedicated class and file to put it in is very helpful, and a lot nicer than exposing the generator via some public static function.

For cases where it does not make sense to share a Generator and you want to keep it entirely private, you might need to deal with the non-rewindable problem. For those cases you can use my Rewindable Generator library, which allows making your generators rewindable by wrapping their creation function:

$aGenerator = function() { /* ... yield ... */ };
$aService->doStuff(new RewindableGenerator($aGenerator));

A real-world example

A few months ago I refactored some code part of the Wikimedia Deutschland fundraising codebase. This code gets the filesystem paths of email templates by looking in a set of specified directories.

private function getMailTemplatesOnDisk( array $mailTemplatePaths ): array {
    $mailTemplatesOnDisk = [];

    foreach ( $mailTemplatePaths as $path ) {
        $mailFilesInFolder = glob( $path . '/Mail_*' );
        array_walk( $mailFilesInFolder, function( & $filename ) {
            $filename = basename( $filename ); // this would cause problems w/ mail templates in sub-folders
        } );
        $mailTemplatesOnDisk = array_merge( $mailTemplatesOnDisk, $mailFilesInFolder );
    }

    return $mailTemplatesOnDisk;
}

This code made the class bound to the filesystem, which made it hard to test. In fact, this code was not tested. Furthermore, this code irked me, since I like code to be on the functional side. The array_walk mutates its by-reference variable and the assignment at the end of the loop mutates the return variable.

This was refactored using the awesome IteratorAggregate + Generator combo:

class MailTemplateFilenameTraversable implements \IteratorAggregate {
	public function __construct( array $mailTemplatePaths ) {
		$this->mailTemplatePaths = $mailTemplatePaths;
	}

	public function getIterator() {
		foreach ( $this->mailTemplatePaths as $path ) {
			foreach ( glob( $path . '/Mail_*' ) as $fileName ) {
				yield basename( $fileName );
			}
		}
	}
}

Much easier to read/understand code, no state mutation whatsoever, good separation of concerns, easier testing and reusability of this collection building code elsewhere.

See also: Use cases for PHP generators (off-site post).

Tips and Tricks

Generators can yield key value pairs:

yield "Iterators" => "are useful";
yield "Generators" => "are awesome";
// [ "Iterators" => "are useful", "Generators" => "are awesome" ]

You can use yield in PHPUnit data providers.

You can yield from an iterable.

yield from [1, 2, 3];
yield from new ArrayIterator([4, 5]);
// 1, 2, 3, 4, 5

// Flattens iterable[] into Generator
foreach ($collections as $collection) {
    yield from $collection;
}

Thanks for Leszek Manicki and Jan Dittrich for reviewing this blog post.

by Jeroen at October 17, 2017 04:49 AM

October 16, 2017

Wikimedia Foundation

Raju Narisetti joins Wikimedia Foundation Board of Trustees

Photo by Niccolò Caranti, CC BY-SA 4.0.

The Wikimedia Foundation today announced the appointment of Raju Narisetti, a veteran media executive and journalist, to the Wikimedia Foundation Board of Trustees.

Raju brings more than 29 years of media experience across three continents. He is currently CEO of Univision Communications Inc’s Gizmodo Media Group, the publisher of Gizmodo, Jezebel, Lifehacker, The Root, and others.

“Raju has dedicated his life’s work to information as a public service. His commitment to editorial integrity, independence, and inclusion is deeply aligned with Wikimedia values. His passion and expertise in digital strategy and international growth will be invaluable to our movement’s future as we advance our global free knowledge mission,” said Wikimedia Foundation Executive Director Katherine Maher.

Prior to joining the Gizmodo Media Group, Raju served as Senior Vice President, Strategy, at News Corp, one of the largest media companies in the world and the publisher of The Wall Street Journal and The Times of London. In that role, Raju was responsible for identifying new digital growth opportunities globally for News Corp.

“There has never been more urgency in Wikipedia’s 16-year history than now, for upholding the values of free exchange of information and knowledge,” said Raju. “Despite mounting challenges around the world, rapid innovation is creating tremendous opportunities for the Wikimedia Foundation. I have much to learn, but am also looking forward to lending my nearly three decades of global media experiences to the movement, to help engage more digital and mobile audiences, particularly diverse young people, and harness their energy to benefit from—and support—the vital values that underpin all Wikimedia initiatives.”

Before joining News Corp, Raju spent nearly 25 years as a journalist and editor. He started at The Economic Times in India before moving to The Dayton Daily News (Ohio), The Wall Street Journal (WSJ), and The Washington Post. Starting out as a summer intern at WSJ, he eventually became Editor of The Wall Street Journal Europe and later Managing Editor of WSJ’s digital newsrooms. At The Washington Post, he was the Managing Editor who led the Post’s rethinking of its separate digital and print newsrooms and operations.

A native of Hyderabad, India, Raju is also the founder of Mint, currently India’s second-largest daily business newspaper by circulation.

“Raju’s extensive international and journalistic experience will add valuable perspective to the Board as we look to bring new voices from around the world into our movement. I am impressed by his willingness to learn about and embrace the values behind the Wikimedia movement, and look forward to working with him to support our free knowledge mission,” said Nataliia Tymkiv, Governance Chair for the Board.

Raju is currently the Vice-chair for the Board of Directors of the International Center for Journalists, as well as a member of the Board of Trustees for the Institute of International Education, which administers the Fulbright Scholarship programmes. He lives in Brooklyn, New York.

Raju joins eight other Foundation Trustees who collectively bring expertise in the Wikimedia community, financial oversight, governance, and organizational development; and a commitment to advancing Wikimedia’s mission of free knowledge for all.

He was approved unanimously by the Wikimedia Foundation Board of Trustees. His term is effective October 2017 and will continue for three years. Please see the Wikimedia Foundation’s Board of Trustees page for complete biographies.

by Wikimedia Foundation at October 16, 2017 06:25 PM

Wiki Education Foundation

The Right Fit

Marcia Harrison-Pitaniello is a Professor of Biological Sciences at Marshall University. In this post, she shares her experience integrating Wikipedia-based assignments into two different biology courses.

Marcia Harrison-Pitaniello Image: Photo-Harrison-Pitaniello2017.jpg, by Marcia Harrison-Pitaniello, CC BY-SA 4.0, via Wikimedia Commons.

I routinely teach writing intensive courses and am always interested in providing exercises that allow students to write for real audiences. When I first heard about Wiki Education at a plant physiology conference, I felt that a Wikipedia editing assignment would be a good fit for my courses. While I was eager to enroll my course, the biggest obstacle was that my fall course, Principles of Cell Biology, has over 70 students. I took the plunge and, now that I have two semesters’ worth of trials and errors, I can relate some of my successes and challenges in using these exercises in my different biology classes.

Principles of Cell Biology is a required core course for our majors. Students may enroll after completing the introductory sequence, but many have taken genetics, microbiology, or biochemistry. Therefore, the level of expertise varies quite a bit among the students. Additionally, peer-reviewed research articles in cell biology are very technical and difficult for undergraduate students.

Setting up the course was the easy part. The Wiki Education Dashboard is well organized and the tutorials are very helpful. The Dashboard’s assignment wizard suggested assigning a small addition per student when dealing with large number of students. This established a timeline where the students were assigned to review one article and make one addition. Allowing students to select and review an article relevant to the class showed me their interests. Most students completed the initial training modules and review assignment, which was a graded take-home assignment. However, even with Teaching Assistant help, it took some time to read and grade all the reviews. The editing portion of the assignment was also graded, but was less successful with fewer than half of the students completing both the training and article addition. Part of this was my fault. While training was included in the Add-to-an-Article assignment, I did not include it in the grading rubric. Therefore, many students skipped the modules as if they were optional, and not part of the editing process.

In the long run, I found 70 student editors cumbersome and the level of expertise in writing quite variable. I also found that some students selected topics that were beyond their expertise, and others lacked confidence to add their edits. Even though the additions were short, grading required extra time to consider how the wording fit into the current text of each article.

The next semester I taught a graduate seminar course which emphasizes scientific communication. Communicating to a general audience using Wikipedia was an ideal addition, since all the students were engaged in thesis work or had completed research reports. Therefore, article selection was easy and each student already had a set of good references to work with. For this class, I shortened the recommended timeline, since selecting and critiquing an existing article did not take much time. The exercise was organized so that the material the students used as background in the introductory information in their seminar presentation matched their Wikipedia entries. This became an excellent venue for discussing different ways to present the same scientific information. The students enjoyed the assignment, and, in a couple of cases, their assigned article was nicely improved. For example, one student added to the sections on reproduction, and feeding behavior and diet in the stingray article, making the article more balanced in terms of biology content. Another student expanded the introduction of the environmental DNA (eDNA) article to include its potential scientific uses, and added information about eDNA in terrestrial sediments. Overall, the students added almost 7,000 words to 13 articles which were viewed a total of 691,000 times.

Now I’m back teaching the larger cell biology course and plan on a more focused and hopefully saner approach. This time, each lab section will work on improving a single article. While each student will only make a minor edit, the intent is to collectively make a substantial improvement to the article. This will make it easier to find good references and review the edits. In preparation, I reviewed topics this summer, selecting several that needed substantial work and fit the course content. The students will vote on which article they prefer to work on, and we will work on the “winner”. Now that I know the system better, I find that it is easy to customize the course after the initial set-up. This is a short assignment, but I modified the timeline so that it begins early in the semester and end by exam 3. This will give us time to cover the relevant material in class and discuss references. I have also incorporated editing sessions during the lecture so that the entire class is involved in providing suggested edits for portions of the articles. I have also added assignments to Blackboard so I can grade and provide feedback for reviews more easily. However, this time, completion of the training is required or the assignment is not graded.

Overall, I found that students enjoyed making a real world contribution to science education, some gained more confidence in writing about science, and all learned more about how Wikipedia works. I also found that I ended up working on some of the articles myself, especially if we found sections that needed more expertise.

Image: Huntington, WV (Marshall University).jpg, by dpursoo, CC BY-SA 3.0 via Wikimedia Commons.

by Guest Contributor at October 16, 2017 04:36 PM

Wikimedia Foundation

The crowdsourcing fallacy

Photo by Pasu Au Yeung, CC BY 2.0.

It’s easy to look at projects like Wikipedia, Reddit, Duolingo, StackOverflow, or Zooniverse and think, we just need to take problem X, add some “crowd” to it (like a magic spell or recipe), and then boom: breakaway success.

If only it worked like that.

Most crowdsourcing projects, like most human efforts, fail.

Crowds are phenomenal tools because they’re made up of people, and people are the most important resource in any initiative.

But.

Your crowdsourcing effort will most likely fail if…

  • your crowd is not diverse.
  • your crowd all thinks alike.
  • their task is not clear.
  • their mission is not compelling.
  • the technical platform is poorly designed or overly complicated.
  • there are not continued areas for growth and engagement over time.
  • the interface and the organizers are not responsive to change.
  • the community lacks social moderation or healthy behavioral norms.
  • it lacks mechanisms to address technical abuse and human harassment.
  • you do not recognize or empower the core users of your platform.
  • you lock it down and people have to jump through hoops to participate.
  • potential users lack free time, skills, access or awareness to contribute.
  • volunteers are hampered by legal restrictions or monetization attempts.
  • another more interesting or better crafted opportunity comes along.
  • you never attract enough people to have a crowd in the first place.

The above is not a guaranteed checklist. You can do them all and still fail. Or, if your project mysteriously gets popular really quickly, because it scratches an irresistible itch or fills an unmet need, you might be an exception in which despite gaps in the list, it mostly all comes together, and no one knows what the magic secret was, and you wind up with the philosophical tagline, “it only works in practice; in theory it could never work” (an actual aphorism about Wikipedia).

Your crowdsourcing effort will fail, most of the time, because most things fail. And because important things are hard.

Success as a project also doesn’t mean you are perfect by any means. Wikipedia and Reddit still struggle with serious harassment issues. In some cases, you can “get away with” a level of perfomance that is sufficient but far from ideal. It’s also possible that you can only get away with it for so long before it threatens the core, sustainability, or growth of the project.

Knowing all this, next time you have a problem and want to add some crowd to it, at least consider the people, ideology, task, mission, platform, journey, adaptations, mores, resiliency, motivators, barriers to entry, prerequisites, distractions, competitors, and core users.

Then, just maybe, you too can harness the power of the crowd!

Jake Orlowitz, Wikimedian

This post is inspired by the work and writings of Clay Shirky, Jimmy Wales, Alexis Ohanian, Luis von Ahn, Joel Spolsky, Chris Lintott, Joseph Reagle, Anasuya Sengupta, Siko Bouterse, Jonathan Morgan, Andrew G. West, Aaron Halfaker, and Sue Gardner.

This text is licensed CC-BY-SA 4.0. It can be shared or reposted without permission under the terms of this Creative Commons license, which requires only attribution and that reusers keep the same terms.

———

Editor’s note: This post was republished from Jake Orlowitz’s Medium blog. While he is an employee of the Wikimedia Foundation, it was written in a personal volunteer capacity. The views expressed are the author’s alone and not necessarily held by the Wikimedia Foundation.

by Jake Orlowitz at October 16, 2017 03:27 PM

Tech News

Tech News issue #42, 2017 (October 16, 2017)

TriangleArrow-Left.svgprevious 2017, week 42 (Monday 16 October 2017) nextTriangleArrow-Right.svg
Other languages:
العربية • ‎čeština • ‎English • ‎español • ‎suomi • ‎français • ‎עברית • ‎italiano • ‎日本語 • ‎ಕನ್ನಡ • ‎português do Brasil • ‎русский • ‎svenska • ‎українська • ‎中文

October 16, 2017 12:00 AM

October 15, 2017

Wikimedia Foundation

Wikimedia Research Newsletter, July 2017

“Problematizing and Addressing the Article-as-Concept Assumption in Wikipedia”

Reviewed by Thomas Niebler

This paper was also presented in the June 2017 WikiResearch showcase

 

In several Wikipedia-based systems and scientific analyses, researchers have assumed that no two articles in Wikipedia represent the same concept, i.e. a semantically closed description of a specific item, for example “New York City“. Lin et al. however published a paper at CSCW’17[1] where they showed that this “article-as-concept” assumption does in fact not hold: The abovementioned article about “New York City” has a separate sub-article about the “History of New York City“, which describes a topic very closely related to “New York City” and could at the same time easily be merged into the original article. This way of splitting up lengthy articles into several smaller ones (“summary style“, more specifically “article size“) may improve readability for human users, but seriously impairs many studies based on the “article-as-concept” assumption. Using a simple classification approach on features based on both the link structure as well as semantic aspects of the title and the context, the authors identified 70.8% of the top 1000 visited pages which have been split up into articles and sub-articles, with an average of 7.5 sub-articles per article, thus stating that the existence of sub-articles is not the exception, but the rule.

A drawback with the proposed sub-article relationship detection method, as stated in the paper, is that it is trained only on explicitly encoded sub-article relationships; it is yet unsure how to detect implicit relationships, i.e. where no editor has linked the sub-article with the main article. Still, this presents the first step into a deeper analysis of the Wikipedia page network to make it at the same time better readable for humans, but also easily exploitable for many algorithms.

Briefly

85% of German scientists use Wikipedia, and other European media survey results

Summary by Tilman Bayer

A survey among 1,354 German academic researchers about their professional use of social media found Wikipedia to be the most widely used site as of 2015, with 84.7%.[2] Among German internet users in general, 79% use Wikipedia. Only 2% of these Wikipedia readers think it’s “never reliable” and 80% hold it is “mostly” (“größtenteils”) reliable.[3] A report by the German Monopolkommission (which advises the government on antitrust matters) on potential monopoly problems in the Internet search engine market highlighted Wikipedia as the top 10 website in Germany that is by far the most dependent on Google, with around 80% of its traffic (according to third-party data from SimilarWeb that is not quite consistent with the Wikimedia Foundation’s own data).[4]

In France, surveys by the Institut national de la statistique et des études économiques (INSEE) found that from 2011 to 2013, the ratio of people who use the internet to consult Wikipedia (“or any other collaborative online encylopedia”) rose from 39% to 51%. Wikipedia usage was higher among younger internet users and among those with degrees – 82% among 16-24 year olds, 54% among 25-54 year olds, and only 31% among 55-74 year olds.[5] The corresponding Eurostat data gave 45% for the entire European Union as of 2015.[6]

In contrast, Ofcom found that only 2-4% of UK 12-15 year olds use Wikipedia as first stop for information as of 2015.[7]

In the meantime, a 2016 Knight Foundation report, based on a study by Nielsen, found that “Among mobile sites [in the US], Wikipedia reigns in terms of popularity (the app does well too) and amount of time users spend on the entity. Wikipedia’s site reaches almost one-third of the total mobile population each month”.[8]

Conferences and events

See the research events page on Meta-wiki for upcoming conferences and events, including submission deadlines.

Other recent publications

Other recent publications that could not be covered in time for this issue include the items listed below. contributions are always welcome for reviewing or summarizing newly published research.

Compiled by Tilman Bayer
  • “Intellectual interchanges in the history of the massive online open-editing encyclopedia, Wikipedia”[9] From the abstract: “[Its] open-editing nature may give us prejudice that Wikipedia is an unstable and unreliable source; yet many studies suggest that Wikipedia is even more accurate and self-consistent than traditional encyclopedias. Scholars have attempted to understand such extraordinary credibility, but usually used the number of edits as the unit of time, without consideration of real time. In this work, we probe the formation of such collective intelligence through a systematic analysis using the entire history of English Wikipedia articles, between 2001 and 2014. … [We] find the existence of distinct growth patterns that are unobserved by utilizing the number of edits as the unit of time. To account for these results, we present a mechanistic model that adopts the article editing dynamics based on both editor-editor and editor-article interactions.. .. [The] model indicates that infrequently referred articles tend to grow faster than frequently referred ones, and articles attracting a high motivation to edit counterintuitively reduce the number of participants. We suggest that this decay of participants eventually brings inequality among the editors, which will become more severe with time.”

This paper was also presented in the February 2017 Wikimedia Research showcase

  • “Not at Home on the Range: Peer Production and the Urban/Rural Divide”[10] From the abstract and paper: “We find that in both Wikipedia and OpenStreetMap, peer-produced content about rural areas is of systematically lower quality, is less likely to have been produced by contributors who focus on the local area, and is more likely to have been generated by automated software agents (i.e. ‘bots’)”, however there is a “substantial rural advantage in the per capita quantity of peer-produced information.”
  • “Understanding the Role of Participative Web within Collaborative Culture: The Case of Wikipedia”[11] From the abstract: “This article will use Wikipedia as an example to illustrate about what the term “participative webs” exactly means. From perspectives of collaborative culture, this study will emphasize the role that participative website plays in knowledge-creating and knowledge-sharing […] and discuss how collaborative culture reflects the role participative web is equipped.”
  • “From Freebase to Wikidata: The Great Migration”[12] From the abstract: “The two major collaborative knowledge bases are Wikimedia’s Wikidata and Google’s Freebase. Due to the success of Wikidata, Google decided in 2014 to offer the content of Freebase to the Wikidata community. In this paper, we report on the ongoing transfer efforts and data mapping challenges, and provide an analysis of the effort so far. […] Throughout the migration, we have gained deep insights into both Wikidata and Freebase, and share and discuss detailed statistics on both knowledge bases.”

References

  1. Lin, Yilun; Yu, Bowen; Hall, Andrew; Hecht, Brent (2017). Problematizing and Addressing the Article-as-Concept Assumption in Wikipedia. CSCW ’17. New York, NY, USA: ACM. pp. 2052–2067. ISBN 9781450343350. doi:10.1145/2998181.2998274. 
  2. Siegfried, Doreen (2015-11-06). “Social Media: Forschende nutzen am häufigsten Wikipedia”. ZBW Website.  (in German)
  3. “Vier von fünf Internetnutzern recherchieren bei Wikipedia”. Bitkom. 2016-01-11. 
  4. http://www.monopolkommission.de/images/PDF/SG/SG68/S68_volltext.pdf
  5. “Ce que l’on sait sur les usages de Wikipedia en France”. 2017-07-10. 
  6. “Individuals using the internet for consulting wiki.”. Eurostat – Tables, Graphs and Maps Interface (TGM). 
  7. Children’s Media Use and Attitudes Report 2015 Section 6 – Knowledge and understanding of media among 8-15s (PDF). United Kingdom: Ofcom. 2015. p. 16. 
  8. Foundation, Knight (2016-05-11). “Mobile America: How Different Audiences Tap Mobile News”. 
  9. Yun, Jinhyuk; Lee, Sang Hoon; Jeong, Hawoong. “Intellectual interchanges in the history of the massive online open-editing encyclopedia, Wikipedia”. Physical Review E 93 (1): 012307. doi:10.1103/PhysRevE.93.012307.  Closed access, preprint: Yun, Jinhyuk; Lee, Sang Hoon; Jeong, Hawoong (2016-01-22). “Intellectual interchanges in the history of the massive online open-editing encyclopedia, Wikipedia”. Physical Review E 93 (1). ISSN 2470-0053. doi:10.1103/PhysRevE.93.012307. 
  10. Johnson, Isaac L.; Yilun, Lin; Li, Toby Jia-Jun; Hall, Andrew; Halfaker, Aaron; Schöning, Johannes; Brent, Hecht (2016-05-07). Not at Home on the Range: Peer Production and the Urban/Rural Divide (PDF). SIGCHI. San Jose, USA: SIGCHI. p. 13. ISBN 978-1-4503-3362-7. doi:10.1145/2858036.2858123. 
  11. He, Yang (2015-12-09). “Understanding the Role of Participative Web within Collaborative Culture: The Case of Wikipedia”. Current Trends in Publishing (Tendances de l’édition): student compilation étudiante 1 (2). 
  12. Tanon, Thomas Pellissier; Vrandecic, Denny; Schaffert, Sebastian; Steiner, Thomas; Pintscher, Lydia (2016-04-11). From Freebase to Wikidata: The Great Migration (PDF). 25TH INTERNATIONAL WORLD WIDE WEB CONFERENCE. Montreal, Quebec, Canada. p. 10. doi:10.1145/2872427.2874809. 

Wikimedia Research Newsletter
Vol: 7 • Issue: 7 • July 2017
This newsletter is brought to you by the Wikimedia Research Committee and The Signpost
Subscribe: Syndicate the Wikimedia Research Newsletter feed Email WikiResearch on Twitter WikiResearch on Facebook[archives] [signpost edition] [contribute] [research index]


by Tilman Bayer at October 15, 2017 11:51 PM

Gerard Meijssen

#Wikidata - motivation; thank you #Magnus

I added a Baratunde A. Cola to Wikidata because he won the Alan T. Waterman Award. This month a Wikipedia article was written and I wanted to add some data to the item.

I did not because functionality that is key to me was broken. A new property was added and all the work that I had done on categories no longer showed in Reasonator. There was no willingness to consider the consequential loss of functionality and the result was a dip in my motivation.

Wikidata is important to me and I asked Magnus if he would help out and change Reasonator. He did.

Now I have added information to Mr Cola based on his categories. It matters that a category like this one reflects all the people known to have played in the Vanderbilt Commodores football team.

The issue is that at Wikidata, we have lost sight of these collaborative aspects. Everybody does his own thing and we hardly consider why. It is why user stories are so important; they tell you why something is done and what the benefit is.  In the end without a benefit there is no reason to do it.
Thanks,
      GerardM

by Gerard Meijssen (noreply@blogger.com) at October 15, 2017 09:07 AM

October 14, 2017

Sam Wilson

Perth Heritage Days 2017

, Perth CBD.

Saturday[edit]

Moana Chambers[edit]

Moana Chambers. The café only recently closed, but prehaps will re-open again soon under new management. An amazing stairwell, retrofitted certainly but it's hard to get an idea of the original layout.

Trinity Buildings[edit]

Trinity Arcade. A great place, I'd love to work in these offices. Lots of empty rooms by the looks of it.

The sign boards 2017-10-14 1513.jpg 2017-10-14 1513 1.jpg 2017-10-14 1514.jpg 2017-10-14 1514 2.jpg 2017-10-14 1514 3.jpg read as follows:

The first congregational church in Perth was founded by Henry Trigg who opened a chapel in 1846 on a site in William Street.

In 1863 the present site in Saint George's Terrace was purchased and a new building, Trinity Congregational Chapel, was opened by Governor Hampton in 1865. This colonial building was designed by Richard Roach Jewell in the Gothic Revival style known as Commissioners' Gothic, was located on a rise set well back from the Terrace.

Constructed in Flemish Bond brickwork and with a shingled root that building exists today and appears externally very much as it did in 1865. The original open space around the chapel has been replaced with arcades and buildings.

The new schoolroom was added to the northern side of the chapel in 1872. Constructed in Flemish Bond brickwork with a shingled roof, that building also survives but with some minor alterations and later additions to the western side.

The present Trinity Church on Saint George's Terrace was erected in front of the former chapel. Designed by Henry Trigg, the first architect to be born and trained in W.A. and grandson of the founder, the church was dedicated in December 1893. This building constructed in Flemish Bond brickwork with twin towers and elaborate stucco decoration was designed in the Victorian style known as "Dissenter's Mediaevalism."

Trinity Church maintains a presence in the city and an historic link with the past. The arcades provide a connection between the business centre in St. George's Terrace and the commercial district of the Hay Street Mall.

(Architects: Duncan Stephen and Mercer 1982.)

In 1927 Trinity Arcade and Buildings were constructed on the Hat Street frontage of the property to provide three floors of shops and commercial premises and a basement. The building was designed for the trustees of Trinity Congregational Church by E. Allwood, architect.

In 1981, Trinity Arcade was extended on three levels and the basement of Trinity buildings upgraded to provided a shopping arcade link from St. George's Terrace and under Hay Street. Trinity Church and Halls were restored at the same time.

Mary Raine exhibition[edit]

A corporate exhibition with reasonable research and few artifacts. Lots to learn though, about Mary Raine. Shame aboue the venue (the foyer of the Bank West corporate offices).

+ Add a commentComments on this blog post
No comments yet

by Sam at October 14, 2017 12:00 AM

October 13, 2017

Wikimedia Foundation

Wiki Loves Africa is back—this year celebrating people at work

Photo by Lucas Takerkart, CC BY-SA 4.0.

Wiki Loves Africa, the annual themed photo and media sharing competition, is back. This year, participants are documenting the lives of “people at work.” The contest, which runs from October to the end of November, is giving special attention to the works covering women at work and endangered work practices.

Started in 2014 by Isla Haddow-Flood and Florence Devouard, Wiki Loves Africa is a public contest that encourages individuals on the African continent and around the world to contribute media files about the African environment.

“Everyone can contribute relevant photos from anywhere on the globe,” the organizers stated in a press release announcing this year’s contest. “Additionally people, groups or organizations are encouraged to host a series of events to build Wikipedia savvy communities around the competition.”

Haddow-Flood and Devouard first thought of Wiki Loves Africa in 2014 as “a fun and engaging way to rebalance the lack of visual representations and relevant content that exists about Africa on Wikipedia,” they explain.

For the 2017 edition of the contest, photo essays that document ‘rare and endangered work practices’ or ‘women at work’ were introduced as a new way to tell the story where a single photo is not enough to capture the whole scene. There will be a special prize for the best photo essay in each of the two topics.

Local Wikimedia communities in 13 African countries have been striving to provide the needed in-person support to the participants by holding Wiki Loves Africa events.

“These events take the form of introductory workshops, photographic excursions and uploading sessions,” Haddow-Flood explains. “[The events] are aimed at encouraging an ongoing pride in local heritage and cultural practice, as well as to foster a culture of contribution to the internet to shake up the single story of Africa.”

In the past three years, over 21,000 media files were uploaded to Wikimedia Commons as part of Wiki Loves Africa. The first edition in 2014 had the African cuisine as a theme where 873 participants contributed over 6,000 photographs. Cultural fashion was the theme for the 2015 competition with over 7,500 photographs taken by 722 participants. The number of donated files grew to nearly 8,000 last year under the theme of music and dance where 836 people joined the contest.

Participation is simple: grab your camera, take photos of people at work in Africa, or featuring Africa-related work then upload your photos to Wikimedia Commons. You will both help increase the internet content on Africa and have a chance to win.

Samir Elsharbaty, Blog Writer
Wikimedia Foundation

by Samir Elsharbaty at October 13, 2017 06:21 PM

Wiki Education Foundation

The Written Word podcast features Wikipedia assignments

Wiki Education’s Classroom Program Manager Helaine Blumenthal

Wiki Education’s Classroom Program Manager Helaine Blumenthal is featured on the latest episode of The Written Word, a podcast about the craft of writing and its impact on culture. Helaine speaks with co-hosts Meredith May and Sean Tupa about her passion for learning and teaching, which she has applied broadly to the world of Wikipedia through Wiki Education. She touches on how collaborative writing has evolved online, what makes a Wikipedia assignment so unique, and some of her favorite courses over the years. Where does Wiki Education’s mission fit into the context of an increasingly open-access, collaborative online world? Listen to what Helaine has to say below!

by Cassidy Villeneuve at October 13, 2017 06:02 PM

Wikimedia UK

Libraries Week – how librarians can help improve Wikipedia

Librarian at the card files in a Minnesota High School (1974) – image by Environmental Protection Agency

Wikipedia’s greatest strength is the sheer number of people who contribute information to it. Every month the collective effort of some 70,000 writers keeps the world’s most popular encyclopedia up-to-date, and make sure that its content is verifiable. That accountability is central to Wikipedia’s reliability and usefulness. At the foot of any article should be details of where the information originally came from.

Wikipedia is a globally important website, and Wikimedia UK are playing an active part in helping people based at research organisations to engage with Wikipedia. In 2016 we took part in #1Lib1Ref for the first time, an initiative to get librarians editing. Next year 1Lib1Ref will be returning bigger than before in the last two weeks of January, this time in partnership with CILIP, the library and information association.

The idea is to encourage every librarian in the world to add one reference to Wikipedia, and make libraries and books even more accessible. Citing books in relevant Wikipedia pages in turn drives more people to do further reading about a subject they are exploring on Wikipedia.

There are around 3,850 public libraries in the UK, and it is more important to support them now than ever as public funding is falling. Our aim is to show that librarians should be using Wikipedia and that it can help to engage new audience to do physical research in libraries as well as online. Libraries don’t have to remain places dedicated to analogue technologies, but can keep their relevance to the needs of contemporary users by hosting events like code clubs and Wikipedia workshops and providing 3D printers and other IT services. Scottish libraries are already making great advances in these areas and the Scottish Libraries and Information Council (SLIC)  recently appointed their first Wikimedian in Residence, in partnership with Wikimedia UK.

Sara Thomas ([[user:lirazelf]]) is working with SLIC until February 2019 to advance open knowledge objectives in Scotland’s public libraries. Drawing on Scotland’s rich library collections, the overarching aim is to support Scotland’s public library staff and users to engage with Wikimedia projects. The project itself draws on Sara’s experience working with the museums sector during her residency with Museums Galleries Scotland, and takes inspiration from the work done in Catalonia’s public libraries.

The first editathon of the project took place on Friday 6 October, as a co-production between Dig It! 2017 and SLIC.  Part of Scotland’s year of History, Heritage and Archaeology, the Hidden Gems event took as its starting point Scotland’s best loved “hidden gems”, a group of lesser-known history, heritage and archaeology sites across the country. SLIC drew together representatives from different Scottish Library services to provide good quality secondary sources from their local history collections, which were used to improve and create articles, whilst also giving those library services an insight into how their collections could be used within Wikipedia.

Phase one of the project runs until #1Lib1Ref, with initial partners undertaking to nominate staff for training, explore the possibilities for working with Wikimedia in their service, and staging at least one editathon event before the end of January.  Phase two will review phase one, and seek to roll out a wider programme across the country.

Project page: https://en.wikipedia.org/wiki/Wikipedia:GLAM/SLIC

Wikimedia UK’s work with SLIC is the latest partnership with a group of libraries, and builds on the success of our current partnerships with Bodleian Libraries Oxford, the Wellcome Library, the National Library of Wales and the National Library of Scotland. These partnerships have helped to release a lot of content on Open Licenses and help people across the world find out about the libraries’ collections.

Meanwhile, in the USA, the Wikimedia Foundation has funded the OCLC Webjunction Wikipedia + Libraries course; a free, nine-week online training program for 300 US public library staff to learn to confidently engage with Wikipedia. As a result of the work that Wikimedia has done with libraries around the world, the International Federation of Library Associations (IFLA) has released the Opportunity papers to highlight how libraries are working with Wikipedia to verify information and encourage librarians worldwide to engage more with Wikipedia.

So if you’re a librarian, please let us know if you would like to be involved with 1Lib1Ref by emailing Communications Coordinator John (john.lubbock@wikimedia.org.uk) and following us on social media for more updates.

Facebook: facebook.com/wikimediauk

Twitter: twitter.com/wikimediauk

by John Lubbock at October 13, 2017 03:38 PM

Lorna M Campbell

The Benefits of Open Education and OER

This is a transcript of a talk I gave as part of the Open Med Project webinar series.

What is open education?

Open education is many things to many people and there’s no one hard and fast definition.

  • A practice?
  • A philosophy?
  • A movement?
  • A licensing issue?
  • A human right?
  • A buzz word?
  • A way to save money?

This is one description of the open education movement that I particularly like from the not for profit organization OER Commons…

“The worldwide OER movement is rooted in the human right to access high-quality education. The Open Education Movement is not just about cost savings and easy access to openly licensed content; it’s about participation and co-creation.”

Open education encompasses many different things. These are just some of the aspects of open education

  • Open textbooks
  • Open licensing
  • Open assessment practices
  • Open badges
  • Open online courses
  • MOOCs (debatably)
  • Open data
  • Open Access scholarly works
  • Open source software
  • Open standards
  • Open educational resources

Open educational resources (OER)

Open educational resources are central to open education. UNESCO define open educational resources as

“teaching, learning and research materials in any medium, digital or otherwise, that reside in the public domain or have been released under an open license that permits no-cost access, use, adaptation and redistribution by others with no or limited restrictions.”

OER World Congress

And the reason I’ve chosen this definition is that UNESCO is one of a number of organisations that actively supports the global adoption of open educational resources and just a few weeks ago UNESCO and the Government of Slovenia hosted the second OER World Congress in Ljubljana  which brought together 550 participants, 30 government ministers, representing 111 member states.

The theme of the Congress was “OER for Inclusive and Equitable Quality Education: From Commitment to Action” and there was a strong focus on how OER can help to support United Nations Sustainable Development Goal 4.

 “Ensure inclusive and equitable quality education and promote lifelong learning opportunities for all”

The main output of the Congress was the UNESCO Ljubljana OER Action Plan and accompanying Ministerial Statement.  Central to the OER Action plan is the acknowledgement of the role that OER can play in supporting quality education that is equitable, inclusive, open and participatory. The Action Plan outlines 41 recommended actions to mainstream OER and to help Member States to build knowledge societies and provide quality, lifelong education.

In his summing up at the end of the congress UNESCO Assistant Director for Education Qian Tang said

“to meet the education challenges, we can’t use the traditional way. In remote and developing areas, particularly for girls and women, OER are a crucial, crucial mean to reach SDGs. OER are the key.”

Creative Commons

One of the key characteristics of open educational resources is that they are either in the public domain or they are released under an open licence and generally that means a Creative Commons licence.

However not all Creative Commons licences are equal and only resources that are licensed for adaptation and reuse can really be considered as OER.  Resources that are licensed with the “No Derivatives” licence can not strictly be regarded as OER, and there is some debate about the status of “Non Commerical” licenced resources.

At the recent OER World Congress, Creative Commons CEO Ryan Merkley emphasized that free is not the most important thing about OER, it’s the permission to modify and adapt resources that is most important, because that is what allows us to adapt educational resources to allow us to meet the specific and diverse needs of our learners.

University of Edinburgh OER Vision

At the University of Edinburgh we believe that open educational resources are strongly in line with our institutional mission to provide the highest quality learning and teaching environment for the greater wellbeing of our students, and to make a significant, sustainable and socially responsible contribution to Scotland, the UK and the world, promoting health and economic and cultural wellbeing.

Our vision for OER has three strands, building on our excellent education and research collections, traditions of the Scottish Enlightenment and the university’s civic mission.  These are:

  • For the common good – encompassing every day teaching and learning materials.
  • Edinburgh at its best – high quality resources produced by a range of projects and initiatives.
  • Edinburgh’s Treasures – content from our world class cultural heritage collections.

This vision is backed up by an OER Policy approved by our Learning and Teaching Committee, which encourages staff and students to use, create and publish OERs to enhance the quality of the student experience.  This OER Policy is itself CC licensed and is adapted from an OER Policy that has already been adopted by a number of other institutions in the UK, so please do feel free to take a look and adopt it or adapt it as you see fit.

And we also have an OER Service which provides staff and students with advice and guidance on creating and using OER, and which provides a one stop shop where you can access open educational resources produced by staff and students across the university.  Because we believe its crucially important to back up our policy and vision with support.

I want to focus now on some of the benefits of OER and I’m going to highlight these benefits with case studies from the University of Edinburgh.

OER ensures longevity of access to resources

So firstly open licences help to ensure longevity of access to educational resources.  It’s very common to think of open licensed resources as primarily being of benefit to those outwith the institution, however open licenses also help to ensure that we can continue to use and reuse the resources that we ourselves have created.  I’m sure you’ll all have come projects that created content only for those resources to be come inaccessible once the project ends or great teaching and learning materials belonging to a colleague who has subsequently retired or moved on, and nobody quite knows if they can still be used or not. Unless teaching and learning resources carry a clear and unambiguous open licence, it is difficult to know whether and in what context they can be reused.  This is a phenomenon that my colleague Melissa Highton has referred to as copyright debt.  If you don’t get the licensing right first time round it will cost you to fix it further down the line.  And this is one of the best strategic reasons for investing in open educational resources at the institutional level. We need to ensure that we have the right use, adapt, and reuse, the educational resources we have invested in.

Continued access to educational resources can be particularly problematic when it comes to MOOCs.  MOOC content often gets locked into commercial platforms, regardless of whether or not it is openly licensed, and some platforms are now time limiting access to content.  So at the University of Edinburgh we are ensuring that all the content we have produced for our MOOCs is also freely available under open licence to download from our Open Media Bank on our Media Hopper platform.

OER can diversify the curriculum

OER can also make a significant contribution to diversifying the curriculum.  So for example A number of studies have shown that lesbian, gay, bisexual and transsexual health is not well-covered in Medical curricula, however knowledge of LGBT health and of the sensitivities needed to treat LGBT patients are valuable skills for qualifying doctors.

Using materials from the commons, a project at the University of Edinburgh sought to address the lack of teaching on LGBT health within the curriculum through OER.  The project remixed and repurposed resources originally created by Case Western Reserve University School of Medicine in Ohio, and then contributed these resources back to the commons as CC BY licensed OER.  New open resources including digital stories recorded from patient interviews and resources for Secondary School children of all ages were also created and released as CC BY OER.

OER improves digital skills

OER can also help to improve digital skills for both staff and students. 23 Things for Digital Knowledge is an award winning, open online course run by my colleague Stephanie Farley. 23 Things, was adapted from an open course originally developed by the University of Oxford, and it is designed to encourage digital literacy by exposing learners to a wide range of digital tools for personal and professional development. Learners spend a little time each week, building up and expanding their digital skills and are encouraged to share their experiences with others.  All course content and materials are licensed under a CC BY licence and the University actively encourages others to take and adapt the course. The course has already been used by many individuals and organisations outwith Edinburgh and it has recently been adapted for use by the Scottish Social Services Council as 23 digital capabilities to support practice and learning in social services.

OER engages students in co-creation

OER can also engage students in the co-creation of their own learning resources. One initiative that does this is the School of Geosciences Outreach and Engagement course. Over two semesters, students undertake an outreach project that communicates an element of GeoSciences outside the university community. Students have the opportunity to work with schools, museums, outdoor centres and community groups to create a wide range of resources for science engagement including  classroom teaching materials, leaflets, websites, and smartphone/tablet applications.  Students gain experience of science outreach, public engagement, teaching and learning, and knowledge transfer while working in new and challenging environments and developing a range of transferable skills that enhance their employability.

A key element of the Geosciences Outreach and Engagement Course is to develop resources with a legacy that can be reused and disseminated for use by other communities and organisations.  And the University is now taking this one step further by repurposing some of these materials to create open educational resources. For the last two years we have recruited Open Content Creation student interns, to take the materials created by the Geoscience students, make sure everything in those resources could be released under open license and then share them in places where they could be found and reused by other teachers and learners.

For example this resource on sea level variation is designed for students learning Geography at third and fourth level of the Scottish Curriculum for Excellence and it can be downloaded under a CC BY Share alike license from Open.Ed and TES.

OER promotes engagement with the outputs of open research

Open access makes research outputs freely accessible to all. It allows research to be disseminated quickly and widely, the research process to operate more efficiently, and has the potential to increase use and understanding of research by business, government, charities and the wider public.  However it is not always easy for those outwith academia to know how to access these outputs, even though they are freely and openly available.

In order to address this issue and to foster technology transfer and innovation, we’ve created a series of open educational resources in the form of video interviews, case studies and learning materials called Innovating with Open Knowledge.  These resources are aimed at creative individuals, private researchers, entrepreneurs and small to medium enterprises to provide guidance on how to find and access the open outputs of Higher Education.  The resources focus on developing digital and data literacy skills and search strategies and feature case study interviews with creative individuals and entrepreneurs engaging with the University of Edinburgh’s world class research outputs.

Innovating with Open Knowledge demonstrates how to find and use Open Access scholarly works, open research data, archival image collections, maker spaces and open source software, and features interviews about how these resources can be used to support creative writing, visual research, citizen science, community engagement, drug discovery and open architecture.  All these resources are released under open licence and the videos can be downloaded for reuse from this url.

OER contributes to the development of open knowledge

OER can contribute to the development of open knowledge and one great way to do this is to engage with the worlds biggest open educational resource, Wikipedia.  Wikipedia is a valuable learning tool that can be used to develop a wide range of digital and information literacy skills at all levels across the curriculum however it is not without bias.  The coverage of subject matter on Wikipedia is neither uniform nor balanced and many topics and areas are underrepresented, particularly those relating to women.

At the University of Edinburgh we have employed a Wikimedian in Residence whose job it is to embed open knowledge in the curriculum, through skills training sessions and editathons, to increase the quantity and quality of open knowledge and enhance digital literacy. This project is also helping to improve the coverage and esteem of Wikipedia articles about women in science, art and technology, and redress the gender imbalance of contributors by encouraging more women to become Wikimedia editors.  And I’m delighted to say that over that last year 65% of participants at our editathons were women.

 OER enhances engagement with content and collections

This rather obscure 17th century map of Iceland was digitized by the University’s Centre for Research Collections and because it was released under open licence, one of our colleagues was able to add it to the Wikipedia page about Iceland.  Now Iceland’s Wikipedia page normally gets about 15,000 hits a day, however in June 2016 Iceland’s page got over 300,000 hits in a single day.  That was the day that Iceland put England out of the Euro 2016 championship qualifiers, so 300,000 people saw our obscure 17th century map because of a game of football.  This story was subsequently picked up by Creative Commons who included a little feature on the map in their 2016 State of the Commons report, resulting in further engagement with this historical gem.

Open Scotland

We believe that there are many benefits to using and sharing open educational resources within Higher Education and beyond, and this is one of the reasons that the University of Edinburgh support Open Scotland, a cross sector initiative that aims to raise awareness of open education, encourage the sharing of open educational resources, and explore the potential of open policy and practice to benefit all sectors of Scottish education.

Open Scotland has developed the Scottish Open Education Declaration which, in line with the UNESCO OER Action Plan, calls for all publicly funded educational resources to be made available under open licence.  I know colleagues in Morocco are already in the process of adopting a version of this Declaration and I would strongly urge you to follow their example.

Conclusion

I just want to finish up with a quote from one of our Open Content Interns that eloquently sums up the real value of OER. This is from Martin Tasker, an undergraduate Physics student who worked with us last summer and in a blog post titled “A Student Perspective on OER” he wrote:

“Open education has played such an integral part of my life so far, and has given me access to knowledge that would otherwise have been totally inaccessible to me. It has genuinely changed my life, and likely the lives of many others. This freedom of knowledge can allow us to tear down the barriers that hold people back from getting a world class education – be those barriers class, gender or race. Open education is the future, and I am both proud of my university for embracing it, and glad that I can contribute even in a small way. Because every resource we release could be a life changed. And that makes it all worth it.”

by admin at October 13, 2017 11:56 AM

Weekly OSM

weeklyOSM 377

03/10/2017-09/10/2017

text

Cycle node networks and mountain passes 1 | © Richard Fairhurst – Map data © OpenStreetMap contributors

About us

  • For more than two years now, TheFive has been developing a content management system called OSMBC (OSM Blog Collector) to meet the special requirements of the editors of weeklyOSM. Without this FOS software, which has been continuously developed since then, it would impossible to publish the blog almost simultaneously in six languages. Thank you TheFive! TheFive has recorded the development in a blog post worth reading and shows that this tool can also be used in other environments easily.

Mapping

  • Harry Wood updates a list of the longest name tags in OSM. There are now 80 more of them than last year.
  • The mailing list tagging discusses whether aeroway=airstrip, which originates from the New Zealand LINZ import, is a legitimate tag or whether you should use aeroway=runway instead.
  • There is a proposal to start distinguishing between public transport bus and coach services. Coach would be a long distance service, that skips most of the stops regular buses would serve and where coach buses with a different level of comfort are used.
  • The feature proposal for sinkholes refinement needs more votes to pass, so voting was extended.
  • Martijn van Exel came up with a better way of tagging center turn lanes, compared to the one he used initially when turn:lanes mapping wasn’t established yet.
  • Martijn van Exel tries to formulate his own definition of highway=trunk in the US. He also notes a few places in Utah where the current tagging doesn’t meet his expectations and is asking for help.
  • User westnordost announced in the German forum that his application StreetComplete will tag from the next version maxspeed:type=*. There is criticism for his “top-down approach”.
  • Discussions about wikidata=* tags are ongoing (which we reported in our two previous issues):
    • Frederik Ramm asks to pause any automated, semi-automated, query-driven, “challenge” driven edits on Wikidata tags in OSM until ongoing discussions have come to an end.
    • People at the German OSM forum discuss (de) (automatic translation) how much Wikidata in OSM is useful.

Community

  • Michal Migurski tweets that OpenStreetMap contributors are listed in the credits of the new feature film Blade Runner 2049.
  • There is a call for membership to join OSMUK to improve OSM in the UK.
  • Around thirty members from Projet EOF association have started this week in 8 different countries (Benin, Burkina Faso, Ivory Coast, Haiti, Mali, Niger, Senegal and Togo) to implement 15 workshops (each one with a duration of five days) about OpenStreetMap, free Geomatics and open data, in capital and regional cities, 15 mapathons and 21 one-day workshops that will reach at least 150 people during October. Follow it on Twitter through #ActionOifProjetEOF.
  • Rebecca Firth has written (.pdf download, 4.5 MB) an article for the journal – “Science Without Borders: Making the SDGs successful” about OpenStreetMap and the Sustainable Development Goals (SDG). You can read more about the SDGs on Wikipedia.
  • On October 7, 2007, version 0.5 of the OpenStreetMap API (and thus of the data model) went into production – 0.6 is the current version. Martin Raifer reminds us of this in his user blog and that many API calls, which we find to be commonplace today, were introduced back then.

Imports

  • Christoph Hormann suggests to prefix any tag that was added by a bot with bot: in the key. He hopes this could help us to overcome the problem of the increase of (bad) mechanical edits.
  • Abishek Nagaraj published a paper with the title Information Seeding and Knowledge Production in Online Communities: Evidence from OpenStreetMap. As data of two different stages of age and quality was imported during the TIGER import, he could analyse the effect on so-called information seeding on later contributions. He measured that counties with the older TIGER data got 38.8 % more contributions than the control group and concludes that information seeding is not as beneficial as assumed. There is also a small discussion on the Talk mailing list.

OpenStreetMap Foundation

  • The minutes of the IRC meeting of Communications Working Group on 28th September is online.

Events

  • In the town of Chiavari (Liguria, Italy), teams composed by one mapper and one volunteer from the Protezione Civile (Civil Protection, governmental body which deals with exceptional events management) ran a pilot data collecting project (organised by the municipality and the CIMA Foundation) focused on a flood scenario alongside the Rupinaro creek. While the volunteers collected information of the buildings (levels, entrances and flood preparedness), the mappers improved OpenStreetMap adding the details in the area, and photomapped with Mapillary. Local newspapers (automatic translation) and tv stations reported.
  • The DINAcon 2017 conference will take place in Berne on 20th October. It is intended to correspond to the development of open data and digital sustainability. Among other things, there is a session on “Using OpenStreetMap for sitemaps and online stories” with Michael Spreng and Stefan Keller. The DINAcon Awards will also be presented at this event. Simon Poole, Michael Spreng and Stefan Keller are also nominated, representing the community. Simon Poole will also offer a lightning talk.

Maps

  • Nils Vierus has published a map service to find ATMs, thus stimulating a lengthly discussion on the tags used at ATMs and banks network=, operator= and brand=. (de) (automatic translation). Has there ever been a quarterly project on this subject? – Maybe together with communities from other countries? 😉

Software

  • Anton Khorev wrote (ru) (automatic translation) down his opinion in regard with the usability of MAPS.ME as OpenStreetMap editor and the resulting changesets.
  • In a commentary to an old article by Roland about Overpass, user mmd refers to his branch and related test measurements, which suggest a significant reduction in the daily CPU consumption.
  • [1] Richard Fairhurst enhanced turn-by-turn instructions on his website cycle.travel. They now mention node numbers of cycle node networks and the names of mountain passes along the route.

Programming

  • Sven Geggus wonders if the new PostgreSQL 10.0 combined with PostGIS 2.4 brings advantages for an osm2pgsql database. The gain is less for the standard use case, but the increased parallel processing abilities will be beneficial if the database will be used for further analysis.
  • Pascal Neis describes how to conveniently process compressed OSM.PBF files with Java.
  • Christoph Hormann has written a blog post about landcover rendering on small zoom levels.

Releases

  • The stable version of JOSM contains the following major enhancements:
    • Add links to external changeset viewers to the History and Changeset windows
    • Allow users to request feedback when uploading by adding the review_requested=yes changeset tag
    • Automatically uses a proper node count when creating circles depending on the diameter
    • Extends the command line interface parameters
  • OSMaxx has been updated: In addition to the existing formats, the original OSM format (pbf) and the universal shape file of the future, GeoPackage (gpkg), can now be exported.
  • Mapbox shares the release about their support to React Native Mapbox GL and started a rewrite of their current experimental React Native library and released an alpha.
  • … for further information please refer to the OSM Software Watchlist.

Did you know …

Other “geo” things

  • How are urban areas in Australia connected with Sydney. Topi Tjukanov visualized it using QGIS, GraphHopper and OpenStreetMap data.
  • Bollards were often re-used cannon.
  • Luca Mandolesi Salvatore Fiandaca has collected the previous splash screens of QGIS. UPDATE

Upcoming Events

Where What When Country
Hérault Opération libre à Jacou, Jacou 2017-10-13-2017-10-15 france
Calvados Cartoparties de la presqu’île (arbres, environnement et déchets) lors du « Turfu Festival » et de la Fête de la Science, au Dôme de Caen 2017-10-13-2017-10-15 france
Shizuoka まちゼミ オープンストリートマップで地図を作ろう 2017-10-15 japan
Bonn Bonner Stammtisch 2017-10-17 germany
Lüneburg Mappertreffen Lüneburg 2017-10-17 germany
Scotland Pub meeting, Edinburgh 2017-10-17 united kingdom
Karlsruhe Stammtisch 2017-10-18 germany
Brest Mapathon Missing Maps à l’UBO Open Factory 2017-10-19 france
Leoben Stammtisch Obersteiermark 2017-10-19 austria
Colorado Boulder]] State of the Map U.S. 2017, [[Boulder 2017-10-19-2017-10-22
Hamburg Design the Smart Mobility Hackathon 2017-10-20-2017-10-21 germany
Kyoto 幕末京都マッピングパーティ#00:志士たちの今 2017-10-21 japan
Karlsruhe Hack Weekend October 2017 2017-10-21-2017-10-22 germany
Buenos Aires FOSS4G+State of the Map Argentina 2017 2017-10-23-2017-10-28 argentina
Bremen Bremer Mappertreffen 2017-10-23 germany
Taipei OpenStreetMap Taipei Meetup, MozSpace 2017-10-23 taiwan
Graz Stammtisch Graz 2017-10-23 austria
Dusseldorf Stammtisch Düsseldorf 2017-10-23 germany
Lyon Mapathon Missing Maps à La Tour Du Web 2017-10-24 france
Nottingham Nottingham Pub Meetup 2017-10-24 united kingdom
Chur Mapping Party Chur 2017-10-24 switzerland
Bern Mapping Party Bern 2017-10-25 switzerland
Leuven Leuven Monthly Meetup. Topic: Rendering for print 2017-10-25 belgium
Minsk byGIS meetup 2017-10-25 belarus
Brussels FOSS4G Belgium 2017 2017-10-26 belgium
Lübeck Lübecker Stammtisch 2017-10-26 germany
Lima State of the Map LatAm 2017 2017-11-29-2017-12-02 perú
Yaoundé State of the Map Cameroun 2017, lors des premières Journées nationales de la Géomatique 2017-12-01-2017-12-03 cameroun
Bonn FOSSGIS 2018 2018-03-21-2018-03-24 germany
Milan State of the Map 2018, (international conference) 2018-07-28-2018-07-30 italy

Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please check your event in our public calendar preview and correct it, where appropriate.

This weeklyOSM was produced by Anmaca, Nakaner, Peda, PierZen, Polyglot, SK53, SeleneYang, Softgrow, Spanholz, YoViajo, derFred, jcoupey, jinalfoflia.

by weeklyteam at October 13, 2017 06:03 AM

Shyamal

The many shades of citizen science

Everyone is a citizen but not all have the same kind of grounding in the methods of science. Someone with a training in science should find it especially easy to separate pomp from substance. The phrase "citizen science" is a fairly recent one which has been pompously marketed without enough clarity.

In India, the label of a "scientist" is a status symbol, indeed many actually proceed on the academic path just to earn status. In many of the key professions (example: medicine, law) authority is gained mainly by guarded membership, initiation rituals, symbolism and hierarchies. At its roots, science differs in being egalitarian but the profession is at odds and its institutions are replete with tribal ritual and power hierarchies. Indian science might tends to carry more than the ordinary share of  ritual.

Long before the creation of the profession of science, "Victorian scientists" (who of course never called themselves that) pursued the quest for knowledge (i.e. science) and were for the most part quite good as citizens. In the field of taxonomy, specimens came to be the reliable carriers of information and they became a key aspect of most of zoology and botany. After all what could you write about or talk about if you did not have a name for the subject under study. Specimens became currency. Victorian scientists collaborated in various ways that involved sharing information, sharing /exchanging specimens, debating ideas, and tapping a network of friends and relatives for gathering more "facts". Learned societies and their journals helped the participants meet and share knowledge across time and geographic boundaries.  Specimens, the key carriers of unquestionable information, were acquired for a price and there was a niche economy created with wealthy collectors, not-so-wealthy field collectors and various agencies bridging them. That economy also included the publishers of monographs, field guides and catalogues who grew in power along with organizations such as  museums and later universities. Along with political changes, there was also a move of power from private wealthy citizens to state-supported organizations. Power brings disparity and the Victorian brand of science had its share of issues but has there been progress in the way of doing science?

Looking at the natural world can be completely absorbing. The kinds of sights, sounds, textures, smells and maybe tastes can keep one completely occupied. The need to communicate our observations and reactions almost immediately makes one need to look for existing structure and framework and that is where organized knowledge a.k.a. science comes in. While the pursuit of science might seem be seen by individuals as being value neutral and objective, the settings of organized and professional science are decidedly not. There are political and social aspects to science and at least in India the tendency is to view these aspects as undesirable and not be talked about lest one "appears" un-professional.  

Silent diplomacy probably adds to the the problem. Not engaging in conversation or debate with "outsiders" (a.k.a. mere citizens) probably fuels the growing claims of the "arrogance" of scientists (or even science itself). Once the egalitarian ideal of science is tossed out of the window, you can be sure that "citizen science" moves from useful and harmless territory to a region of conflict and potential danger. Many years ago I saw a bit of this  tone in a publication boasting the virtues of Cornell's ebird and commented on it. Ebird was not particularly novel (especially as it was not the first either by idea or implementation, lots of us would have tinkered with such ideas, such as this one - BirdSpot - aimed to be federated and peer-to-peer - ideally something like torrent) but Cornell obviously is well-funded to run PR campaigns. I think it is extremely easy to set up a basic software system that captures a specific set of data but fitting it to meet grander visions and wider geographical scales takes much more than mere software construction to meet more than the needs of a few American scientists. I commented in 2007 that the wording used in ebird publicity sounded more like "scientists using citizens rather than looking upon citizens as scientists", the latter being in my view the nobler aim to achieve. Over time, ebird has gained global coverage, but it has remained "closed" code-wise and vision-wise. There are no open and public discussions on software construction and the average contributor is not regarded as a stakeholder. It has, on the other hand, upheld traditional political hierarchies and processes that ensure conflict and lack of progress. Indeed it reflects political and cultural systems based on hierarchies. (There is a quote in software engineering that the architecture of a software mirrors the organization) As someone who has watched and appreciated the growth of systems like Wikipedia it is hard not to see the philosophical differences - almost as stark as right-wing versus left-wing politics.

Do projects like ebird see the politics in "citizen-science"?
Arnstein's ladder is a nice guide to judge
the philosophy behind a project.
I write this while noting that criticisms of ebird are slowly becoming more commonplace (after the initial glowing accounts). There are comments on how it is reviewed by self-appointed police  (it seems that the problem seems to be not just in the appointment - indeed why could not have the software designers allowed anyone to question any record and put in methods to suggest alternative identifications - gather measures of confidence based on community queries and opinions on confidence measures), there are supposedly a class of user who manages something called "filters" (the problem here is not just with the idea of creating user classes but also with the idea of using manually-defined "filters", to an outsider like me who has some insight in software engineering poor-software construction is symptomatic of poor vision, guiding philosophy and probably issues in project governance ), there are issues with taxonomic changes (I heard someone complain about a user being asked to verify identification - because of a taxonomic split - and that too a split that allows one to unambiguously relabel older records based on geography - these could have been automatically resolved but developers tend to avoid fixing problems and obviously prefer to get users to manage it by changing their way of using it - trust me I have seen how professional software development works), and there are now dangers to birds themselves. There are also issues and conflicts associated with licensing, intellectual property and so on. Now it is easy to fix all these problems piecemeal but that does not make the system better, fixing the underlying processes and philosophies is the big thing to aim for. So how do you go from a system designed for gathering data to one where you want the stakeholders to be enlightened. Well, a start could be made by first discussing in the open.

I guess many of us who have seen and discussed ebird privately could have just said I told you so, but sadly many of the problems were easily foreseeable. One merely needs to read the history of ornithology to see how conflicts worked out between the center and the periphery (conflicts between museum workers and collectors); the troubles of peer-review and open-ness; the conflicts between the rich and the poor (not just measured by wealth); or perhaps the haves and the have-nots. And then of course there are scientific issues - the conflicts between species concepts not to mention conservation issues - local versus global thinking. Conflicting aims may not be entirely solved but you cannot have an isolated software development team, a bunch of "scientists" and citizens at large expected merely to key in data and be gone. There is perhaps a lot to learn from other open-source projects and I think the lessons in the culture, politics of Wikipedia are especially interesting for citizen science projects like ebird. I am yet to hear of an organization where the head is forced to resign by the long tail that has traditionally been powerless in decision making and allowing for that is where a brighter future lies. Even better would be where the head and tail cannot be told apart.

Postscript: 

There is an interesting study of fieldguides and their users in Nature - which essentially shows that everyone is quite equal in making misidentifications - just another reason why ebird developers ought to just remove this whole system creating an uber class involved in rating observations/observers.

Additionally one needs to examine how much of ebird data is actually from locals (perhaps definable as living within walking distance of the area being observed). India has a legacy of tourism-based research (not to mention, governance) - in fact there are entire institutions where students travel far afield to study when even their own campuses remain scientific blanks.

23 December 2016 - For a refreshingly honest and deep reflection on analyzing a citizen science project see -  Caroline Gottschalk Druschke & Carrie E. Seltzer (2012) Failures of Engagement: Lessons Learned from a Citizen Science Pilot Study, Applied Environmental Education & Communication, 11:178-188.
20 January 2017 - An excellent and very balanced review (unlike my opinions) can be found here -  Kimura, Aya H.; Abby Kinchy (2016) Citizen Science: Probing the Virtues and Contexts of Participatory Research Engaging Science, Technology, and Society 2:331-361.

by Shyamal L. (noreply@blogger.com) at October 13, 2017 02:51 AM

October 12, 2017

Gerard Meijssen

#Wikisource - the proof of the pudding

A user story for Wikisource could be: As Wikisourcerers we transcribe and format books so that our public may read these books electronically.

The proof of the pudding is therefore in the people who actually read the finished books.  To future proof the effort of the Wikisourcerers, it is vital to know all the books that are ready for reading. It is vital to know this for books in any and all languages supported.

There are two issues:
  • The status of the books is not sufficiently maintained in all the Wikisources
  • There is no tool that advertises finished books
To come to a solution, existing information could be maintained in Wikidata for all Wikisources in a similar way as done for badges. With the information in Wikidata a queries can be formulated that shows the books in whatever language, by whatever author.

Currently there are Wikisources that do not register this information at all. This does not prevent us from making the necessary steps towards a queriable solution. After all adding missing badges at a later date only adds to the size of the pudding, not to the proof of the pudding.
Thanks,
     GerardM

by Gerard Meijssen (noreply@blogger.com) at October 12, 2017 11:55 AM

October 11, 2017

Wiki Education Foundation

Wikipedia, Webinars, and Wiki Education

Each year I meet hundreds of instructors and librarians excited about the idea of bringing our Wikipedia assignment into their classes. Sometimes it’s an easy conversation about upcoming courses, updating syllabi, and designing the right project to achieve learning objectives. Other times, the conversation is a bit harder – what if someone is excited about the project but isn’t teaching in the near future? Or doesn’t have a course at the moment suited for this assignment? That’s where hosting Wiki Education for an online workshop or webinar comes in.

Samantha Weald joins remotely to present to instructors at Louisiana State University about the benefits of teaching with Wikipedia.

Instructors and staff interested in the project can host these events so that others on campus can get involved. And just last week, I ran two similar events for universities interested in learning more about Wiki Education, our work, and the details of what it means to “teach with Wikipedia.” The first with a graduate level course at George Mason University, where students were interested in “Higher Education in the Digital Age” and how Wiki Education’s work promotes open access, OER, and and digital literacy. The second with a group of instructors at Louisiana State University interested in implementing these assignments within their research and writing courses. At each event, I presented information about Wiki Education and showcased our Dashboard tool (which provides students with online trainings, assignment templates, and more; and helps instructors track the work their students do on Wikipedia). I answered questions about Wikipedia, Wiki Education, and anything else that was on attendees minds.

In preparation for the winter and spring 2018 terms, I’d love to host more events like these. If you’re interested in running an online workshop or webinar at your institution, please let us know! Here’s what we’ll need:

  • Someone on campus to organize and promote the event via listserves and emails to faculty
  • A room on campus or online space for everyone to meet
  • A date and time confirmed
  • An interested group of instructors, no minimum attendance necessary (although more than 1 is probably a good idea!)

Thanks again to George Mason and Louisiana State for hosting me last week. I look forward to moderating more of these events in the coming months! To host a webinar or learn more, email us at contact@wikiedu.org.

by Samantha Weald at October 11, 2017 06:17 PM

Magnus Manske

The Big Ones

Update: After fixing an import error, and cross-matching of BNF-supplied VIAF data, 18% of BNF people are matched in Wikidata. This has been corrected in the text.

My mix’n’match tool holds a lot of entries from third-party catalogs – 21,795,323 at the time of writing. That’s a lot, but it doesn’t cover “the big ones” – VIAF, BNF, etc., which hold many millions of entries each. I could “just” (not so easy) import those, but:

  • Mix’n’match is designed for small and medium-sized entry lists, a few hundred thousand at best. It does not scale well to larger catalog sizes
  • Mix’n’match is designed to work with many different catalogs, so the database structure represents the least common denominator – ID, title, short description. Catalog-specific metadata gets lost, or is not easily accessible after import
  • The sheer number of entries might require different interface solutions, as well as automated matching tools

To at least get a grasp of how many entries we are dealing with in these catalogs, and inspired by the Project soweego proposal, I have used a BNF data dump to extract 1,637,195 entries (less than I expected) into a new database, one that hopefully will keep other large catalogs in the future. There is much to do; currently, only 102,115 295,763 entries (~618%) exist on Wikidata, according to the SPARQL query service.

As one can glimpse from the screenshot, I have also extracted some metadata into a “proper” database table. All this is preliminary; I might have missed entries or good metadata, or gotten things wrong. For me, the important thing is that (a) there is some query-able data on Labs Toolforge, and that (re-)import and matching of the data is fully automated, so it can be re-run is something turns out to be problematic.

I shall see where I go from here. Obvious candidates include auto-matching (via names and dates) to Wikidata, and adding BNF references to relevant statements. If you have a Toolforge user account, you can access the new database (read-only) as s51434__mixnmatch_large_catalogs_p. Feel free to run some queries or build some tools around it!

by Magnus at October 11, 2017 05:02 PM

Brion Vibber

Old Xeon E5520 versus PS3 emulator RPCS3

I’ve been fixing up my old Dell Precision T5500 workstation, which had been repurposed to run (slightly older) games, to run both work stuff & more current games if possible. Although the per-core performance of a 2009-era Xeon E5520 processor is not great, with 8 cores / 16 threads total (dual CPUs!) it still packs a punch on multithreaded workloads compared to a laptop.

When I stumbled on the RPCS3 Playstation 3 emulator, I just had to try it too and see what happened… especially since I’ve been jonesing for a Katamari fix since getting rid of my old PS3 a couple years ago!

The result is surprisingly decent graphics, but badly garbled audio:

My current theory based on reading a bunch in their support forums is that the per-thread performance is too bad to run the thread doing audio processing, so it’s garbling/stuttering at short intervals. Between the low clock speed (2.26 GHz / 2.5 GHz boost) and the older processor tech AND the emulation overhead, one Xeon core is probably not going to be as fast as one PS3 Cell SPE unit, so if that thread was just close enough to full to work on the PS3 it’ll be too slow on my PC…

Windows’ Task Manager shows a spread of work over 8-9 logical processors, but not fully utilized. Threads that are woken/sleeped frequently like audio or video processing tend to get broken up on this kind of graph (switching processors on each wake), so you can’t tell if one *OS* thread is maxed out as easily from the graph.

This all leads me to believe the emulator’s inherently CPU-bound here, and really would do better with a more modern 4-core or 6-core CPU in the 3ish-to-4ish GHz range. I’ll try some of the other games I have discs still lying around for just for fun, but I expect similar results.

This is probably something to shelve until I’ve got a more modern PC, which probably means a big investment (either new CPU+RAM+motherboard or just a whole new PC) so no rush. 🙂

by brion at October 11, 2017 02:50 PM

Wikimedia Tech Blog

The what and how of code health

Photo by CEphoto, Uwe Aranas, CC BY-SA 3.0.

I first encountered the term “code health” in Max Kanat-Alexander’s post on the Google Testing Blog.  It is simply defined as: “…how software was written that could influence the readability, maintainability, stability, or simplicity of code“.

The basic premise behind code health is that a developer’s quality of work, productivity, and overall happiness can be drastically improved if the code they work with is healthy.

That’s a pretty broad definition to say the least, but what’s equally important to the what code health is is the how it is then managed by a team. In Max’s post, he outlines how Google formed a small team called the Code Health Group. Each member of the team was expected to contribute an impactful percentage of their normal work efforts towards the Code Health Group’s priorities.

At the Wikimedia Foundation, we have formed a similar group. The Wikimedia Code Health Group (CHG) was launched in August 2017 with a vision of improving code health through deliberate action and support.

The CHG is made up of a steering committee, which  plans to focus our improvement efforts towards common goals, and sub-project teams. The group will not only come up with prospective improvement initiatives, but be a conduit for others to propose improvements. The steering committee will then figure out staffing and needed resources, based on interest and availability of staff members.

In this post, I’ll talk a little bit about my definition of code health, and what we can do to manage it. Before that, however, I want to share why I became interested in working on this subject at the Foundation.

My deep dive into MediaWiki software

I’m a relatively new member of the Wikimedia Foundation—I joined the the Release Engineering team in January of 2017 with the goal of helping the Foundation and broader technical community improve its software development practices.

One of my first tasks was to understand our development practices and ecosystem, and I started by talking with developers who deeply understood MediaWiki — both in terms of what we did well and where there was room for improvement.  My goal was to better understand the historical context for how MediaWiki was developed, and learn more about areas that we could improve.  The result of these discussions are what I refer to as the “Quality Big Picture.” (I know, catchy name.)

What I learned during this discovery process was that there was room for improvement as well as a community of developers eager to improve the software.

Several weeks later, I had the opportunity to attend the Vienna Hackathon, where I hosted a session called “Building Better Software.”  There, I shared what I had learned, and the room discussed areas of concern, including the quality of MediaWiki extensions and 3rd party deployments.

Other topics came up: I heard from long-time developers that some previous efforts to improve MediaWiki software lacked sustained support and guidance, and that efforts were ad-hoc and often completed by volunteers in their spare time.

The challenge was therefore two-fold: how to define and prioritize what to improve, and how to actually devote resources to make those improvements happen. It was these two questions that led me to Max’s blog post, and the subject of code health.

Let’s define “code health”

With some of the background laid out, I’d like to spend a little time digging into what “code health” means.

At a basic level, code health is about how easily software can be modified to correct faults, improve performance, or add new capabilities. This is broadly referred to as “maintainability”.  What makes software maintainable? What are the attributes of maintainable software?  This leads to another question: What enables developers to confidently and efficiently change code?  Enabling developers after all, is what we are really targeting with code health.

Both a developer’s confidence and efficiency can vary depending on their experience with the codebase.  But that also depends on the code health of the codebase. The lower the code’s health, then the more experience it takes for a developer to code with both confidence and efficiency.  Trying to parse a codebase with low-code health is difficult enough for veteran developers, but it’s almost  impossible for new/less experienced developers.

Interestingly, the more experienced a developer is with a code base, the more they want to see code health increase because code with lower health is more difficult and time-consuming to parse.  In other words, high code health is good for both experienced and inexperienced developers alike.

Attributes

So what are the attributes of code health?  For me, it boils down to four factors: simplicity, readability, testability, and buildability.

Simplicity

Let’s start with simplicity. Despite being subjective by nature, simplicity is the one attribute of code health that may be the most responsible for low code health.  Fundamentally, simplicity is all about make code easier to understand. This goes against the common sentiment that because software is often written to solve complex problems, the code must be complex as well. However, that’s not always true: hard problems can be solved with code that’s easy to parse and understand.

Code Simplicity encompasses a number of factors. The size and signatures of functions, the use of certain constructs such as switch/case,  and broader design patterns can all impact how easy a codebase is to understand and modify.

Despite its subjective nature, there are ways to measure code complexity such as the Cyclomatic and Halstead complexity measures.  The former is already available against some of MediaWiki’s repos.  But these tools come with a caveat because complexity measures can be misleading.

Readability

Another factor that affects code health is readability. Readability becomes more important as a development community grows in size.  You can think of readability as the grammatical rules, sentence structures, and vocabulary that are present in any written human language.

Although a programming language’s syntax enforces a certain core set of rules, those rules are generally in place to provide a basic structure for a human to communicate with the computer, not another human. The paragraphs below are an example of how something can become significantly more complex without a common well understood set of rules.  Given some time, you can still make sense of the paragraphs, but it is more difficult and error prone.

Much of what we see in-terms of poor readability is rooted in the not-so-distant history of programming.  With limited computing resources such as processing, memory, and communication, programmers were encouraged to optimize code for the computer — not another human reader.   But optimization is not nearly as important as it once was (There are always exceptions to that rule, however, so don’t set that in stone.).  Today, developers can optimize their code to be human friendly with very little negative impact.

Examples of readability efforts include creating coding standards and writing descriptive function and variable names.  It’s quite easy to get entangled in endless debate about the merit of one approach over another — for example, whether to use tabs or spaces.  However, It’s more important to have a standard in place — whether it’s tabs or spaces — than to quibble about whether having a standard is useful. (It is.)

Although not all aspects of readability are easily measured, there is a fair amount of automated tooling that can assist in enforcing these standards.  Where there are gaps, developers can encourage readability through regularly-scheduled code reviews.

Testability

Testability is often missing from many discussions regarding code health.  I suspect that’s because the concept is often implied within other attributes.  However, in my experience, if software is not developed with testability in mind, it generally results in software that is more difficult to test.  This is not unlike other software attributes such as performance or scalability.  Without some forethought, you’ll be rolling the dice in terms of your software’s testability.

I’ve found that it’s not uncommon for a developer to say that something is very difficult to test.  Though this may sound like an excuse or laziness, it’s often pretty accurate.  The question becomes: Could the software have been designed and/or developed differently to make it easier to test? Asking this question is the first step to make software testable.

Why should a developer change anything to make it easier to test?  Remember the developer confidence I mentioned earlier?  A big part of developer confidence when modifying code is based on whether or not they broke the product.  Being able to easily test the code, preferably in an automated way, goes a long way to building confidence.

Buildability

The three attributes I’ve already mentioned are fairly well understood and are frequently mentioned when discussing healthy code. But there is a fourth attribute that I’d like to discuss. Code health is incomplete without including a discussion around Buildability, which I define as the infrastructure and ecosystem that the developer depends on to build and receive timely feedback on code changes they are submitting.

To be fair, you’d be hard pressed to find any material on code health that doesn’t mention continuous integration or delivery, but I think it’s important to elevate its importance in these discussions.  After all, not being able to reliably build something and receive timely feedback hampers developer productivity, code quality, and overall community happiness.

The How

Now that we’ve talked about what code health is, we can discuss our next question: How do we address it?   This is where we transition from talking about code to talking about people.

MediaWiki, like many successful software products/services, started with humble beginnings.  Both its code base and developer base have grown as the Wikimedia projects have matured, and the personality of the code has evolved and changed as the code base expanded.  All too often, however, this code is not refactored and “cruft”—or unwanted code—develops.

None of this of course is news to those of us at the Foundation and in the volunteer developer community that work on MediaWiki.  There are many stories of individual or groups of developers going above and beyond to fix things and improve code health.  However, these heroics are both difficult to sustain without formal support and resources, and often are limited in scope.

While speaking to developers during my first few months at the Foundation, I was inspired by what I heard, and I want to ensure that we’re working towards making these kinds of efforts to make our codebase more sustainable and even more impactful.  Luckily, enabling developers is core to the mission of the Release Engineering team.

Simply forming the CHG isn’t sufficient. We also need to build momentum through ongoing action and feedback loops that ensure that we’re successful over the long-term.  As a result, we’ve decided to take on the following engagement approach:

  1. The Code Health Group is now meeting on a regular cadence.

The goal of these monthly meetings is to discuss ongoing code health challenges, prioritize them, and spin up sub-project teams to work towards addressing them.

Agenda and notes from those meetings will be made available through the Code Health Group MediaWiki page.

Although the CHG has been formed and is meeting regularly, it’s far from complete.  There will be plenty of opportunities for you to get involved over the coming months.

  1. We’ll share what we learn.

The CHG will look to provide regular knowledge sharing through a series of upcoming blog posts, tech talks, and conferences.

We anticipate that the knowledge shared will come from many different source both from within the MediaWiki community as well as the broader industry.  If you have a code health topic that you’d like to share, please let us know.

  1. We plan to hold office hours.

For code health to really improve, we need to engage as a community, like we do for so many other things, and that involves regular communication.

Although we will fully expect and support ad hoc discussions to happen, we thought it might enable those discussions if we had some “office hours” where folks can gather on a regular basis to ask questions, share experiences, and just chat about code health.

These office hours will be held in IRC as well as a Google Hangout.  Choose your preferred tech and swing on by.  Check out the CHG Wiki page for more info.

What’s next?

Though the CHG is in a nascent stage, we’re happy with the progress we’ve made. We’re also excited about where we plan to go next.

One of the first areas we plan to focus on is identifying technical debt. Technical debt—which I’ll discuss in an upcoming series of posts—is closely aligned with code health. The newly launched Technical Debt Program will live within the Code Health Group umbrella.  We believe that a significant portion of Technical Debt on MediaWiki is due to code health challenges.  The technical debt reduction activities will help build sound code health practices that we will then be able to use to avoid incurring additional technical debt, and reducing what currently exists.

Over the coming weeks, we will be releasing a series of blog post on Technical Debt.  This will be followed by a broader series of blog posts related to code health.  As the code health hub, we’ll also share what we learn from the broader world.  In the meantime, please don’t hesitate to reach out to us.

Jean-Rene Branaa, Senior QA Analyst, Release Engineering
Wikimedia Foundation

Thank you to Melody Kramer, Communications, for editing this post.

by Jean-Rene Branaa at October 11, 2017 02:38 PM

October 10, 2017

Wikimedia Foundation

Winners announced in Europeana’s First World War portfolio contest

Partial reproduction from the painting “Retragerea din Dobrogea” by Ion Stoica Dumitrescu (1916). Painting via the Romanian National Museum of History, public domain.

Wikimedians from Austria, Romania, Norway, and the United Kingdom were honored this week for utilizing and amplifying the impact of digitized First World War-era content held in galleries, libraries, archives, and museums across Europe.

The winning countries were four of thirteen who entered a contest organized by Europeana, the European Union’s online platform dedicated to cultural heritage, as part of its First World War/1914–1918 centenary project. The organization has focused on Wikimedia platforms for a number of years, including last year’s art history-focused contest on Wikipedia and Wikidata.

This year’s challenge asked Wikimedia groups and affiliates to put together portfolios of their work related to memories and experiences during the war years, thereby demonstrating how people around the world can understand the conflict through open-access heritage, and emphasizing how the affiliates worked with partner cultural institutions to accomplish the task.

“This challenge,” Europeana GLAMwiki Community Manager Liam Wyatt told us, “is part of Europeana’s long-term strategy to engage the Wikimedia community and integrate Wikimedia content into all our major projects.”

The contest’s winners included:

Special mentions were extended to:

  • Wikimedia Estonia, for its “innovative use of the Wikipedia platform to curate a multilingual virtual art exhibition”
  • Wikimedia Italy, for the “quality and diversity of the content produced for their portfolio”
  • Wikimedia Switzerland, for its “detailed and beautifully produced portfolio”

Comments from Europeana and the contest participants follow.

———

Romania

The jury noted that the Romanian portfolio was created by an entirely volunteer team that went to great lengths to participate in this project to the fullest extent using Europeana 1914–18 platform resources. The Romanian Wikimedia community ran a Wikisource/Transcribathon project, uploaded files to Wikimedia Commons using crowdsourced Europeana 1914–18 material, contributed their own material to the Europeana 1914–18 platform, entered the Europeana Transcribathon as a ‘team’, and improved the categorisation and Wikipedia usage of Europeana 14–18 material that had already been shared.

Măcreanu Iulian, Coordinator of the Wikimedians of Romania portfolio, said:

We have been working for more than three years at developing a project dedicated to participation of Romania in the First World War. [With this competition] we discovered that Europeana has a huge amount of valuable resources that can be easily used for developing our project and bring it to a new level. We learnt that it is easy and very useful to develop similar projects both on Wikipedia and Europeana, as they are very complementary. Last but not the least, we were “caught” and became enthusiastic by the new type of works discovered on Europeana: telling stories, bringing old photos in light, transcribe old war journals…

As part of the prize, a volunteer representative of the Romanian Wikimedia community will receive a scholarship to the Europeana 2018 conference, held in mid-May in the Netherlands, to present their portfolio to the Europeana Network Association.

Wikimedia Austria-sponsored edit-a-thon. Photo by Hubertl, CC BY-SA 4.0.

Austria

Coming first in the “most innovative project” category is Wikimedia Austria’s portfolio, which has used Wikiversity to form a partnership with the education sector, and used various media to investigate propaganda. As Beppo Stuhl from the Wikimedia Austria Board of Trustees, explained:

We felt we could provide a new point of view to the anniversary in writing articles in Wikiversity and Wikipedia about the media and other items that have been collected during Europeana’s 1914-1918 campaign. We saw some flyers in 2013 about Europeana’s aim to collect items like letters, postcards or military decorations of the time between 1914-1918. The first results of the collection inspired us to look for partners to work on that basis to get new insights. We found partners in different institutions like museums, libraries and archives as well as the University of Vienna. We will continue the series of courses at the University up to 2018.

Recruiting for Kitchener’s army.Photo via the Bodleian Library collection.

United Kingdom

The prize for the widest diversity of content was awarded to Wikimedia UK’s portfolio, which contained projects relating to many different cultural institutions; pertained to multiple countries (England, Wales, Scotland, India, and Canada); and worked across multiple media forms, including Wikipedia, Commons, and notably also Wikidata. Richard Nevell, Project Coordinator for Wikimedia UK, stated that:

The initiative led to us discussing data donations with the Imperial War Museum, territory which is still quite new for us and could have a big impact on how people access information and give our readers a lot more high-quality content.

Wikimedia UK have been running events around the commemoration of WWI for several years. The competition has been an opportunity to bring those various projects together and showcase the varied work we’ve been doing with our partners. We’re proud of how we’ve been able to work with the likes of Bodleian Libraries and the University of Edinburgh to shine a light on aspects of the war which don’t get as much attention – like the lives of nurses or the creation of ‘Vigilance Committees’ while soldiers were away at the front.

Girls celebrating Norwegian Constitution Day, 17 May 1914. Photo via the National Library of Norway, public domain.

Norway

In awarding the prize for highest quality produced, the jury also noted that Wikimedia Norway’s portfolio showed direct engagement in Europeana content (especially through the Transcribathon platform), extensive outreach to partners, and hosting of events as a result of this challenge. When learning of this award Astrid Carlsen, Executive Director of Wikimedia Norway, said that the experience of this project has led them towards further cultural partnerships:

Wikimedia Norway has not done any projects on transcribing before we uploaded three letter collections to Europeana and Wikisource. Because we now have some experience on transcribing projects, we are looking into how we can work with GLAMs and our communities on this topic; we are now applying for funding from The Arts Council to do a project to transcribe handwritten descriptions of images from Armenia, from around 1915.

———

Ed Erhart, Senior Editorial Associate
Wikimedia Foundation

This blog post is based on (and the individual sections are drawn directly from) Europeana’s winners announcement, freely licensed under CC BY-SA 4.0.

by Ed Erhart at October 10, 2017 07:09 PM

Wiki Education Foundation

The Web As She Is Spoke

Over the last two years, it’s been my responsibility to build and maintain the Wiki Education Dashboard, a complex website that has become our primary tool for keeping track of hundreds of courses and thousands of students each term. It’s been an amazing journey so far — one I started with almost no experience in web development — and I’ve learned little bits of lots of facets of writing software and running a website. One area where our Dashboard is better than most sites is accessibility — but unfortunately, that’s not saying much.

Since the beginning of the Dashboard project, one of my most important power users has been Wiki Education’s Helaine Blumenthal. As our Classroom Program Manager, Helaine needs to be able to use the Dashboard efficiently to keep up with now more than 300 courses per term. Helaine is blind and uses a screen reader to navigate the web, which means that if a site isn’t accessible by screen reader, it’s broken.

With an interface largely written in the popular React framework and built from a wide variety of open-source JavaScript tools, the technology story of the Dashboard is pretty typical of recent web apps. Modern JavaScript frameworks have made it dramatically easier to build slick visual user experiences, narrowing the gap between what a designer can imagine and what a browser can deliver. But maintaining an accessible website has sometimes meant swimming upstream against the currents of web technology. JavaScript can rewrite a page in an instant, bringing us interactive “single-page applications” where you don’t have to reload the entire page to get from one feature to another. A site can show you just the information you need, right when you need it. But this visual information design often overrides the information hierarchy of HTML, which is the main way a screen reader makes sense of it as more than just a huge wall of text. JavaScript frameworks also mean that commons widgets and tools are easy to add to your site. JavaScript can make anything on the page look and act like a link or a button, even without the standard HTML tags. And with enough extra JavaScript code, you can make the webpage do just about anything, without worrying much about the underlying HTML.

In the last few years, the JavaScript ecosystem has gained a reputation as the ‘wild west’ of web development, with many exciting developments and opportunities against a backdrop of uncertainty and chaos. I’ve chipped away a tiny bit at this chaos when it comes to accessibility, making as much of the Dashboard usable for Helaine and any other screen reader users as I’ve been able to while reporting accessibility problems to “upstream” open source projects. But I didn’t really understand the scope of the problem until I started looking into options for help desk software, and taking a peak at the underlying code.

To support the continued growth of our Classroom Program — hundreds of courses, supported by one Classroom Program Manager and two Wikipedia Experts — we decided to invest in a system for keeping track of who needs help and who is helping them. Help desk / customer support software is a relatively mature “Software as a Service” niche, and there are many well-funded companies competing for market share. After surveying the landscape, I identified a handful of the most promising ones worth putting through their paces. At the top of the list was Zendesk, which seemed to be the most mature of the newer generation of help desk services (with about 1700 employees and 100,000 customers). Unfortunately, after signing up for a trial and testing it out with Helaine, it quickly became clear that having a large customer base and large engineering staff is no guarantee of good accessibility practices. It was unusable by a screen reader for even the most basic tasks, and from what I can tell by inspecting their website, they hadn’t even put in the smallest bit of effort. (We ultimately went with Desk.com as our help desk service; while far from perfect, it’s generally proved accessible enough for our needs — even with a similarly JavaScript-heavy interface.)

I’m always looking for ways to improve the usability and accessibility of the Wiki Education Dashboard, and I’ve still got a lot to learn. Now that I’ve dipped my toe into web accessibility, I find myself losing patience quickly with big software companies and open source projects that don’t even try.

by Sage Ross at October 10, 2017 04:19 PM

Lorna M Campbell

Ada Lovelace Day – Professor Elizabeth Slater

Today is Ada Lovelace Day and unfortunately I am stuck at home waiting for a network engineer to come and fix my intermittent internet, rather than joining colleagues in Edinburgh for this year’s event celebrating Women in STEM.   Ada Lovelace Day is always one of my favourite events of the year so I’m gutted to miss it, especially as there will be periodic table cup cakes as designed by chemist Ida Freund! However I’m planning to participate remotely so that I can defend my Metadata Games crown and I am also hoping to write a Wikipedia article about a woman from this field who was an inspiration to me when I was a student.

Vulcan hammering metal at a forge watched by Thetis. Engraving by Daret after Jacques Blanchard. CC BY, Wellcome Images, Wikimedia Commons.

Dr Elizabeth Slater was a lecturer in archaeology when I studied at the University of Glasgow in the late 1980s and it was as a result of her lively and engaging lectures that I developed an interest in archaeological conservation.  I believe Liz started her academic career as a scientist, before developing an interest in archaeometallurgy, and from there moving into archaeology. Liz taught us material sciences, conservation, archaeometallurgy, early smelting techniques, chemical analysis, experimental archaeology and the use and abuse of statistics.  It was her lectures on archaeometallurgy that really fascinated me though and it was from Liz that I remember hearing the theory as to why smith gods in so many mythologies happen to be lame.  One of the most common native ores of copper, which was smelted before the development of bronze, is copper arsenic ore, and when it’s smelted it gives off arsenic gas.  Prolonged exposure to arsenic gas results in chronic arsenic poisoning, the symptoms of which include sensory peripheral neuropathy, or numbness of the extremities, and distal weakness of the hands and feet.  It’s not too difficult to imagine that early smiths must frequently have suffered from chronic arsenic poisoning and that, Liz suggested, was why smith gods were portrayed as being lame.  This appears to be a relatively widespread theory now and I’m not sure it can be credited to any one individual, however it’s a fascinating story and one I always associate with Liz.

Liz left Glasgow in 1991 to take up the Garstang Chair in Archaeology at the University of Liverpool.  At the time, I believe she was the only current female Professor of Archaeology in the UK, Professor Rosemary Cramp having retired from Durham University the previous year.  Liz had a long and active career at the University of Liverpool, where she also served as Dean of the Faculty of Arts.  She retired in 2007 and died in 2014 at the age of 68.   In 2015 the University of Liverpool commemorated her contribution to archaeological sciences by opening the Professor Elizabeth Slater Archaeological Research Laboratories.

Liz does not currently have a Wikipedia page, but hopefully I can do something about that this afternoon.

by admin at October 10, 2017 08:46 AM

Gerard Meijssen

#Wikipedia discovers #OpenLibrary

On Facebook, Dumisani Ndubane posted his discovery of Open Library:
I just discovered that The Internet Archive has a book loan system, which gives me access up to 5 books for 14 days. So I have a library on my laptop!!! This is awesomest!!!
And it is. Anybody can borrow books from the Open Library (is is part of the Internet Archive). What Dumisani did not know at the time is that there are books in other languages to be found as well.

Dumisani found out by accident; he googled for an ebook called "Heart of darkness" by Joseph Conrad. What Dumisani did not know at the time is that the Open Library includes books in many languages. His next challenge: find the books in Xitsonga, and tell his fellow Wikipedians about it.
Thanks,
      GerardM

by Gerard Meijssen (noreply@blogger.com) at October 10, 2017 05:55 AM

October 09, 2017

Wiki Education Foundation

Roundup: Closing the gender gap in cinema on Wikipedia

The overwhelming success of the summer blockbuster Wonder Woman has brought new mainstream attention to the topic of gender and cinema, with many media outlets noting that women are less likely to be chosen for major creative and directorial positions or perform as the lead protagonist. Gender equity is a very real issue, which makes Carlton University professor Laura Horak’s class on Topics in Cinema and Gender very timely. Students for this course created and expanded multiple articles on gender related topics such as female creative professionals and their work.

Students created many new articles for the course, one of which is on Deanne Foley, a Canadian director, writer, and producer. Foley was inspired to become a filmmaker after attending a film festival aimed at women, the St. John’s International Women’s Film Festival. She has since created several short and feature length films, worked on three television series, and was awarded Artist of the Year at the Newfoundland and Labrador Arts Council Awards in 2015. Another new addition was the article for Kristiene Clarke, who is cited by several sources as being the first transgender film director in the world to have created documentaries addressing the topic. Through her work Clarke hopes to dispel common misconceptions about transgender persons, most notably the assumption that all transgender individuals are part of one homogeneous group and have the same values as one another. Along with film directing and producing, Clarke also works as an educator at the University of Kent, where she teaches moving image production, and serves as a guest lecturer and educator at other universities worldwide.

Born in South America, filmmaker and writer Michelle Mohabeer hails from Canada, where she has worked as an educator at many universities, including the University of Toronto. Her first short film, Exposure, received notice when it was released in 1990 and was credited as being the first up-front lesbian film from the National Film Board in Studio D. Considered to be groundbreaking, Exposure explores the topics of race and racism, sexuality and homophobia, cultural and ethnic identity. Students expanded the article for Caroline Leaf, a Canadian-American filmmaker, animator, director, producer, and tutor known for her pioneering work at the National Film Board of Canada, where she created the sand animation and paint-on-glass animation techniques.

Fans of French cinema may recognize the name Nadine Trintignant, a French film director, producer, editor, screenwriter, and novelist. Her 1967 film Mon amour, mon amour was nominated for the prestigious Palme d’Or at the Cannes Film Festival. Trintignant has also written several books and novels. Unafraid of controversy, Trintignant added her name to the Manifesto of the 343 – also known as the “Manifesto of the 343 Sluts” – in 1971. This manifesto was signed by 343 women and was printed in the French magazine Le Nouvel Observateur. In the manifesto the women declared that they had received abortions at some point in their lives, which was then illegal in France. The reason for this move was because the women sought to advocate for reproductive rights, something for which they were willing to face criminal prosecution. This proved to be an effective move as it led to further protests and advocacy, which helped impact the 1974/1975 adoption of a law that repealed penalties for receiving an abortion within the first ten weeks of pregnancy.

Wikipedia has a wealth of knowledge. However, the site cannot grow without users contributing to and correcting its information. Editing is a wonderful way to teach your students about technical writing, collaboration, and sourcing in a unique learning environment. If you are interested in using Wikipedia with your next class, please contact Wiki Education at contact@wikiedu.org to find out how you can gain access to tools, online trainings, and printed materials.

Image: Carleton_University_River.jpg, by Peregrine981 , CC BY-SA 3.0, via Wikimedia Commons.

by Shalor Toncray at October 09, 2017 04:19 PM

Wikimedia UK

Get involved: WikiProject Social Housing in the United Kingdom

Brandon Estate, Southwark – Image by Jwslubbock

By John Lubbock, Wikimedia UK Communications Coordinator

There’s been a lot of discussion over the past six months about housing policy in the UK, and the rumblings of discontent about the housing crisis that is particularly affecting London and the South East have been going on for years. I used to work as a community organiser on a former council estate in South London, so in early July I decided to start the Social Housing in the United Kingdom WikiProject. Two weeks later, the Grenfell Tower fire happened.

Wikipedia’s role in these kind of policy questions is to summarise the available information into an easily searchable introduction to the topic. We seek to provide a neutral summary of information which will help people discuss the subject and encourage people to find solutions. The conversation about how to deal with the housing crisis is long overdue, and it is a shame that it took a tragedy like Grenfell to finally put it on the political agenda. All we can hope to do is to give people the resources to have that discussion in the most productive way. This is what we did when we edited pages about the EU before the referendum last year, though sadly the pages received the biggest spike in traffic the day after the vote.

via GIPHY

That’s why I began talking to Paul Watt, a lecturer in housing policy at Birkbeck, around a year ago. Paul writes and speaks about the history of housing and has a good collection of photos he has taken himself over the years which we hope to make available on Wikimedia Commons so that they can be used to improve articles. I hope to be able to organise a Housing editathon in 2018 to engage people who are interested in the topic to learn how to edit Wikipedia and improve related pages.

There are a few things you can do if you would like to help this project progress:

  1. Add your name to the list of participants at the bottom of the Wikiproject
  2. Help expand the list of articles needing improvement or creation
  3. Upload photos of social housing to Wikimedia Commons
  4. Get in touch with us if you need help, advice, or would like to help organise an event

Editing Wikipedia means that your efforts may be read by policymakers and people in the housing sector who may have influence over the future development of social housing in the UK. Your old photos of social housing areas could have historical value and be seen by thousands of people searching for information about them online. We would like to contribute to the development of better policy on housing, but we can’t do it without your help, so let us know if you want to get involved via one of the links below.

Facebook: facebook.com/wikimediauk

Twitter: twitter.com/wikimediauk

Email: john.lubbock@wikimedia.org.uk

by John Lubbock at October 09, 2017 02:32 PM

Jeroen De Dauw

Yield in PHPUnit data providers

Initially I started creating a general post about PHP Generators, a feature introduced in PHP 5.5. However since I keep failing to come up with good examples for some cool ways to use Generators, I decided to do this mini post focusing on one such cool usage.

PHPUnit data providers

A commonly used PHPUnit feature is data providers. In a data provider you specify a list of argument lists, and the test methods that use the data provider get called once for each argument list.

Often data providers are created with an array variable in which the argument lists get stuffed. Example (including poor naming):

/**
 * @dataProvider provideUserInfo
 */
function testSomeStuff( string $userName, int $userAge ) {}

function provideUserInfo() {
    $return = [];

    $return[] = [ 'Such Name', 42 ];
    $return[] = [ 'Very Name', 23 ];
    $return['Named parameter set'] = [ 'So Name', 1337 ];

    return $return;
}

The not so nice thing here is that you have a variable (explicit state) and you modify it (mutable state). A more functional approach is to just return an array that holds the argument lists directly. However if your argument list creation is more complex than in this example, requiring state, this might not work. And when such state is required, you end up with more complexity and a higher chance that the $return variable will bite you.

Using yield

What you might not have realized is that data providers do not need to return an array. They need to return an iterable, so they can also return an Iterator, and by extension, a Generator. This means you can write the above data provider as follows:

function provideUserInfo() {
    yield [ 'Such Name', 42 ];
    yield [ 'Very Name', 23 ];
    yield 'Named parameter set' => [ 'So Name', 1337 ];
}

No explicit state to be seen!

Stay tuned for more generator goodness if I can overcome my own laziness (hint hint :))

by Jeroen at October 09, 2017 06:16 AM

This month in GLAM

This Month in GLAM: September 2017

by Admin at October 09, 2017 03:58 AM

Tech News

Tech News issue #41, 2017 (October 9, 2017)

TriangleArrow-Left.svgprevious 2017, week 41 (Monday 09 October 2017) nextTriangleArrow-Right.svg
Other languages:
العربية • ‎čeština • ‎Deutsch • ‎English • ‎español • ‎suomi • ‎français • ‎עברית • ‎italiano • ‎日本語 • ‎português do Brasil • ‎русский • ‎svenska • ‎українська • ‎粵語 • ‎中文

October 09, 2017 12:00 AM

October 08, 2017

Brion Vibber

2009 Workstation recovery – libvpx benchmark

I’m slowly upgrading my old 2009-era workstation into the modern world. It’s kind of a fun project! Although the CPUs are a bit long in the tooth, with 8 cores total it still can run some tasks faster than my more modern MacBook Pro with just 2 cores.

Enjoy some notes from benchmarking VP9 video encoding with libvpx and ffmpeg on the workstation versus the laptop…

(end)

by brion at October 08, 2017 10:15 AM

October 07, 2017

Resident Mario

Wikimedia Foundation

New survey creates path for hearing from diverse Wikimedia communities

Photo by Jason Krüger/WMDE, CC BY-SA 4.0.

Imagine a survey designed to hear from Wikimedia contributors, people from every project and affiliate, used to influence the Wikimedia Foundation’s program strategies.

Welcome to Community Engagement Insights.

In 2016, the Wikimedia Foundation initiated Community Engagement Insights, under which we designed a Wikimedia contributors and communities survey, which cultivates learning, dialogue, and improved decision making between the Foundation and multiple community audiences. Foundation staff designed hundreds of questions and organized them into a comprehensive online survey. We then sent it to many different Wikimedians, including editors, affiliates, program leaders, and technical contributors. After completing our basic analysis, we now want to share what we learned, and what we are going to do with it.

On Tuesday, October 10, at 10 am PST/5 pm UTC, we will hold a public meeting to present some of the data we found, how different teams will use it, offer guidance on how to navigate the report, and open the space for questions. You can join the conversation on a YouTube livestream, asking question via IRC on #wikimedia-office.

How are these surveys different from others?

Looking at the history of surveys at the Foundation, we did not have a systematic approach to hearing input and feedback from communities about the Foundation’s programs and support. For example, between 2011 and 2015, there were only four comprehensive contributors surveys:[1] three Wikipedia Editor surveys, and one Global South User Survey. Between 2015 and 2016, however, we witnessed a growing demand for community survey data. In that year alone, there were ten different surveys requested by ten different teams. While the first four surveys were exploratory, to learn from users about broad themes, newer surveys were looking for more specific feedback on projects and initiatives. However, individual teams didn’t have a structure to approach this type of inquiry for international audiences, nor was there a system in place for the foundation to hear from the communities it serves on a regular basis.

This is when we started to think about a collaborative approach to surveying communities. At the beginning of the Community Engagement Insights project, we interviewed teams to learn about the specific audience groups they worked with, and what information they wanted to learn from them. Understanding that the need for community survey data would continue to grow, we started thinking of a systematic and shared solution that could respond to support this emergent demand. The Wikimedia Contributors and Communities Survey was the answer. It has three key characteristics:

  • It is an annual iteration: The Wikimedia Contributors and Communities Survey is held year over year to observe change over time.
  • There is a submission process: Participating teams can participate, and submit survey questions they would like answered.
  • It is a collaborative effort: Survey expertise is spread across the organization, so people from different teams, especially those who have submitted questions, collaborate also not only on survey design, sampling and analysis, but also outreach, messaging, translation and communication.

Towards the end of the survey design process, 13 teams had submitted 260 questions, aimed at 4 audience groups: editors or contributors, affiliates, program leaders, and developers (also known as technical contributors).

What did we learn about Wikimedia communities in 2016, and how we serve them?

 

Male to female ratios.
Graphic from CE Insights 2016-17 report.

Regions editor respondents reported they come from.
Graphic from CE Insights 2016-17 report.

In terms of response rates, we had 26% (4,100) response rate from editors, 53% (127) response rate from affiliates, and 46% (241) response rate from program leaders.[2] Volunteer developers were not sampled, and we got 129 responses from that audience group.

For Wikimedia Foundation programs that are community-facing, we collected data on three different areas. Personal information allowed us to understand the personal characteristics of the communities we serve, such as gender and demographics. Here, we found that while the percentage of women contributors is still below 15% across all regions, the number is higher when it comes to leadership roles: 25% of program leaders are women, as well as 28% of affiliate representatives. The majority of editors across all projects come from Western Europe (44%), followed by Eastern Europe (15%), South America (11%), and Western Asia (9%).[3]

The Wikimedia Contributors and Communities Survey also had questions about Wikimedia environments—the spaces that we are trying to make an impact on as a movement, such as the Wikimedia projects, software, or affiliates. Looking at the projects environment, we learned that 31% of all survey participants have felt uncomfortable or unsafe in Wikimedia spaces online or offline. Also, 39% agree or strongly agree that people have a difficult time understanding and empathizing with others. When rating issues on Wikipedia, the top three were vandalism, the difficulty in gaining consensus on changes, and the amount of work that goes undone. While 72% of the survey participants claim to be satisfied or very satisfied with the software they use to contribute, 20% reported being neither satisfied nor dissatisfied.

Activities in the last 12 months.
Graphic drawn from CE Insights 2016-17 report.

Percent of participants who began contributing from 2001–2016.
Graphic drawn from CE Insights 2016-17 report.

Wikimedia Foundation programs is the third area we explored. Programs include Annual Plan programs that are aiming to achieve a certain goal, like New Readers or Community Capacity Development, and also other regular workflows such as improving collaboration and communication. Focusing on this goal, the Support and Safety team at the Foundation offers services to support Wikimedians. They learned that an average of 22% of editors across regions had engaged with staff or board of the Foundation. Regionally, the Middle East and Africa represented the highest rate of engagement, at 37%.

These are only very few highlights. The full report has hundreds more data points in all three areas, about community health, software development, fundraising, capacity development, brand awareness, and collaboration, among other topics and programs.

Get involved!

Since each Foundation team had specific questions that were tailored for particular initiatives and projects, each group is now in the process of analyzing the data they received. The end goal is to use this information for data-driven direction and annual planning. Some teams will be sharing their takeaways in the Tuesday, October 10 meeting.

We will continue to work with community organizers that can help spread the word about the survey, and engage more community members in taking the survey next year. If you are interested in joining this project, please email eval[at]wikimedia[dot]org.

María Cruz, Communications and Outreach manager, Community Engagement
Edward Galvez, Survey specialist, Community Engagement
Wikimedia Foundation

Footnotes

  1. Community Surveys at the Foundation“, Community Engagement Insights, presentation at Wikimania 2017.
  2. See more about our sampling strategy on Commons.
  3. These numbers are heavily influenced by the sampling strategy and do not necessarily represent the population of editors.

by María Cruz and Edward Galvez at October 07, 2017 01:10 AM

October 06, 2017

Wikimedia Foundation

Community digest: UNESCO Challenge participants contribute massive amount of content to Wikimedia projects; news in brief

Photo by Véronique Dauge/UNESCO, CC BY-SA 3.0 IGO.

The UNESCO Challenge, a writing contest on Wikipedia, was organized for the first time this year from 18 April (the International Day For Monuments and Sites) to 18 May 2017. We are glad to announce the results and share the learned lessons from this project.

96 participants signed up and contributed 6,917,069 bytes of content, equivalent to 1,729 A4-size print pages, about the world heritage sites in 28 languages on Wikipedia. In addition, 326 pictures were uploaded as part of the Connected Open Heritage project were added to the articles.

The contest was organized by Wikimedia Sweden (Sverige), the UNESCO and the Swedish National Heritage Board as part of the Connected Open Heritage project. It focused on improving Wikipedia’s content about the world heritage sites, giving special attention to threatened heritage sites.

103 articles were created about heritage sites in Sweden, and 35 existing articles were improved in 24 different languages. The 40 participants who worked on them were rewarded by the Swedish National Heritage Board with a book about the Swedish World Heritage sites. And we were happy to announce the winners who helped make this success:

Participant feedback

In the responses to the survey we sent out, the feedback was really positive. We are delighted that many of the participants stated that they participated mainly out of their inclination to improve Wikipedia, and that more than two thirds of them plan to keep improving the articles after the contest is over. Many appreciated that the organizers were quick to answer their questions, provide information when needed. Many believe that that the rules were clear and easy to follow, and they liked the reward system. The theme was interesting for many and the new photos released by the UNESCO made it fun. By the contest time, Wikimedia Sweden (Sverige) worked with the UNESCO to release photos of thousands of world heritage sites on Wikimedia Commons.

However, we will take into account criticism of the slightly distorted scoring system for future competitions. Several participants felt that the score for adding pictures was too generous in comparison to the effort that required to write an article. We will also look at how the points can be easily reported by the participants, and if additional topics should be added.

We are looking forward to the next edition of the challenge.

John Andersson, Wikimedia Sverige

In brief

  • Wikimedia Israel publishes an encyclopedic writing guide: The new guide helps contributors understand how to add substantial content to Wikipedia especial academic contributors. “The guide thus teaches how to asses the encyclopedic importance of a topic,” says Michal Lester, executive director of Wikimedia Israel, “how to find independent and reliable sources on that topic, how to structure the information according to Wikipedia’s article format, and how to produce neutral and succinct writing.” More on that on Wikimedia Israel’s website.
  • Bulgarian students are introduced to Wikipedia in I Can – Here and Now: 40 students in the Super Summer Academy I Can – Here and Now attended an introductory workshop to editing Wikipedia by the experienced Bulgarian Wikipedia editors Vassia Atanassova and Justine Toms. The annual Super Summer Academy has been held annually for the past six years. However, this year is the first time to include Wikipedia editing in their program. More details about the event in This Month In Education newsletter.
  • New edition of Wiki Loves Africa kicks off: With the theme of “African people at work,” Wiki Loves Africa, the annual media sharing contest kicks off this week. Through the end of November, participants will be uploading photos, audio and video files that document the lives of people at work in the African continent. Details about the contest and how to participate on Wikimedia Commons.
  • 2017 strategy update: Phase one of the movement’s shared strategic direction for the Wikimedia movement has been wrapped up last week. During the month of October, individuals, and groups are invited to endorse the strategic direction. In the coming weeks participants will be preparing for Phase two, which will involve developing specific plans for how to achieve the direction we have built together. More details on meta.
  • Ombudsman commission call for volunteers: The Ombudsman commission works on all Wikimedia projects to investigate complaints about violations of the privacy policy, especially in use of CheckUser tools, and to mediate between the complaining party and the individual whose work is being investigated. Volunteers serving in this role should be experienced Wikimedians, active on any project, who have previously used the CheckUser tool OR who have the technical ability to understand the CheckUser tool and the willingness to learn it. More details on the requirements and how to apply are on Wikimedia-l.
  • Diversity award goes to the 2017 Wikimedia Hackathon mentoring program: The first Austrian Open Source Award in the “diversity” category went to the 2017 Wikimedia Hackathon mentoring program. The Austrian Open Source Award was established this year in order to raise awareness and visibility for our local Open Source Communities and their projects. More details on Wikimedia-l.

Samir Elsharbaty, Blog Writer
Wikimedia Foundation

by John Andersson and Samir Elsharbaty at October 06, 2017 08:29 PM

Wiki Education Foundation

Welcome, Cassidy!

Cassidy Villeneuve

I’m happy to announce that Cassidy Villeneuve has joined our team as Communications Associate. In this role, Cassidy is responsible for general communications tasks across departments, as well as maintaining the Wiki Education blog and social media channels.

Cassidy graduated from Scripps College this past May with a Bachelor’s degree in Interdisciplinary Humanities. Professionally, she has worked as a Digital Storytelling Workshop Facilitator for Bay Area nonprofit StoryCenter, as well as a Media and Design Consultant for LA nonprofit the It Gets Better Project.

Outside of work, Cassidy enjoys crafting, exploring her new neighborhood on foot, and swimming in the ocean (although she doesn’t brave the cold San Francisco waters often!). She also loves citizen-scientist books about cetacean biology and venturing out to Point Reyes to see sea life in action.

Welcome aboard, Cassidy!

by LiAnna Davis at October 06, 2017 04:03 PM

Weekly OSM

weeklyOSM 376

26/09/2017-02/10/2017

A map of unmapped places on OSM

A map of unmapped places around the world 1 | Copyright © Pascal Neis (neis-one.org) – Map data © OpenStreetMap contributors

Mapping

  • A new task on MapRoulette. Help clean up non-existing schools from the GNIS US 2009 import.
  • Florian Lainez announced proudly on the French mailing list, that in Accra, the main city of Ghana 320 minibus lines with stops are now available in OSM thanks to Jungle Bus. A week after he asked for help with rendering on a website, the rendering of Mapanica was adapted and now you can see all the lines here.
  • The discussions about large-scale modifications of the wikidata=* tag started by Yuri Astrakhan (we reported in our last issue) are still going on and have been amended by further discussion topics and threads:
  • We reported in September Statistics Canada workshop in Ottawa to present a pilot project with the local OSM community to map all buildings in Ottawa. This time on Talk-ca StatCan, propose a goal for the OSM community to add all buildings in Canada by the year 2020. Some contributors on the talk-ca list expressed the opinion that this goal is not realistic with the difficulty to negotiate licenses with municipalities and without a clear engagement from StatCan.
  • Martijn van Exel from Telenav asks on Talk-ca mailing list how to tag bilingual destination signs (e.g. “Rue Regent St”).with key destination:street
  • User Gorm presents a new JOSM plugin: Roundabout Expander, to help mapping these features from a single node.
  • Mateusz Konieczny proposes to generate lists of objects that need our attention, like statues referring directly to a person instead of using subject:wikipedia/subject:wikidata tags or OSM objects for which Wikidata has a web url, that can be added to OSM as website tag.
  • Voting is open on the Fire Hydrant Extension wiki page proposal until October 15th. It adds new tags but also replaces existing ones.

Community

  • Matthew Darwin proposes to the folks on Talk-ca mailing list to meet at State of the Map US (20th–22nd October) and discuss about founding a local chapter of OSMF.

Imports

  • Enock Seth Nyamador wants to import 216 District Borders into OSM and asks for advice.
  • The buildings import for Denver Metropolitan area is ready. Russell Deffner wrote an update at the import mailing list. At the wiki, you can read about the import in detail.

OpenStreetMap Foundation

  • Tom Hughes confirms on OSM-dev that the OSRM routing service is not available anymore. The demo server on osm.org was provided by Mapbox and has been put down without any prior notice. There is an ongoing discussion in this pull request about commercial services and various possibilities around using them.
  • Joost Schouppe started on osmf-talk the discussion about sponsored OSM Foundation memberships with the expressed objective to help increase diversity among members.
  • Already participated in the survey of the DWG? (we reported last week).

Events

  • Rob Nickerson one of the organisers of the State of Map shared a budget for a generic large State of the Map events, this will be a useful reference for organising similar events in the future.

Humanitarian OSM

  • The DailyMail reports how the Nigerian youth “armed with smartphones and boots” mapped flooded areas in Niamey.
  • The developers of Ushahidi, a tech platform for social activism, attended the HOT summit 2017 and share their experience in a blog post.

Maps

  • [1] Pascal Neis updated his map of unmapped places around the world.
  • Ilya Zverev wrote a script to inventorying railway=subway stations in major cities around the world for use in MAPS.ME. which means it will be the first app that uses solely OSM data for routing passengers. Michael Reichert informs that Geofabrik also develops a Public Transport version 2 validator.
  • There was an issue raised in help.openstreetmap.org about a delay in updates between OpenStreetMap and Strava maps which uses Mapbox. Mapbox states that they can’t guarantee a timeframe for the edits to appear but they appreciate everyone’s contributions and are continuing to work hard to make sure their maps are up-to-date.
  • The University of Heidelberg has produced a map “osmlanduse.org” that analyses OSM landuse in an area (and also uses some remote sensing data). An introduction is here and the map is here.

Open Data

  • Heise writes (automatic translation) about a proposed EU directive compelling open source platforms to monitor their data for copyright infringements at high expenses.

Software

  • Kort Game, an app for Android and iOS to add missing information to existing objects in OpenStreetMap got published. The app follows a gamification approach that pays out so called Koins for each question answered, and only when a response has been verified by another user, and then is it added to the OSM database. The question remains how such a verification will allow to get edits in rural areas added to OpenStreetMap. The development of Kort happens on GitHub. The translation on Transifex. The Kort Game App is available for Android and iOS. It is already being used – and discussed.

Programming

  • In OSM help forum, karussell, account of the developers of Graphopper routing engine, seeks an answer for the default access values that should be assumed for a ferry, a ford and a barrier when it comes to a routing engine.
  • Chris Whong has written an express.js server that allows to quickly load Mapbox GL styles from any project into the Maputnik Style Editor.

Releases

  • Mapbox Navigation SDK for Android v0.6.0 adds a prebuilt user interface similar to the one in the iOS version

Did you know …

  • … Achituv Cohen’s thesis (PDF) on “Building a Weighted Graph based on OpenStreetMap Data for Routing Algorithms for Blind Pedestrians”?
  • Marble, the online globe for Linux, Mac, Windows and Android with OSM maps?
  • update:WHODIDIT to analyze OpenStreetMap changesets? Ilya recommends a faster fork. 😉

OSM in the media

  • The New York Times reports on “A Mapathon to Pinpoint Areas Hardest Hit in Puerto Rico”.
  • A reddit user shares an animated GIF that shows hurricane Jose predicted path.
  • TV network WCPO reports about a mapping event at Miami university helping to face the hurricane damage in Puerto Rico. OSM is only mentioned in one brief direct quote, though.

Other “geo” things

  • An article in The Guardian reports on the wheelchair accessibility issues of metro networks around the world. The results are discouraging.
  • HOT is searching for a Director of Finance & Administration (Part-Time) to join their senior management team.
  • According to The Guardian, “Google Maps must improve if it wants cyclists to use it”. However, commenting users (apart from some militant pro- and anti-cycling trolls) seem to be managing just fine without it. Cited competitors such as CycleStreets, Map.Me are not clearly identified as being part of the OSM Ecosystem.
  • NASA has released a tool that makes it easy to discover satellite images of recent natural phenomena such as hurricanes, forest fires, icebergs, algae blooms, volcanoes and more. Additional information was shared by a developer on Reddit.

Upcoming Events

Where What When Country
Morbihan Opération Libre du Pays de Redon, Peillac et Les Fougerêts 2017-10-07-2017-10-08 france
Dortmund Mappertreffen Dortmund 2017-10-08 germany
Fukuchi Machi 福智町の歴史・文化まち歩きプロジェクト~自分のチカラで世界中にタカラ発信!~ 2017-10-08 japan
Rennes Réunion mensuelle 2017-10-09 france
Lyon Rencontre mensuelle ouverte 2017-10-10 france
Munich Stammtisch 2017-10-10 germany
Rostock Rostocker Treffen 2017-10-10 germany
Viersen OSM Stammtisch Viersen 2017-10-10 germany
Berlin 112. Berlin-Brandenburg Stammtisch 2017-10-13 germany
Hérault Opération libre à Jacou, Jacou 2017-10-13-2017-10-15 france
Tokyo 東京!街歩き!マッピングパーティ:第12回 旧古河庭園 2017-10-14 japan
Turin Muoversi a Torinoː Torino mapping party 2017-10-14 italy
Caen Carto-Party : Environment et Arbres 2017-10-14 france
Bonn Bonner Stammtisch 2017-10-17 germany
Lüneburg Mappertreffen Lüneburg 2017-10-17 germany
Scotland Pub meeting, Edinburgh 2017-10-17 united kingdom
Karlsruhe Stammtisch 2017-10-18 germany
Leoben Stammtisch Obersteiermark 2017-10-19 austria
Colorado Boulder]] State of the Map U.S. 2017, [[Boulder 2017-10-19-2017-10-22
Karlsruhe Hack Weekend October 2017 2017-10-21-2017-10-22 germany
Buenos Aires FOSS4G+State of the Map Argentina 2017 2017-10-23-2017-10-28 argentina
Brussels FOSS4G Belgium 2017 2017-10-26 belgium
Lima State of the Map LatAm 2017 2017-11-29-2017-12-02 perú
Yaoundé State of the Map Cameroun 2017, lors des premières Journées nationales de la Géomatique 2017-12-01-2017-12-03 cameroun
Bonn FOSSGIS 2018 2018-03-21-2018-03-24 germany
Milan State of the Map 2018, (international conference) 2018-07-28-2018-07-30 italy

Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please check your event in our public calendar preview and correct it, where appropriate.

This weeklyOSM was produced by Laura Barroso, Nakaner, PierZen, Polyglot, SeleneYang, SomeoneElse, Spanholz, Spec80, YoViajo, derFred, jinalfoflia, k_zoar, kreuzschnabel.

by weeklyteam at October 06, 2017 07:31 AM

October 05, 2017

Wikimedia Tech Blog

How Technical Collaboration is bringing new developers into the Wikimedia movement

A Wikimedia team won the Open Minds Award in the category “Diversity” for their work with a mentoring program during the 2017 Wikimedia Hackathon in Vienna. Photo by Jean-Frédéric, CC0/public domain.

The Technical Collaboration team at the Wikimedia Foundation are focusing our efforts on a single goal: recruiting and retaining new volunteer developers to work on Wikimedia software projects.

Onboarding new developers, and ensuring they are set up to succeed, is key to ensuring the long-term sustainability of the Wikimedia developer community, which works on projects seen by billions of people around the world.

The current active developer community, which currently numbers in the hundreds, helps maintain more than 300 code repositories and makes more than 15,000 code contributions on a monthly basis. That puts the Wikimedia projects on par with some of the largest and most active free software development projects in the world, like the Linux kernel, Mozilla, Debian, GNOME, and KDE, among others.

But the developer community is not growing at the pace required to ensure the long-term health of our projects. Conscious of this, the Technical Collaboration team is focusing on bringing in new volunteer developers, connecting them with existing communities, and ensuring the success of both new and experienced technical members of the Wikimedia movement.

What we’re doing

Thinking closely about the ways we conduct outreach through formal programs.

We have participated in the developer training programs Google Summer of Code for 12 years and Outreachy, run by the Software Freedom Conservancy, for 10 rounds over 5 years. Part of our goal in working with those programs is to find and train new developers who continue to contribute to our projects once they complete the internship program. To improve the retention figures, we pair developers in the program with an experienced technical mentor who shares their interest. We are also thinking carefully about the social component of the program, and in helping developers find new challenges and roles after their internships end.

Thinking about the ways in which Wikimedia hackathons and technical events can bring in new developers.

We have changed our approach at Wikimedia hackathons and in technical spaces in order to focus on new developers’ outreach and retention. In the last editions of the Wikimedia Hackathon and the Hackathon at Wikimania, we put more attention towards supporting new developers specifically, by pairing them with mentors and creating spaces specifically for them on-wiki and in-person. We have also promoted smaller regional hackathons to reach out to more developers, and we have modified our scholarship processes so that top newcomers from a local event have a better chance to end up joining our global events.

Where we plan to go next.

Outreach programs and developer events were obvious places to start our work because they already are touch points with outside developers.  However, it is also clear that in order to improve our retention of new developers, we have to pursue a variety of approaches. Here are some of the avenues we plan to focus on from our annual plan:

  • An explicit focus on diversity. We believe that diversity is an intrinsic strength in our developer community. We want to improve our outreach and support to identify developers from around the globe, invite them to join our community, and support them.
  • Quantitative and qualitative research. Most of our current knowledge and assumptions are not based on systematic research. We plan to focus on some key progress indicators to ensure that we are meeting our goals. Metrics include the number of current volunteer developers, number of new volunteer developers who joined our project over the last quarter, and the number of new developers who remain active after one year. We are also starting to survey all newcomers who contribute a first code patch, and we planto survey new developers who seem to have left the projects. We want to learn more about their initial motivations and the first obstacles they faced, and also about the factors that influenced their decision to leave. We are going to compile the data, findings and lessons learned in a quarterly report.
  • Featured projects for newcomers. We have been trying to connect potential new developers with any of the hundreds of Wikimedia projects, when in reality, the vast majority of them are not a good destination for volunteers. Many projects are inactive, and others are so active that the learning curve is rather complex. Still others don’t have mentors available or appropriate documentation. To help new developers succeed, we have decided to select a reasonable amount of projects that are ready to welcome newcomers, and we work closely with their mentors to lead newcomers to those areas—to see if this helps improve retention.
  • Multilingual documentation and support. Picking a limited set of featured projects also helps us support documentation in multiple languages for those projects.  We have also thought about the pathways that we want new users to take. While we have traditionally sent new developers to read How to become a MediaWiki hacker, this may not be the right approach if developers want to contribute to tools, bots, gadgets, mobile apps.  We are now refreshing our developer documentation for newcomers, and plan to refresh the org homepage accordingly. We also plan to offer one support channel for new developers easy to find and maintain.

By connecting all these pieces, we aim to attract more developers from diverse backgrounds, and to offer pathways into our movement—professionally and personally—that motivate them to stick around.

For many of us, joining the Wikimedia movement was a life-changing experience. We want to help new developers (and their mentors!) walk their own paths in Wikimedia, to gain experience and contacts in our unique community of communities. We want to offer them opportunities to become local heroes fixing technical problems and creating missing features for the Wikimedia communities living in their regions or speaking their languages. We want to offer them opportunities to meet peers across borders and boundaries, working on volunteer or funded projects and traveling to developer events.

We plan to bring the Wikimedia technical community to the levels that one would expect from one of the biggest and most active free software projects, from probably the most popular free content creation project. The chances to succeed depend heavily on current Wikimedia developers (volunteers or professionals) willing to share some of their experience and motivation mentoring newcomers. It also depends heavily on Wikimedia chapters and other affiliates willing to scratch their own technical itches working with us, co-organizing local or thematic developer activities with our help. The first experiments have been very positive (and fun) so far. Join us for more!

Quim Gil, Senior Manager, Technical Collaboration
Wikimedia Foundation

by Quim Gil at October 05, 2017 10:29 PM

Wiki Education Foundation

American Studies Association to improve Wikipedia’s coverage of U.S. culture and history from multiple perspectives

Wiki Education has a new partnership with the American Studies Association (ASA). ASA promotes the development and dissemination of interdisciplinary research on U.S. culture and history in a global context, and they will encourage members to participate in Wiki Education’s Classroom Program to increase the availability of information about American Studies from multiple perspectives on Wikipedia.

The ASA is constituted by some 5,000 researchers, teachers, students, writers, curators, community organizers, and activists from around the world committed to the study and teaching of U.S. history and culture from multiple perspectives, and to the circulation of that knowledge both within and outside of the academy. To disseminate a broader understanding of U.S. culture and history beyond the academy, they are looking to Wikipedia. Wikipedia is the 5th most-visited website in the world, meaning it’s the source people use to get crucial information. University students have unique access to academic peer review journals and interdisciplinary research that define the topics important to American Studies scholars, but most people can’t access those resources without a membership fee or in-depth understanding of the field. Students love Wiki Education’s Classroom Program because they get an opportunity to take their learning on campus and transfer it out into the world for others’ benefit. Student editors studying interdisciplinary American topics already have contributed valuable information, expanding Wikipedia’s coverage of African American women, American Indians, and Latinx Americans. Bringing more American Studies instructors and students into the Classroom Program will amplify this outcome and help close the well-documented diversity content gaps.

In the Classroom Program, university instructors assign students to write Wikipedia articles, empowering them to share knowledge with the world. Students research course-related topics that are missing or underrepresented, synthesize the available literature, and use our tools and trainings to add distilled information to Wikipedia. Essentially, students turn literature reviews into a resource for the world, learning how Wikipedia works along the way. While contributing cited, well-founded information, they help combat fake news on the internet. After supporting tens of thousands of students, we’ve proven this model brings high-quality academic information to the public and meaningful learning experiences to students.

To better understand the types of skills students obtain from contributing to Wikipedia as a course assignment, Wiki Education sponsored Dr. Zach McDowell, of the University of Massachusetts, Amherst, to conduct a study of our program participants during the Fall 2016 term. After careful analysis of both quantitative and qualitative data, the study found that Wikipedia-based assignments enhance students’ digital literacy and critical research skills, foster their ability to write for a public audience, promote collaboration, and motivate them more than traditional assignments. Students also gain a valuable understanding and appreciation for a source of information they use every day: Wikipedia.

We are thrilled to work with ASA’s members who are looking for support to improve their pedagogical practices and provide impactful learning experiences for students. Wiki Education provides that service and helps students learn how they can actively participate in the production of knowledge. In partnership, ASA can promote Wiki Education’s Classroom Program as one option for instructors to influence the broader perception of American studies. To join this initiative, email us at contact@wikiedu.org.

by Jami Mathewson at October 05, 2017 04:16 PM

Lorna M Campbell

Wiki Loves Monuments – An amazing contribution to the commons

The Wiki Loves Monuments competition came to a close last Friday and I’d be lying if I didn’t admit that I was still uploading ancient holiday snaps at quarter to midnight.  Who knew I had so many pics of ancient ruins?!  By the time the competition closed, a staggering 14,359 new images of UK scheduled monuments and listed buildings had been uploaded to Wikimedia Commons, over 2000 of which came from Scotland.  And what’s even more impressive is that 1,351 of those image were uploaded by colleagues from the University of Edinburgh 💕 That’s an amazing contribution to the global commons and a wonderful collection of open educational resources that are free for all to use.  Our most prolific contributor was our very own Wikimedian in Residence Ewan McAndrew, who we have to thank for spurring us all on, closely followed by Anne-Marie Scott, who contributed some glorious images of Pheobe Traquair’s murals at the Mansfield Traquair Centre.  And the diversity of the images uploaded is just incredible.  Everything from castles, cathedrals, country houses, churches, cemeteries, chambered cairns, terraces, fountains,  bars, bridges, brochs, botanic gardens, and even a lap dancing club (thank you Ewan…) I managed to upload a modest 184; my oldest monument was the Callanish Stones on my home island of Lewis and most modern was a picture of Luma Tower in Glasgow that I took out the window of a passing bus!  You can see all my pics here, and not one of them was taken with an actual camera :}

by admin at October 05, 2017 09:32 AM

October 04, 2017

Wiki Education Foundation

Gen. Quon is Wikipedia Visiting Scholar at the University of Pennsylvania

I’m pleased to announce Paul Thomas as the newest Wikipedia Visiting Scholar, sponsored by the University of Pennsylvania!

Editing as User:Gen. Quon, Paul is an experienced Wikipedian with nearly 50,000 edits, including an incredible 267 Good Articles and several Featured Articles in a range of topic areas. Though he had access to some library resources that enabled him to write many high-quality articles, he did not have access to key scholarship in one of his major areas of interest, Classics. While many older primary texts are readily available online, contemporary translations, book reviews, and a great deal of the scholarship from the past century is locked behind costly paywalls.

Paul Thomas
Image: GQ in Rome 2014.jpg, by Paul Thomas, CC BY-SA 4.0, via Wikimedia Commons.

One of the things I like most about the Visiting Scholars program is that it works in two directions. Most often, we form a connection with an academic institution interested to see their library resources put to good use on Wikipedia, and then proceed to find the right Wikipedian to do so. Other times, an experienced Wikipedian like Paul will apply even when there are no current openings matching their interests, and we reach out to university libraries to find a good fit.

I can’t imagine a better fit than the University of Pennsylvania, the private Ivy League research university in Philadelphia. It is one of the oldest higher education institutions in the country, and its Department of Classical Studies has been providing undergraduate and graduate programs for more than two centuries. Paul has access to the vast digital resources available through the Penn Libraries system.

I asked Paul what sorts of topics he’ll be contributing to as Visiting Scholar:

“My focus is on pre-Imperial Rome, Latin, and history, but I also plan to work on articles about Imperial, Late, and Medieval Latin. I have been an active Wikipedia editor for over ten years, but my interest in the classics (and Latin, in particular) only began during my undergraduate career. During this time, I was a Classics major, and I spent much of my free periods editing Wikipedia pages about classical topics, supplementing the articles with information that I learned in my classes.”

“I have several goals, the first of which is to clean-up, expand, and promote a number of important articles to Good or Featured article status, such as Ennius’s Annales, Lucretius’s De rerum natura, Cicero’s Somnium Scipionis, and Lucan’s Pharsalia. My second goal is to create, expand, and possibly promote articles about lesser known but deserving topics, such as Faltonia Proba’s patchwork poem, Cento vergilianus de laudibus Christi, and Manilius’s Astronomica. My final goal is simply to create articles about classical topics that do not yet have a Wikipedia page.”

In the short amount of time since gaining access to Penn resources, Paul has already taken the article on Manilius’s first century poem, Astronomica, up to Featured Article status, and improved Liber physiognomiae (The Book of Physiognomy) by Michael Scot to the point that it is now pending a Good Article assessment.

At Penn Libraries, this Visiting Scholars relationship is largely thanks to the work of Rebecca Stuhr, Assistant Director for Liaison Services and Librarian for Classical Studies, who said this of her interest in the program:

“As a librarian, I am excited to work in collaboration with Penn’s Department of Classical Studies and the Wikipedia’s Visiting Scholars program. I am looking forward to the work that Paul Thomas will be completing during our year together for a few reasons. First, it allows us to support the work of a Wikipedian dedicated to developing materials within the scope of a Classical Studies curriculum. Second, I am happy to support Wikipedia’s efforts to make its ubiquitous go-to-source, a source our students are likely making use of, an ever higher quality resource. Finally, the program provides a framework for a parallel project through which I will be working with undergraduate classics students to develop their familiarity with specialized classical studies research tools as they identify Wikipedia articles of interest relevant to the field of Classical Studies and expand ‘further reading’ lists. I hope that if our Classical Studies Wikipedia project shows good results, it will be something that other librarians and departments at Penn will want to emulate.”

If you’re a Wikipedian interested to connect with an academic library, or if you work at an academic institution and would like to see your resources put to good use on Wikipedia, visit the Visiting Scholars section of our website or email visitingscholars@wikiedu.org

Image: Claudia Cohen Hall – panoramio.jpg, by Bohao Zhao, CC BY 3.0, via Wikimedia Commons.

by Ryan McGrady at October 04, 2017 06:00 PM

Welcome, Özge!

Özge Gündoğdu

I’m excited to announce that Özge Gündoğdu has joined Wiki Education as Executive Assistant and Office Manager. In her new role, Özge will be responsible for providing general administrative support to staff, as well as to the board. We’re very happy to have her.

Özge has spent the last couple of years working for Kimpton Hotels in San Francisco where she organized conferences and events, often juggling an average of 40 groups simultaneously from contract to departure. Özge holds a Bachelor’s Degree in International Relations from Gazi University in Ankara, Turkey, where she graduated in the top 10% of her class. She came to the United States in 2005 and received a Certificate in Business Administration from UC Berkeley in 2007.

Outside of work, Özge enjoys spending time with her husband and 18-month-old son. They love being outside and exploring the city through a toddler’s eye. Özge enjoys cooking, traveling to warm places, and going out for brunch. She appreciates and enjoys all cuisines San Francisco has to offer, although she’s not quite ready to try chicken feet or pig ears.

Welcome to the Wiki Education family, Özge!

by Frank Schulenburg at October 04, 2017 04:40 PM

Wikimedia Foundation

Research libraries and Wikimedia: A shared commitment to diversity, open knowledge, and community participation

Painting by Anna Marie Wirth, public domain.

In August of last year, representatives from the Wikimedia community, the Wikimedia Foundation, and the Association of Research Libraries came together to determine common goals and find areas to collaborate. During two days of discussing our respective cultures and roles in the open knowledge landscape, several themes emerged, two of which provide the framework for a new collaborative project with indigenous and tribal communities in the United States and Canada:

  1. The potential for linked open data to connect information from different, typically disconnected data sources, and mutually enrich libraries’ and Wikipedia content
  2. An overarching commitment to increase diversity and inclusion in library and Wikipedia culture and content.

Boxes of archival documents at York University. Photo by Smallison, CC BY-SA 4.0.

This project, Advancing Reconciliation and Social Justice in Libraries through Research Library and Community Collaboration in Wikimedia Projects, uses a case study approach to model community collaboration in the creation of linked open data, in this case related to archival and special collection materials related to Indigenous communities in North America. The project focuses on getting local communities access to documentation related to their own local news and culture. These materials carry much cultural value for these communities, but are frequently under-described, held far from the communities they originated from, are not digitized, and/or suffer from contested or problematic concepts of ownership and custodial history.

Even if the content is accessible, under-description can be reflected in how information is organized and structured. Metadata describing content may be minimal, incorrect, exclusionary, or absent altogether, thereby obscuring the content’s contextual importance. Traditional library and archival practices of description, as well as other power dynamics, reinforce this structural exclusion. This project will contend with these problematic aspects of library and archival professional practice and bring them into conversation with Wikimedia community practices to demonstrate how a more inclusive and community-engaged approach can more accurately tell stories about Indigenous communities’ notable accomplishments and impact in the world.

The records of the Mariposa Folk Festival Foundation at York University contain information about numerous Indigenous artisans, storytellers, musicians, and activists. Cesar Newashish, for example, was a master canoe builder and traditional artisan featured in several Mariposa programs and newsletters, in addition to a film by the National Film Board of Canada. His work can be found in museums and displayed at cultural events all around Canada.

Despite all this, there is little information about Newashish online. One of the few places where you can read about him is on the Atikamekw language version of Wikipedia, using an article created due to the combined work of the Manawan Nation and Wikimedia Canada. Many others do not even have an article or item about them.  The Festival archives at York contain rich documentation that could build content in existing entries like that of Newashish’s and others, such as digitized programs with biographical information, newsletters with articles about the practice of artisans, and photographs documenting events, performances and individuals.

To address this overall systemic problem, our project will focus on creating inclusive structured data in archival and manuscript collections to contextualize and highlight relationships among people, and in particular the stories of Indigenous groups.[1]

Goals for the project include:

  1. Creating and cultivating existing pathways for community participation and agency in the description of archival documents and special collections,
  2. Encouraging alignment between the development of structured data within the Library, Archives, and Museum (LAM) community and social justice work, and
  3. The creation of referenced structured data that can be used to connect reliable sources with topics to help justify “notability[2] in Wikipedia in order to redress a known diversity gap in Wikipedia’s coverage.

To maximize exposure of linkable structured metadata, this project will collaborate within communities to use Wikidata as the global, openly licensed knowledge base where the data can be deposited, queried, and reused.

We will work closely with different communities, community members, archivists, librarians, scholars and students, and the Wikipedia and Wikidata communities—and draw on existing theory and best practices—to find pathways for collaboration and community participation in the work of social justice.

This project is sponsored by the Association of Research Libraries (ARL) and led by Stacy Allison-Cassin at  York University Libraries. The project lead team includes Joy Kirchner, Dean, York University Libraries; Anna St. Onge, Archivist, Digital Projects and Outreach; Mark Puente, Director of Diversity and Leadership Programs; and Judy Ruttenberg, Program Director for Strategic Initiatives, Association of Research Libraries.

Stacy Allison-Cassin, W.P. Scott Chair in E-Librarianship, Associate Librarian
Scott Library, York University

Editor’s note: A detailed project page is being developed; we will update this blog post with a hyperlink when it is complete.

Footnotes

  1. For example, different members of the Tootoosis family of Saskatchewan made numerous appearances at the festival. While Wikidata can be used to document familial relationships and appearances at the festival, there are insufficient properties to appropriately document roles like “elder” or other roles related to Indigenous knowledge structures.
  2. The English-language Wikipedia’s policy on notability, linked above, is one of many, each developed by individual language communities. You can see its equivalents on Wikidata.

by Stacy Allison-Cassin at October 04, 2017 02:34 PM

Amir E. Aharoni

Five More Privileges of English Speakers, part 2: Language and Software

For the previous part in the series, see Five Privileges of English Speakers, part 1.

I’m continuing the series of posts in each of which I write about five privileges that English speakers have without giving it a lot of thought. The examples I give mostly come from my experience translating software, Wikipedia articles, blog posts, and some other texts between English, Hebrew, and Russian. Hebrew and Russian are the languages I know best. If you have interesting examples from other languages, I am very interested in hearing them and writing about them.

I’m writing them mostly as they come into my mind, without a particular order, but the five items in this part of the series will focus on usage of the English language in software, and try to show that the dominance of English is not only a consequence of economics and history, but that it’s further reinforced by features of the language itself.

1. Software usually begins its life in English

English is the main language of software development worldwide.

The world’s best-known place for software development is Silicon Valley, an English-speaking place. That’s the place of Facebook, Google, Apple, Oracle and many others. California is also the home of Adobe.

There are several other hubs of software development in United States: Seattle (Microsoft, Amazon), North Carolina (Red Hat), New York (IBM, CA), Massachusets (TripAdvisor, Lotus, RSA), and more. The U.S. is also the source for much of computer science research and education, coming from Berkeley, MIT, and plenty of other schools. The U.S. is also the birthplace of the Internet, originally supported by the U.S. Department of Defense and several American universities. The world wide web, which brought the Internet to the masses, was created in Switzerland by an English speaker.

Software is developed in other countries—India, Russia, Israel, France, Germany, Estonia, and many other countries. But the dominance of the U.S. and of the English language is clear. The reason for this is not only that the U.S. is the source for much of computer technologies, but also—and probably more importantly—that the U.S. is the biggest consumer market for software. So developers in all countries tend to optimize the product for the highest-paying consumers, and these only need English.

When engineers write the user interface of their software in English, they often do not give any thought to other languages at all, or make translation possible, but complicated by English-centric assumptions about number, gender, text direction, text size, personal names, and plenty of other things, which will be explored in further points.

2. Terminology

English is also the source for much of the computer world’s terminology. Other languages have to adapt terms like smartphone, network, token, download, authentication, and thousands of others.

Some language communities work hard to translate them all meticulously into native words; Icelandic, Lithuanian, French, Chinese, and Croatian are famous examples. This is nice, but requires effort on behalf of terminology committees, who need to keep up with the fast pace of technological development, and on behalf of the software translators, who have to keep with the committees.

Some just transliterate most of them: keep the term essentially in English, but rewritten in the native alphabet. Hindi and Japanese are examples of that. This seems easy, but it is based on a problematic assumption: that the target language speakers who will use the software know at least some English! This assumption is correct for the translators, who don’t just know the English terms, but are probably also quite accustomed to it, but it’s not necessarily correct for the end users. Thus, the privilege is perpetuated.

Some languages, such as Hebrew, German, and Russian, are mid-way, with language academics and purists pulling to purer native language, engineers pulling to more English-based words, and the general public settling somewhere in between—accepting the neologisms for some terms, and going for English-based words for others.

For the non-English languages it provides fertile ground for arguments between purists and realists, in which the needs of the actual users are frequently forgotten. All the while, English speakers are not even aware of all this.

3. Easy binary logic word formation

One particular area of computer terminology is binary logic. This sounds complicated, but it’s actually simple: in electronics and software opposite notions such as true / false, success / failure, OK / Cancel, and so forth, are very common.

This translates to a great need for words that express opposites: enable / disable, do / undo, log in / log out, delete / undelete, block / unblock, select / deselect, online / offline, connect / disconnect, read / unread, configured / misconfigured.

Notice something? All of the above words are formed with the same root, with the addition of a prefix (un-, dis-, de-, mis-, a-), or with the words “on” and “off”.

A distinct, but closely related need, is words for repetition. Computers are famously good at doing things again and again, and that’s where the prefix re- is handy: reconnect, retry, redo, retransmit.

These features happen to be conveniently built into the English language. While English has extremely simple morphology for declension and conjugation (see the section “Spell-checking” in part 1 of the series), it has a slightly more complex morphology for word formation, but it’s still fairly easy.

It is also productive. That is, a software developer can create new words using it. For example, the MediaWiki software has the concept of “oversight”—hiding a problematic page in such a way that only users with a particular permission can read it. What happens if a page was hidden by mistake? Correct: “unoversight”. This word doesn’t quite exist elsewhere, but it doesn’t sound incorrect, because familiar English word formation rules were used to coin it.

As it always happens, English-speaking software engineers either don’t think about it at all, or think that other languages also have similar word formation rules. If you haven’t guessed it already, it is not true. Sime other European languages have similar constructs, but not necessarily as consistent as in English. And for Semitic languages like Hebrew it’s a disaster, because in Semitic languages prefixes are used for entirely different things, and the grammar doesn’t have constructs for repetition and negation. So when translating software user interface strings into Hebrew, we have to use different words as opposites. For example the English pair connect / disconnect is translated as lehitḥabér / lehitnaték—completely different roots, which Hebrew is just lucky to have. Another option is to use negative words like lo and bilti, or bitul, but they are often unnatural or outright wrong. Having to deal with something like “Mark as unread” is every Hebrew software translator’s nightmare, even though it sounds pretty straightforward in English.

English itself also has pairs of negative words that are not formed using the above prefixes, for example next / previous and open / close, but in many other languages they are much more common.

4. Verbing

“Verbing weirds language”, as one of the famous Calvin and Hobbes panels says.

Despite being a funny joke in the comic, it’s a real feature of the English language: because of how English morphology and syntax work, nouns can easily jump into the roles of adjectives and verbs without changing the way they are written.

For English, this is a useful simplification, and it works in labeling, as well as in advertising. “Enjoy Coca-Cola” is something more than an imperative. The fact that it’s a short single word and that it’s the same in all genders and numbers, makes it more usable as a call to action than it would be in other languages. And, other than advertising, where are calls to action very common? Software, of course. When you’re trying to tell a user to do something, a word that happens to be both the abstract concept and the imperative is quite useful.

Perhaps the most famous example of this these days is Facebook’s “Like”. Grammatically, what is it in English? Imperative? A noun describing an abstract action? Maybe a plain old noun, as in “chasing likes” (this is a plural noun—English verb don’t have a plural form!)? Answer: it’s all of them and more.

When translated to Hebrew in Facebook’s interface, it’s Ahávti, which literally means “I loved it”. Actually, this translation is mostly good, because it’s understandable, idiomatic, and colloquial enough without compromising correctness. Still, it’s a verb, which is not imperative, and it’s definitely not a noun, so you cannot use it in a sentence as if it was a noun. Indeed, Hebrew speakers are comfortable using this button, but when they speak and write about this feature, they just use its English name: “like” (in plural láykim). It even became a slightly awkward, but commonly used verb: lelaykék. Something similar happens in Russian.

It would be impossible in Hebrew and Russian to use the exact same word for the noun and the verb, especially in different persons and genders. Sometimes the languages are lucky enough to be able to adapt an English verb in a way that is more or less natural, but sometimes it’s weird, and hurts the user experience.

5. Word length

This one is relatively simple and not unique to English, but should be mentioned anyway: English words are neither very long, nor very short. Examples of languages where words are, on average, longer than in English, are Finnish, Tamil, German, and occasionally Russian. Hebrew tends to be shorter, although sometimes a single English word has to be translated with several Hebrew words, so it can get also get longer. This is true for a pretty much any language, really.

In designing interfaces, especially for smaller screens, the length of the text is often important. If a button label is too long, it may overflow from the button, or be truncated, making the display ugly, or unusable, or both.

If you’re an English speaker, it probably won’t happen with you, because almost all software is usually designed with the word length of your language in mind. Other languages are almost always an afterthought.

The good practice for software engineers and designers is to make sure that translated strings can be longer. Their being shorter is rarely a problem, although sometimes a string is so short that the button may become to small to click or tap conveniently.


Generally, what can you do about these privileges?

Whoever you are, remember it. If you know English, you are privileged: Software is designed more for you than for people who speak other languages.

If you are a software engineer or a designer, at the very least, make your software translatable. Try to stick to good internationalization practices and to standards like Unicode and CLDR. Write explanations for every translatable string in as much detail as possible. Listen to users’ and translators’ complaints patiently—they are not whining, they are trying to improve your software! The more internationalizable it is, the more robust it is for you as a developer, and for your English-speaking users, too, because better design thinking will be going into each of its components, and less problematic assumptions will be made.


Filed under: English, Hebrew, language, Russian, software, translation, Wikipedia

by aharoni at October 04, 2017 01:08 PM

Gerard Meijssen

#Wikimedia - A user story for libraries

The primary user story for libraries is something like: As a library we maintain a collection of publications so that the public may read them in the library or at home .

Whatever else is done, it is to serve this primary purpose. In the English Wikipedia you will find at the bottom for many authors a reference to WorldCat. WorldCat is to entice people to come to their library.

It does not work for me.

My library is in Almere and, I have stated in my profile in WorldCat that I live in Almere, I have indicated that my local library is my favourite. WorldCat indicates that the Peace Palace Library is nearby.. It isn't.

When it does not work for me, it does not work for other people reading Wikipedia articles and consequently it needs to be fixed. So what does it take to fix WorldCat for the Netherlands; for me. WorldCat is used for a wordwide public and all the libraries of the world may benefit when WorldCat gets some TLC.
Thanks,
     GerardM

by Gerard Meijssen (noreply@blogger.com) at October 04, 2017 11:36 AM

October 03, 2017

Wikimedia Cloud Services

New dedicated puppetmasters for cloud instances

Back in year zero of Wikimedia Labs, shockingly many services were confined to a single box. A server named 'virt0' hosted the Wikitech website, Keystone, Glance, Ldap, Rabbitmq, ran a puppetmaster, and did a bunch of other things.

Even after the move from the Tampa data center to Ashburn, the model remained much the same, with a whole lot of different services crowded onto a single, overworked box. Since then we've been gradually splitting out important services onto their own systems -- it takes up a bit more rack space but has made debugging and management much more straightforward.

Today I've put the final finishing touches on one of the biggest break-away services to date: The puppetmaster that manages most cloud instances is no longer running on 'labcontrol1001'; instead the puppetmaster has its own two-server cluster which does puppet and nothing else. VMs have been using the new puppetmasters for a few weeks, but I've just now finally shut down the old service on labcontrol1001 and cleaned things up.

With luck, this new setup will gain us some or all of the following advantages:

  • fewer bad interactions between puppet and other cloud services: In particular, RabbitMQ (which manages most communication between openstack services) runs on labcontrol1001 and is very hungry for resources -- we're hoping it will be happier not competing with the puppetmaster for RAM.
  • improved puppetmaster scalability: The new puppetmaster has a simple load-balancer that allows puppet compilations to be farmed out to additional backends when needed.
  • less custom code: The new puppetmasters are managed with the same puppet classes that are used elsewhere in Wikimedia production.

Of course, many instances weren't using the puppetmaster on labcontrol1001 anyway; they use separate custom puppetmasters that run directly on cloud instances. In many ways this is better -- certainly the security model is simpler. It's likely that at some point we'll move ALL puppet hosting off of metal servers and into the cloud, at which point there will be yet another giant puppet migration. This last one went pretty well, though, so I'm much less worried about that move than I was before; and in the meantime we have a nice stable setup to keep things going.

by Andrew (Andrew Bogott) at October 03, 2017 11:23 PM

Wiki Education Foundation

Presentation at Howard University

Last month, I joined faculty at Howard University’s Center for Excellence in Teaching, Learning, and Assessment (CETLA) to encourage attendees to teach with Wikipedia. When I met Dr. Tracy Perkins at this year’s American Sociological Association’s annual meeting, I was thrilled when I learned she teaches at Howard and would organize a faculty workshop about Wikipedia assignments.

Howard University is consistently ranked highly among historically black colleges and universities (HBCUs), higher education institutions long known for championing and admitting students of all races. At Wiki Education, students attending HBCUs and participating in courses about racial diversity have added high quality information for Wikipedia’s readers. Students in classes or at institutions focusing on racial diversity often provide a new voice to Wikipedia, making important changes to articles’ meanings.

Take Dr. Fabian Neuner’s Black Lives and Deaths course at the University of Michigan this past spring 2017 term. Students made significant changes to articles about criminal stereotyping of African Americans, reparations for slavery, and the War on Drugs as it relates to race. Student editor User:Ujwalamurthy selected Wikipedia’s article about Kalief Browder, increasing the content in the article by a factor of eight—or 8,000 words! This is a significant expansion by any measure, but it’s outstanding for a first-time editor.

If you searched for information about Kalief Browder on March 3, 2017, you would learn the tragic tale of a young teenager who was arrested in New York and spent three years in prison without a conviction. Right there in the Wikipedia lead, you’d learn that his imprisonment and subsequent death have incited activists to demand criminal justice reform.

Kalief Browder’s article’s lead section in March 2017

But that doesn’t tell the whole story, and that’s where the power of Wikipedia classroom assignments can come in. See the student editor’s impact of adding just 26 words to the lead section:

Kalief Browder’s article’s lead section, with annotations, after a student editor’s Wikipedia assignment.

The student editor modified “young man” with “African American” in the first sentence, a crucial fact about Kalief Browder when you learn about the Black Lives Matter activism his case has ignited.

After mention of his arrest, the student editor added it was on suspicion of stealing a backpack. The fact that it was minor crime likely leads readers to question immediately how an alleged backpack theft could lead to three years’ imprisonment without conviction — and again makes it more obvious why this case has generated activism.

The addition of another short phrase makes it even clearer: Kalief Browder spent the majority of his three years in prison in solitary confinement. Many psychologists argue that solitary confinement evokes such mental anguish in its victims that we should consider it cruel and unusual punishment, made illegal in the United States by the 8th Amendment. Highlighting this fact in the lead section of Kalief Browder’s Wikipedia article draws even more attention to why activists objected to his imprisonment.

Upon comparison, a single replaced word jumped out to me that signified the impact of these edits: the student editor changed one word in this opening paragraph to emphasize Kalief Browder, a sixteen-year-old boy who was imprisoned for three years, was imprisoned not without conviction but without trial. This word brings not only specificity into the article’s introduction, but it immediately points to why activists argue there was injustice in this case.

Lastly, the student brought the most harrowing piece of information from the body of the article to this summary: “Two years after his release, Browder committed suicide.”

Since this incredible change to Kalief Browder’s Wikipedia article in late March 2017, it has received 650,000 page views. In half a year, that’s 650,000 people who came to Wikipedia asking questions about Kalief Browder’s imprisonment and death walking away with a more complete representation of the story. And when we’re talking about Wikipedia’s reach to the world’s curious citizens, let’s not forget the impact of search engines like Google. Pictured below is today’s Google results, where searchers can see the updated lead from Kalief Browder’s Wikipedia article right there in the Google Knowledge Graph, in the right-hand sidebar.

How many people use a search engine like Google when looking for information on the internet? How many activists? Policymakers? Young African Americans? Police officers?

Wikipedia is a neutral, fact-based encyclopedia, but it is only as good as the facts that are added to it. When an article like the biography of Kalief Browder doesn’t share all the facts, it can lead to an incomplete picture of reality. Kalief Browder’s legacy deserves wide readership of a complete narrative—and I couldn’t be more moved that one of our student editors in the Classroom Program has helped honor that legacy.

This is why we need more diverse voices and perspectives on Wikipedia, and I’m thrilled to support students at HBCUs like Howard University so they can learn how to add their voices to Wikipedia. Wiki Education is actively seeking more faculty who want to teach with Wikipedia, to provide a more complete description of their discipline. To join us, visit wikiedu.org/teach-with-wikipedia or email us at contact@wikiedu.org.

by Jami Mathewson at October 03, 2017 09:02 PM

Wikimedia Foundation

Knowing when to quit: What we’re doing to limit the number of fundraising banners you see on Wikipedia

Photo by Jorge Royan, CC BY-SA 3.0.

Every December, the Foundation’s online fundraising team conducts our most wide-reaching donor drive of the year, known internally as the Big English campaign. Coinciding with the end of the calendar year, considered the highest-value time period for charitable giving in the Western world, we present fundraising appeals on Wikipedia to readers in some of our largest English-speaking countries.

The Big English campaign is seen by millions of Wikipedia users, giving us an unparalleled opportunity to raise much needed revenue while educating readers about our non-profit status. But striking the right balance between education, persuasion, and saturation is a constant challenge.

Until December 2016, we throttled our fundraising banners on Wikipedia in two stages. For the first two weeks of a campaign, we would not cap the number of impressions shown to users. This meant that every reader would see a banner on every page visit to Wikipedia. After the two-week mark, we capped impressions at a maximum of 10 per device, having learned that the efficacy of our appeals consistently dropped over the course of a fundraising campaign.

Still, we didn’t have the hard data to determine exactly where our most effective impression cut-off was. Our compromise of 2 weeks at 100% effectively penalized power users of the site; for daily readers, every visit and every clickthrough meant another banner.

We are keenly attentive to any sign that our fundraising appeals are hurting traffic to the site. We know, from thousands of survey responses, that most Wikipedia readers don’t consider our fundraising content too intrusive or aggressive. But we also knew there had to be a better way to determine when to limit banners so that readers had a fair chance to contribute without experiencing burnout. And last December we found one.

Top: Donor impression counts in 2016. Bottom: Banner impressions per hour, 2015 vs. 2016. Graphs by the Wikimedia Foundation, CC BY-SA 4.0.

We crunched the data on last year’s banner impressions to determine which ones generated the largest number of donors (‘donor conversion’). Looking at the distribution across all devices, including mobile, tablet, and desktop/laptop computers, it became apparent that the first handful of banner impressions is where we converted the vast majority of donors, and the decline in conversion afterwards was undeniable.

Each subsequent banner impression after the first exhibits a rapid dropoff in conversion. After the tenth, the conversion rate is minuscule: only about 3–5% of donors donate after the 10th impression. With the data in hand, there was no need for debate; we would gladly trade that lost revenue to know that we aren’t burning out readers with ineffective and relentless fundraising messages.

What’s more, we validated a central premise of our fundraising strategies: that our model depends upon a wide readership contributing small amounts of money to sustain Wikipedia and its related services. It would be both annoying and ineffective to run fundraising banners on Wikipedia constantly throughout the year. Our strategy of annual high traffic campaigns in a rotating lineup of countries ensures that our content stays fresh and engaging to readers, while offering readers around the world the opportunity to become a sustaining donor of the world’s independent encyclopedia.

Although we’ve reduced banner frequency, we’ve still been able to match or exceed revenue in our fundraising campaigns, compared to previous years. There is always room to improve our approach to fundraising. If you have a suggestion, please share it with us at fr-creative[at]wikimedia[dot]org.

Sam Patton, Senior Fundraising Campaign Manager, Online Fundraising
Wikimedia Foundation

Want to work on cool projects like this? See our current openings.

by Sam Patton at October 03, 2017 08:18 PM

Wiki Education Foundation

Student editors and Nobel Prizes

On Monday morning, news broke that the 2017 Nobel Prize in Physiology or Medicine had been awarded to Jeffrey C. HallMichael Rosbash, and Michael W. Young. Scientists are rarely well known by the public, and they had not been among the favorites to win the Nobel. Had you rushed to Wikipedia to find out who they were, you would have found well-written biographies that provided an overview of their academic careers and, more importantly, discussed the importance of their research. Had it not been for students in Wiki Education’s programs, however, you would have been disappointed.

Prior to Monday, the biographies of the three scientists each averaged about 10 views a day; on Monday, they jumped to more than 25,000 views in one day.

In Spring 2013, Washington University at St Louis professor Erik Hertzog started integrating Wikipedia editing into his Chronobiology class. Students were asked to edit biographies of notable chronobiologists on Wikipedia as a class assignment. He’s repeated the assignment in three other classes, two more in spring 2015 and one in spring 2017. Through these assignments, Erik’s students have significantly expanded the available information on chronobiology on Wikipedia. That knowledge added by students became even more important yesterday when the Nobel Prize was awarded to three scientists whose biographies Erik’s students had improved.

Michael Rosbash’s biography was created by students in the Spring 2013 class. The students’ final version remained largely unchanged until the Nobel Prize was announced. Michael W. Young’s biography was just five sentences long when students in the Spring 2015 iteration of Erik Hertzog’s class started working on it, while Jeffrey C. Hall’s biography told of his achievements in four sentences. Now, all are fleshed out articles with detailed information about their research and careers.

When students write Wikipedia articles, they can make a substantial contribution to a body of information that’s accessible to the world. When a news story like this breaks and the Nobel Prize is awarded for something other than immunotherapy or CRISPR (the expected winners), that information is going to come from Wikipedia—if someone created an article about it—or it just isn’t going to out there in a readily accessible format. And student editors are especially well-equipped to be that someone.

To get started with a Wikipedia assignment in your class, reach out to us at contact@wikiedu.org.

by Ian Ramjohn at October 03, 2017 05:19 PM

October 02, 2017

Wikimedia Foundation

The future of offline access to Wikipedia: The Kiwix example

Photo by Dietmar Rabich, CC BY-SA 4.0.

Senior Program Manager Anne Gomez leads the New Readers initiative, where she works on ways to better understand barriers that prevent people around the world from accessing information online. One of her areas of interest is offline access, as she works with the New Readers team to improve the way people who have limited or infrequent access to the Internet can access free and open knowledge.

Over the coming months, Anne will be interviewing people who work to remove access barriers for people across the world. In her first conversation for the Wikipedia Blog, Anne chats with Emmanuel Engelhart (aka “Kelson”), a developer who works on Kiwix, an open source software which allows users to download web content for offline reading. In the eleven years since being invented, a number of organizations have utilized it, including World Possible and Internet in a Box. Still, it’s perhaps best known for its distribution of entire copies of Wikipedia in areas of low bandwidth, like Cuba.

As we noted in a 2014 profile of Kiwix, the software “uses all of Wikipedia’s content through the Parsoid wiki parser to package articles into an open source .zim file that can be read by the special Kiwix browser. Since Kiwix was released in 2007, dozens of languages of Wikipedia have been made available as .zim files, as has other free content, such as Wikisource, Wiktionary and Wikivoyage.”

In addition to Wikimedia content, Kiwix now contains TED talks, the Stack Exchange websites, all of Project Gutenberg, and many YouTube educational channels. Anne and Emmanuel chatted about how video and smart phones are changing the offline landscape—and where Kiwix plans to go from here.

———

Anne Gomez: A lot has changed in a decade. What can you do now that wasn’t possible when you started Kiwix?

Engelhart: A lot has changed, indeed. Around us, a lot more people now have broadband access, but 4 billions remain unconnected. At the same time, Internet censorship has increased. That’s not something we’d expected, and it forces us to constantly rethink offline access. On the Kiwix side, the technology has changed a lot and the project has become a lot stronger.

We now have a small and very motivated team of volunteers with a huge array of skills. Our budget, while still ridiculously low, has also increased and allows us to pay for services that are sorely needed to grow in scale.

Ten years ago, the dream was to create a technology to bring Wikipedia to people without Internet access. And we succeeded. But there still are too many folks out there who don’t know about the technology or can’t access it. Our next Big Dream, therefore, is to consolidate our solutions and be more efficient in bringing them to people who really need it.

———

What’s been the biggest surprise for you over the years?

I don’t know if I have a “best surprise ever” to tell… but I’m often impressed by the ingenuity and the resilience of our users. I think in particular about these people who travel, often in really precarious conditions, from school to school to install Wikipedia offline.

Another really dominant feeling I have is my gratefulness to the volunteers who make the project so lively. For the past 10 years, and now more than ever, they have joined and done what needed to be done so that free knowledge is available to all.

———

Smartphones have transformed the way people can access the internet, and recently you started building packaged apps for Wikimed. How has this changed the landscape and the way you view of offline access? How do you see these devices impacting the future of educational resources?

In general, I have mixed feelings about the smartphone/tablet ecosystem: On the one hand, it has done a lot to make computers and internet access more affordable to people. It has also allowed for new kinds of softwares and features. And that’s good.

On the other hand, most ecosystems are closed or proprietary, making software development pretty expensive. They also tend to treat users as consumers and encourage that mindset. I see it as a real issue, in particular for collaborative and participatory movements like Wikimedia.

Most of our audience at Kiwix does not own a computer at all, and probably never will; our priority therefore is to have a great mobile-friendly portfolio. That’s why we spent the last two years developing dedicated apps for Android. These package Kiwix with a topic-specific content (e.g. Medicine or Travel, but soon also History, Geography, or Movies). Wikimed has been a huge success, and showed us the way forward. The big learning for us has been that users search for easily actionable content rather than super powerful technologies. When it comes to offline, size of content does matter as you don’t want to download something you don’t need.

By bringing learning resources and tools at any time and to (almost) every corner, mobile devices have definitely helped people win a bit of freedom. That said, the software engineering challenges are still pretty big and a lot of resources are still needed to make sure this paradigm shift will benefit everyone.

———

What hasn’t changed?

To be honest we really would love that a technology like Kiwix someday becomes obsolete. But unfortunately, this is not going to happen anytime soon. In some case, it might even become worse. We are concerned that censorship will soon become the #1 problem for those who want to access free knowledge.

———

Video makes up over half of global bandwidth, and from the New Readers research we know that lots of people prefer to learn by video, but it’s expensive to store for offline. How are you thinking about video and other media?

I tend to think that the pedagogical value of videos are overrated. Being lazy myself, I might also prefer watching a video that using other means for learning. That does not mean this is the most appropriate way.

That said, there are lots of legitimate use for video, and in general we try to stay away from editorial discussions: we only want to focus on building the best technology. And the ZIM format that Kiwix relies on is anyway content-agnostic: this means that you can use it to store whatever content you like. We are actually already distributing dozens of offline files with videos embedded in them.

But of course the reader needs to be able to display efficiently these videos. So far, Kiwix does it but it could be better… This is something we have been working on and will keep on working on in the near future. Hopefully our effort on this will be over next year when we release a new version of Kiwix for Windows/Linux.

———

Kiwix supports more than just Wikipedia—how do you think about what content packs to include?

We always search any content which is free (as in free speech). Most of the time, ideas come as feature requests from our users/partners.

———

Kiwix is foundational to a number of other offline educational projects (IiaB, RACHEL, etc.). How do you balance supporting end users and reusers?

We try to support both as much as we can, but we consider integration projects like the ones you mention, as well as those of other deployment partners, to be the key to accessing a broader audience. They are therefore privileged because of the scaling effect they give us in terms of distribution.

———

What resources exist for people who want to know more?

We are a software project, so most of the activity is visible on our code forge (Github) at:

We also have chat channels on Freenode IRC #Kiwix (http://chat.kiwix.org) and on Slack #kiwixoffline (http://kiwixoffline.slack.com).

People can also always send us an email, if only to say hello, at contact[at]kiwix[dot]org

Anne Gomez, Senior Program Manager, Program Management
Wikimedia Foundation

Thanks to Melody Kramer for writing the introduction to this piece.

by Anne Gomez at October 02, 2017 06:22 PM

Wiki Education Foundation

Roundup: Hearing Conservation

October is National Protect Your Hearing Month and National Audiology Awareness Month, a month where audiologists and organizations like the National Institute on Deafness and Other Communication Disorders (NIDCD) would like people to take time out of their day to learn about audiology and how they can prevent noise-related hearing loss in themselves and others. According to the NIDCD, this type of hearing loss is completely preventable and can be accomplished by methods such as lowering the volume while using electronic devices and wearing hearing protectors while you are in a noisy area.

Earlier this year University of Nebraska students from Emily Wakefield’s spring class edited articles on Hearing Conservation, making their contributions an appropriate topic for this month. The class’s largest contribution was to the article for earmuffs. While the term likely brings to mind images of something warm and fluffy to wear on your head once it starts turning cold, this word is also used to refer to earmuffs intended to protect one’s hearing. Another article that the students edited was earplug — another item that can be used to help prevent hearing loss, albeit by placing the plugs inside your ears as opposed to the external protection provided by earmuffs. Students added information on musicians’ earplugs, earplugs that help maintain the ear’s natural frequency response and can be worn in the studio and during a concert. As these are designed to protect the user from overexposure to loud music, they will not properly protect users from very high noise levels. Students also made heavy edits to the article for hearing conservation programs, adding information on employee training and education. For example, educating employees on the risks of high noise levels is not enough; employers must also motivate their workers to use proper precautions.

Wikipedia has a wealth of knowledge, but the site cannot grow without users contributing and correcting information to the site. Editing is a wonderful way to teach your students about technical writing, collaboration, and sourcing in a unique learning environment. If you are interested in using Wikipedia with your next class, please contact Wiki Education at contact@wikiedu.org to find out how you can gain access to tools, online trainings, and printed materials.

Image: 2017-03-22 40 Jahre Kulturzentrum Pavillon Hannover (188), by Bernd Schwabe, CC BY-SA 4.0, via Wikimedia Commons.

by Shalor Toncray at October 02, 2017 03:28 PM

Gerard Meijssen

#Wikipedia - A user story for WikipediaXL: an end to the Cebuano issue

The user story for #Wikimedia is something like: As a Wikimedia community we share the sum of all knowledge so that all people have this available to them. 

As an achievable objective it sucks. The sum of all knowledge is not available to us either. To reflect this, the following is more realistic: As a Wikimedia community we share the sum of all knowledge available to us so that all people have this available to them.

When all people are to be served with the sum of all knowledge that is available to us, it is obvious that what we do serve depends very much on the language people are seeking knowledge in. What we offer is whatever a Wikipedia holds and this is often not nearly enough.

To counter the lack of information, bots add articles on subjects like "all the lakes in Finland". This information is not really helpful for people living in the Philipines but it does add to the sum of available information in Cebuano.

The process is as follows: an external database is selected. A script is created to build text and an infobox for each item in the database. This text is saved as an article in the Wikipedia. From the article information is harvested and it is included in Wikidata. One issue is that when the data is not "good enough", subsequent changes in Wikidata are not reflected in the Wikipedia article.

Turning the process around makes a key difference. An external database is selected. Selected data is merged into Wikidata. This data is used to generate only new article texts that are cached in all languages that have an applicable script. As the quality of the data in Wikidata improves, the cached articles improve.

With Wikipedia extended in this way, WikipediaXL, we become more adept at sharing the sum of our available knowledge. With caching enabled in this way, any language may benefit from all the data in Wikidata. It is considered important to consider the quality of new data. Data may come from a reputable source or from a source we collaborate with on the maintenance of the data. What is to be preferred is for another blogpost.

by Gerard Meijssen (noreply@blogger.com) at October 02, 2017 12:14 PM

Amir E. Aharoni

Amir Aharoni’s Quasi-Pro Tips for Translating the Software That Powers Wikipedia

As you probably already know, Wikipedia is a website. A website has content—the articles; and it has user interface—the menus around the articles and the various screens that let editors edit the articles and communicate to each other.

Another thing that you probably already know is that Wikipedia is massively multilingual, so both the content and the user interface must be translated.

Translation of articles is a topic for another post. This post is about getting all of the user interface translated to your language, as quickly and efficiently as possible.

The most important piece of software that powers Wikipedia and its sister projects is called MediaWiki. As of today, there are 3,335 messages to translate in MediaWiki, and the number grows frequently. “Messages” in the MediaWiki jargon are strings that are shown in the user interface, and that can be translated. In addition to core MediaWiki, Wikipedia also has dozens of MediaWiki extensions installed, some of them very important—extensions for displaying citations and mathematical formulas, uploading files, receiving notifications, mobile browsing, different editing environments, etc. There are around 3,500 messages to translate in the main extensions, and over 10,000 messages to translate if you want to have all the extensions translated. There are also the Wikipedia mobile apps and additional tools for making automated edits (bots) and monitoring vandalism, with several hundreds of messages each.

Translating all of it probably sounds like an enormous job, and yes, it takes time, but it’s doable.

In February 2011 or so—sorry, I don’t remember the exact date—I completed the translation into Hebrew of all of the messages that are needed for Wikipedia and projects related to it. All. The total, complete, no-excuses, premium Wikipedia experience, in Hebrew. Every single part of the MediaWiki software, extensions and additional tools was translated to Hebrew, and if you were a Hebrew speaker, you didn’t need to know a single English word to use it.

I wasn’t the only one who did this of course. There were plenty of other people who did this before I joined the effort, and plenty of others who helped along the way: Rotem Dan, Ofra Hod, Yaron Shahrabani, Rotem Liss, Or Shapiro, Shani Evenshtein, Inkbug (whose real name I don’t know), and many others. But back then in 2011 it was I who made a conscious effort to get to 100%. It took me quite a few weeks, but I made it.

Of course, the software that powers Wikipedia changes every single day. So the day after the translations statistics got to 100%, they went down to 99%, because new messages to translate were added. But there were just a few of them, and it took me a few minutes to translate them and get back to 100%.

I’ve been doing this almost every day since then, keeping Hebrew at 100%. Sometimes it slips because I am traveling or I am ill. It slipped for quite a few months because in late 2014 I became a father, and a lot of new messages happened to be added at the same time, but Hebrew is back at 100% now. And I keep doing this.

With the sincere hope that this will be useful for translating the software behind Wikipedia to your language, let me tell you how.

Preparation

First, let’s do some work to set you up.

  • Get a translatewiki.net account if you haven’t already.
  • Make sure you know your language code.
  • Go to your preferences, to the Editing tab, and add languages that you know to Assistant languages. For example, if you speak one of the native languages of South America like Aymara (ay) or Quechua (qu), then you probably also know Spanish (es) or Portuguese (pt), and if you speak one of the languages of the former Soviet Union like Tatar (tt) or Azerbaijani (az), then you probably also know Russian (ru). When available, translations to these languages will be shown in addition to English.
  • Familiarize yourself with the Support page and with the general localization guidelines for MediaWiki.
  • Add yourself to the portal for your language. The page name is Portal:Xyz, where Xyz is your language code.

Priorities, part 1

The translatewiki.net website hosts many projects to translate beyond stuff related to Wikipedia. It hosts such respectable Free Software projects as OpenStreetMap, Etherpad, MathJax, Blockly, and others. Also, not all the MediaWiki extensions are used on Wikimedia projects; there are plenty of extensions, with thousands of translatable messages, that are not used by Wikimedia, but only on other sites, but they use translatewiki.net as the platform for translation of their user interface.

It would be nice to translate all of it, but because I don’t have time for that, I have to prioritize.

On my translatewiki.net user page I have a list of direct links to the translation interface of the projects that are the most important:

  • Core MediaWiki: the heart of it all
  • Extensions used by Wikimedia: the extensions on Wikipedia and related sites
  • MediaWiki Action Api: the documentation of the API functions, mostly interesting to developers who build tools around Wikimedia projects
  • Wikipedia Android app
  • Wikipedia iOS app
  • Installer: MediaWiki’s installer, not used in Wikipedia because MediaWiki is already installed there, but useful for people who install their own instances of MediaWiki, in particular new developers
  • Intuition: a set of different tools, like edit counters, statistics collectors, etc.
  • Pywikibot: a library for writing bots—scripts that make useful automatic edits to MediaWiki sites.

I usually don’t work on translating other projects unless all of the above projects are 100% translated to Hebrew. I occasionally make an exception for OpenStreetMap or Etherpad, but only if there’s little to translate there and the untranslated MediaWiki-related projects are not very important.

Priorities, part 2

So how can you know what is important among more than 15,000 messages from the Wikimedia universe?

Start from MediaWiki most important messages. If your language is not at 100% in this list, it absolutely must be. This list is automatically created periodically by counting which 600 or so messages are actually shown most frequently to Wikipedia users. This list includes messages from MediaWiki core and a bunch of extensions, so when you’re done with it, you’ll see that the statistics for several groups improved by themselves.

Now, if the translation of MediaWiki core to your language is not yet at 18%, get it there. Why 18%? Because that’s the threshold for exporting your language to the source code. This is essential for making it possible to use your language in your Wikipedia (or Incubator). It will be quite easy to find short and simple messages to translate (of course, you still have to do it carefully and correctly).

Getting Things Done, One by One

Once you have the most important MediaWiki messages 100% and at least 18% of MediaWiki core is translated to your language, where do you go next?

I have surprising advice.

You need to get everything to 100% eventually. There are several ways to get there. Your mileage may vary, but I’m going to suggest the way that worked for me: Complete the easiest piece that will get your language closer to 100%! For me this is an easy way to strike an item off my list and feel that I accomplished something.

But still, there are so many items at which you could start looking! So here’s my selection of components that are more user-visible and less technical, sorted not by importance, but by the number of messages to translate:

  • Cite: the extension that displays footnotes on Wikipedia
  • Babel: the extension that displays boxes on userpages with information about the languages that the user knows
  • Math: the extension that displays math formulas in articles
  • Thanks: the extension for sending “thank you” messages to other editors
  • Universal Language Selector: the extension that lets people select the language they need from a long list of languages (disclaimer: I am one of its developers)
    • jquery.uls: an internal component of Universal Language Selector that has to be translated separately for technical reasons
  • Wikibase Client: the part of Wikidata that appears on Wikipedia, mostly for handling interlanguage links
  • VisualEditor: the extension that allows Wikipedia articles to be edited in a WYSIWYG style
  • ProofreadPage: the extension that makes it easy to digitize PDF and DjVu files on Wikisource
  • Wikibase Lib: additional messages for Wikidata
  • Echo: the extension that shows notifications about messages and events (the red numbers at the top of Wikipedia)
  • MobileFrontend: the extension that adapts MediaWiki to mobile phones
  • WikiEditor: the toolbar for the classic wiki syntax editor
  • ContentTranslation extension that helps translate articles between languages (disclaimer: I am one of its developers)
  • Wikipedia Android mobile app
  • Wikipedia iOS mobile app
  • UploadWizard: the extension that helps people upload files to Wikimedia Commons comfortably
  • Flow: the extension that is starting to make talk pages more comfortable to use
  • Wikibase Repo: the extension that powers the Wikidata website
  • Translate: the extension that powers translatewiki.net itself (disclaimer: I am one of its developers)
  • MediaWiki core: the base MediaWiki software itself!

I put MediaWiki core last intentionally. It’s a very large message group, with over 3000 messages. It’s hard to get it completed quickly, and to be honest, some of its features are not seen very frequently by users who aren’t site administrators or very advanced editors. By all means, do complete it, try to do it as early as possible, and get your friends to help you, but it’s also OK if it takes some time.

Getting All Things Done

OK, so if you translate all the items above, you’ll make Wikipedia in your language mostly usable for most readers and editors.

But let’s go further.

Let’s go further not just for the sake of seeing pure 100% in the statistics everywhere. There’s more.

As I wrote above, the software changes every single day. So do the translatable messages. You need to get your language to 100% not just once; you need to keep doing it continuously.

Once you make the effort of getting to 100%, it will be much easier to keep it there. This means translating some things that are used rarely (but used nevertheless; otherwise they’d be removed). This means investing a few more days or weeks into translating-translating-translating.

You’ll be able to congratulate yourself not only upon the big accomplishment of getting everything to 100%, but also upon the accomplishments along the way.

One strategy to accomplish this is translating extension by extension. This means, going to your translatewiki.net language statistics: here’s an example with Albanian, but choose your own language. Click “expand” on MediaWiki, then again “expand” on “MediaWiki Extensions”, then on “Extensions used by Wikimedia” and finally, on “Extensions used by Wikimedia – Main”. Similarly to what I described above, find the smaller extensions first and translate them. Once you’re done with all the Main extensions, do all the extensions used by Wikimedia. (Going to all extensions, beyond Extensions used by Wikimedia, helps users of these extensions, but doesn’t help Wikipedia very much.) This strategy can work well if you have several people translating to your language, because it’s easy to divide work by topic.

Another strategy is quiet and friendly competition with other languages. Open the statistics for Extensions Used by Wikimedia – Main and sort the table by the “Completion” column. Find your language. Now translate as many messages as needed to pass the language above you in the list. Then translate as many messages as needed to pass the next language above you in the list. Repeat until you get to 100%.

For example, here’s an excerpt from the statistics for today:

MediaWiki translation stats exampleLet’s say that you are translating to Malay. You only need to translate eight messages to go up a notch (901 – 894 + 1). Then six messages more to go up another notch (894 – 888). And so on.

Once you’re done, you will have translated over 3,400 messages, but it’s much easier to do it in small steps.

Once you get to 100% in the main extensions, do the same with all the Extensions Used by Wikimedia. It’s over 10,000 messages, but the same strategies work.

Good Stuff to Do Along the Way

Never assume that the English message is perfect. Never. Do what you can to improve the English messages.

Developers are people just like you are. They may know their code very well, but they may not be the most brilliant writers. And though some messages are written by professional user experience designers, many are written by the developers themselves. Developers are developers; they are not necessarily very good writers or designers, and the messages that they write in English may not be perfect. Keep in mind that many, many MediaWiki developers are not native English speakers; a lot of them are from Russia, Netherlands, India, Spain, Germany, Norway, China, France and many other countries, and English is foreign to them, and they may make mistakes.

So report problems with the English messages to the translatewiki Support page. (Use the opportunity to help other translators who are asking questions there, if you can.)

Another good thing is to do your best to try running the software that you are translating. If there are thousands of messages that are not translated to your language, then chances are that it’s already deployed in Wikipedia and you can try it. Actually trying to use it will help you translate it better.

Whenever relevant, fix the documentation displayed near the translation area. Strange as it may sound, it is possible that you understand the message better than the developer who wrote it!

Before translating a component, review the messages that were already translated. To do this, click the “All” tab at the top of the translation area. It’s useful for learning the current terminology, and you can also improve them and make them more consistent.

After you gain some experience, create a localization guide in your language. There are very few of them at the moment, and there should be more. Here’s the localization guide for French, for example. Create your own with the title “Localisation guidelines/xyz” where “xyz” is your language code.

As in Wikipedia, Be Bold.

OK, So I Got to 100%, What Now?

Well done and congratulations.

Now check the statistics for your language every day. I can’t emphasize how important it is to do this every day.

The way I do this is having a list of links on my translatewiki.net user page. I click them every day, and if there’s anything new to translate, I immediately translate it. Usually there is just a small number of new messages to translate; I didn’t measure precisely, but usually it’s less than 20. Quite often you won’t have to translate from scratch, but to update the translation of a message that changed in English, which is usually even faster.

But what if you suddenly see 200 new messages to translate? It happens occasionally. Maybe several times a year, when a major new feature is added or an existing feature is changed.

Basically, handle it the same way you got to 100% before: step by step, part by part, day by day, week by week, notch by notch, and get back to 100%.

But you can also try to anticipate it. Follow the discussions about new features, check out new extensions that appear before they are added to the Extensions Used by Wikimedia group, consider translating them when you have a few spare minutes. At the worst case, they will never be used by Wikimedia, but they may be used by somebody else who speaks your language, and your translations will definitely feed the translation memory database that helps you and other people translate more efficiently and easily.

Consider also translating other useful projects: OpenStreetMap, Etherpad, Blockly, Encyclopedia of Life, etc. Up to you. The same techniques apply everywhere.

What Do I Get for Doing All This Work?

The knowledge that thanks to you people who read in your language can use Wikipedia without having to learn English. Awesome, isn’t it? Some people call it “Good karma”.

Oh, and enormous experience with software localization, which is a rather useful job skill these days.

Is There Any Other Way in Which I Can Help?

Yes!

If you find this post useful, please translate it to other languages and publish it in your blog. No copyright restrictions, public domain (but it would be nice if you credit me and send me a link to your translation). Make any adaptations you need for your language. It took me years of experience to learn all of this, and it took me about four hours to write it. Translating it will take you much less than four hours, and it will help people be more efficient translators.

Versions of this post were already published in the following languages:

I’m deeply grateful to all the people who made these translations; keep them coming!


Filed under: Free Software, localization, Wikipedia

by aharoni at October 02, 2017 09:47 AM

Tech News

Tech News issue #40, 2017 (October 2, 2017)

TriangleArrow-Left.svgprevious 2017, week 40 (Monday 02 October 2017) nextTriangleArrow-Right.svg
Other languages:
العربية • ‎čeština • ‎English • ‎español • ‎suomi • ‎français • ‎עברית • ‎हिन्दी • ‎italiano • ‎日本語 • ‎ಕನ್ನಡ • ‎polski • ‎português do Brasil • ‎русский • ‎svenska • ‎українська • ‎中文

October 02, 2017 12:00 AM

September 30, 2017

Gerard Meijssen

#Wikipedia - #Wikidata user stories

User stories are important. They indicate why a certain functionality exists or the purpose of a project. A "user story" has a fixed format:
As a <insert a role> I would like to <insert an activitiy> so that I <insert a purpose>.
One user story is: As a Wikipedia editor, I can link an article to articles in other language(s) so that a Wikipedia reader can find an article in a language he or she can read.

Another user story:  As a Wikidata editor, I can maintain statements on Wikidata items so that Wikipedia readers always have the latest information available to them.

The first user story has been a resounding success. It is why Wikidata was relevant from the start. The second is very much a work in process and it depends very much how the current state of affairs is evaluated. There are dependencies for the efforts of so many to have an effect;
  • Readers of a Wikipedia can only see the result when the information has been included in Wikidata
  • Wikipedia readers will only see the result when the editors of their Wikipedia allow them to see it
The first dependency is with Wikidata editors but the second dependency is outside of the influence of Wikidata editors. For this reason it makes sense to formulate a different user story: As a Wikidata editor I can maintain statements on Wikidata items so that Wikipedia editors can take the responsibility to inform their public.

To help these Wikipedia gatekeepers there is a need for tools that makes them aware of the information they do not provide.
Thanks,
      GerardM

by Gerard Meijssen (noreply@blogger.com) at September 30, 2017 03:27 PM

Jeroen De Dauw

Why Every Single Argument of Dan North is Wrong

Alternative title: Dan North, the Straw Man That Put His Head in His Ass.

This blog post is a reply to Dan’s presentation Why Every Element of SOLID is Wrong. It is crammed full with straw man argumentation in which he misinterprets what the SOLID principles are about. After refuting each principle he proposes an alternative, typically a well-accepted non-SOLID principle that does not contradict SOLID. If you are not that familiar with the SOLID principles and cannot spot the bullshit in his presentation, this blog post is for you. The same goes if you enjoy bullshit being pointed out and broken down.

What follows are screenshots of select slides with comments on them underneath.

Dan starts by asking “What is a single responsibility anyway?”. Perhaps he should have figured that out before giving a presentation about how it is wrong.

A short (non-comprehensive) description of the principle: systems change for various different reasons. Perhaps a database expert changes the database schema for performance reasons, perhaps a User Interface person is reorganizing the layout of a web page, perhaps a developer changes business logic. What the Single Responsibility Principle says is that ideally changes for such disparate reasons do not affect the same code. If they did, different people would get in each other’s way. Possibly worse still, if the concerns are mixed together and you want to change some UI code, suddenly you need to deal with and thus understand, the business logic and database code.

How can we predict what is going to change? Clearly you can’t, and this is simply not needed to follow the Single Responsibility Principle or to get value out of it.

Write simple code… no shit. One of the best ways to write simple code is to separate concerns. You can be needlessly vague about it and simply state “write simple code”. I’m going to label this Dan North’s Pointlessly Vague Principle. Congratulations sir.

The idea behind the Open Closed Principle is not that complicated. To partially quote the first line on the Wikipedia Page (my emphasis):

… such an entity can allow its behaviour to be extended without modifying its source code.

In other words, when you ADD behavior, you should not have to change existing code. This is very nice, since you can add new functionality without having to rewrite old code. Contrast this to shotgun surgery, where to make an addition, you need to modify existing code at various places in the codebase.

In practice, you cannot gain full adherence to this principle, and you will have places where you will need to modify existing code. Full adherence to the principle is not the point. Like with all engineering principles, they are guidelines which live in a complex world of trade offs. Knowing these guidelines is very useful.

Clearly it’s a bad idea to leave code in place that is wrong after a requirement change. That’s not what this principle is about.

Another very informative “simple code is a good thing” slide.

To be honest, I’m not entirely sure what Dan is getting at with his “is-a, has-a” vs “acts-like-a, can-be-used-as-a”. It does make me think of the Interface Segregation Principle, which, coincidentally, is the next principle he misinterprets.

The remainder of this slide is about the “favor compositions about inheritance” principle. This is really good advice, which has been well-accepted in professional circles for a long time. This principle is about code sharing, which is generally better done via composition than inheritance (the latter creates very strong coupling). In the last big application I wrote I have several 100s of classes and less than a handful inherit concrete code. Inheritance has a use which is completely different from code reuse: sub-typing and polymorphism. I won’t go into detail about those here, and will just say that this is at the core of what Object Orientation is about, and that even in the application I mentioned, this is used all over, making the Liskov Substitution Principle very relevant.

Here Dan is slamming the principle for being too obvious? Really?

“Design small , role-based classes”. Here Dan changed “interfaces” into “classes”. Which results in a line that makes me think of the Single Responsibility Principle. More importantly, there is a misunderstanding about the meaning of the word “interface” here. This principle is about the abstract concept of an interface, not the language construct that you find in some programming languages such as Java and PHP. A class forms an interface. This principle applies to OO languages that do not have an interface keyword such as Python and even to those that do not have a class keyword such as Lua.

If you follow the Interface Segregation Principle and create interfaces designed for specific clients, it becomes much easier to construct or invoke those clients. You won’t have to provide additional dependencies that your client does not actually care about. In addition, if you are doing something with those extra dependencies, you know this client will not be affected.

This is a bit bizarre. The definition Dan provides is good enough, even though it is incomplete, which can be excused by it being a slide. From the slide it’s clear that the Dependency Inversion Principle is about dependencies (who would have guessed) and coupling. The next slide is about how reuse is overrated. As we’ve already established, this is not what the principle is about.

As to the Dependency Inversion Principle leading to DI frameworks that you then depend on… this is like saying that if you eat food, you might eat non-nutritious food such as sand, which is not healthy. The fix is not to reject food altogether, it is to not eat food that is non-nutritious. Remember the application I mentioned? It uses dependency injection all the way, without using any framework or magic. In fact, 95% of the code does not bind to the web-framework used due to adherence to the Dependency Inversion Principle. (Read more about this application)

That attitude explains a lot about the preceding slides.

Yeah, please do write simple code. The SOLID principles and many others can help you with this difficult task. There is a lot of hard-won knowledge in our industry and many problems are well understood. Frivolously rejecting that knowledge with “I know better” is an act of supreme arrogance and ignorance.

I do hope this is the category Dan falls into, because the alternative of purposefully misleading people for personal profit (attention via controversy) rustles my jimmies.

If you’re not familiar with the SOLID principles, I recommend you start by reading their associated Wikipedia pages. If you are like me, it will take you practice to truly understand the principles and their implications and to find out where they break down or should be superseded. Knowing about them and keeping an open mind is already a good start, which will likely lead you to many other interesting principles and practices.

by Jeroen at September 30, 2017 09:47 AM

September 29, 2017

Wikimedia Cloud Services

Automated OpenStack Testing, now with charts and graphs

One of our quarterly goals was "Define a metric to track OpenStack system availability". Despite the weak phrasing, we elected to not only pick something to measure but also to actually measure it.

I originally proposed this goal based on the notion that VPS creation seems to break pretty often, but that I have no idea how often, or for how long. The good news is that several months ago Chase wrote a 'fullstack' testing tool that creates a VM, checks to see if comes up, makes sure that DNS and puppet work, and finally deletes the new VM. That tool is now running in an (ideally) uninterrupted loop, reporting successes and failures to graphite so that we can gather up long-term statistics about when things are working.

In addition to the fullstack test, I wrote some Prometheus tests that check whether or not individual public OpenStack APIs are responding to requests. When these services go down the fullstack test is also likely to break, but other things are affected as well: Horizon, the openstack-browser, and potentially various internal Cloud Services things like DNS updates.

All of these new stats can now be viewed on the WMCS API uptimes dashboard. The information there isn't very detailed but should be useful to the WMCS staff as we work to improve stability, and should be useful to our users when they want to answer the question "Is this broken for everyone or just for me?"

by Andrew (Andrew Bogott) at September 29, 2017 11:04 PM