What are WikiJournals?

17:53, Saturday, 09 2019 March UTC

This article was jointly authored by Thomas Shafee and Jack Nunn from the WikiJournals board, and edited by John Lubbock of Wikimedia UK.

The WikiJournals are a new group of peer-reviewed, open-access academic journals which are free to publish in. The twist is that articles published in them are integrated into Wikipedia. At the moment, there are three:

WikiJournals are also highly unusual for academic journals, as they’re free for both readers and authors!

What WikiJournals hope to achieve:

The  aim of these journals is to generate new, high-quality peer-reviewed articles, which can form part of Wikipedia. As well as new articles, submissions can include existing Wikipedia pages, which are then subjected to the exact same rigour as any other submission.

The hope is that this new way of publishing peer-reviewed content will encourage academics, researchers, students and other experts to get involved in the process of creating and reviewing high-quality content for the Wikimedia project. It also allows participants a way of putting their contributions on their CV with an easily definable output (including DOI links and listing in indexes like Google Scholar).

When an article gets through the peer review process, there are two copies. The Journal copy can now be reliably cited and stays the same as a ‘version of record’ alongside the public reviewer comments. The Wikipedia version is free to evolve in the normal Wikipedia way as people update it over time, and is linked to the Journal article.

Since 2014, articles have been published on massive topics like Radiocarbon Dating and niche topics like Æthelflæd. They’ve also published meta analyses, original research, case studies, teaching material, diagrams and galleries!

 

Some submissions are written from scratch. Others are adapted from existing Wikipedia material. The journal editors invite academic peer reviewers to publicly comment. If published, suitable material is integrated back into Wikipedia to improve the encyclopedia. From ref.

How to get involved!

If this sounds like the sort of thing that you’d like to get involved in, support, or just spread the word on, there’s plenty of ways to contribute!

School projects

So here’s an example for a teacher. You have a class of 30 keen students who would normally all write an essay on a subject, have it read once, then never seen again. An alternative could be to have students in groups of 5 each chose a section of a neglected Wikipedia article to update and overhaul (there are millions of stub and start class articles to choose from). Each group writes a section of the article, then proofreads each others sections (WikiEdu has a great dashboard for this). Once the article is up to scratch, it’s submitted to the relevant WikiJournal who reaches out to experts in the topic to give in-depth feedback on what can be improved. If you and your students are able to fully address those comments then the article can be published and you and your students have just generated a new Wikipedia article read by thousands, and an academic article to put on their CVs!

Teachers who would consider using this method as an assessed class exercise can ask for advice from Wikimedia UK. We think that this workflow offers a useful alternative to simply having students write parts of Wikipedia articles in class, which may be harder to assess, and doesn’t provide a final product as tangible as a published journal article.

Academic outreach

The current priorities for the WikiJournals are to expand and improve representation on their editorial boards, and to invite article submissions. If you would like to volunteer in these roles, we encourage you to talk to the WikiJournal organisers.

If you are based in a UK academic institution at a course that has a strong strategic overlap with Wikimedia UK’s strategic priorities, you can also email education@wikimedia.org.uk to talk to us about providing advice on using WikiJournals as part of your course.

Individuals

The journals always welcome new submissions. Whether they’re written by a professor or a student, all go through the same process. You could get a team together to submit a brand new article. Or maybe you could overhaul and submit an existing Wikipedia page. You could even help translate an existing article.

They have a public discussion forum (typical for a wiki, unusual for a journal!) where you can share ideas for improvements, other projects they could reach out to or point out gaps in Wikipedia’s content where they could invite researchers to write an article.

Each journal has a twitter and facebook account (@WikiJMed, @WikiJSci and @WikiJHum) so feel free to chat with them there. You can even suggest social media posts or accounts to follow. Not into social media? Maybe put a poster in your university tearoom.

Monthly Report, January 2019

22:23, Friday, 08 2019 March UTC

Highlights

  • Wiki Education hosted 2019’s first in-person board meeting in the Presidio of San Francisco in late January. On this occasion, board and staff celebrated Wiki Education’s fifth anniversary.
  • We started third round of our Wiki Scholars professional development course with the National Archives and Records Administration (NARA). Ten people from a wide range of backgrounds have come together to learn to share their knowledge with the public through Wikipedia.
  • In January, Wes Reid joined the Technology department as Wiki Education’s first Software Developer.

Programs

Wikipedia Student Program

Status of the Wikipedia Student Program for Spring 2019 in numbers, as of January 31:

  • 290 Wiki Education-supported courses were in progress (176, or 61%, were led by returning instructors)
  • 5,083 student editors were enrolled
  • 70% of students were up-to-date with their assigned training modules
  • Students edited 715 articles, created 35 new entries, and added more than 209,000 words to Wikipedia.

As always, January saw a flurry of course pages coming through on the Dashboard. Wikipedia Student Program Manager Helaine Blumenthal spent the majority of her time ensuring that all of Wiki Education’s Spring 2019 courses are set up for success. This meant welcoming back returning instructors and providing that bit of extra support to first-time participants in the Student Program.

Students are enrolling on the Dashboard and getting their feet wet as they learn to navigate what’s likely the most familiar, unfamiliar site on the web.

Student work highlights:

There are always people making new innovations in the world of science, as University of Michigan students in Kush Patel and Anne Cong-Huyen’s class Digital Pedagogy with U-M Library could tell you. One article that they have created so far is the one on Alison R.H. Narayan, a William R. Roush assistant professor at the Department of Chemistry in the College of Literature, Science, and the Arts at the University of Michigan. A Michigan native, Narayan engineered cytochrome P450 enzymes to perform C-H functionalization in not native substrates during her postdoc and for her doctoral thesis she wrote “New Reactions and Synthetic Strategies toward Indolizidine Alkaloids and Pallavicinia Diterpenes”. Her work has been recognized by such major organizations as the American Chemical Society, who named her one of their “Talented 12” in a 2016 issue of Chemical & Engineering News, and by the Research Corporation, who made her one of their 2019 Cottrell Scholars.

The class also expanded the article on Melanie Sanford, a chemist who also teaches at the University of Michigan and holds the positions of Moses Gomberg Collegiate Professor of Chemistry and Arthur F. Thurnau Professor of Chemistry. She earned her BS and MS at Yale University and went on to gain her Ph.D. from the California Institute of Technology, where she worked with future Nobel Prize recipient Robert H. Grubbs. Sanford followed this up by performed her postdoctoral work at Princeton University. Her work has been recognized by numerous organizations, earning her awards and accolades such as the Royal Society of Chemistry Fluorine Prize and the prestigious MacArthur Fellowship!

While Wikipedia has a goal of holding the sum of human knowledge, it still isn’t there yet – which is why it’s so important for people to contribute their time and effort to expand articles on not only the very well known topic areas, but those that have not yet reached common knowledge world-wide. Areas like Ongamira may not be as much of a household name as say, Paris, but it holds just as much of a treasure trove of history and culture. Ongamira is a valley near the city of Córdoba, Argentina that contains caves and grottoes of immense archaeological and natural significance. The valley was formerly home to the Comechingones, who settled in the region. Many of them died as a result of battles over the land between the Comechingones and Spanish forces led by conquistadors such as Blas de Rosales, who was granted the lands by Jerónimo Luis de Cabrera. Thanks to efforts by a Paradise Valley Community College student in Kande Mickelsen and Sheila Afnan-Manns’s spring course, this historic valley now has an article on Wikipedia.

Scholars & Scientists Program

This month we started a new round of our Wiki Scholars professional development course with the National Archives and Records Administration (NARA). Ten people from a wide range of backgrounds have come together to learn to share their knowledge with the public through Wikipedia. In particular, we will focus on improving articles about women’s suffrage in the United States in advance of NARA’s upcoming exhibit, Rightfully Hers, which will be opening in Washington, DC in May 2019.

Participants are in the process of selecting their first of two articles to improve. Most will be expanding the articles on prominent (as well as less prominent) suffragists. The first step is exploring areas for opportunity, then participants conduct an evaluation of the articles while learning how to edit Wikipedia and becoming familiar with its policies and guidelines. Finally, the editing begins, improving the articles with resources from NARA and elsewhere.

Here are some of the articles current Wiki Scholars have chosen to improve:

  • Annie Smith Peck (1850–1935), a mountaineer, adventurer, suffragist, and lecturer. The northern peak of a Peruvian mountain chain was named in her honor.
  • Mary Birdsall (1828–1894), a journalist, suffragist, and temperance worker who served as president of the Indiana Women’s Suffrage Association.
  • Woman suffrage parade of 1913 (also known as the Woman Suffrage Procession), the first suffragist parade in Washington, D.C.
  • Women’s suffrage movement in Washington, focusing on events, people, publications, and activities that took place in Washington State.
  • Lillian Exum Clement (1894–1925), who served in the North Carolina General Assembly (the first woman to do so—and the first woman to serve in any state legislature in the Southern United States).
  • Mabel Ping-Hua Lee (1897–1966), a Chinese religious and women’s rights leader.
  • Amanda Way (1828–1914), a Civil War nurse, minister, and pioneer in the temperance and women’s rights movements.
Lillian Exum Clement, the first woman to serve in any state legislature in the Southern United States.
Image: File:Lillian Exum Clement (33361507400).jpg, open license, via Flickr.

Visiting Scholars Program

This month Visiting Scholars contributed several high-quality articles to Wikipedia, including both a Featured Article (FA) and a Good Article (GA). These designations are reserved for the highest quality articles and require peer review to ensure they meet strict criteria.

In 1971, West German stamp dealer Hermann Sieger secretly paid the Apollo 15 astronauts, David Scott, Alfred Worden, and James Irwin, to bring 400 unauthorized postal covers to the surface of the Moon, to be sold on their return. Worden had already received permission to take 144 other covers with him for a stamp collector friend, based on the understanding they would not be sold until after the Apollo program ended. The covers were postmarked the morning of the launch and again after splashdown. When NASA learned Worden’s friend was selling the covers, they warned the astronaut about commercializing their activities. When they learned about Sieger, the three were reprimanded, removed as backup crew members for Apollo 17, and required to give the money back. These events are known as the Apollo 15 postal covers incident, an article developed by George Mason University Visiting Scholar Gary Greenbaum, using resources from the GMU library. Gary brought the article to Good Article level last year, and continued his extensive work on it before it was finally promoted to Featured Article this month.

Hortensius (On Philosophy) is a lost dialogue written by Cicero in 45 BCE. One might think that a dialogue that’s been lost since the 6th century might be difficult to write a Wikipedia article about. Paul Thomas, Visiting Scholar at the University of Pennsylvania, did just that, bringing it up to Good Article quality. The work was notable at the time and scholars have developed a picture of its content and style based on other writings. It is said to have inspired such important historical figures as Seneca the Younger, Tacitus, Boethius, and even Augustine of Hippo. It’s through the writings of the latter, as well as Nonius Marcellus and others, that we have what pieces of the dialogue still remain.

Rosie Stephenson-Goodknight, Visiting Scholar at Northeastern University, wrote an impressive 14 articles about women writers this month, including the creation of articles about 12 women who previously were not represented on Wikipedia. For example, Emily Parmely Collins (1814–1909), a suffragist, women’s rights activist and writer. She established the first suffragists’ society in 1848: the Woman’s Equal Rights Union. Frances Manwaring Caulkins (1795–1869) was a historian and genealogist who wrote histories of town in Connecticut. She was elected to be a member of the Massachusetts Historical Society in 1849, and was the first woman to join. Rosa Louise Woodberry (1869–1932) was a journalist, educator, and stenographer. Her philosophy and science writing appeared in journals around the country and she was on the staff of both The Augusta Chronicle and Savannah Press.

Emily Parmely Collins (1814–1909), a suffragist, women’s rights activist and writer. Collins established the first suffragists’ society in 1848.
Image: File:EMILY PARMELY COLLINS.jpg, public domain, via Wikimedia Commons.

Advancement

In January, the Advancement Team began implementing its newly established charter by establishing regular team meetings, enacting team norms, and identifying/documenting team policies and processes.

Fundraising

In January, we received our first installment, totaling $233,000, of the $400,000 Annual Planning Grant from Wikimedia Foundation’s Fund Dissemination Committee. Chief Advancement Officer TJ Bliss had calls and dinner meetings with several potential new funders, all of which asked for concept notes or other follow-up documents. TJ also developed a draft proposal for a funder briefing on Wikipedia, hosted by one of our major funders. This briefing will help build awareness about the importance of Wikipedia to furthering philanthropic efforts generally. TJ and Director of Partnerships Jami Mathewson visited the Stanton Foundation in Boston, gave an oral report on our current work, and requested funding for a Scholars and Scientists course related to public policy. Jami also visited with Program Officers at the Simons Foundation in New York City and described our Wiki Scholars and Scientists efforts and our interest in Wikidata. Customer Success Manager Samantha Weald continued her research and identification of new funders and began drafting outreach letters.

Scholars & Scientists partnerships and collaborations

Samantha worked closely with new participants in our NARA Scholars & Scientists course to ensure their needs were met as they began the course.

At the end of the month, TJ and Jami presented to faculty at the Massachusetts Institute of Technology, sharing collaboration opportunities to share high quality open knowledge via Wikipedia.

Communications

January 15th was Wikipedia Day, a day of reflection and celebration for the Wikipedia community across the globe. We published a year in review blog post about what we’ve accomplished since this day last year. We were also featured in The Washington Post in a piece by Stephen Harrison celebrating Wikipedia’s 18th birthday. Later in the month, the National Institute for Occupational Safety and Health recognized our Wikipedia Student Program (and specifically a course we support at the Harvard T.H. Chan School of Public Health) as an effective way to make occupational safety information available to the public.

Blog posts:

External media:

Technology

Wes Reid.
Image: File:Wes-reid-profile-image-wiki-education.jpg, Wes (Wiki Ed), CC BY-SA 4.0, via Wikimedia Commons.

In January, Wes Reid joined the Technology department as Wiki Education’s first Software Developer. Our focus was to bring Wes up to speed on the breadth of our codebase and the broader Wikimedia technology ecosystem, and to prepare for the major technical projects on the horizon. We also had a bevy of volunteer contributions this month, including a large set of improvements to our test suite and a final set of enhancements to enable complete translation of the Dashboard’s training modules.

In addition to fixing numerous bugs, Wes improved the course approval workflow so that we can more easily keep track of new instructors and how they found out about the Wikipedia Student Program. At the end of the month, Wes and Chief Technology Officer Sage Ross also began transferring Programs & Events Dashboard to a new server for the first time since it was set up in 2015; the operating system on the original server was deprecated for use on the Wikimedia Cloud platform, and would have become inaccessible soon. (The transfer was completed, with minimal downtime, in early February.)

Outreachy intern Cressence continued her work on the event creation workflow, which now lets users of the global Programs & Events Dashboard choose which type of program they are running, with detailed explanations of the differences. Now she’s turning her attention to the start and end date interface, which has been a frequent point of confusion for global Dashboard users.

Finance & Administration

The total expenses for January were $213,000, entirely on target for the budgeted $213,000 amount. The Board meeting occurred in January; however, the budget split the costs between January and February. Where we see an overage in January for the Board, this will balance out in February, where an additional $9K is allocated and will not be used. General and Administrative costs were over budget, a combination of overages in Furniture and Equipment-$5K and Indirect Expenses-$13K, while spending under budget ($4K) Staff meeting, ($8K) Professional Services and ($1K) Rent. Programs were very close to budget, under ($2K). And Technology was under budget ($10K), Payroll ($3K), Professional Services (5K) and Occupancy Costs ($2K).

The Year-to-date expenses are $1.2M ($240K) under budget of $1.44M. We expected that Fundraising would be under by ($160K) due to a change in plan for professional services ($149K) and deciding not to engage in a cultivation event ($11K). Programs were under ($56K) due to a few changes in processes-professional services ($17K), Travel ($28K), Printing and Reproduction ($11K).) Communication ($4K) and Indirect expenses ($21K) while reporting an overage in Payroll-$21K and furniture and equipment-$4K. General and Administrative are under ($23K) due to a reduction of payroll ($14K) and professional fees mostly relating to Audit and Tax prep ($13K) and administrative costs ($8K) while spending over budget Occupancy-combined direct and indirect-+$11K and Travel $2K. As mentioned in the January report, the Board was over budget for January, due to expense accrual, and will be under budget, come February. Technology is under budget by ($15K) as there was a change in plans in utilizing the budgeted professional fees ($17K) and additional rent ($5K) and instead increased their payroll -$3K and Furniture and equipment-$4K.

Office of the ED

  • Current priorities:
    • Coordinating and overseeing work on the Annual Plan & Budget for FY 2019/20
    • Improving the organization’s resilience to staffing changes
Board and senior leadership team members gather at the former Officers Club of the Presidio for the January in-person board meeting

In January, Executive Director Frank Schulenburg worked with our auditors from Hood & Strong on finalizing our audit for fiscal year 2017/18. This year’s audit is our fourth voluntary audit since 2015 and the board approved the audit report during its in-person board meeting on on January 25. Once our work on form 990 will be done, we’ll publish both the report from Hood & Strong as well as form 990 on our website.

Board chair PJ Tabit and Sage Ross (as the youngest member of the senior leadership team) jointly cut the anniversary cake, celebrating five years of Wiki Education

As in former years, January is the month for one of our two in-person board meetings. The meeting serves at looking back at the organization’s performance during the first half of the current fiscal year and at providing the board with a high-level understanding of what to expect for next fiscal year. The meeting kicked off with Frank looking back at five years of Wiki Education (Wiki Education officially started operations in mid February 2014.) TJ then walked the board through our current fundraising efforts and also provided an analysis of our work in opening up a second revenue stream through our fee-for-service based Scholars & Scientists Program. Chief Programs Officer LiAnna Davis provided the board with an update on programs while outlining how our current and future activities connect to the organizational strategy approved by the board in June last year. LiAnna also shared that Wiki Education now brings 19% of all new contributors to the English Wikipedia and 9% of all new Wikipedia editors globally. Sage walked the attendants of the meeting through a presentation that outlined how our digital infrastructure supports our own program participants as well as program leaders in other parts of the world. As this was Sage’s first presentation at a board meeting, he also walked the board through his general philosophy behind making our Dashboard adaptable and sustainable. On the evening of the first meeting day, board and staff joined the crowd at the Internet Archive to celebrate Public Domain Day 2019. The second day of the board meeting was dedicated to a longer discussion about new board member recruitment and a report from board member Richard Knipel about new developments in the U.S. Wikimedia landscape. This in-person board meeting was the first to take place in the Presidio, where Wiki Education’s office is based.

On the evening prior to the board meeting, board member Ted Yang led a staff education event at Wiki Education’s office around the topic of “Planning for retirement.” This event kicked off our new effort of providing staff with education around topics that are relevant to their employment and life planning.

Also in January, Frank met virtually with Jens Ohlig and Nicola Zeuner from Wikimedia Deutschland, the German chapter of the Wikimedia Foundation. With Wikimedia Deutschland being the organization responsible for the technical infrastructure of Wikidata and Wiki Education planning on providing Wikidata-related trainings, the two organizations decided to collaborate more closely. As more and more people in the U.S. get information through Wikidata instead of through Wikipedia due to the rise of virtual digital assistants (that rely heavily on structured data), Wikidata will play a more prominent role in Wiki Education’s future programmatic offerings.

* * *

Make sure every woman in science has a Wikipedia bio

19:21, Friday, 08 2019 March UTC

When young women see women in STEM succeed and thrive, they feel empowered to follow their passions into those careers as well. The gender imbalance of STEM fields is not only a daunting, self-reinforcing cycle, but also a barrier to new scientific discovery (diversity and inclusion make a difference!). It’s something that must be corrected from the inside. And academics all have the power to do so.

One such avenue is by engaging the world’s #1 source of online information: Wikipedia. Academics can use their resources and expertise to make Wikipedia better, which has the power to inspire the next generation of women interested in STEM. Wikipedia matters for women in science. Wikipedia biographies of women in STEM model potential career paths for young people. Reading about women in a variety of fields can help alleviate the threat of negative stereotypes in STEM and beyond. And seeing women and people of color in a young person’s chosen profession minimizes the sense of responsibility they may feel to confront stereotypes alone.

There’s a lot of work to do on Wikipedia. Only 17% of biographies are about women. And women are more likely than men to be described in terms of their family life, rather than their careers. Let’s change that. As Eryk Salvaggio writes, the gap in Wikipedia’s coverage of women reflects worrisome stereotypes of women in science, but it’s also an unprecedented opportunity to challenge those stereotypes.

You may be familiar with physicist and Wikipedia rockstar Dr. Jess Wade and the hundreds of Wikipedia biographies about women in STEM that she has written.

“I kind of realized we can only really change things from the inside,” Dr. Wade told The Guardian. “Wikipedia is a really great way to engage people in this mission because the more you read about these sensational women, the more you get so motivated and inspired by their personal stories.” Making society’s favorite encyclopedia reflective of the diverse lives and achievements of women leads to equity in these fields.

Wikipedia is a touchstone of our digital information age and academics have told us for years that they want to get involved in helping shape it. But it isn’t obvious or intuitive where to look to learn the “rules” of editing. And it can be intimidating to enter a new community with its own sets of norms and expectations.

That’s where we come in. Wiki Education has a track record for guiding newcomers in the technical, cultural, and procedural practices of Wikipedia writing. Our virtual courses equip scholars of all levels to channel their expertise into a platform where people are most looking for information. Our course alumni are helping close Wikipedia’s gender content gap; they’re setting examples for future women in STEM; and they’re expanding their science communication tool belt to promote important research in their own fields.

One of many success stories to come out of our virtual Wiki Scientist courses is the biography of geologist and oceanographic cartographer Marie Tharp. Before a participant in our Communicating Science course started making changes, the article had information about her career and early life, but not much about her scientific contributions. It now includes more about the impact of her work. The course participant also noticed that the photograph used in the article was more a picture of a male colleague of hers, who appeared to be showing her something on a map, when in reality the map he was pointing to was of her own creation. Now the article includes a much better image of Tharp with her work.

Image 1: Marie Tharp & Bruce Heezen, copyrighted. Image 2: Marie Tharp in 2001, copyrighted.

Subtle inequalities like this are pervasive; and they are what draw a lot of scholars to our programs. Scholars increasingly want to reach the public with their efforts, and they understand that the most important public archive of our age should be representative of everyone.

“Lately, I have become increasingly frustrated by the way both women and science are discredited. How can I act as a counterbalancing force, I often wondered, while working as a full-time chemistry graduate student?” says Columbia University graduate student Karen Kwon, an alum of our course.

Ultimately, she says, “Learning how to edit and create Wikipedia pages and experiencing the culture of Wikipedia was such a joy, especially since all my efforts were put towards a cause that I deeply care about. … I will continue to edit Wikipedia so that all deserving women scientists have Wikipedia pages.”

Improving Wikipedia is a collaborative effort, one that we can all chip away at. “My guiding principal has become the idea of improvement rather than perfection,” reflects Samantha Kao, another Wiki Scientist from our course. “Any improvement leaves an article better than it was before.”

“As academics, it’s drilled into you that your time is precious and limited,” says Chelsea Sutcliffe, another course alum. “But our academic and civic duties extend beyond what is required or expected of us. Nothing will change unless we will it to. A bottom-up approach matters just as much as a top-down approach. There is nothing more satisfying than providing content for the world’s most accessible platform for anyone to see, respect, and admire.”


If you’re looking for how you can get involved right now, join us in our next Communicating Science virtual course! You have the power to make sure women are written into history. Register here.


ImageFile:(Manuscript painting of Heezen-Tharp World ocean floor map by Berann).jpg, public domain, via Wikimedia Commons. 

Filipina American artist Dorothy Santos knows what it feels like to be a minority in your field. She said, “I thought, ‘there’s no one who looks like me in this industry!’ ” She reached out to Jennifer Wofford, co-founder of the San Francisco Bay Area-based artist collaborative Mail Order Brides/M.O.B., and years later contributed to Wikipedia for the first time to write an article about the collaborative. She spoke about the experience at the UC Berkeley Art + Feminism Wikipedia Edit-a-thon on Tuesday, 5 March.

Dorothy Santos at the Berkeley edit-a-thon.

The daylong event, organized by a committee of UC Berkeley librarians, archivists, and faculty, added more than 2700 new words through 41 total edits. For some attendees, it was their first time editing Wikipedia. For others, it was a process that had begun long before the edit-a-thon.

Organizer Emily Vigor’s journey began several years ago. She’s an archivist on campus at the Environmental Design Archives, which holds collections relating to local architecture and landscape architecture. Emily processed the collection for a female architect named Alice Carey. Noticing that Carey didn’t have a Wikipedia page, Emily decided she would try to get an article about Carey published online. “It was rejected as soon as I put it up for not being notable enough,” she said. “I was pretty flummoxed. She wasn’t my first Wikipedia edit, actually. I’d created pages for male architects before with fewer verifiable resources and hadn’t run into issues.” Around the same time, Emily learned about Art + Feminism, the worldwide campaign of edit-a-thons to improve coverage of cis and transgender women, non-binary folks, feminism and the arts on Wikipedia. Later, she participated in her first edit-a-thon at Berkeley, and wrote a blog post for the Environmental Design Archives about the struggle to publish articles about female architects.

Chris Marino with edit-a-thon participants.

Her colleague Chris Marino found a similar impetus to participate in efforts around inclusion. “When I started to edit Wikipedia, a lot of my edits were flagged. I saw the importance of having informal get-togethers where basic skills can be taught.” Chris works with primary records of the designed environment and is particularly passionate about recognizing women in the design field. While creating a page about graphic designer, illustrator, and author of gardening books Maggie Baylis, she encountered roadblocks right away. She had written the finding aid about Baylis for the Environmental Design Archives collection, but couldn’t re-use that material for the Wikipedia page because of plagiarism concerns. To make matters worse, there weren’t enough secondary sources about Baylis to cite.

Encountering these obstacles was discouraging because of her deep connection to archival subjects. “When you work with primary sources and you’re processing collections, say, for women designers, you get to know them intimately,” Chris said. “You’re going through all of their papers and you can see all their correspondence. Including maybe the discrimination they encountered, all that behind the scenes information. You become invested, and that makes me feel really strongly about documenting their work on Wikipedia.”

Luckily, the Maggie Baylis story had a happy ending. “Her work was part of an exhibit called Serious Play at the Denver Art Museum. She’s written about there, so I want to go add the references,” Chris said. The article is now free of flags.

Organizer Stacy Reardon said, “Everyone says the same thing: the first time they make an edit they feel elated and they feel empowered. They feel like they’ve really changed something, and they’ve created access for people around the world. It might sound a little romantic, but that’s really how you feel.” Emily echoed that sentiment: “It’s nice when you do a Google search for someone’s name and there’s a Wikipedia link.”

Maybe it’s not only nice, but necessary. In some fields, having a Wikipedia page is equated to credibility, respect, and having “made it.” If the cultural cachet that comes from encyclopedia recognition is disproportionately meted out to men, that can have downstream effects on who gets profiled in the media and studied by academics, a vicious feedback loop. Editing to share information about figures like Maggie Baylis isn’t just about leveling the playing field; it’s about expanding what we know, making sure that someone who runs a search or goes down a Wikipedia wormhole isn’t inadvertently losing out on half of history.

Dorothy Santos recalls the objective of California College of the Arts professor Tirza True Latimer’s class, “Exhibitions and Ideology,” as to edit what is scant— to figure out which artists haven’t been included in Wikipedia and profile them. Latimer used Wikipedia as a pedagogical tool, and students in her class created the page for “The Perfect Moment,” a retrospective of works by the noted photographer Robert Mapplethorpe.

More university students should take advantage of their prime position—having access to institutional resources, like archives, databases, and experts—to edit Wikipedia, edit-a-thon organizers and participants agreed.

Two Berkeley undergraduates, Maddie and Ollie, presented to the edit-a-thon about their own experience in a freshman Human Biological Variation seminar where their professor asked students to edit Wikipedia “stub” articles (articles deemed too short to provide encyclopedic coverage). The class gave them firsthand experience with democratizing knowledge and the empowerment that came from editing.

“It’s important for students to look under the hood and see how this is constructed,” organizer Corliss Lee said. “Wikipedia is omnipresent and yet many students I’ve presented to didn’t know how it’s made. It’s important for them to realize. Students who edit feel very empowered seeing their stuff out there, as opposed to handing in another class paper that never sees the light of day again.” As Maddie said, “Most undergraduate students won’t have the chance to publish papers, but anyone can edit Wikipedia.” Ollie’s sense of impact came from looking at page history and statistics: “I look at the page views. Every single person there learned something. I feel blessed to put my expertise out there in the minds of people who I’m never going to meet.”

Dorothy Santos echoed that sentiment. “Page statistics and history may look boring to a lot of people but it’s good to look behind the curtain and see specific changes that have been made. When a new user sees those changes, the process of editing becomes tangible and feasible. When I saw my own changes highlighted in perpetuity, that was impactful for me. Something I did is now a part of the pantheon of knowledge about that subject. I can help tell that story.”

Adora Svitak, Communications Fellow
Wikimedia Foundation

All photos in this blog post were taken by Adora Svitak and are licensed under CC BY-SA 4.0.

“I call my senators, I vote, I donate to the ACLU, and now, I edit Wikipedia.”

Students lack the critical media skills they need to navigate our increasingly digital society. That’s what Stanford Graduate School of Education determined in their 2016 study of media literacy in youth. Participating students were unable to identify credible sources online, to distinguish advertisements from news articles, or to understand where information came from. That’s a problem.

In Literacy Worldwide, Dr. Susan Luft asks teachers to prepare their students to be the “prosumers” (both producers and consumers) of information that they need to be in the digital age. “While putting our efforts into teaching students the craft, ethics, and responsibilities in producing media, we must also teach them to become skilled consumers of information, discerning fact from fiction at every turn or click of a hyperlink.”

Whether or not a foundation for students’ media literacy skills was set in high school, college-level instructors have the opportunity to further round out those skills. Instructors taking this opportunity not only better prepare students for future courses, but also for life as critically thinking employees and citizens.

How Wikipedia factors in

Wikipedia is a critical source of information on the internet. And what student hasn’t heard the old caveat that they shouldn’t trust the information they find there? It’s something instructors say a lot. But we all use Wikipedia (it’s the fifth most visited site in the world!). So instead of advising against it, why not teach students the skills they need to identify where its content is accurate and where it is not?

While Wikipedia has gained a much stronger reputation for reliability since its inception in 2001 (Youtube is now linking to it in videos that spread misinformation, for example), there are still content gaps that need remedying, especially for academic topics.  That’s where a Wikipedia assignment is a great catch-all. Students learn how to comb the site for inaccurate or missing information related to their course topic, and they employ research and writing skills to remedy the issues themselves.

What students think of the assignment

After Dr. Jennifer Glass‘ students at Georgia Institute of Technology created and expanded Wikipedia pages as an assignment, they self-reported numerous important learning outcomes – including new skills for identifing reputable information online.

“Wikipedia in general has a reputation for being unreliable,” reflected one student. “But when I actually read through the article, I found this to be true only for information that did not have citations.”

“Over the course of the semester they realized Wikipedia is more credible than they originally thought,” said Dr. Glass. “But they also know they have to be careful that the topic they’re reading has citations. Now they will look at Wikipedia articles they read, look for citations, and check what kind of citations.”

How Wikipedia inspires digital citizenship

Media literacy skills gathered through Wikipedia editing can equip students to be active consumers of information. Those skills help prepare students to identify fake news. Students may also be inspired to engage more actively in the information landscape. As Rice University student Katie Webber wrote about her experience of a Wikipedia assignment,

“To have some concrete thing that I feel like I can really do right now has made me really feel more confident that I can find other ways to create change going forward. I call my senators, I vote, I donate to the ACLU, and now, I edit Wikipedia.”

Technology is a powerful tool for students to engage with their digital landscape, their communities, and ultimately their future. Understanding how technological tools affect society is what makes an effective and responsible digital citizen.

Students who complete a Wikipedia assignment in their classroom feel like they have done something that matters. As we’ve seen before, when students see their work has an impact beyond the classroom, they are motivated to produce better work and are more likely to carry that work beyond their course. And according to a 2018 Strada and Gallup study, students are also more likely to report that their education has been worth the cost when they see that their schoolwork is relevant to their lives and chosen career paths.

Student learning objectives achieved

Critical media literacy is a learning outcome that is relevant across academia. We’ve supported instructors in all disciplines, including engineering, political science, rhetoric, earth science, biotech, history, law, media studies, psychology, gender and women’s studies, and more.

“Writing for Wikipedia allows students to develop critical skills for communication in the digital age. What platform is better for teaching that writing is a public activity with ethical consequences?” says Dr. Gerald Lucas of Middle Georgia State University.

“Working with Wiki Education opens up possibilities for how we teach, how that teaching engages the world, what our students accomplish in the classroom, and what kinds of conversations we can have about critical issues related to humanities and digital culture,” says Dr. Matthew Vetter of Indiana University of Pennsylvania. “I want students to be more than consumers of media. I want them to be active producers and critics of discourse and culture. I want them to understand that language shapes the world, that they need to understand that process, and participate in it. I try to do this by making critical learning and thinking come alive through innovative and consequential writing assignments. Working with Wikipedia is one of the best ways I’ve found to make this kind of pedagogy happen.”


Interested in adapting a Wikipedia assignment to fit your course? Visit teach.wikiedu.org for all you need to know to get started.

This Month in GLAM: February 2019

07:58, Friday, 08 2019 March UTC

This article is by Wikipedia administrator User:TheSandDoctor

Early this past December, I was reviewing article submissions on Wikipedia and noticed that some included templates, pages created to be included in other pages, that were inappropriate for pages that are not yet articles. This got me thinking, how widespread is this misconception? A search led to the discovery that there were over 500 — roughly one percent of the 42,939 drafts. While this is indeed a small percentage, that is still over 500 drafts which may result in confused new editors.

42,939 drafts present in the Draft namespace as of 30 December 2018. Generated using Quarry Beta (report link). Photo: TheSandDoctor.

The English Wikipedia consists of 32 namespaces, different sets of Wikipedia pages whose names begin with a particular reserved word recognized by the MediaWiki software. While it would be cumbersome to list them all here, it is worth mentioning the Draft, Main/Article (“mainspace”), and Wikipedia namespaces. The draft namespace is somewhat special as, unlike the others, it is not indexed by most search engines, including Google. This allows it to be a place where editors can develop article drafts that are not yet ready for indexing, may not yet have demonstrated adequate notability, or are notable works in progress. When a draft is deemed ready by an editor with sufficient user rights and experience, it can then be moved to the “main” article namespace, where pages most readers are familiar with are located. Another path by which a draft may make its way to the mainspace is through Articles for Creation (“AfC”), a peer review process in which experienced registered editors can either help create an article submitted by an anonymous editor or decline the article because it is unsuitable for Wikipedia.

An article may be unsuitable for Wikipedia for a number of reasons. AfC submissions can be declined based on having insufficient content, consisting of vandalism or personal attacks, posting copyrighted material, not asserting notability, and most often for not being properly sourced. We also do not accept new articles where a page on the topic already exists, even if under another name.

Articles for Creation welcome page

Articles for Creation is consistently backlogged with sometimes more than 2,000 drafts awaiting review. Any editor whose Wikipedia account is at least 90 days old, has made 500 edits to articles, has good understanding of the various notability guidelines, and agrees to review solely on a volunteer basis may apply to become a reviewer. It is important to note that despite the constant backlog, submissions must be carefully reviewed for whether or not they meet the criteria. However, in the case of those which meet any of the quick-fail criteria, the total time investment is much lower.

In order to speed up the number of submissions reviewed while maintaining review quality, the number of active reviewers must increase. If you are interested in helping out and have had an account for more than 90 days with over 500 total edits, you can find out more and how to apply here.

(Bottom) An example of an unreferenced template placed on a draft.

By its very nature, the pages within the draft namespace are designed to be separate. They are not permitted to be active members in categories nor are they permitted to be linked to within existing articles as doing so would defeat the purpose of the draft namespace — a workshop or incubator of sorts for new articles. By this very nature, having templates such as “Underlinked” or “Orphan” — both being designed for articles, not drafts — present could confuse new contributors as to its purpose, potentially giving incorrect information.

“Underlinked” refers to having too few incoming links from existing articles while “Orphan”, in the Wikipedia context, refers to having no incoming links from articles at all — in essence being orphaned from the rest of the encyclopedia. In articles, both of these are issues that could affect its visibility and discoverability, but for drafts that is the entire point. They are not ready for the spotlight that is indexing; if they were, they should not be in the namespace to begin with or should have been moved out of it already.

I have worked on three bots so far, each with vastly different purposes and two of the three performing multiple tasks. Unlike the others, TheSandBot is the most multipurpose. As was previously written about for Wikimedia UK, the bot’s first task was a temporary one moving articles and other pages.

With this in mind and armed with the statistics, — one percent is still far too high — I got to work on the second task for TheSandBot. The sole purpose of this task is to look for templates which should not be in drafts and remove them if present. So far, the bot only looks for the following four templates and any of their aliases (too many to list), but more could be added with minimal effort.

{{orphan}}
{{uncategorized}
{{underlinked}}
{{unreferenced}}

As of 17 December 2018, the task has been approved and is now set as an automatic cron job, running daily at 03:00 UTC (3am UTC, 7pm Pacific, 10pm EST). This task is different than all the rest that I have worked on. Whereas my others, while lacking a fixed start or end date, were temporary, this is my first task without an end date at all. So long as there is a need and this task is not shut off, it will run at that time for the rest of time.

I am not too sure where my bot work will take me next, but I am definitely excited for the future possibilities. Maybe I will finally be able to take over the Good Article review clerking duties from Legobot, which the current operator wishes to partially retire. Either that or I might find something else that needs fixing. Something always needs fixing on a project the size of Wikipedia. One thing I do know for sure though is that, at over 270,000 combined edits, my bots are closing in on having performed 300,000. This will most likely happen later this year.

_______________________

If you’re a developer working on the technical side of Wikimedia projects, there is a community of developers in the UK you can get help and advice from. Wikimedia UK will be running Wikidata meetup events every couple of months in London, and the best way to find out how to get involved in improving the Wikimedia projects is by talking to other developers. We also encourage Wikimediand to write for our blog about their work to encourage others to get involved. So get in touch if you have ideas!

 

Wikimedians are a bunch of silent nobodies, tireless and restless in doing what they love doing. Silent nobodies who stand for their passion, silent nobodies who stand for their ethnicity. But is there equity and diversity amongst the nobodies. It’s no secret there isn’t. Endless discussions, deliberations, debates to communicate, coordinate and command do not boil down the problem of having a gender-neutral Wikimedia. There are undying efforts though to say the least, it’s a priority for most of us. Most of the nobodies, most of the affiliates, most of those who are leading the initiatives admit, accept, acknowledge and appreciate the need to articulate this as a principal priority problem.

We at Wikimedia India, have been documenting the success stories every month for these silent nobodies. The stories of those silent editors who are, lesser known to the larger Indian and global communities. Recording stories since 2011, Featured Wikimedian of the month is a storytelling exercise, where every month a Wikimedian is identified and a short story is blogged. The idea is not only to facilitate the person for efforts but to build the moving spirit and motivation for others to follow.

In this edition , Wikimedia India will  featured three Women Wikimedians.


Here is a small story on how they are doing today from the time they were made the featured Wikimedians.

Nitesh Gill during Train The Trainer 2016

Nitesh Gill : Nitesh is a Wikimedian on Punjabi Projects and she was featured last March in 2018. Either it be then, or it be today, there has been no change to Nitesh’s dedication and commitment  toward Wikimedia. She was featured last March after having completed 365 day challenge on 28th February where for each of those 365 days she had written an article on Women biography. Today, that score for Nitesh has reached more than 700 and she is still unstoppable. She has an aim to reach 1000 articles at the moment but who knows the scores may keep rising as the enthusiasm is indefinite.

Nitesh was previously pursuing her scholarship in Mphil for Punjabi literature and has now successfully attained it. She believes for equality and justice, women’s really need to know about their rights at the first place.

Ananya Mondal

Ananya Modal : Popularly known as the Butterfly Wikimedian. The passionate Ananya Modal with User Name: Atudu is a clinical nutritionist who works on Bengali Wikipedia, Bengali Wikisource, Wikidata and uploads pictures of butterfly, also snakes and other fauna and flora too. Ananya who was featured in September 2016 had then written 90 articles and had an edit count of 9,000 +. Today, she has written more than 250 articles, has an edit count of more than 38,000+. Till date , over 300 butterfly species and subspecies have been photo documented (with more than 100 images in Valued Category ) and near about 200 articles on butterfly have been contributed. The numbers are galloping any moment and let’s not miss her contributions to Wikidata and Wikisource.

Her Wiki Loves Butterfly Project which earlier had a geographical base for the state of West Bengal has now reached entire North-East India. Growth in her leadership skills has led her in hosting Wiki Loves Monuments in India 2018 and being on the International Jury for Wiki Loves Earth Photo Contest in 2018.

In some of her initiatives with West Bengal Wikimedians User Group, she has  included two female participants from Assam , made them wiki contributors and mentored them to take fore-front role and even lead different fieldwork operations and they have been given utmost importance among experienced and expert fieldworkers. As a result , those two female wikimedians are developing the necessary skill and motivation.

Pankajamala Sarangi

Pankajamala Sarangi : Ask anyone in Indian community about Pankajamala who has a User Name:Pmsarangi and they will tell you she one of the most active contributors on Odia Wikisource. Yes ! That’s correct and for that reason she was featured wikimedian in January,2016. But that’s not everything. Her passion for Wikisource hasn’t been restricted to just Odia, she also one of the most active contributor on Hindi Wikisource (which is under incubation). She is helping out today Hindi Wikimedians, training them on one-to-one basis and also participated in Hindi Conference to make them understand, Hindi needs a Wikisource. Today, Hindi has fulfilled the requirements of the Language committee and at any time they could get their domain. User:Pmsarangi is one amongst those who deserve appreciation for the same.

She dreams to have at least one women from every family as an editor on Wikimedia Projects.

Written by Abhinav Srivastava

Edited by Yohann Thomas

Choosing tools for continuous integration

00:36, Thursday, 07 2019 March UTC

The Release Engineering team has started a working group to discuss and consider our future continuous integration tooling. Please help!

The RelEng team is working with SRE to build a continuous delivery and deployment pipeline, as well as changing production to run things in containers under Kubernetes. We aim to improve the process of making changes to software behind our various sites by making it take less effort, happen faster, be less risky, and as automated as possible. The developers will have a better development experience, be more empowered, and more productive.

Wikimedia has had a CI system for many years now, but is based on versions of tools that are reaching the end of their useful life. Those tools need to be upgraded, and this will probably require further changes due to how the new versions function. This is a good point to consider what tools and functionality we need and want.

The working group is tasked to consider the needs and wants, and evaluate the available options, and make a recommendation of what to use in the future. The deadline is March 25. The work is being documented at https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/CI_Futures_WG and we're currently collecting requirements and candidates to evaluate.

We would welcome any feedback on those! Via IRC (#wikimedia-pipeline), on the talk page of the working group's wiki page above, or as a comment to this blog post.

Tech News issue #10, 2019 (March 4, 2019)

00:00, Monday, 04 2019 March UTC
This document has a planned publication deadline (link leads to timeanddate.com).
TriangleArrow-Left.svgprevious 2019, week 10 (Monday 04 March 2019) nextTriangleArrow-Right.svg
Other languages:
Bahasa Indonesia • ‎Deutsch • ‎English • ‎español • ‎français • ‎italiano • ‎polski • ‎português do Brasil • ‎svenska • ‎čeština • ‎русский • ‎українська • ‎עברית • ‎العربية • ‎مصرى • ‎नेपाली • ‎हिन्दी • ‎中文 • ‎日本語

SIMD in WebAssembly – tales from the bleeding edge

21:54, Sunday, 03 2019 March UTC

While benchmarking the AV1 and VP9 video decoders in ogv.js, I bemoaned the lack of SIMD vector operations in WebAssembly. Native builds of these decoders lean heavily on SIMD (AVX and SSE for x86, Neon for ARM, etc) to perform operations on 8 or 16 pixels at once… Turns out there has been movement on the WebAssembly SIMD proposal after all!

Chrome’s V8 engine has implemented it (warning: somewhat buggy still), and the upstream LLVM Wasm code generator will generate code for it using clang’s vector operations and some intrinsic functions.

emscripten setup

The first step in your SIMD journey is to set up your emscripten development environment for the upstream compiler backend. Install emsdk via git — or update it if you’ve got an old copy.

Be sure to update the tags as well:

./emsdk update-tags

If you’re on Linux you can download a binary installation, but there’s a bug in emsdk that will cause it not to update. (Update: this was fixed a few days ago, so make sure to update your emsdk!)

./emsdk install latest-upstream
./emsdk activate latest-upstream

On Mac or Windows, or to install the latest upstream source on purpose, you can have it build the various tools from source. There’s not a convenient “sdk” catch-all tag for this that I can see, so you may need to call out all the bits:

./emsdk install emscripten-incoming-64bit
./emsdk activate emscripten-incoming-64bit
./emsdk install upstream-clang-master-64bit
./emsdk activate upstream-clang-master-64bit
./emsdk install binaryen-master-64bit
./emsdk activate binaryen-master-64bit

First build may take a couple hours or so, depending on your machine.

Re-running the install steps will update the git checkouts and re-build, which doesn’t take as long as a fresh build usually but can still take some time.

Upstream backend differences

Be warned that with the upstream backend, emscripten cannot build asm.js, only WebAssembly. If you’re making mixed JS & WebAssembly builds this may complicate your build process, because you have to switch back.

You can switch back to the current fastcomp backend at any time by swapping your sdk state back:

./emsdk install latest
./emsdk activate latest

Note that every time you switch, the cached libc will get rebuilt on your next emcc invocation.

Currently there are some code-gen issues where the upstream backend produces more local variables and such than the older fastcomp backend, which can cause a slight slowdown in code of a few % (and for me, a bigger slowdown in Safari which does particularly well with the old compiler’s output). This is being actively worked on, and is expected to improve significantly soon.

Starting Chrome

Now you’ve got a compiler; you’ll also need a browser to run your code in. Chrome’s V8 includes support behind an experimental runtime flag; currently it’s not exposed to the user interface so you must pass it on the command line.

I recommend using Chrome Canary on Mac or Windows, or a nightly Chromium build on Linux, to make sure you’ve got any fixes that may have come in recently.

On Mac, one can start it like so:

/Applications/Google\ Chrome\ Canary.app/Contents/MacOS/Google\ Chrome\ Canary --js-flags="--experimental-wasm-simd"

If you forget the command-line flag or get it wrong, the module compilation won’t validate the SIMD instructions and will throw an exception, so you can’t run by mistake. (This is also apparently how you’re meant to test for SIMD support presence, AFAIK… compile a module and see if it works?)

Beware there are a few serious bugs in the V8 implementation, which may trip you up. In particular watch out for broken splat which can produce non-deterministic errors. For this reason I recommend disabling autovectorization for now, since you have no control of workarounds on the clang end. Non-constant bit shifts also fail to validate, requiring a scalar workaround.

Vector ops in clang

If you’re not working in C/C++ but are implementing your own Wasm compiler or hand-writing Wasm source, skip this section! ;) You’ll want to checkout the SIMD proposal documentation for a list of available instructions.

First, forget anything you may have heard about emscripten including MMX compatibility headers (xmmintrin.h etc). They’ve been recently removed as they’re broken and misleading.

There’s also emscripten/vector.h but it seems obsolete as well with some references to functions that don’t exist that were from the old SIMD.js implementation, and I recommend avoiding it for now.

The good news is, a bunch of vector stuff “just works” using standard clang syntax, and there’s a few intrinsic functions for particular operations like the bitselect instruction and vector shuffling.

First, take some plain ol’ code and compile it with SIMD enabled:

emcc -o foo.html -O3 -s SIMD=1 -fno-vectorize foo.c

It’s important for now to disable autovectorization since it tends to break on the V8 splat bug for me. In the future, you’ll want to leave it on to squeeze out the occasional performance increase without manual intervention.

If you’re not using headers which predefine standard vector types for you, you can create some convenient aliases like so (these are just the types I’ve used so far):

typedef int16_t int16x8 __attribute((vector_size(16)));

typedef uint8_t uint8x16 __attribute((vector_size(16)));
typedef uint16_t uint16x8 __attribute((vector_size(16)));
typedef uint32_t uint32x4 __attribute((vector_size(16)));
typedef uint64_t uint64x2 __attribute((vector_size(16)));

The expected float and signed/unsigned integer interpretations for 128 bits are available, and you can freely cast between them to reinterpret bit sizes.

To work around bugs in the “splat” operation that expands a scalar to a vector, I made inline helper functions for myself:

static volatile int junk = 0;

static inline int16x8 splat_const(const int16_t val) {
// Use this only on constants due to the v8 splat bug!
return (int16x8){
val, val, val, val,
val, val, val, val
};
}

static inline int16x8 splat_vec(const int16_t val) {
// Try to work around issues with broken splat in V8
// by forcing the input to be something that won't be reused.
const int guarded = val + junk;
return (int16x8){
guarded, guarded, guarded, guarded,
guarded, guarded, guarded, guarded
};
}

Once the bug is fixed, it’ll be safe to remove the ‘junk’ and ‘guarded’ bits and use a single splat helper function. Though I’m sure there’s got to be a visibly clearer way to do a splat than manually writing out all the lanes and having the compiler coalesce them into a single splat operation? o_O

The bitselect operation is also frequently necessary, especially since convenient operations like min, max, and abs aren’t available on integer vectors. You might or might not be able to do this some cleaner way without the builtin, but this seems to work:

static inline int16x8 select_vec(const int16x8 cond,
const int16x8 a,
const int16x8 b) {
return (int16x8)__builtin_wasm_bitselect(a, b, cond);
}

Note the order of parameters on the bitselect instruction and the builtin has the condition last — I found this maddening so my helper function has the condition first where my code likes it.

You can now make your own vector abs function:

static inline int16x8 abs_vec(const int16x8 v) {
return select_vec(v < splat_const(0), -v, v);
}

Note the < and – operators “just work” on the vectors, we only needed helper functions for the bitselect and the splat. And I’m still not confident I need those two?

You’ll also likely need the vector shuffle operation, which there’s a standard builtin for. For instance here I’m deinterleaving 16-bit pixels into 8-bit pixels and the extra high bytes:

static inline uint8x16 merge_pixels(const int16x8 work) {
return (uint8x16)__builtin_shufflevector((uint8x16)work, (uint8x16)work,
0, 2, 4, 6, 8, 10, 12, 14, // the 8 pixels we worked on
1, 3, 5, 7, 9, 11, 13, 15 // zeroes we don't need
);
}

Checking compiler output

To figure out what’s going on it helps a lot to disassemble the WebAssembly output to confirm what instructions are actually emitted — especially if you’re hoping to report a bug. Refer to the SIMD proposal for details of instructions and types used.

If you compile with -g your .wasm output will include function names, which make it much easier to read the disassembly!

Use wasm-dis like so:

wasm-dis foo.wasm > foo.wat

Load up the .wat in your code editor of choice (there are syntax highlighting plugins available for VS Code and I think Atom etc) and search for your function in the mountain of stuff.

Note in particular that bit-shift operations currently can produce big sequences of lane-shuffling and scalar bit-shifts. This is due to the LLVM compiler working around a V8 bug with bit-shifts, and will be fixed soon I hope.

If you wish, you can modify the Wasm source in the .wat file and re-assemble it to test subtle changes — use wasm-as for this.

Reporting bugs

You probably will encounter bugs — this is very bleeding-edge stuff! The folks working on it want your feedback if you’re working in this area, so please make the most of it by providing reproducible test cases for any bugs you encounter that aren’t chalked down to the existing splat argument corruption and non-constant shift bugs.

And beware that until the splat bug is fixed, non-deterministic problems are really easy to pop up.

The various trackers:

weeklyOSM 449

09:51, Sunday, 03 2019 March UTC

19/02/2019-25/02/2019

Logo

Qwant launched an alpha version of an OSM based map 1 | © Qwant Maps © OpenMapTiles © OpenStreetMap contributors

About us

  • Are you reading something that belongs in weeklyOSM? Copy the link and write your message! Simply login into https://osmbc.openstreetmap.de/login using your OSM user account, use the guest account to submit the link and then write your contribution. This way you can help to make WeeklyOSM even better. Read more about how to write a post here.

Mapping

  • If the inventor of highway=unclassified had known how many discussions were going to be held about it, he/she would probably have named it differently. Florian Lohoff asked a question about the distinction between highway=unclassified and highway=residential and has received around 50 responses so far and still with an upward trend. (Nabble)
  • Stefan Keller presented (de) (automatic translation) an improvement to his webtool, which converts freely formatted opening hours copied from web pages into the syntax of opening_hours. As previously noted on the Swiss list(de) (automatic translation) he is seeking feedback.
  • Mapillary features an article about the lesser-known OSM tool, Deriviste. Deriviste, introduced by Richard Fairhurst in October 2018, allows users to create OSM data while looking at Mapillary images. For example, a click on an image is translated into a data point. The comprehensive blog post contains background information and details how Deriviste should be used, including practical examples.
  • Yuu Hayashi announced (ja) (automatic translation) a new proposal to change the Japanese road types. This proposal includes cutbacks of redundant tags and content improvement that corrects the gap between global standards and Japanese tagging. He asks for community feedback and is thinking about starting the vote on 10 March.

Community

  • OpenStreetMap France announces (fr) (automatic translation) on Twitter that a donation campaign has raised 10,081 €. The first use of this funds was to buy extra server RAM.
  • On the Talk-GB mailing list Frederik Ramm asks if anyone knows of a non-profit organisation that has moved from the UK to elsewhere (prompted by the “lack of clarity” over Brexit). Simon Poole elaborates some more reasons – including that the UK doesn’t have a form of incorporation that is well suited to a non-profit like OSM.
  • The nominations closed on 20 February 2019 for the board election of OSM US and five people will run for five seats.
  • The “Mapper of the Month” February 2019 chosen by OpenStreetMap Belgium is Volker Schmidt (OSM voschix). He has contributed to OSM for nearly 9 years and recently has been mainly active in Padova, where he currently lives.

OpenStreetMap Foundation

  • The minutes of the Licence Working Group meeting on 14 February 2019 have been published. Topics included the handling of inquiries from law enforcement agencies and a letter from the Belgian Ministry of Defence.
  • “Can you help make OpenStreetMap.org faster in Brazil, or Australia/New Zealand?” asks the Operations Working Group as the demand on OSM services is increasing and for example reaching around 65 Mbps in Brazil.
  • The OSMF Membership Working Group (MWG) uncovered the unusual signups in advance of the recent OSMF board elections and concluded that the signup of 100 employees of the Indian subsidiary of GlobalLogic (GL) was an orchestrated and directed campaign and that the applicants did not sign up voluntarily, personally and individually 1. The findings were sent to the OSMF board at the end of December and later shared with the membership by the MWG. In February GlobalLogic offered to withdraw the memberships following an online meeting between OSMF board and GlobalLogic representatives at the end of January. During the regular board meeting on 20 February 2019 the Board decided to accept GL’s offer to withdraw the memberships. The OSMF also made a summary of the events available to the public.

Humanitarian OSM

  • HOT has set up four mapping tasks after a 7.5-magnitude earthquake struck the Peru-Ecuador border region.
  • HOT is looking for a new logo for its HOT Summit 2019 in Heidelberg, Germany. The deadline for the participation in this contest is 6 March 2019.
  • HeiGIT reports on what’s going on with the MapSwipe app, a micro-tasking app for supporting mapping, for example in the humanitarian sector. A next step is, for example, the integration of the results of machine learning based building detection.
  • Melanie Eckle, from Heidelberg Institute for Geoinformation Technology (HeiGIT), has been invited to present HeiGIT as well as HOT during the Geo4SDGs at the Geospatial World Forum 2019 in Amsterdam. She will provide an overview of current HOT projects.
  • The minutes of the HOT board meeting on 7 February 2019 have been released.
  • The author Nicole Martinelli, who belongs to the team behind the Resiliency Maps website, published an article about the importance of open data and tools for disaster resilience. One interesting point is that she uses San Francisco as an example.
  • HOT announced the availability of an Instagram page, which you can follow to receive the latest updates.

Maps

  • [1] Qwant, “The search engine that respects your privacy”, based in Paris, has launched (fr) ( automatic translation) “Qwant-Maps“. The service is based on OSM data. Qwant follows OSMF’s copyright guidelines and promotes our project with an additional link to LearnOSM.

switch2OSM

  • On his blog Krzysztof Grajek explains the use of prefabricated docker images for setting up one’s own OSM tile server as an alternative to the Google Maps service, which can become expensive if your website has higher traffic volumes.

Open Data

  • The winners of the Open Data Day mini-grants, which are provided by the Open Knowledge organisation, have been announced. Some OSM related projects are amongst the winners.

Software

  • OsmAnd Online GPS Tracker, not to be confused with well known OsmAnd navigation, has been upgraded to version 0.4. The new version allows its users to share the track between specific times and to send their location with a timer, i.e. it stops sending the current location after a defined time.

Programming

  • In a tweet Julien Coupey announced his results of the recent Karlsruhe Hack Weekend, which look interesting even if you are not a Chinese postman.
  • The company Komoot, know for its OSM based cycling and hiking app and the geocoder Photon, has an open vacancy as Backend Engineer.
  • The German Federal Ministry of Transport and Digital Infrastructure is sponsoring the project TARDUR (de) (automatic translation) (Temporal Access Restrictions for Dynamic Ultra-Flexible Routing) of Heidelberg University and GraphHopper GmbH as part of its mFund programme. The project aims to improve the usage of access restrictions with temporal conditions in OSM.
  • mmd would like to introduce limits of 5000 tags per object and 32,000 members per relation for the OSM API. In the ensuing discussion, Simon Poole expresses his doubts about the introduction of “arbitrary” limits.

Releases

  • Quincy Morgan announced the release of the new version of iD. The highlights of version 2.14.0 are the introduction of live issue flagging and feature validation with the inclusion of recommended fixes, integration of Telenav’s ImproveOSM data detection tools to identify missing roads, as well as various other fixes and improvements. The full changelog can be found here.
  • QGIS has reached version 3.6. This version includes new features such as a new decorator for titles, map canvas copyright and title decorations can now be centred on top or bottom, and the requirement to select a feature before you can edit it. There are many more new and improved features that can be read about in the changelog.
  • The lightweight desktop GIS software Simple GIS (for Windows) has been upgraded. Version 11 allows you to build custom data forms, adds support for layer selection when converting pdf to Geotiff, can now deal with ArcGrid and File Geodb datasets, and adds many more new features.

Did you know …

  • … the possibility of recording the bus lines that serve a bus stop without having to record the complete route? The “route_ref” tag allows you to enter the line numbers at a bus stop and thus helps others to enter the complete route later.
  • … the swift LocalFocus-Batch-Geocoder? The geocoder uses open data from OSM, OpenAdresses, and Who’s On First via a copy-and-paste interface.
  • … the wiki page listing the maps generated from OSM?
  • … Joel Hansen’s quick guide to quickly drawing many buildings with JOSM and the buildings_tools plugin?

Other “geo” things

  • Mikel Maron, head of community at Mapbox and an OSMF board member, elaborated on the importance of mapping, open data and crowdsourcing efforts, such as HOT, in responding to natural disasters. Rapidly evolving technologies can help governments, communities, and aid agencies reduce the impact of natural disasters. He points out that on this basis the transport and logistics industry is in a position to support crisis initiatives proactively, promptly and precisely.

Upcoming Events

Where What When Country
Manila 【MapaTime!】 Open Data Day Celebration 2019-03-02 philippines
Amagasaki IODD:尼崎港線アーカイブダンジョン 2019-03-02 japan
Taipei Open Data Day Taiwan 2019 2019-03-02 taiwan
Wuppertal [Wuppertaler Opendata Day] 2019-03-02-2019-03-03 germany
London Missing Maps London Mapathon 2019-03-05 uk
Stuttgart Stuttgarter Stammtisch 2019-03-06 germany
Praha/Brno/Ostrava Kvartální pivo 2019-03-06 czech republic
Dresden Stammtisch Dresden 2019-03-07 germany
Nantes Réunion mensuelle 2019-03-07 france
Ivrea Incontro mensile 2019-03-09 italy
Oslo OSM-beer 2019-03-08 norway
Rennes Réunion mensuelle 2019-03-11 france
Zurich OSM Stammtisch Zurich 2019-03-11 switzerland
Lyon Rencontre mensuelle pour tous 2019-03-12 france
Salt Lake City SLC Mappy Hour 2019-03-12 united states
Arlon Espace public numérique d’Arlon – Formation Initiation 2019-03-12 belgium
Munich Münchner Stammtisch 2019-03-13 germany
Dresden FOSSGIS 2019 2019-03-13-2019-03-16 germany
Berlin 129. Berlin-Brandenburg Stammtisch 2019-03-14 germany
Kyoto 京都!街歩き!マッピングパーティ:第6回 善峯寺 2019-03-17 japan
Chemnitz Chemnitzer Linux-Tage 2019 2019-03-16-2019-03-17 germany
Taipei OSM x Wikidata #2 2019-03-18 taiwan
Cologne Bonn Airport Bonner Stammtisch 2019-03-19 germany
Nottingham East Midlands Pub meetup 2019-03-19 england
Scotland Edinburgh 2019-03-19 uk
Salt Lake City SLC Map Night 2019-03-19 united states
Lüneburg Lüneburger Mappertreffen 2019-03-19 germany
Toulouse Rencontre mensuelle 2019-03-20 france
Karlsruhe Stammtisch 2019-03-20 germany
Nagoya 図書で調べて編集するオープンデータワークショップ 2019-03-21 japan
Greater Vancouver area Metrotown mappy Hour 2019-03-22 canada
Tokyo ミャンマーに絵本と地図を届けよう~ミャンマーに届ける翻訳絵本作り&自由な世界地図作り~ 2019-03-23 japan
Portmarnock Erasmus+ EuYoutH_OSM Meeting 2019-03-25-2019-03-29 ireland
Montpellier State of the Map France 2019 2019-06-14-2019-06-16 france
Angra do Heroísmo Erasmus+ EuYoutH_OSM Meeting 2019-06-24-2019-06-29 portugal
Minneapolis State of the Map US 2019 2019-09-06-2019-09-08 united states
Edinburgh FOSS4GUK 2019 2019-09-18-2019-09-21 united kingdom
Heidelberg Erasmus+ EuYoutH_OSM Meeting 2019-09-18-2019-09-23 germany
Heidelberg HOT Summit 2019 2019-09-19-2019-09-20 germany
Heidelberg State of the Map 2019 (international conference) 2019-09-21-2019-09-23 germany
Grand-Bassam State of the Map Africa 2019 2019-11-22-2019-11-24 ivory coast

Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please check your event in our public calendar preview and correct it, where appropriate.

This weeklyOSM was produced by Nakaner, Polyglot, Rainero, Rogehm, Sheeplieder, SunCobalt, TheSwavu, YoViajo, derFred, jinalfoflia, kartonage, keithonearth.

Readable Functions: Minimize State

15:32, Friday, 01 2019 March UTC

Several tricks and heuristics that I apply to write easy to understand functions keep coming up when I look at other peoples code. In this post I share the first of two key principles to writing readable functions. Follow up posts will contain the second key principle and specific tricks, often building on these general principles.

What makes functional programming so powerful? Why do developers that have mastered it say it makes them so much more productive? What amazing features or capabilities does the functional paradigm provide to enable this enhanced productivity? The answer is not what you might expect if you never looked into functional programming. The power of the functional paradigm does not come from new functionality, it comes from restricting something we are all familiar with: mutable state. By minimizing or altogether avoiding mutable state, functional programs skip a great source of complexity, thus becoming easier to understand and work with.

Minimize Mutability

If you are doing Object Orientated Programming you are hopefully aware of the drawbacks of having mutable objects. Similar drawbacks apply to mutable state within function scope, even if those functions are part of a procedural program. Consider the below PHP code snippet:

function getThing() {
    var $thing = 'default';
    
    if (someCondition()) {
        $thing = 'special case';
    }
    
    return $thing;
}

This function is needlessly complex because of mutable state. The variable $thing is in scope in the entire function and it gets modified. Thus to understand the function you need to keep track of the value that was assigned and how that value might get modified/overridden. This mental overhead can easily be avoided by using what is called a Guard Clause:

function getThing() {
    if (someCondition()) {
        return 'special case';
    }
    
    return 'default';
}

This code snippet is easier to understand because there is no state. The less state, the less things you need to remember while simulating the function in your head. Even though the logic in these code snippets is trivial, you can already notice how the Accidental Complexity created by the mutable state makes understanding the code take more time and effort. It pays to write your functions in a functional manner even if you are not doing functional programming.

Minimize Scope

While mutable state is particularly harmful, non-mutable state also comes with a cost. What is the return value of this function?

function getThing() {
    $foo = 1;
    $bar = 2;
    $baz = 3;

    $meh = $foo + $baz * 2;
    $baz = square($meh);

    print($baz);
    return $bar;
}

It is a lot easier to tell what the return value is when refactored as follows:

function getThing() {
    $foo = 1;
    $baz = 3;

    $meh = $foo + $baz * 2;
    $baz = square($meh);
    print($baz);

    $bar = 2;
    return $bar;
}

To understand the return value you need to know where the last assignment to $bar happened. In the first snippet you, for no reason at all, need to scan up all the way to the first lines of the function. You can avoid this by minimizing the scope of $bar. This is especially important if, like in PHP, you cannot declare function scope values as constants. In the first snippet you likely spotted that $bar = 2 before you went through the irrelevant details that follow. If instead the code had been const bar = 2 like you can do in JavaScript, you would not have needed to make that effort.

Conclusion

With this understanding we arrive at two guidelines for scope in functions that you can’t avoid altogether in the first place. Thou shalt:

  • Minimize mutability
  • Minimize scope

Indeed, these are two very general directives that you can apply in many other areas of software design. Keep in mind that these are just guidelines that serve as a starting point. Sometimes a little state or mutability can help readability.

To minimize scope, create it as close as possible to where it is needed. The worst thing you can do is declare all state at the start of a function, as this maximizes scope. Yes, I’m looking at you JavaScript developers and university professors. If you find yourself in a team or community that follows the practice of declaring all variables at the start of a function, I recommend not going along with this custom because its harmful nature outweighs “consistency” and “tradition” benefits.

To minimize mutability, stop every time you are about to override a variable and ask yourself if you cannot simplify the code. The answer is nearly always that you can via tricks such as Guard Clauses, many of which I will share in follow up posts. I myself rarely end up mutating variables, less than once per thousand 1000 lines of code. Because each removal of harmful mutability makes your code easier to work with you reap the benefits incrementally and can start applying this style right away. If you are lucky enough to work with a language that has constants in function scopes, use them as default instead of variables.

See also

Thanks to Gabriel Birke for proofreading and making some suggestions.

The post Readable Functions: Minimize State appeared first on Entropy Wins.

Clean Architecture + Bounded Contexts

07:12, Friday, 01 2019 March UTC

In this follow-up to Implementing the Clean Architecture I introduce you to a combination of The Clean Architecture and the strategic DDD pattern known as Bounded Contexts.

At Wikimedia Deutschland we use this combination of The Clean Architecture and Bounded Contexts for our fundraising applications. In this post I describe the structure we have and the architectural rules we follow in the abstract. For the story on how we got to this point and a more concrete description, see my post Bounded Contexts in the Wikimedia Fundraising Software. In that post and at the end of this one I link you to a real-world codebase that follows the abstract rules described in this post.

If you are not yet familiar with The Clean Architecture, please first read Implementing the Clean Architecture.

Clean Architecture + Bounded Contexts

Diagram by Jeroen De Dauw, Charlie Kritschmar, Jan Dittrich and Hanna Petruschat. Click image to enlarge

Diagram depicting Clean Architecture + Bounded Contexts

In the top layer of the diagram we have applications. These can be web applications, they can be console applications, they can be monoliths, they can be microservices, etc. Each application has presentation code which in bigger applications tends to reside in a decoupled presentation layer using patterns such as presenters. All applications also somehow construct the dependency graph they need, perhaps using a Dependency Injection Container or set of factories. Often this involves reading configuration from somewhere. The applications contain ALL framework binding, hence they are the place where you will find the Controllers if you are using a typical web framework.

Since the applications are in the top layer, and dependencies can only go down, no code outside of the applications is allowed to depend on code in the applications. That means there is 0 binding to mechanisms such as frameworks and presentation code outside of the applications.

In the second layer we have the Bounded Contexts. Ideally one Bounded Context per subdomain. At the core of each BC we have the Domain Model and Domain Services, containing the business logic part of the subdomain. Dependencies can only point inwards, so the Domain model which is at the center cannot depend on anything more to the outside. Around the Domain Model are the Domain Services. These include interfaces for persistence services such as Repositories. The UseCases form the final ring. They can use both the Domain Model and the Domain Services. They also form a boundary around the two, meaning that no code outside of the Bounded Context is allowed to talk to the Domain Model or Domain Services.

The Bounded Contexts include their own Persistence Layer. The Persistence Layer can use a relational database, files on the file system, a remote web API, a combination of these, etc. It has implementations of domain services such as Repositories which are used by the UseCases. These implementations are the only thing that is allowed to talk to and know about the low-level aspects of the Persistence Layer. The only things that can use these service implementations are other Domain Services and the UseCases.

The UseCases, including their Request Models and Response Models, form the public interface of the Bounded Context. This means that there is 0 binding to the persistence mechanisms outside of the Bounded Context. It also means that the code responsible for the domain logic cannot be directly accessed elsewhere, such as in the presentation layer of an application.

The applications and Bounded Contexts contain all the domain specific code. This code can make use of libraries and of course the runtime (ie PHP) itself.

As examples of Bounded Contexts following this approach, see the Donation Context and Membership Context. For an application following this architecture, see the FundraisingFrontend, which uses both the Donation Context and Membership Context. Both these contexts are also used by another application the code of which sadly enough is not currently public. You can also read the stories of how we rewrote the FundraisingFontend to use the Clean Architecture and how we refactored towards Bounded Contexts.

Further reading

Sign up below to receive news on my upcoming Clean Architecture book, including a discount:

If you are not yet familiar with Bounded Contexts or how to design them well, I recommend reading Domain-Driven Design Distilled.

The post Clean Architecture + Bounded Contexts appeared first on Entropy Wins.

Debugging production with X-Wikimedia-Debug

03:29, Friday, 01 2019 March UTC

In February 2018, a user reported that some topics created by users on Flow discussion boards were not appearing in the Recent Changes feeds, including EventStreams and the IRC-RC feed. Various automated patrol systems rely on EventStreams, so the bug meant a number of edits bypassed those systems on Flow-enabled wikis.

When approaching a bug like this, there are typically three things I do:

  1. Determine the steps to reproduce the bug. That was already done by the task author (thank you @Rxy!) and then confirmed by other contributors to the task (h/t @Krinkle, @Etonkovidova)
  2. Attempt to reproduce the issue locally and set breakpoints in code to understand why the problem occurs
  3. Check the production logs to look for any messages related to the bug report

Unfortunately the problem was not reproducible in the MediaWiki Vagrant development environment. Nor were there any relevant messages in the logs. Since reproducing the issue locally wasn't possible, we merged some diagnostic code but still had nothing. Early on, @SBisson suggested a hypothesis about the code path involved in emitting the event:

if ( user is trusted ) 
  return true
else
  let's load the revision from replica, return true based on the the status of the revision
  oh it doesn't exist (yet), return false

But we could not reproduce this, nor could we identify exactly where this might occur since the code paths for this functionality had many points where execution could stop silently.

Enter X-Wikimedia-Debug

One of the useful tools in our stack is the X-Wikimedia-Debug header. I knew about this header (and its browser extensions) from verifying changes that were being SWAT'ed into production but I had not thought to use it for tracking down a production bug.

I was using the browser extension with the "Log" checkbox ticked (and still not finding anything useful in Logstash to help isolate this bug) when I realized that I could also profile the problematic request. When you check the box to profile a request, XHProf will profile the code that's executed and make the result available for viewing via XHGui.

Typically you do this to understand performance bottlenecks in your code, as get a complete list of all functions executed during the request, along with the time and memory usage associated with each function.

I followed the steps to reproduce and then switched on the "Profile" option before posting a new topic on an empty Flow board. Now, I had a profiled request which provided me with information on all the methods called, including which method called another (click on a method call to see its parent and children method calls). From here I could follow the path traversed by Flow's event emitting code, and see exactly where the code execution halted.

Reproducing the bug locally

With this knowledge, I went back to my local environment, this time using MediaWiki-Docker-Dev, which has database replication set up as part of its stack (MediaWiki Vagrant does not). I set some breakpoints in the code I suspected was causing the problem, and then found that in RevisionActionPermissions.php#isBoardAllowed(), we had this code:

$allowed = $this->user->isAllowedAny( ...(array)$permissions );
if ( $allowed ) {
    return true;
}
return !$workflow->isDeleted();

For a new topic on a blank flow board, $permissions is deletedtext, which would return true for privileged users. But for unprivileged users, Flow would check !$workflow->isDeleted();, and this evaluated as false because the code was querying the database replica, and the title did not exist there yet.

The submitted solution was to patch isDeleted() to query the master DB when in the context of a POST request, since we know the title would exist in the master DB. With this patch in place, events were once again emitted properly and the bug was fixed.

Conclusion

A few of my conclusions from this experience:

  • If you're having difficulty tracking down the code path, consider using the profiler in the X-Wikimedia-Debug browser extension
  • Diagnostic code is helpful (even if it didn't pinpoint the problem here) and debug level logging should be considered instead of silent returns
  • Having database replication in your local development environment can help catch issues while developing and when attempting to reproduce a production issue. One can use the MediaWiki-Docker-Dev environment for this, and see also how to adjust its database replication lag.

Kosta Harlan
Senior Software Engineer
Growth Team


Learn more about the X-Wikimedia-Debug header and browser extension on Wikitech.

After a long legislative process, the final text of the EU Copyright Directive was cemented last week as trilogue negotiations between the EU Commision, Parliament, and Council came to a close. Now that the final text has been made available, with only a yes-no vote in Parliament standing in the way of its implementation, Wikimedia has determined that we cannot support the reform as is. Here’s why.

The evolution of the directive’s text

Over the past few years, we have spoken out against problematic parts of the proposed EU Copyright Directive. Initially, we were hopeful. Our community was supportive of reform, engaging with the European Commission before the directive was proposed, and with MEPs and Member States representatives early on about what they hoped to see in the directive. Among these asks were a broad freedom of panorama exception, so that photographers can freely take and share photographs of public art and buildings, and a greater harmonization of rules around the public domain, so that faithful reproductions of public domain works would not be subject to new rights.

Yet, the EU Commission presented a one-sided proposal and added worrying elements to the directive. As the community saw their suggestions set aside in favor of provisions benefiting large rightsholder industries and news publishers, criticism of the directive grew. The two most harmful provisions, Articles 11 and 13, have stuck around despite that criticism and are now a part of the final text that the EU Parliament and EU Council concluded last week. Although some good elements were included in the reform package, it is impossible for Wikimedia to support a reform that includes these two articles. In a final step, the EU Copyright Directive will return to the parliament this spring for its last vote.

Despite carve outs, a net loss for free knowledge

Article 11, which is aimed at news aggregators but has a much wider reach, will require licenses for all online uses of news content apart from a few exceptions. This means that websites which aggregate, organize, or make sense of the news will no longer be able to display snippets alongside those articles, making it much harder for users to find and use information online. We are thankful that Article 11 at least includes a few exceptions, for individual, non-profit uses or “individual words” or “very short extracts”.  However, by making information harder to find online, Article 11 affects our volunteer contributor’s ability to improve Wikipedia, especially with European-specific sources.

Article 13 will impose liability on platforms for copyright-infringing content uploaded by users unless they meet a number of stringent requirements. The provision requires that websites make “best efforts” to obtain authorization for all content on their websites as well as quickly removing infringing content and preventing its re-upload. These are tall tasks for any platform that allows a large number of users to upload content, and only the most sophisticated and well-funded websites will be able to develop the technology to enforce these rules themselves. This will dramatically decrease the diversity of content available online if websites strictly comply with these requirements, and it sets up a system for private enforcement of copyright through upload filters that can lead to over-removal of content due to fear of liability and false positives. As content outside of Wikipedia shrinks, so will the depth, accuracy, and quality of Wikipedia’s content. We rely on the outside world to build our collaborative encyclopedia, and what affects the internet ecosystem as a whole affects Wikipedia, regardless of direct legal carve-outs.

Still, given the uphill battle we have faced, the free knowledge community can be proud of the impact that they had on the reform so far. The current text includes a broad exception for text and data mining, a digitization safeguard for public domain works, an out-of-commerce works provision that will make more cultural heritage content available online, and a carve-out in an attempt to limit the text’s detrimental effect on non-commercial projects.

These are good measures and they aim to do what the reform was originally set out to do: bring century old legislation in line with the digital future we are facing. They also painfully remind us that the other parts of this reform are not future-oriented.

Bottom line: free knowledge impacts more than just Wikipedia

We will be asked why we are unhappy with this reform if certain non-commercial projects are mostly carved out and we can even point to some improvements for the public domain. Well, these measures don’t make this a good or balanced reform. Despite some good intentions, the wholly problematic inclusion of Articles 11 and 13 mean that fundamental principles of knowledge sharing are overturned: in practice users and projects will have to prove they are allowed to share knowledge before a platform permits an upload. The EU Copyright Directive envisions a technical and legal infrastructure that treats user generated content with suspicion unless proved legal. We cannot support this—it is better to have no reform at all, than to have one including these toxic provisions.

There will be one final yes-no vote on the directive in Parliament, to take place in late March. This vote is one last chance for the Wikimedia community in Europe to tell the European Parliament that they will not stand for a copyright reform that hands out exceptions to the open community without considering the entire internet ecosystem as a whole. At this point in this legislative process, it is past the time for amendments and negotiations. The European Parliament should reject the whole reform. With a text this controversial and many MEPs up for re-election in May, it would be prudent to reject the proposal as it stands and continue to work on a solution with a fresh mandate.

It’s not too late for Europe to have a positive copyright reform, but it will be soon. That is why our affiliates across Europe are organizing their communities to take action. Find out more about their progress and actions.

Allison Davenport, Technology Law and Policy Fellow, Legal
Wikimedia Foundation

Learn more about the EU Copyright Directive:

A powerful online resource for digital marketing

13:33, Thursday, 28 2019 February UTC

The Munich software company Ryte has been using a BlueSpice 3 wiki from Hallo Welt! GmbH for its reference work on digital marketing. The resources maintained in this way are regularly cited in academic theses and make up a large part of the sessions logged on the Ryte website.

“At the start, our developers simply built and maintained a wiki just for us. We had nothing like we have now,” explains Pauline Mitifiot. She is a marketing expert at Ryte GmbH in Munich, a SaaS (Software as a Service) company founded in 2012. Now, Ryte’s wiki has become an important treasury, a leading online resource of its type. In February 2018, the encyclopaedia was transferred to a BlueSpice MediaWiki wiki from Hallo Welt! GmbH. The reason: “the wiki had started to consume too much of our developers’ time and energy. And we needed them to concentrate fully on our product,” said Mitifiot.

 

Ryte – one of the quickest growing tech companies in Europe

Ryte’s success began in 2012 with the founding of OnPage.org and the provision of cloud-based software focussing on sustainable improvements and increased performance for websites. After rebranding in 2017, Ryte took a further step forward: the Ryte Suite with its three tools, “Website Success”, “Content Success” and “Search Success”, offers professional tools for you to comprehensively monitor, analyse and optimise your websites. “Ryte combines into one tool everything you need to simply and lastingly optimise websites,” explains Pauline Mitifiot.

Now more than 700,000 users are taking advantage of the award-winning software, including well-known companies like Allianz, Daimler, Sixt and Burda to mention just a few. With over 60 members of staff and, alongside their base in Munich, branches in Madrid and Ho Chi Minh City, Ryte is one of the quickest growing tech companies in Europe.

 

The office of Ryte GmbH in Munich
The office of Ryte GmbH in Munich

 

A reference work cited at universities

As soon as the company was founded in 2012, work began on the building of a reference work on everything related to digital marketing. Though at the beginning, it was mainly the company’s employees themselves which used the encyclopaedia, over time more and more external users started to access it. “Our entries are often cited in master’s theses at universities.”

As the encyclopaedia became more and more successful, the decision was quickly made to transfer it to a wiki. “We needed a tool which was technically simple with a well-organised and easy to understand range of functions. It had to provide an easily operated search function and not only make our resources more attractive but also offer the external user the opportunity to work with them,” said Mitifiot. External expertise was needed for this and while searching for a competent wikimedia agency, Ryte hit upon Hallo Welt! GmbH in Regensburg.

 

Multilingual wiki with about 1,500 pages

Since February 2018, there has been the new Ryte wiki, based on BlueSpice 3 software, tailored to Ryte’s corporate design, which can so far be accessed in four languages: German, English, Spanish and French. Five employees from Ryte’s marketing team have admin permissions so they can update content and proof read the contributions of external users. A reliable point of contact at Hallo Welt! guarantees the technical support required.

The wiki contains about 1,500 pages where you can find terminology, exact definitions and further interesting aspects of digital marketing, classified into seven categories: online marketing, search machine optimization, social media, usability, mobile marketing, web analysis and development. “It is particularly important for us that our encyclopaedia, in accordance with with the wiki code, does not contain any advertising,” emphasises Pauline Mitifiot. “This not only means that only credible and relevant data is included, but also that we have a very high positioning with search engines.” (sa – 02/2019)

Ryte GmbH
Paul-Heyse-Str. 27
80336 München
Germany
Phone: +49 (0) 89 416 1151 0
www.ryte.com

Your contact partner on the subject of success stories with Hallo Welt! GmbH:
David Schweiger
Telephone: +49 (0)941 660 800
www.bluespice.com

Interested in BlueSpice pro? For a 30 day trial click here:
bluespice.com/bluespice-pro-evaluation

The post A powerful online resource for digital marketing appeared first on BlueSpice Blog.

Well-organised knowledge management for about 2,000 users

12:46, Thursday, 28 2019 February UTC

Task breakdown, functionality, examples and training material: seven German states embrace a BlueSpice MediaWiki from Hallo Welt! GmbH, which makes it easy for members of staff to use a shared editing and information system on land consolidation and explains all the topics that arise.

It is a mammoth project that was undertaken about seven years ago by the German states Brandenburg, Hesse, Mecklenburg-Vorpommern, Lower Saxony, North Rhine-Westphalia, Rhineland-Palatinate and Saxony-Anhalt: the implementation of a joint editing and information system for all issues relating to land consolidation (Flurbereinigung), working with a uniform data set and the technical standards connected
with it.

The states took delivery of the system, named with the German acronym LEFIS, in December 2015. Some states have been using it since 2018, and others are just about to start. In the final expansion phase, about 2,000 people will work with the system – administrators, technicians, project managers and departmental heads, planners and engineering firms who work with land consolidation.

“When, for example, a new street is being built, we need to list the individual pieces of land, match them with their owners and evaluate them so that the owners can be properly compensated and suitable replacement areas can be put together,” explains Ralf Meier. He is a surveyor at the Service Centre for Rural Development and Agricultural Support in the state Lower Saxony within the “Land Consolidation and Geoinformation” department.

 

The same knowledge at the same time for all

It has been four years now since Meier started the introduction of a Media Wiki in which the functionality of LEFIS is collected and explained with concrete examples. Meier lists the main arguments for choosing a wiki as its great flexibility and customisability. “With it, we can keep all the knowledge completely up-to-date for all. When we make a change, it is immediately available to everyone. By looking at the history, everyone can see clearly when what was changed.” During the search for a provider, Hallo Welt! GmbH with its BlueSpice MediaWiki came out ahead of the competition.

 

screenshot bluespice mediawiki
Calling a function in the “LEFIS” application.

 

From installation to concrete use-cases

“This starts with information on installing the program, then includes details on the general functions, gives concrete examples, includes use-cases which are specific to individual states and, last but not least: training material,” explains Meier. “The principle is that our wiki offers knowledge management for the user, who can then find the information they need according to what they already understand.” The book navigation system is used to structure the contents in the wiki, which is offered in BlueSpice MediaWiki with the extension “Bookmaker”. This places the contents in a hierarchy of chapters and subchapters. Meier describes it in this way:

“At the start, there is a concrete definition of the task, then there is a link to a description of the software’s functionality. Those who want to know more about this can look at concrete examples, where necessary with photos of the LEFIS masks, and so be able to trace through how the project needs to be carried out in detail.” Here, great emphasis is placed on user-friendliness. One of the ways this is achieved is with a “create function”. Using this, the user can create a new page without having to understand the deeper wiki structure. The structure is defined by the templates and categories provided.

 

Topic hierarchy instead of long running texts

The whole system functions very clearly with expandable drop-down menus and clickable info buttons. “In the PDF handbook, you had to tortuously hunt through long texts, and sometimes search in hundreds of documents to find what you wanted to know. In the wiki, you only read the content you need and want. You can progress from brief descriptions down into the detail, for example concrete use-cases, when one needs such details.”

Put simply, for the task “import data from the land register management”, some users only need to see the heading of the task description, while others want to know which entries need to be made in the function dialogue. Depending on what they already understand, the task description may be enough for them, or they may need to dive deeper into the functional details and perhaps even take into account state-specific cases. If that is not enough, then the member of staff can see an example for each step of the work. The structures are defined. The contents are being developed.

 

“The application continues to develop and grow and so does our wiki.”

As a basic principle, all users are authors in the wiki: everyone has the permissions to add new examples. Additionally there are supervisors, specialist admins who have more permissions for describing the software and setting out state-specific rules, and three wiki admins who take care of the networking, copy-editing and the structure of contents. They are supported with training courses and workshops from Hallo Welt! GmbH. There is a four-person group “AG help” who the users can turn to in case of doubt. “Just as the application continues to develop and grow, so does our wiki,” says Meier. Every time the software is updated, the functional descriptions need to be adapted, and new practical examples need to be added all the time. “The work never really ends. This is really in the nature of things with such a complex application as LEFIS. But in the end, whatever questions a user has, they should be answered in the wiki.” (sa – 03/2019)

Servicezentrum Landentwicklung und Agrarförderung
Wiesenstraße 1
30169 Hannover
Germany
Telephone: +49 (0) 511 30245 704
www.sla.niedersachsen.de

Your contact partner on the subject of success stories with Hallo Welt! GmbH:

David Schweiger
Telephone: +49 (0)941 660 800
www.bluespice.com

Interested in BlueSpice pro? For a 30 day trial click here:
bluespice.com/bluespice-pro-evaluation

The post Well-organised knowledge management for about 2,000 users appeared first on BlueSpice Blog.

Wikimania 2019 will take place on 14–18 August at Stockholm University. Hosted by Wikimedia Sweden, the conference will bring together leaders within free knowledge spaces across the world for five days of pioneering discussions on the role of Wikimedia and free knowledge in fulfilling the Sustainable Development Goals (SDGs) outlined by the United Nations in its 2030 Agenda for Sustainable Development.

Why we chose this theme

In September 2015, all United Nations member states in the world adopted a “universal call to action to end poverty, protect the planet, and ensure that all people enjoy peace and prosperity.” Seventeen SDGs were announced, all of them aiming toward a sustainable world.

These SDGs have given people, organizations, and governments across the world a shared framework for their work, efforts, and investments. The goals include the right to quality education, a reduction of gender inequalities, ensuring innovations, empowering people regardless of their background, supporting the development of global partnerships and achieving sustainable societies.

They correspond well to Wikimedia’s vision of creating a world “in which every single human being can freely share in the sum of all knowledge.” By inviting the SDGs as a thematic framework for Wikimania 2019, we hope to encourage attendees to reflect on the broader implications of Wikimedia’s work within the global context and discuss how our movement can contribute to achieving the SDGs.

This year’s theme will be seen throughout the program, as session leaders will be asked to discuss their projects’ impact on achieving the global goals.

“The Global Goals can only be met if we work together. To build a better world, we need to be supportive, empathetic, inventive, passionate, and, above all, cooperative.”

Goal 17 of the SDGs.

Michael Peter Edson, the co-founder and Associate Director of The Museum for the United Nations — UN Live, will deliver a keynote address at the conference.

“The thing I love about the SDGs is that they are shared goals,” he says. “They don’t belong to me. They don’t belong to you. They belong to everyone. And they require effort from everyone if we are going to succeed. The SDGs are there exactly to help groups such as Wikimedia bring stakeholders together to find new ways to take action to create a sustainable future.”

We look forward to discussing the goals, and to discussing ways to work together to create a sustainable future. See you in Sweden!

Eric Luth, Conference Manager
Wikimedia Sweden (Sverige)

For more information on this year’s theme, see Wikimania’s website, a FAQ, and the UN’s home for the sustainable development goals.



datarep:

Languages Analysis: Number of Wikipedia Articles -VS- Native Speakers Population

A lot of those large green dots showing languages with a high ratio of Wikipedia articles per speaker are languages where people have been using automated tools to create Wikipedia articles (even though those articles are often very short). It might be interesting to combine the total number of articles with average article length to provide a more nuanced look at how well-represented various languages are on Wikipedia, but this is still an interesting visualization! (And one I definitely wish I’d had when writing this Wired article about underrepresented languages online.) 

More from the graph’s creator about what the visuals mean: 

Bubble size depends on the ratio “number of wikipedia articles”/“number native speakers”. For example in Swedish there are more wikipedia articles per native Swedish speaker than for English wikipedia articles per native English speaker. The color scaling function on bubbles it’s there to help viewers to distinguish between bubble sizes, but it doesn’t bring any extra information.

Languages from poor countries like Tigrigna from Ethiopia and Eritrea (Africa) are underrepresented in wikipedia. Interestingly languages from small European countries like Sweden, Netherlands, Scotland, Catalonia, Basque Country,… are among the highest in terms of wikipedia activity.

ogv.js 1.6.0 released with experimental AV1 decoding

23:29, Tuesday, 26 2019 February UTC

After some additional fixes and experiments I’ve tagged ogv.js 1.6.0 and released it. As usual you can use ‘ogv’ package on npm or fetch the zip manually. This includes various fixes, including for some weird bugs!, and performance improvements on lower-end machines. Internals have been partially refactored to aid future maintenance, and experimental AV1 decoding has been added using VideoLAN’s dav1d decoder.

dav1d and SIMD

The dav1d AV1 decoder is now working pretty solidly, but slowly. I found that my test files were encoded at too high a quality and dialed them back to my actual target bitrate and find that performance improves as a consequence, so hey! Not bad. ;)

I’ve worked around a minor compiler issue in emscripten’s old “fastcomp” asm.js->wasm backend where an inner loop didn’t get unrolled, which improves decode performance by a couple percent. Upstream prefers to let the unroll be implicit, so I’m keeping this patch in a local fork for now.

I’ve also been reached out to by some folks working on the WebAssembly SIMD proposal, which should allow speeding up some of the slow filtering operations with optimized vector code! The only browser implementation of the proposal (which remains a bit controversial) is currently Chrome, with an experimental command-line flag, and the updated vectorization code is in the new WebAssembly compiler backend that’s integrated with upstream LLVM.

So I spent some time getting up and running on the new LLVM backend for emscripten, found a few issues:

  • emsdk doesn’t update the LLVM download properly so you can get stuck on an old version and be very confused — this is being fixed shortly!
  • currently it’s hard to use a single sdk installation for both modes at once, and asm.js compilation requires the old backend. So I’ve temporarily disabled the asm.js builds on my simd2 work branch.
  • multithreaded builds are broken atm (at least modularized, which only just got fixed on the main compiler so might need fixes for llvm backend)
  • use of atomics intrinsics in a non-multithreaded build results in a validation error, whereas it had been silently turned into something safe in the old backend. I had to patch dav1d with a “fake atomics” option to #define them away.
  • Non-SIMD builds were coming out with data corruption, which I tracked down to an optimizer bug which had just been fixed upstream the day before I reported it. ;)
  • I haven’t gotten as far as working with any of the SIMD intrinsics, because I’m getting a memory access out of bounds issue when engaging the autovectorizer. I narrowed down a test case with the first error and have reported it; not sure yet whether the problem is in the compiler or in Chrome/V8.

In theory autovectorization is likely to not do much, but significant gains could be made using intrinsics… but only so much, as the operations available are limited and it’s hard to tell what will be efficient or not.

Intermittent file breakages

Very rarely, some files would just break at a certain position in the file for reasons I couldn’t explain. I had one turn up during AV1 testing where a particular video packet that contained two frame OBUs had one OBU header appear ok and the second obviously corrupted. I tracked the corruption back from the codec to the demuxer to the demuxer’s input buffer to my StreamFile abstraction used for loading data from a seekable URL.

Turned out that the offending packet straddled a boundary between HTTP requests — between the second and third megabytes of the file, each requested as a separate Range-based XMLHttpRequest, downloaded as binary strings so the data can be accessed during progress events. But according to the network panel, the second and third megabytes looked fine…. but the *following* request turned up as 512 KiB. …What?

Dumping the binary strings of the second and third megabytes, I immediately realized what was wrong:

Enjoy some tasty binary strings!

The first requests were as expected showing 8-bit characters (ASCII and control chars etc). The request with the broken packet was showing CJK characters indicating the string had probably been misinterpreted as UTF-16

It didn’t take much longer to confirm that the first two bytes of the broken request were 0xFE 0xFF, a UTF-16 Byte Order Mark. This apparently overrides the “overrideMimeType” method’s x-user-defined charset, and there’s no way to override it back. Hypothetically you could probably detect the case and swap bytes back but I think it’s not actually worth it to do full streaming downloads within chunks for the player — it’s better to buffer ahead so you can play reliably.

For now I’ve switched it to use ArrayBuffer XHRs instead of binary strings, which avoids the encoding problem but means data can’t be accessed until each chunk has finished downloading.

Minimal MediaWiki for frontend engineers

21:28, Tuesday, 26 2019 February UTC

I use OSX. Vagrant has not been kind to me, but I'm hopeful that Docker will make development a lot easier for me in future.
Until then, I use MAMP which provides a pretty easy LAMP setup. I wanted to share it with other frontend engineers as this minimal setup works well for me - it's fast, it minimises the extensions I need to update and most importantly brings me closer to problems with frontend end-users are experiencing.

MAMP replicating Wikimedia paths

In MAMP I have the web server set to apache and use a symlink to point the wiki folder to a git clone of mediawiki/core.

I setup the wiki via the web installer - which if you've never tried yourself, I urge you to give it a go! It's not that complicated really, and that's quite impressive!

I have the following defined in LocalSettings.php to match production wikis.

# this is is important as otherwise things like page previews will not work correctly when using the proxying tips I tell you below
# With this setup articles can be viewed on the path `/wiki/Main_Page`
$wgArticlePath = "/wiki/$1";

Extensions and proxying content

I am lucky to work with extensions that require minimum setup - most are simply git clone, configure and play e.g.

wfLoadExtension( 'Popups' );
wfLoadExtension( 'MobileFrontend' );
wfLoadSkin( 'MinervaNeue' );

I have no working instance of Wikidata or VisualEditor - these are black boxes to me. As a general rule I only install what I need. When I do need to test integrations with them, I seek configurations that can point to production.

For instance, the following config allows me to load production content into VisualEditor without setting up a REST base server:

wfLoadExtension( 'VisualEditor' );
// Enable by default for everybody
$wgDefaultUserOptions['visualeditor-enable'] = 1;
$wgVirtualRestConfig['modules']['parsoid'] = array(
    // URL to the Parsoid instance
    // Use port 8142 if you use the Debian package
    'url' => 'https://en.wikipedia.org',
    // Parsoid "domain", see below (optional)
    'domain' => 'en.wikipedia.org',
);
$wgVisualEditorFullRestbaseURL = 'https://en.wikipedia.org/api/rest_';

Note, that this will trigger CORs problems on your localhost, but that MediaWiki's API supports CORs for localhost for readonly requests provided you pass a query string parameter "origin=*".

The following code when placed in the right place (I usually throw this line into core or above the api request I want to proxy) gives me
content to feed from production into VisualEditor:

$.ajaxSetup({ data: {  origin: '*'  }  } );

Whenever my team needs to work with any kind of service of API, I've always found it much more useful to proxy content from production as otherwise I miss bugs and I find replication difficult. Using Special:Import is slow and broken. In particular, if you import pages linked to Wikidata, you also need to clone the Wikidata page for that article to be replicated locally.

User generated content is particularly important when working with content on mobile. We added support to MobileFrontend to proxy content and it can easily be enabled with the following config in LocalSettings.php:

$wgMFContentProviderClass = 'MobileFrontend\ContentProviders\MwApiContentProvider';
$wgMFMwApiContentProviderBaseUri = "https://en.wikipedia.org/w/api.php";

With these two changes, any pages I view in mobile will be proxied from production. This is currently available in the beta cluster - check out https://en.m.wikipedia.beta.wmflabs.org/wiki/Singapore for example. The pages 404 to avoid indexing, but will be live copies of the production equivalent.

Proxying content for desktop too!

In Popups (page previews feature), this is pretty easy to avoid installing Popups dependencies PageImages and TextExtracts I use 2 lines of config:

$wgPopupsGateway = "restbaseHTML";
$wgPopupsRestGatewayEndpoint = 'https://en.m.wikipedia.org/api/rest_v1/page/summary/';

This configures my localhost to make use of the production REST endpoint to source summaries. If I create a page "Popups test page" and create red links on a page "Dog" and create a page "Dog" with 2 lines of text, previewing the link Dog on "Popups test page" will show me the production preview for the Dog article. However, this doesn't fully work as I need links to those pages for page previews to work. Since page previews runs on desktop I can use the same proxying I use for mobile....

// Enable MobileFrontend's "content provider" for desktop too!
$wgMFAlwaysUseContentProvider = true;

Sometimes I need to make articles, so I provide an override to allow me to edit and view local pages that don't live on my production wiki:

// This will ensure that any local pages are served instead of production copies where available.
$wgMFContentProviderTryLocalContentFirst = true;

Proxying read-only APIs in mobile

Sometimes, I want to proxy JavaScript as well, which MobileFrontend also allows me to do. If I'm testing workflows with read-only (ie. no POSTs or authentication) I can proxy APIs. This is useful for testing pages like Special:Nearby without having to create lots of articles with coordinates near your current location.

// Redirect API requests to English Wikipedia
$wgMFContentProviderScriptPath = "https://en.wikipedia.org/w";

Cache!

I use memcached to cache. Without this your wiki might be a little on the slow side:

$wgMainCacheType = CACHE_MEMCACHED;
$wgMemCachedServers = array( '127.0.0.1:11211' );
$wgParserCacheType = CACHE_MEMCACHED; # optional
$wgMessageCacheType = CACHE_MEMCACHED; # optional

Conclusions

That's it.. that's my setup. All the extensions I work on tend to mirror production config as closely as they possibly can so shouldn't require any additional setup, so I urge you if you haven't... try setting up a minimal MediaWiki and see what's the minimal you can get away with to be productive.

Let me know in comments if you run into any problems or I've converted you into a more effective engineer :-).

Gerrit now automatically adds reviewers

18:55, Tuesday, 26 2019 February UTC

Finding reviewers for a change is often a challenge, especially for a newcomer or folks proposing changes to projects they are not familiar with. Since January 16th, 2019, Gerrit automatically adds reviewers on your behalf based on who last changed the code you are affecting.

Antoine "@hashar" Musso exposes what lead us to enable that feature and how to configure it to fit your project. He will offers tip as to how to seek more reviewers based on years of experience.


When uploading a new patch, reviewers should be added automatically, that is the subject of the task T91190 opened almost four years ago (March 2015). I declined the task since we already have the Reviewer bot (see section below), @Tgr found a plugin for Gerrit which analyzes the code history with git blame and uses that to determine potential reviewers for a change. It took us a while to add that particular Gerrit plugin and the first version we installed was not compatible with our Gerrit version. The plugin was upgraded yesterday (Jan 16th) and is working fine (T101131).

Let's have a look at the functionality the plugin provides, and how it can be configured per repository. I will then offer a refresher of how one can search for reviewers based on git history.

Reviewers by blame plugin

The Gerrit plugin looks at affected code using git blame, it extracts the top three past authors which are then added as reviewers to the change on your behalf. Added reviewers will thus receive a notification showing you have asked them for code review.

The configuration is done on a per project basis and inherits from the parent project. Without any tweaks, your project inherits the configuration from All-Projects. If you are a project owner, you can adjust the configuration. As an example the configuration for operations/mediawiki-config which shows inherited values and an exception to not process a file named InitialiseSettings.php:

The three settings are described in the documentation for the plugin:

plugin.reviewers-by-blame.maxReviewers
The maximum number of reviewers that should be added to a change by this plugin.
By default 3.

plugin.reviewers-by-blame.ignoreFileRegEx
Ignore files where the filename matches the given regular expression when computing the reviewers. If empty or not set, no files are ignored.
By default not set.

plugin.reviewers-by-blame.ignoreSubjectRegEx
Ignore commits where the subject of the commit messages matches the given regular expression. If empty or not set, no commits are ignored.
By default not set.

By making past authors aware of a change to code they previously altered, I believe you will get more reviews and hopefully get your changes approved faster.

Previously we had other methods to add reviewers, one opt-in based and the others being cumbersome manual steps. They should be used to compliment the Gerrit reviewers by blame plugin, and I am giving an overview of each of them in the following sections.

Gerrit watchlist

The original system from Gerrit lets you watch projects, similar to a user watch list on MediaWiki. In Gerrit preferences, one can get notified for new changes, patchsets, comments... Simply indicate a repository, optionally a search query and you will receive email notifications for matching events.

The attached image is my watched projects configuration, I thus receive notifications for any changes made to the integration/config config as well as for changes in mediawiki/core which affect either composer.json or one of the Wikimedia deployment branches for that repo.

One drawback is that we can not watch a whole hierarchy of projects such as mediawiki and all its descendants, which would be helpful to watch our deployment branch. It is still useful when you are the primary maintainer of a repository since you can keep track of all activity for the repository.

Reviewer bot

The reviewer bot has been written by Merlijn van Deen (@valhallasw), it is similar to the Gerrit watched projects feature with some major benefits:

  • watcher is added as a reviewer, the author thus knows you were notified
  • it supports watching a hierarchy of projects (eg: mediawiki/*)
  • the file/branch filtering might be easier to gasp compared to Gerrit search queries
  • the watchers are stored at a central place which is public to anyone, making it easy to add others as reviewers.

One registers reviewers on a single wiki page: https://www.mediawiki.org/wiki/Git/Reviewers.

Each repository filter is a wikitext section (eg: === mediawiki/core ===) followed by a wikitext template and a file filter using using python fnmatch. Some examples:

Listen to any changes that touch i18n:

== Listen to repository groups ==
=== * ===
* {{Gerrit-reviewer|JohnDoe|file_regexp=<nowiki>i18n</nowiki>}}

Listen to MediaWiki core search related code:

=== mediawiki/core ===
* {{Gerrit-reviewer|JaneDoe|file_regexp=<nowiki>^includes/search/</nowiki>

The system works great, given maintainers remember to register on the page and that the files are not moved around. The bot is not that well known though and most repositories do not have any reviewers listed.

Inspecting git history

A source of reviewers is the git history, one can easily retrieve a list of past authors which should be good candidates to review code. I typically use git shortlog --summary --no-merges for that (--no-merges filters out merge commit crafted by Gerrit when a change is submitted). Example for MediaWiki Job queue system:

$ git shortlog --no-merges --summary --since "one year ago" includes/jobqueue/|sort -n|tail -n4
     3 Petr Pchelko
     4 Brad Jorsch
     4 Umherirrender
    16 Aaron Schulz

Which gives me four candidates that acted on that directory over a year.

Past reviewers from git notes

When a patch is merged, Gerrit records in git trace votes and the canonical URL of the change. They are available in git notes under /refs/notes/review, once notes are fetched, they can be show in git show or git log by passing --show-notes=review, for each commit, after the commit messages, the notes get displayed and show votes among other metadata:

$ git fetch refs/notes/review:refs/notes/review
$ git log --no-merges --show-notes=review -n1
commit e1d2c92ac69b6537866c742d8e9006f98d0e82e8
Author: Gergő Tisza <tgr.huwiki@gmail.com>
Date:   Wed Jan 16 18:14:52 2019 -0800

    Fix error reporting in MovePage
    
    Bug: T210739
    Change-Id: I8f6c9647ee949b33fd4daeae6aed6b94bb1988aa

Notes (review):
    Code-Review+2: Jforrester <jforrester@wikimedia.org>
    Verified+2: jenkins-bot
    Submitted-by: jenkins-bot
    Submitted-at: Thu, 17 Jan 2019 05:02:23 +0000
    Reviewed-on: https://gerrit.wikimedia.org/r/484825
    Project: mediawiki/core
    Branch: refs/heads/master

And I can then get the list of authors that previously voted Code-Review +2 for a given path. Using the previous example of includes/jobqueue/ over a year, the list is slightly different:

$ git log --show-notes=review --since "1 year ago" includes/jobqueue/|grep 'Code-Review+2:'|sort|uniq -c|sort -n|tail -n5
      2     Code-Review+2: Umherirrender <umherirrender_de.wp@web.de>
      3     Code-Review+2: Jforrester <jforrester@wikimedia.org>
      3     Code-Review+2: Mobrovac <mobrovac@wikimedia.org>
      9     Code-Review+2: Aaron Schulz <aschulz@wikimedia.org>
     18     Code-Review+2: Krinkle <krinklemail@gmail.com>

User Krinkle has approved a lot of patches, even if he doesn't show in the list of authors obtained by the previous mean (inspecting git history).

Conclusion

The Gerrit reviewers by blame plugin acts automatically which offers a good chance your newly uploaded patch will get reviewers added out of the box. For finer tweaking one should register as a reviewer on https://www.mediawiki.org/wiki/Git/Reviewers which benefits everyone. The last course of action is meant to compliment the git log history.

For any remarks, support, concerns, reach out on IRC freenode channel #wikimedia-releng or fill a task in Phabricator.

Thank you @thcipriani for the proof reading and english fixes.

Leading with Wikipedia: A brand proposal for 2030

03:00, Tuesday, 26 2019 February UTC

A few months ago, people across the world were asked “What is Wikimedia?”

Almost no one answered correctly.

But while “Wikimedia” may not be widely recognized outside our movement, there is a clear way to use our existing brands to better bring in the billions of people who have yet to join us in our vision. We can center our brand system around Wikipedia, one of the world’s best-known brands.

This is the suggestion at the heart of a brand research and strategy project, conducted to examine how Wikimedia’s brands could be refined to support our ambitious 2030 strategic goals. The project was led by Wolff Olins, a London-based brand consultancy, in consultation with Wikimedia Foundation staff, community leaders, and the Board of Trustees.

The proposed system change suggests elevating Wikipedia into a high-visibility entry point that can be used to better introduce the world to our range of projects and their shared mission. The proposal also recommends retaining project names as they are, while shortening “Wikimedia Commons” to its nickname “Wikicommons” to fit the “wiki + project” name convention.

Wolff Olins/Wikimedia Foundation, CC BY-SA 4.0.
Wolff Olins/Wikimedia Foundation, CC BY-SA 4.0.

This proposal is not a new idea. Since the name “Wiki-media” was proposed in 2003 as the movement’s collective identity, various discussions have suggested a Wikipedia-led brand system.

In 2007, Erik Möller suggested the movement “mak[e] use of the strongest brand (Wikipedia) to identify all activities.” This call was echoed by Guillaume Paumier and Elisabeth Bauer at Wikimania Taipei that year. In contrast to those proposals, the current brand strategy suggests retaining project names as they are, reflecting the strength of these identities as community knowledge projects.

Wikimedia Foundation, CC BY-SA 4.0.

Wikipedia, now 18 years old, remains one of the world’s best-known brands. More than 80% of internet users in North America and Europe have heard of it, and in nations where internet access is expanding rapidly, awareness of Wikipedia is growing quickly. Since 2016, for example, Wikipedia awareness in Nigeria has nearly doubled from 27% to 48% of internet users.

Of course, there have also been arguments against using Wikipedia as the brand center point. Wikipedia is just one of the movement’s 13 online projects, and many volunteers worry that focusing on a single project would lead to smaller project communities receiving less attention. Naming movement organizations after Wikipedia might lead people to mistakenly believe that they are responsible for content authorship.

However, Wikipedia is already how the movement connects to the world. From visa applications to press to partnerships, Wikimedians today commonly define the movement as “behind” or “in support of” Wikipedia, pragmatically connecting to our most visible project. Donors make contributions in explicit aid to Wikipedia, and donors are sometimes confused or alarmed when they encounter the name “Wikimedia” on their donation receipts.

In the proposed brand strategy, Wikipedia would be redefined to encompass the entire movement’s identity. Just as the Google name (originally just for the search engine, then “the company behind the search engine”) was broadened to connect many projects like Maps, Drive, Slides, and Docs, this Wikipedia centering would offer a clear common brand to the movement while retaining project areas. This shift would create an opportunity for renaming the movement, affiliates, and the Foundation with “Wikipedia” names.

The Wikipedia interface could also be improved to show clearer connection between projects, driving more visibility, usage, and hopefully contributions, to smaller projects. Naming conventions would also be developed in order to show how projects connect back to Wikipedia. We could imagine expository taglines, for example, that would describe “Wiktionary” as “a Wikipedia project.” A new visual identity, linking projects together with a unified “Wikipedia” movement mark and style, would also be considered.

• • •

But let’s pause here.

By definition, Wikimedia brands are shared among the communities who give them meaning. So in considering this change, the Wikimedia Foundation is collecting feedback from across our communities. Our goal is to speak with more than 80% of affiliates and as many individual contributors as possible before May 2019, when we will offer the Board of Trustees a summary of the community’s response.

Today, we want to share the full brand project materials for your review and response. We invite you to look at a project summary, the brand research, and the brand strategy suggestion Wolff Olins prepared working with many in the movement.

This moment, like so much made possible by the strategic direction and the movement strategy process, invites us to consider how we show up in the world, and how we can be more inviting to the billions of people we have yet to reach. It is an opportunity for us to consider change while assessing if this suggestion really works across the world. In the words of the brand summary, together we can set knowledge free.

• • •

Would you like to share thoughts?

Please email us at brandproject@wikimedia.org or add a note to the project talk page on Meta-Wiki.

We are also ready for comments in person (at conferences or via virtual meetings) and in various languages. Email brandproject@wikimedia.org to set up a discussion.

Zack McCune, Senior Global Brand Manager
Wikimedia Foundation

You may remember a time before password standards, when passwords like “password” were used. As countless news stories have shown us since, those passwords were not ideal—but the recommended solution of creating complex passwords for each website has created problems of its own.[1]

Having one of your online accounts hacked can be a disruptive and disturbing experience. That’s why the Wikimedia Foundation’s Security team wants to make preventing that a little easier by updating our password policies (more on that at the bottom) and have put together six rules for selecting a good password.[2] We strongly encourage all current Wikimedia users to review the updated policy and current passwords to ensure that their account remains secure.

Rule #1: Favor length over complexity

When creating a password, pick something that is easy to remember but has a lot of characters and is made up from multiple words. I like to use a collection of thoughts and things to create a statement or phrase. This phrase could be nonsensical or something real.

Photo by John Bennett, CC BY-SA 4.0.

For example, here’s a picture of a dog. If I were to create a password based on this image, it would be “That dog is standing in the violets and needs a shave!” This is a great password for these three reasons: it’s long, difficult to guess or crack, and easy to remember (because it’s true).

A more complex password with fewer characters, like D0gg@sRul3!, is tough to crack but much harder to create and to remember. Because it is hard to remember, it is also more likely to be recycled for use in other places, which is a bad idea and something we will cover in rule #5.

Rule #2: D0nt M@k3 1t h/\rd3r t#aN 1T hA5 t() %e! (Don’t make it harder than it has to be!)

Complexity is the enemy of security. From a credentialing standpoint, it encourages very bad habits. When we add more complexity to credentials, it makes it harder to remember passwords and strengthens the temptations to reuse the same credentials on multiple sites, which is a very bad idea (see rule #5). You can create a great password without making it super complicated.

Rule #3: Don’t change passwords just for the sake of changing them

Changing passwords for the sake of changing them enforces a couple of bad habits. Primarily, it encourages the selection of bad passwords (such as passwords that follow the seasons, like Summer2018 or Winter2018). This also encourages credential reuse—so e.g. when users get prompted to change their password, it’s easier to just use something you are are already using somewhere else. This is a bad idea (see rule #5).

You should change your password if you know or suspect that the account has been compromised. There are a couple places on the internet that can help you find that information, such as the website have i been pwned?.

Rule #4: Don’t use the name of the site, application or thing as part of the password

While incorporating the name of the site or application into your password creation process might be tempting, it’s not a great idea. This concept extends to products or services that site or application provides also.  When you create credentials they should be unique and separate from the activity you are participating in.  An example is if your password on Wikipedia is ‘i edit wikipedia,’ please change your password immediately.

Rule #5: Don’t reuse passwords

This rule has been mentioned in just about every other rule because it’s extremely important.  Many of us go to lots of places on the internet, and that results in lots of credentials. That also means that it’s not super odd to create common credentials, reused across social media or banking or other sites. Often we’ve created a “good” strong password that we use it for sensitive sites, and a “ok” password that is used for less critical things.

Unfortunately, recycling passwords is pretty dangerous. Here’s a very common and oft-heard scenario:

  • One of your favorite sites gets compromised. It’s one where you used your “good” password.
  • A dump of user id’s and passwords from that compromise is posted someplace on the internet.
  • Attackers use the list’s information, including your username and password, to try to break into other accounts on other sites.
  • Suddenly, it’s not just the one account that’s compromised—it’s your banking and any number of other sensitive sites where you used those credentials.

 
It’s totally fair to say in response that you can’t remember that many passwords. I certainly can’t. This is why I encourage you to use a password manager, which securely stores all of your passwords. There are many options out there, both free and paid. Some examples are lastpass, keepass, and sticky password.[3]

Of course, please follow these rules when creating your password manager’s password—only use a strong, unique, and lengthy password. This is the only password you’ll have to remember!

Rule #6: Passwords are “ok”. A second factor is better!

Two-factor authentication, often shortened to 2FA, is a way of securing your accounts such that a user has to present two pieces of evidence before logging in. Most frequently, this is a password and a temporary code.

At this time, the Wikimedia Foundation offers two-factor authentication (2FA) only to accounts with certain privileged roles, though we are exploring 2FA options for all users.

That said, this rule is still good to keep in mind as you negotiate your way around the internet. Some examples of 2FA services you can use are Google Authenticator, YubiKey, or Authy.[3]

What about that new password policy?

Wikipedia is not immune to being targeted by password attacks. That’s why we’re implementing a new password policy, which will go into effect in early 2019 for newly created accounts. While existing users won’t be affected by this change, we strongly encourage everyone to review and follow the rules above to keep your account secure. If your password isn’t up to snuff, please come up with something new.

The new password policy will evaluate new credentials against a list of known compromised, weak or just poor passwords in general, and will enforce a minimum eight character password for any newly created account. The same is true for privileged accounts (Administrators, Bot admins, Bureaucrats, Check users, and others), but will enforce a minimum of ten characters.

You can find more information about these changes on MediaWiki.org.

Related, but separately, the Wikimedia Foundation’s Security team will also begin running regular password tests. These tests will look for existing weak passwords, and we will encourage everyone to protect their account by using a strong credential.

The Security team is committed to regular security awareness, so you’ll be seeing more content like this coming soon. Thank you for being an advocate for account security.

John Bennett, Director of Security
Wikimedia Foundation

• • •

Footnotes

  1. The National Institute of Standards and Technology believed that the solution was to create complex passwords for each website—but the more complex things became, the harder it was to meet the requirements and remember passwords. Inadvertently, the institute’s recommendations encouraged poor credential habits like passwords on post-it notes, or having a single ‘strong’ password and one that gets used for everything else.
  2. For a litany of reasons that I won’t get into here, the word ‘password’ is a bit dated. Going forward, ‘passphrase’ is really a better way to think about all of this. I’d like to keep things simple, though, so we’ve used ‘password’ for this post.
  3. These are examples and not endorsed by the Wikimedia Foundation.

weeklyOSM 448

09:34, Sunday, 24 2019 February UTC

12/02/2019-18/02/2019

Logo

16 mappers doing field mapping in Marikina, Philippines 1 | © GOwin

[Actual Category]

  • On March 23, 2019, an Action Day called Save Your Internet against the EU Copyright Directive is planned with demonstrations in many cities. On Saturday, February 16, the first, spontaneously organised demonstration, with about 1000 participants, took place in Cologne, as Heise Online reports (automatic translation).
  • Julia Reda, the member of the German Piraten Partei in the European Parliament, has published on her website the current drafts for Article 11 (Linksteuer / Presseleistungsschutzrecht) and 13 (Uploadfilter) of the new EU Copyright Directive. She expects the vote to take place in the plenary sessions of Parliament between 25 and 28 March, on 4 April, or between 15 and 18 April.
  • In a comment on netzpolitik.org, Markus Reuter calls (automatic translation) for action against the EU copyright directive in real life.

Mapping

  • Nuno Caldeira announced on Twitter the excellent work done by the newly created Portuguese OpenStreetMap Telegram group mapping buildings on Porto Santo island, in the Madeira archipelago. They will do monthly mapping challenges, such as this one, to improve OpenStreetMap in Portugal.
  • The decision of the StreetComplete maker to create challenges for adding default values like foot=yes on highway=residential or access=yes on amenity=playground caused lengthy discussions on the tagging mailing list and the forum (de) (automatic translation).
  • Tshedy reported on the MapLesotho blog about an exhibition at Avani Maseru which was held with the help of Fingal County Council and Action Ireland Trust. The “tremendous progress from 2013 to 2018” in Lesotho was featured as image of the week on the main wiki page.
  • [1] GOwin wrote a blog post about the first mapping party organised by MapaTime in Marikina, Philippines. Sixteen mappers showed up at the first field mapping activity since 2016, which started as early as 6:30 am. The picture-rich post also details the organisation of the event.
  • The vote for the key departures=, which is designed to indicate the departure times for any given stop, just started. The proposal follows the recent approval of the new key interval= for tagging the time between departures.
  • Voting for natural=isthmus, for tagging a strip of land, bordered by water on both sides and connecting two larger land masses, and for natural=peninsula has recently started.

Community

  • Stefan Keller criticises Mapzen and the Linux Foundation, which recently took over Mapzen’s projects, for concealing the main data sources, the OpenStreetMap project, in their public announcements.
  • Christian created an online map that displays the use of animals. In his blog he explains (de) (automatic translation) that he is interested in animal protection-related open data.
  • Chris Beddow and Daniela Waltersdorfer J wrote, in the Mapillary Blog, about the need for more detailed mapping of kerbs.
  • The election for the OpenStreetMap US Board, which the weeklyOSM recently mistakenly called an OSMF chapter, will start on 25 February and end 28 February. Nominations for the OpenStreetMap US Board are open until Sunday 24 February.

Events

  • Joost Schouppe announced this year’s Belgian National Mapathon, which is scheduled on:
    • March 27 in Brussels (FR), Leuven, Louvain-la-Neuve and Mons
    • March 28 in Brussels (NL), Liège, Namur and Ghent
    • March 30 in Louvain-la-Neuve.and Liège.
  • The SotM-France 2019 will take place from the 14 to 16 June in Montpellier.

Humanitarian OSM

  • HOT has recently held a Tasking Manager Stakeholders’ Workshop in Washington, DC at which development goals for 2019 were discussed. The focus will be on the functionality to group users into suitable teams based on the users’ backgrounds, statistical features to improve mapping projects and their data quality, the introduction of machine learning workflows into mapping and the observation of usability and user flows.
  • HOT features an article from Peter Ward from the GAL School in Cusco, Peru about the geodata collection for testing their hypothesis that closer rubbish bins means less rubbish on the ground.

Maps

  • Lukas Loos, from HeiGIT, introduced a first beta version of the OpenStreetMap History eXplorer (ohsomeHeX). It uses ohsome (OpenStreetMap History Data Analytics Platform) to aggregate OSM data of a first set of selected features into a set of world-spanning hexagonal grids in a configurable temporal resolution. The goal is to allow the exploration of the history of OpenStreetMap data in time and space at varying scales. Further improvements are in the pipeline.
  • Utagawa Hiroshige is one of the great masters of Japanese printmaking. George has created an OSM-based map that geolocalises “impressive images of Japan’s great landscape” made by Utagawa Hiroshige. Antoine Oury has written an article about this on ActuaLitté. (fr) (automatic translation)

Open Data

  • On 13 February 2019, the Brandenburg state government, Germany, passed a bill amending the Surveying Act and providing for the future free use of geodata by surveying authorities. With the amendment to the law, digital geodata will, in future, be made available to the surveying authority free of charge. The offer covers nearly 330 digital data records and data services altogether. These include, for example, high-resolution aerial photographs, maps of roads and properties or the representation of entire landscapes and elevation models. Brandenburg publishes its open data on the Internet portal DatenAdler.de under the license dl-zero-de/2.0.
  • The European Union has launched the third Eudatathon. Participants, who are asked to develop apps and visualisations using open data made available by the European Union, can win prizes of up to €15,000.
  • Ellen Tannam published an interview on siliconrepublic.com with Séan Lynch, the man behind the project openlittermap.com. In the interview he explained his motivation, the benefits of OSM’s open data model and the hopes he has for his project in light of the global litter crisis.

Licences

  • The OSMF Licence Working Group has written a Licence Compatibility guide for our current licence ODbL 1.0 and made the evaluation results of the Linux Foundation’s “CDLA (Community Data License Agreement) Permissive” licence available.

Software

  • On Twitter, Quincy Morgan presented a preview of an upcoming feature for the iD-Editor, a tool that shows validation issues to the user while editing. In response to an enquiry from Pierre Béland, Quincy explains that, for example, the upload of untagged objects with area=yes will be blocked. A test version is available here.
  • The WordPress plugin OSM from MiKa is now compatible with WordPress 5.0. This version of the plugin supports the “Classic Editor” and the new “Gutenberg Editor”. MiKa has also implemented 200 new icons and support for the humanitarian OSM style.

Releases

  • OSM’s main map style, Carto, has received another update. As the maintainer, Daniel Koć, wrote in his OSM diary, version 4.20.0 has added the rendering of boundary=aboriginal_area and boundary=protected_area with protect_class=24. The new version also dropped support for leisure=common, lightened major buildings and added support for oneway arrows for footways, cycleways and paths as well as a variety of other improvements.
  • OsmAnd was released in version 2.5 for Apple’s iOS operating system. The most notable new feature is access to their OsmAnd Live subscription. The app can be downloaded from the iTunes Store.

Did you know …

  • You might have stumbled across fictional data in OSM. Now you can point the authors of such unwanted data to opengeofiction.net. Martijn van Exel asks for help to grow his fictional town and connect it to the rest of the world.
  • Following her tweet that kerbs are become increasingly important nowadays, Daniela Waltersdorfer published an article about kerbs, why she thinks their importance is growing, opens questions with regards to mapping them in OSM and how “curbs”–or, as they should be correctly named “kerbs”–should be added in OSM. She draws special attention to transition points, i.e. where kerbs are lowered, as these are important features to pedestrians, cyclists, and especially to those with reduced mobility. On Twitter, Tobias Jordans points to the parking lane viewer/editor on zlant.github.io as an easy way to add parking information to OSM.
  • … about the supplementary descriptions of the tags used for representing roads and paths as areas, resulting from the original proposal by Marek_kleciak? In 2018 Tomasz_W wrote wiki pages for area:highway= pedestrian, footway, path and cycleway, which are, however, currently at odds with the rules defined in the original proposal.

OSM in the media

  • The Berliner Morgenpost used Hans Hack’s Figuregrounder tool to create a quiz that challenges its readers to identify Berlin locations on the basis of the building outlines from OSM data.

Other “geo” things

  • Mapbox published a community impact summary for 2018, which highlights Mapbox’s social engagement and details some of the 167 non-profit projects they support.
  • Listverse.com shows ten areas on Google maps where the borders between states are not clear.
  • Users of ESRI’s ArcGIS Online have already contributed vector data and aerial images for background maps (“Community Maps”). Now ESRI also allows users to edit these data. The edit review process takes one week and inclusion in the main database takes up to three weeks. Needless to say, OSM is a bit faster.
  • The New York Times used the Openrouteservice API to derive walking distances from major subway stations in New York City for their analytical piece “Where the Subway Limits New Yorkers With Disabilities”.
  • Dmitry Filippov, a student of “Information technology and applied mathematics” at Moscow Aviation Institute (National Research University), created a startup, “Heavy Geeks”, where he transferred the map of the entire real world into the game “Parallel 42”.

Upcoming Events

Where What When Country
Padua FOSS4G-IT 2019 (OSMit2019) 2019-02-20-2019-02-24 italy
Greater Vancouver area Metrotown mappy Hour 2019-02-22 canada
Biella Incontro mensile 2019-02-23 italy
Manila 【MapaTime!】 @ co.lab 2019-02-23 philippines
Karlsruhe Karlsruhe Hack Weekend February 2019 2019-02-23-2019-02-24 germany
Rennes Créer ses propres cartes avec uMap 2019-02-24 france
Bremen Bremer Mappertreffen 2019-02-25 germany
Digne-les-Bains Conférence « Communs numériques – Cartes sensibles » 2019-02-26 france
Viersen OSM Stammtisch Viersen 2019-02-26 germany
Düsseldorf Stammtisch 2019-02-27 germany
Ludwigshafen am Rhein Mannheimer Mapathons – Stadtbibliothek LU 2019-02-27 germany
Zurich Missing Maps Mapathon Zurich 2019-02-27 switzerland
Lübeck Lübecker Mappertreffen 2019-02-28 germany
Leoberdorf Leobersdorfer Stammtisch 2019-02-28 austria
Montrouge Rencontre des contributeurs de Montrouge et alentours 2019-02-28 france
Minsk byGIS Meetup 2019-03-01 belarus
Amagasaki IODD:尼崎港線アーカイブダンジョン 2019-03-02 japan
Wuppertal [Wuppertaler Opendata Day] 2019-03-02-2019-03-03 germany
London Missing Maps London Mapathon 2019-03-05 uk
Stuttgart Stuttgarter Stammtisch 2019-03-06 germany
Praha/Brno/Ostrava Kvartální pivo 2019-03-06 czech republic
Dresden Stammtisch Dresden 2019-03-07 germany
Nantes Réunion mensuelle 2019-03-07 france
Ivrea Incontro mensile 2019-03-09 italy
Oslo OSM-beer 2019-03-08 norway
Rennes Réunion mensuelle 2019-03-11 france
Zurich OSM Stammtisch Zurich 2019-03-11 switzerland
Taipei OSM x Wikidata #2 2019-03-11 taiwan
Lyon Rencontre mensuelle pour tous 2019-03-12 france
Salt Lake City SLC Mappy Hour 2019-03-12 united states
Arlon Espace public numérique d’Arlon – Formation Initiation 2019-03-12 belgium
Munich Münchner Stammtisch 2019-03-13 germany
Dresden FOSSGIS 2019 2019-03-13-2019-03-16 germany
Berlin 129. Berlin-Brandenburg Stammtisch 2019-03-14 germany
Scotland Edinburgh 2019-03-19 uk
Portmarnock Erasmus+ EuYoutH_OSM Meeting 2019-03-25-2019-03-29 ireland
Montpellier State of the Map France 2019 2019-06-14-2019-06-16 france
Angra do Heroísmo Erasmus+ EuYoutH_OSM Meeting 2019-06-24-2019-06-29 portugal
Minneapolis State of the Map US 2019 2019-09-06-2019-09-08 united states
Edinburgh FOSS4GUK 2019 2019-09-18-2019-09-21 united kingdom
Heidelberg Erasmus+ EuYoutH_OSM Meeting 2019-09-18-2019-09-23 germany
Heidelberg HOT Summit 2019 2019-09-19-2019-09-20 germany
Heidelberg State of the Map 2019 (international conference) 2019-09-21-2019-09-23 germany
Grand-Bassam State of the Map Africa 2019 2019-11-22-2019-11-24 ivory coast

Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please check your event in our public calendar preview and correct it, where appropriate.

This weeklyOSM was produced by Nakaner, Polyglot, Rogehm, SunCobalt, TheSwavu, YoViajo, derFred, geologist, jinalfoflia, kartonage.

Clean Architecture: UseCase tests

05:18, Saturday, 23 2019 February UTC

When creating an application that follows The Clean Architecture you end up with a number of UseCases that hold your application logic. In this blog post I outline a testing pattern for effectively testing these UseCases and avoiding common pitfalls.

Testing UseCases

A UseCase contains the application logic for a single “action” that your system supports. For instance “cancel a membership”. This application logic interacts with the domain and various services. These services and the domain should have their own unit and integration tests. Each UseCase gets used in one or more applications, where it gets invoked from inside the presentation layer. Typically you want to have a few integration or edge-to-edge tests that cover this invocation. In this post I look at how to test the application logic of the UseCase itself.

UseCases tend to have “many” collaborators. I can’t recall any that had less than 3. For the typical UseCase the number is likely closer to 6 or 7, with more collaborators being possible even when the design is good. That means constructing a UseCase takes some work: you need to provide working instances of all the collaborators.

Integration Testing

One way to deal with this is to write integration tests for your UseCases. Simply get an instance of the UseCase from your Top Level Factory or Dependency Injection Container.

This approach often requires you to mutate the factory or DIC. Want to test that an exception from the persistence service gets handled properly? You’ll need to use some test double instead of the real service, or perhaps mutate the real service in some way. Want to verify a mail got send? Definitely want to use a Spy here instead of the real service. Mutability comes with a cost so is better avoided.

A second issue with using real collaborators is that your tests get slow due to real persistence usage. Even using an in-memory SQLite database (that needs initialization) instead of a simple in-memory fake repository makes for a speed difference of easily two orders of magnitude.

Unit Testing

While there might be some cases where integration tests make sense, normally it is better to write unit tests for UseCases. This means having test doubles for all collaborators. Which leads us to the question of how to best inject these test doubles into our UseCases.

As example I will use the CancelMembershipApplicationUseCase of the Wikimedia Deutschland fundrasing application.

function __construct(ApplicationAuthorizer $authorizer, ApplicationRepository $repository, TemplateMailerInterface $mailer) {
    $this->authorizer = $authorizer;
    $this->repository = $repository;
    $this->mailer = $mailer;
}

This UseCase uses 3 collaborators. An authorization service, a repository (persistence service) and a mailing service. First it checks if the operation is allowed with the authorizer, then it interacts with the persistence and finally, if all went well, it uses the mailing service to send a confirmation email. Our unit test should test all this behavior and needs to inject test doubles for the 3 collaborators.

The most obvious approach is to construct the UseCase in each test method.

public function testGivenIdOfUnknownDonation_cancellationIsNotSuccessful(): void {
    $useCase = new CancelMembershipApplicationUseCase(
        new SucceedingAuthorizer(),
        $this->newRepositoryWithCancellableDonation(),
        new MailerSpy()
    );

    $response = $useCase->cancelApplication(
        new CancellationRequest( self::ID_OF_NON_EXISTING_APPLICATION )
    );

    $this->assertFalse( $response->cancellationWasSuccessful() );
}

public function testGivenIdOfCancellableApplication_cancellationIsSuccessful(): void {
    $useCase = new CancelMembershipApplicationUseCase(
        new SucceedingAuthorizer(),
        $this->newRepositoryWithCancellableDonation(),
        new MailerSpy()
    );
    
    $response = $useCase->cancelApplication(
        new CancellationRequest( $this->cancelableApplication->getId() )
    );

    $this->assertTrue( $response->cancellationWasSuccessful() );
}

Note how both these test methods use the same test doubles. This is not always the case, for instance when testing authorization failure, the test double for the authorizer service will differ, and when testing persistence failure, the test double for the persistence service will differ.

public function testWhenAuthorizationFails_cancellationFails(): void {
    $useCase = new CancelMembershipApplicationUseCase(
        new FailingAuthorizer(),
        $this->newRepositoryWithCancellableDonation(),
        new MailerSpy()
    );

    $response = $useCase->cancelApplication(
        new CancellationRequest( $this->cancelableApplication->getId() )
    );

    $this->assertFalse( $response->cancellationWasSuccessful() );
}

Normally a test function will only change a single test double.

UseCases tend to have, on average, two or more behaviors (and thus tests) per collaborator. That means for most UseCases you will be repeating the construction of the UseCase in a dozen or more test functions. That is a problem. Ask yourself why.

If the answer you came up with was DRY then think again and read my blog post on DRY 😉 The primary issue is that you couple each of those test methods to the list of collaborators. So when the constructor signature of your UseCase changes, you will need to do Shotgun Surgery and update all test functions. Even if those tests have nothing to do with the changed collaborator. A second issue is that you pollute the test methods with irrelevant details, making them harder to read.

Default Test Doubles Pattern

The pattern is demonstrated using PHP + PHPUnit and will need some adaptation when using a testing framework that does not work with a class based model like that of PHPUnit.

The coupling to the constructor signature and resulting Shotgun Surgery can be avoided by having a default instance of the UseCase filled with the right test doubles. This can be done by having a newUseCase method that constructs the UseCase and returns it. A way to change specific collaborators is needed (ie a FailingAuthorizer to test handling of failing authorization).

private function newUseCase() {
    return new CancelMembershipApplicationUseCase(
        new SucceedingAuthorizer(),
        new InMemoryApplicationRepository(),
        new MailerSpy()
    );
}

Making the UseCase itself mutable is a big no-no. Adding optional parameters to the newUseCase method works in languages that have named parameters. Since PHP does not have named parameters, another solution is needed.

An alternative approach to getting modified collaborators into the newUseCase method is using fields. This is less nice than named parameters, as it introduces mutable state on the level of the test class. Since in PHP this approach gives us named fields and is understandable by tools, it is better than either using a positional list of optional arguments or emulating named arguments with an associative array (key-value map).

The fields can be set in the setUp method, which gets called by PHPUnit before the test methods. For each test method PHPUnit instantiates the test class, then calls setUp, and then calls the test method.

public function setUp() {
    $this->authorizer = new SucceedingAuthorizer();
    $this->repository = new InMemoryApplicationRepository();
    $this->mailer = new MailerSpy();

    $this->cancelableApplication = ValidMembershipApplication::newDomainEntity();
    $this->repository->storeApplication( $this->cancelableApplication );
}
private function newUseCase(): CancelMembershipApplicationUseCase {
    return new CancelMembershipApplicationUseCase(
        $this->authorizer,
        $this->repository,
        $this->mailer
    );
}

With this field-based approach individual test methods can modify a specific collaborator by writing to the field before calling newUseCase.

public function testWhenAuthorizationFails_cancellationFails(): void {
    $this->authorizer = new FailingAuthorizer();

    $response = $this->newUseCase()->cancelApplication(
        new CancellationRequest( $this->cancelableApplication->getId() )
    );

    $this->assertFalse( $response->cancellationWasSuccessful() );
}

public function testWhenSaveFails_cancellationFails() {
    $this->repository->throwOnWrite();

    $response = $this->newUseCase()->cancelApplication(
        new CancellationRequest( $this->cancelableApplication->getId() )
    );

    $this->assertFalse( $response->cancellationWasSuccessful() );
}

The choice of default collaborators is important. To minimize binding in the test functions, the default collaborators should not cause any failures. This is the case both when using the field-based approach and when using optional named parameters.

If the authorization service failed by default, most test methods would need to modify it, even if they have nothing to do with authorization. And it is not always self-evident they need to modify the unrelated collaborator. Imagine the default authorization service indeed fails and that the testWhenSaveFails_cancellationFails test method forgets to modify it. This test method would end up passing even if the behavior it tests is broken, since the UseCase will return the expected failure result even before getting to the point where it saves something.

This is why inside of the setUp function the example creates a “cancellable application” and puts it inside an in-memory test double of the repository.

I chose the CancelMembershipApplication UseCase as an example because it is short and easy to understand. For most UseCases it is even more important to avoid the constructor signature coupling as this issue becomes more severe with size. And no matter how big or small the UseCase is, you benefit from not polluting your tests with unrelated setup details.

You can view the whole CancelMembershipApplicationUseCase and CancelMembershipApplicationUseCaseTest.

See also

Sign up below to receive news on my upcoming Clean Architecture book, including a discount:

The post Clean Architecture: UseCase tests appeared first on Entropy Wins.

Clean Architecture + Bounded Contexts diagram

05:12, Saturday, 23 2019 February UTC

I’m happy to release a two Clean Architecture + Bounded Contexts diagrams into the public domain (CC0 1.0).

I created these diagrams for Wikimedia Deutchland with the help of Jan Dittrich, Charlie Kritschmar and Hanna Petruschat. They represent the architecture of our fundraising codebase. I explain the rules of this architecture in my post Clean Architecture + Bounded Contexts. The new diagrams are based on the ones I published two years ago in my Clean Architecture Diagram post.

Diagram 1: Clean Architecture + DDD, generic version. Click to enlarge. Link: SVG version

Diagram 1: Clean Architecture + DDD, fundraising version. Click to enlarge. Link: SVG version

Clean Architecture book

Sign up below to receive news on my upcoming Clean Architecture book, including a discount:

The post Clean Architecture + Bounded Contexts diagram appeared first on Entropy Wins.

To be a more effective data scientist, think in experiments

05:00, Saturday, 23 2019 February UTC

Perf Matters at Wikipedia 2015

14:43, Thursday, 21 2019 February UTC

Hello, WANObjectCache

This year we achieved another milestone in our multi-year effort to prepare Wikipedia for serving traffic from multiple data centres.

The MediaWiki application that powers Wikipedia relies heavily on object caching. We use Memcached as horizontally scaled key-value store, and we’d like to keep the cache local to each data centre. This minimises dependencies between data centres, and makes better use of storage capacity (based on local needs).

Aaron Schulz devised a strategy that makes MediaWiki caching compatible with the requirements of a multi-DC architecture. Previously, when source data changed, MediaWiki would recompute and replace the cache value. Now, MediaWiki broadcasts “purge” events for cache keys. Each data centre receives these and sets a “tombstone”, a marker lasting a few seconds that limits any set-value operations for that key to a miniscule time-to-live. This makes it tolerable for recache-on-miss logic to recompute the cache value using local replica databases, even though they might have several seconds of replication lag. Heartbeats are used to detect the replication lag of the databases involved during any re-computation of a cache value. When that lag is more than a few seconds (a large portion of the tombstone period), the corresponding cache set-value operation automatically uses a low time-to-live. This means that large amounts of replication lag are tolerated.

This and other aspects of WANObjectCache’s design allow MediaWiki to trust that cached values are not substantially more stale, than a local replica database; provided that cross-DC broadcasting of tiny in-memory tombstones is not disrupted.


First paint time now under 900ms

In July we set out a goal: improve page load performance so our median first paint time would go down from approximately 1.5 seconds to under a second – and stay under it!

I identified synchronous scripts as the single-biggest task blocking the browser, between the start of a page navigation and the first visual change seen by Wikipedia readers. We had used async scripts before, but converting these last two scripts to be asynchronous was easier said than done.

There were several blockers to this change. Including the use of embedded scripts by interactive features. These were partly migrated to CSS-only solutions. For the other features, we introduced the notion of “delayed inline scripts”. Embedded scripts now wrap their code in a closure and add it to an array. After the module loader arrives, we process the closures from the array and execute the code within.

Another major blocker was the subset of community-developed gadgets that didn’t yet use the module loader (introduced in 2011). These legacy scripts assumed a global scope for variables, and depended on browser behaviour specific to serially loaded, synchronous, scripts. Between July 2015 and August 2015, I worked with the community to develop a migration guide. And, after a short deprecation period, the legacy loader was removed.

Line graph that plots the firstPaint metric for August 2015. The line drops from approximately one and a half seconds to 890 milliseconds.

Hello, WebPageTest

Previously, we only collected performance metrics for Wikipedia from sampled real-user page loads. This is super and helps detect trends, regressions, and other changes at large. But, to truly understand the characteristics of what made a page load a certain way, we need synthetic testing as well.

Synthetic testing offers frame-by-frame video captures, waterfall graphs, performance timelines, and above-the-fold visual progression. We can run these automatically (e.g. every hour) for many urls, on many different browsers and devices, and from different geo locations. These tests allow us to understand the performance, and analyse it. We can then compare runs over any period of time, and across different factors. It also gives us snapshots of how pages were built at a certain point in time.

The results are automatically recorded into a database every hour, and we use Grafana to visualise the data.

In 2015 Peter built out the synthetic testing infrastructure for Wikimedia, from scratch. We use the open-source WebPageTest software. To read more about its operation, check Wikitech.


The journey to Thumbor begins

Gilles evaluated various thumbnailing services for MediaWiki. The open-source Thumbor software came out as the most promising candidate.

Gilles implemented support for Thumbor in the MediaWiki-Vagrant development environment.

To read more about our journey to Thumbor, read The Journey to Thumbor (part 1).


Save timing reduced by 50%

Save timing is one of the key performance metrics for Wikipedia. It measures the time from when a user presses “Publish changes” when editing – until the user’s browser starts to receive a response. During this time, many things happen. MediaWiki parses the wiki-markup into HTML, which can involve page macros, sub-queries, templates, and other parser extensions. These inputs must be saved to a database. There may also be some cascading updates, such as the page’s membership in a category. And last but not least, there is the network latency between user’s device and our data centres.

This year saw a 50% reduction in save timing. At the beginning of the year, median save timing was 2.0 seconds (quarterly report). By June, it was down to 1.6 seconds (report), and in September 2015, we reached 1.0 seconds! (report)

Line graph of the median save timing metric, over 2015. Showing a drop from two seconds to one and a half in May, and another drop in June, gradually going further down to one second.

The effort to reduce save timing was led by Aaron Schulz. The impact that followed was the result of hundreds of changes to MediaWiki core and to extensions.

Deferring tasks to post-send

Many of these changes involved deferring work to happen post-send. That is, after the server sends the HTTP response to the user and closes the main database transaction. Examples of tasks that now happen post-send are: cascading updates, emitting “recent changes” objects to the database and to pub-sub feeds, and doing automatic user rights promotions for the editing user based on their current age and total edit count.

Aaron also implemented the “async write” feature in the multi-backend object cache interface. MediaWiki uses this for storing the parser cache HTML in both Memcached (tier 1) and MySQL (tier 2). The second write now happens post-send.

By re-ordering these tasks to occur post-send, the server can send a response back to the user sooner.

Working with the database, instead of against it

A major category of changes were improvements to database queries. For example, reducing lock contention in SQL, refactoring code in a way that reduces the amount of work done between two write queries in the same transaction, splitting large queries into smaller ones, and avoiding use of database master connections whenever possible.

These optimisations reduced chances of queries being stalled, and allow them to complete more quickly.

Avoid synchronous cache re-computations

The aforementioned work on WANObjectCache also helped a lot. Whenever we converted a feature to use this interface, we reduced the amount of blocking cache computation that happened mid-request. WANObjectCache also performs probabilistic preemptive refreshes of near-expiring values, which can prevent cache stampedes.

Profiling can be expensive

We disabled the performance profiler of the AbuseFilter extension in production. AbuseFilter allows privileged users to write rules that may prevent edits based on certain heuristics. Its profiler would record how long the rules took to inspect an edit, allowing users to optimise them. The way the profiler worked, though, added a significant slow down to the editing process. Work began later in 2016 to create a new profiler, which has since completed.

And more

Lots of small things. Including the fixing of the User object cache which existed but wasn’t working. And avoid caching values in Memcached if computing them is faster than the Memcached latency required to fetch it!

We also improved latency of file operations by switching more LBYL-style coding patterns to EAFP-style code. Rather than checking whether a file exists, is readable, and then checking when it was last modified – do only the latter and handle any errors. This is both faster and more correct (due to LBYL race conditions).


So long, Sajax!

Sajax was a library for invoking a subroutine on the server, and receiving its return value as JSON from client-side JavaScript. In March 2006, it was adopted in MediaWiki to power the autocomplete feature of the search input field.

The Sajax library had a utility for creating an XMLHttpRequest object in a cross-browser-compatible way. MediaWiki deprecated Sajax in favour of jQuery.ajax and the MediaWiki API. Yet, years later in 2015, this tiny part of Sajax remained popular in Wikimedia's ecosystem of community-developed gadgets.

The legacy library was loaded by default on all Wikipedia page views for nearly a decade. During a performance inspection this year, Ori Livneh decided it was high time to finish this migration. Goodbye Sajax!


Further reading

This year also saw the switch to encrypt all Wikimedia traffic with TLS by default. More about that on the Wikimedia blog.

— Timo Tijhof

Older blog entries