Shadow librarians

Dream about information universe

The pirate community is creative. Not only does it gain information, it also modifies, catalogues, manages, and selects them. In recent years scientists and others who need access to specialized texts have found a common language with pirates.

Web, i.e. worldwide network, originated in CERN in 1990 from a dream about information universe that would enable scientist access to specialized information. As an example, the system was supposed to enable steps from citations to the cited source. “Pick up your pen, mouse, or favourite pointing device and press it on a reference in this document,” writes the web's founder, Tim Berners Lee two years after his discovery. “Suppose that you are then directly presented with the background material—other papers, the authors coordinates, the organization's address, and its entire telephone directory.” Web had shortly interconnected the whole world, becoming a crucial social and commercial mover. But scientific knowledge is hidden behind a paywall, and it tends to be pretty high sometimes.

Scientists are trapped in a strange paradox. Their achievements are evaluated with the help of „scientometry“ – numbers of published scientific/academic articles and their citations. Citation index is something like a score in a computer game that opens new levels of career. It makes some sense: journal contributions undergo often demanding peer-review process and the number of citations can suggest the importance of the researcher's discovery for the world. But the majority of scientific/academic texts are published by five largest publishers: Reed Elsevier, Springer Nature, WileyBlackwell, Taylor & Francis a Sage. Neither the authors nor the reviewers have usually received payments for a long time, and the publishers mostly demand full rights, so the author cannot publish their text on their own. The articles are then published in a predominantly electronic version, available for a high fee. Scientific institutions, universities, and libraries subscribe to only a handful of journals they can afford. This results in paradoxical situations. For example, the tax payer paid for the recently published article by a laser physician several times: once to make it happen, twice, when it is read by some of the scientists, and third, when colleagues from other institutions would like to read it. Expenses connected to library services are increasingly weighing down science institutes' budgets. Students and scientists from first world countries are in a relatively good situation – they get access to some journals. But what about scientists from third world countries who can only afford essential tools and everything else is too expensive? And that doesn't even take into account access for a lay reader, teacher, journalist, or simply curious citizen whose tax money are used to pay for the research. Every article is priced between USD 30 to 40. So for a text of a maximum tens of pages that the author needed to submit without receiving remuneration, is worth the same amount of money as a coffee table publication.


Martyr of the internet 

Aaron Swartz stole fire from the gods and became the internet's first martyr. This talented hacker stood at the birth of such systems as Reddit, RSS or tor2web. Led by the conviction that information should be available to everyone, he had previously “scrapped” public documents of U.S federal courts from a public database PACER. The script – scraper – he created carefully listed and downloaded court documents one by one, but much faster and more effectively than a person could by simple clicking. Swartz was investigated for downloading 2.7 million of documents, but the charges were dropped as court decisions are not protected by copyright, and what is accessible to all cannot be stolen.

In fall 2010, Swartz gave a lecture at a Budapest conference on the freedom of the internet. He met not only with the activists who wanted to use the funds made available by the philanthropist George Soros to provide the public with access to the whole paid database of the scientific/academic journals JSTOR. But they found that the fees would reach hundreds of million USD. Swartz started envisioning a plan.

Ha was living in the vicinity of the campus of the U.S. technical university MIT, which is, by the way, a cradle of hackers. This wealthy school had access to the complete database of JSTOR. Although Swartz did not study at the school, visitors could freely access the computer network and officially also the library services. JSTOR cut off first attempts with scraper as it overloaded its servers and robotic downloading were against the terms of use. This is why Swartz adjusted the scraper so that it downloaded articles more carefully. In a freely accessible basement of building no. 16, Swartz used an unlocked closet with network adapters. He connected a notebook to a switch, hid it on a shelf under a paper box and let the machine work. The program gradually downloaded millions of articles. All that was needed was to replace discs once in a while. But JSTOR noticed the downloading and made a motion to investigate. The Secret Service joined in and a secret camera hidden in the closet recorded Swartz replacing discs. Following a short chase, Swartz was captured.

Since campus visitors could freely access articles and a hacker de facto only broke the JSTOR terms of service, Swartz was eventually charged with wire and computer fraud. The pressure of a long investigation led the 26-year-old Swartz to hang himself on January 11, 2013. The internet got its first martyr.


Shadow Libraries

In those times, books were shared in a similar way as other “warez” – i.e. pirate software, music, and films. Someone sent a link to a file available on a sharing server to an anonymous, passworded forum. It’s a poorly arranged method – files appear chaotically from various fields and by various authors. The links are full of ads and expire after a certain time. Frustrated users beg: “please reup!”  Such a situation encourages creating home, semipublic or even completely opened libraries. Some began to be so large that they attracted thousands of excited users, but also attention of authorities. A year after Aaron Swartz’s death, Munich court accepted a complaint of 17 large publishers and ordered to shut down the web library At the time of the closure, the library contained about 400 thousand books, which puts it on a par with the library of Alexandria.

Academics and researchers from all over the world cried – “textbooks, secondary treatises, obscure monographs, biographical analyses, technical manuals, collections of cutting edge research in engineering, mathematics, biology, social science and humanities. (...) Even the pornography was scholarly: guidebooks and scholarly books about the pornography industry,” commented Professor Kelty from UCLA. “For a criminal underground site to be mercifully free of pornography must alone count as a triumph of civilisation.”

Meanwhile, the dream about a universal library materialized in a project called Library Genesis. The Cyrillic script on the suggested the origin of the server. Not only does the library contain millions of specialized books, it also contains its bibliographical notations along with good search options. All files are accessible using torrent so downloaded copies can make their way from a user to another, even in case the original source is taken down. And the server goes even further on the openness quest: it also shows database dumps, letting anyone create their own library of creation in any given place. That is anyone with a 40TB of disc space. The library’s clones have already been created at various addresses. They do not compete, but cooperate so offers to download files from various possible mirrors. This is particularly useful given the main server’s frequent outages.


Battle of Scihub

Alexandra Elbakyan, a neuroscience student from Kazakhstan, set off to study in German Freiburg and the USA. She got used to the science there and access to information; however, after having returned back to her country, she found herself cut off all the sources of information she was accustomed to.  When her local colleagues needed a specialized article, they asked for it in semisecret discussion board for and with luck a colleague from an institution that subscribed to the journal downloaded and sent it to them. The Twitter hashtag #ICANHAZPDF started serving a similar function. As a skillful hacker, Elbakyan started thinking how to automate the system and wrote Sci-Hub. “There was no big idea behind the project, like ‘make all information free’ or something like that. We just needed to read all these papers to do our research,” Elbakyan says in an interview for TorrentFreak server. You put the title of the article into the search box. If Sci-Hub already contains it, it lets you downloaded. If not, it sends an impulse to the web of secret robots somewhere in the background. These can access various publisher databases behind the paywalls, download the article for you, and save a copy for those who will search for it next time. Access credentials were allegedly donated by scientists – either voluntarily, or without their knowing.

Although courts have already ordered the impound of the  and domains, the US justice only has jurisdiction over US domains and so both webs have soon appeared at new addresses. When they were banned, they continued elsewhere. The courts have been playing such cat and mouse domain game with The Pirate Bay for years and The Bay still exists.

The now 28-year old Elbakyan is not particularly hiding – she only conceals her place of residence. It is said she lives “someplace Russia,” outside the reach of the US courts. She gave an interview for the renowned journal Science and after anonymizing she willingly provided data about utilizing her web. The journal analysis made clear that aside from the third world that seeks ways to information about medicine and technologies, western scientists and students also greatly utilize the web. It is easier and free and they find what they need, irrespective of cuts in institutional libraries. Either publishers come to terms with this the way the music industry has come to terms with Napster and substantially change their strategy, or the shadow library, which is the more robust the more they try to eliminate it, wins.

The author is a documentary filmmaker.


Translated by Dagmar Frančíková.