Digital libraries and archives have a number of advantages over their analog counterparts: theoretically, any person with the proper permissions can access any content, anywhere, at any time, instantaneously, and without having to compete for limited numbers of physical copies. Digital storage and distribution is also considerably cheaper than building networks of physical buildings (not to mention staffing and maintaining them). Digitized texts also allow more flexible indexing, as algorithms can scan and extract useful metadata that would be too time-consuming for a person to manually generate for each book. However, digital libraries and digital archives both have major drawbacks, two of which are discussed below.
First, while digitization makes it possible to automate metadata extraction, there is no guarantee that the compiled data will be of any use. For instance, I noticed that Google Books’ “common terms and phrases” section from Peter Linebaugh’s The London Hanged included a large number of generic words like “became,” “lived,” and “women.” Another page, this one for Liaquat Ahamed’s Lords of Finance, included the mysterious word “tion.” When I clicked on the word to see it in context, I saw that the algorithm couldn’t parse words when they were split up over line breaks, so it just treated each half as a distinct word – in this case, apparently, it picked up several split words ending in “tion” and decided to put them next to “John Maynard Keynes” and “Hjalmar Schacht” in the tag cloud. It had even more difficulty creating the contents section for The London Hanged; it contained random capitalization, dates displayed as strings of numbers like “1750076” and “1780 333,” and truncated chapter titles like “The sociology of,” “The London,” and my personal favorite, “CHAPTER TPN.” (one can assume that, although separated from “CHAPTER NINE If You Plead for Your Life Plead” by another section called “Tire Crisis of Thanatocmcmy in the Era,” this was, in fact, chapter ten) These errors complicate both the process of cataloguing and of searching through digital archives.
Second, while digital distribution would seem inherently open and democratic, this is complicated by disagreement over copyright law and access rights. While organizations like Project Gutenberg and the Open Content Alliance offer free, full-length texts from the public domain, Google Books also offers “previews” and snippets” of copyrighted material, often without the copyright owner’s permission. Furthermore, while regular libraries lend out copies of copyrighted books, Google only sells them; in effect, it is an attempt to monopolize the production, organization, and distribution of digitized texts, disguised as a good-faith effort to create an open platform.