Digital Libraries & Archives

Digital libraries and archives have a number of advantages over their analog counterparts: theoretically, any person with the proper permissions can access any content, anywhere, at any time, instantaneously, and without having to compete for limited numbers of physical copies. Digital storage and distribution is also considerably cheaper than building networks of physical buildings (not to mention staffing and maintaining them). Digitized texts also allow more flexible indexing, as algorithms can scan and extract useful metadata that would be too time-consuming for a person to manually generate for each book. However, digital libraries and digital archives both have major drawbacks, two of which are discussed below.

First, while digitization makes it possible to automate metadata extraction, there is no guarantee that the compiled data will be of any use. For instance, I noticed that Google Books’ “common terms and phrases” section from Peter Linebaugh’s The London Hanged included a large number of generic words like “became,” “lived,” and “women.” Another page, this one for Liaquat Ahamed’s Lords of Finance, included the mysterious word “tion.” When I clicked on the word to see it in context, I saw that the algorithm couldn’t parse words when they were split up over line breaks, so it just treated each half as a distinct word – in this case, apparently, it picked up several split words ending in “tion” and decided to put them next to “John Maynard Keynes” and “Hjalmar Schacht” in the tag cloud. It had even more difficulty creating the contents section for The London Hanged; it contained random capitalization, dates displayed as strings of numbers like “1750076” and “1780 333,” and truncated chapter titles like “The sociology of,” “The London,” and my personal favorite, “CHAPTER TPN.” (one can assume that, although separated from “CHAPTER NINE If You Plead for Your Life Plead” by another section called “Tire Crisis of Thanatocmcmy in the Era,” this was, in fact, chapter ten) These errors complicate both the process of cataloguing and of searching through digital archives.

Second, while digital distribution would seem inherently open and democratic, this is complicated by disagreement over copyright law and access rights. While organizations like Project Gutenberg and the Open Content Alliance offer free, full-length texts from the public domain, Google Books also offers “previews” and snippets” of copyrighted material, often without the copyright owner’s permission. Furthermore, while regular libraries lend out copies of copyrighted books, Google only sells them; in effect, it is an attempt to monopolize the production, organization, and distribution of digitized texts, disguised as a good-faith effort to create an open platform.

4 Replies to “Digital Libraries & Archives”

  1. Interesting point that this is an attempt by google to monopolize the digital text market. Perhaps they would like to do with this what Apple did with music on the internet. I’ve been boycotting Google (by myself) every since I found out about some of their projects with the Pentagon. That said, sometimes if I’m trying to find the source of a quote, Google Books is the only tool I find helpful. If I plug the quote into a google search, sometimes it pops up in the original book as a Google Book. Then I go find the complete book at the library. Why doesn’t this happen with Project Gutenberg or Open Content Alliance?

    1. I don’t really think that Google is trying to monopolize the production, organization, or distribution of digitized texts. If you look at the design of Google Books, you’ll see that the platform has been all but abandoned by Google developers for years. You can’t even buy books on Google Books, you have to buy them on Google Play. I know it seems like a silly distinction, but they really are two different things. What is interesting is how you point out that when books are digitized, snippets are usually shown. This is a new way that books are now available, and it reminds me of the dilemma that booksellers in the Enlightenment faced. Just like in the early 1700s in Britain, the way books are made and sold and the way people interact with books is increasingly changing. Copyright law and standard author contracts are morphing into something completely different, just like they did then.

  2. I think it’s interesting that you focus on the precision with which Google Books can identify relevant and common words within the text, as this is a bigger deal than people let on. Whether or not you agree with it, more and more often historians and especially history students are relying on the search function to find what they need, so fine tuning these services to more accurately lead people to what it is they are looking for while minimizing distractions is actually essential.

  3. I was both happy to read the London Hanged on your post, and then saddened to see the context. Google books common is an incredible tool when libraries are very much lacking. As this is the case in many places of the world, I have always appreciated that which Google Books has to offer. For searching quickly within books, a google search can get you relatively quickly from point a to point be. However personally I have never really enjoyed the interface on Google Books, the presentation of Gallica or Open Content Alliance is preferable in my opinion. Just by looks not necessarily functionality.

Leave a Reply

Your email address will not be published. Required fields are marked *