Mass digitization

From HandWiki
Short description: Term

Mass digitization is a term used to describe "large-scale digitization projects of varying scopes." Such projects include efforts to digitize physical books, on a mass scale, to make knowledge openly and publicly accessible and are made possible by selecting cultural objects, prepping them, scanning them, and constructing necessary digital infrastructures including digital libraries. These projects are often piloted by cultural institutions and private bodies, however, individuals may attempt to conduct a mass digitization effort as well. Mass digitization efforts occur quite often; millions of files (books, photos, color swatches, etc.) are uploaded to large-scale public or private online archives every single day. This practice of taking the physical to the digital on a mass realm changes the way we interact with knowledge. The history of mass digitization can be traced as early as the mid-1800s with the advent of microfilm, and technical infrastructures such as the internet, data farms, and computer data storage make these efforts technologically possible. This seemingly simple process of digitization of physical knowledge, or even products, has vast implications that can be explored.

History of Mass Digitization Initiatives

Fictional Considerations

Perhaps one of the most notable considerations of mass digitization, in a fictional sense, is the speculations on the Library of Babel by Jorge Luis Borges. In this account, Borges describes a vision of a library in which every possible permutation of books were available.[1] Although Borges describes the preservation and archival practices of all knowledge in a physical space (a library), Borges' fictional vision has already taken place in a digital sense. Endless copies of online books are freely available to the public by means of internet archives or library databases. An account like this was actually quite common, and expertly conveys the idea that "the dream and practice of mass digitization cultural works have been around for decades."[2]

Non-fictional considerations

Some of the earliest digitization programs started before the age of the internet, and include the adaption of technologies such as microfilm in the 19th century. The technical affordances of microfilm allowed it to be a significant medium in the efforts to preserve and extend library materials, as well as its feature of "graphically dramatizing questions of scale." Microfilm was also known as microphotography, developed in1839, and its capabilities demonstrate (perhaps for the first time) the ability to store mass amounts of information, in this case photos, on a physically small space. When discussing the affordances of microfilm, it was noted by an observer that, "the whole archives of the nation might be packed away in a snuffbox." Such notes expertly demonstrate how the technical infrastructure of microfilm could be leveraged to archive and preserve on a mass scale. Paul Otlet, a Belgian author often considered one of the founders of information science, "outlined the benefits of microfilm as a stable and long-term remediation format that could be used to extend the reach of literature" in his 1906 work "Sur une forme nouvelle du livre : le livre microphotographique". His claim was proven right, with the Library of Congress and other bodies using microfilm to "digitize" cultural objects such as manuscripts, books, images, and newspapers in the early 20th century.

Technical Infrastructures

Microfilm

Microfilm represents a shift in the infrastructure of data storage: an immense amount of pictures could be stored in a physically small space, and then expanded for viewing with the help of the microfilm machine. Microfilm, in combination with the microfilm viewer, were leveraged to allow objects to be digitized, preserved, and viewed on a mass scale. It is interesting to note that students needed the help of staff before using the machine; accessing digital materials now is a swift, easy process that one can conduct independently. More information on microfilm can be found under the "Non-fictional considerations" tab of this page.

Server Farms

Another large shift in the infrastructure of data storage was the advent server farms. Websites rely on server farms for “scalability, reliability, and low-latency access to Internet content”. According to Burns,[author incomplete] these technologies are essential when building a high-performance infrastructure for content delivery. Moving from microfilm to complex server farms with their own schemas demonstrates the infrastructural demands mass digitization requires over time. Here, mass digitization is both facilitated and exists in this place.[clarification needed] Without server farms, data would not be able to be stored or accessed on the necessary scale for mass digitization projects. However, it is important to note that server farms do not act alone in storing data. Other web based infrastructures aid greatly in the storage of data, such as hard drives on a personal computer. Encryption tools and services also work to protect and secure data in sensitive, or internal use, mass digitization projects.

Databases

Databases are often seen as the "home" of a variety of mass digitization efforts. Databases, such as Google Books, allow one to view an entire collection of digitized objects. In the case of Google Books, the database allows a user to search, research, and preview an estimated 40 million titles, corresponding to roughly 30% of the estimated number of all books ever published that the Google team has scanned and uploaded However, faults do exist within such databases; the hands of a scanner can accidentally be scanned and posted, as opposed to the page of a book itself. Errors such as these in public, and often permanent, databases call into question the efficiency of human efforts in mass digitization projects.

Other databases allow researchers from all over the world to upload or view data for scientific inquiry. In this case, raw data from scientific experiments - anonymized for participant privacy - is uploaded and stored on a mass scale. A prime example of such databases for research purposes include the Child Language Data Exchange System (CHILDES) Database. This database houses raw data for language acquisition, and includes videos, audio, transcripts, and de-identified participant information. Databases that store published research articles also exist, and include sites such as PubMed, ScienceDirect, JSTOR, and EBSCO.

Databases, in conjunction with server farms and other web based infrastructures, allow for crucial collaboration in the scientific realm. Here, mass digitization has expanded from the digitization of physical objects (such as books) to the digitization of interactions for scientific inquiry.

Implications

References

  1. Borges, Jorge Luis (2001). Prólogos de La biblioteca de Babel. Madrid: Alianza Editorial. ISBN:84-206-3875-7. OCLC 57893246.
  2. Thylstrup, Nanna Bonde (2019). The politics of mass digitization. Cambridge. ISBN 978-0-262-35005-1. OCLC 1078691226. 
  • Auerbach, J.; Gitelman, L. (2007-06-13). "Microfilm, Containment, and the Cold War". American Literary History. 19 (3): 745–768. doi:10.1093/alh/ajm022. ISSN 0896-7148
  • Luther, Frederic. Microfilm: A History, 1839–1900. Annapolis, MD: The National Microfilm Association, 1959.
  • Goldschmidt, & Otlet, P. (1906). Sur une forme nouvelle du livre : le livre microphotographique. [Institut international de bibliographie].
  • La Hood, Charles G. "Microfilm for the Library of Congress." College & Research Libraries 34.4 (1973): 291–294.
  • Duncan, Virginia L., and Frances E. Parsons. "Use of Microfilm in an Industrial Research Library." Spec Libr 61.6 (1970): 288–290.