Timeline of digital preservation

From HandWiki
Short description: None

This page is a timeline of digital preservation and Web archiving. It covers various aspects of saving and preserving digital data, whether they are born-digital or not.

Trends

Digital preservation encompasses a variety of efforts and technologies, so its history can be viewed through various trends in these separate efforts:

  • File systems with built-in fault-tolerance
  • Various changes in the physical storage used
  • On-demand archiving services
  • URL shortening services
  • Various episodes of major archival work, sometimes as a result of services shutting down
  • Efforts at converting physical/analog information to more modern digital media, file formats, and storage

Timeline

Year Month and date Topic Details
1972 Versioning Marc Rochkind develops the Source Code Control System at Bell Labs.
1982 October Physical storage The compact disc (CD) as well as the CD player first become commercially available in Japan.[1][2]
1989 November 13 Versioning Continuous data protection, the technique of backing up computer data by automatically saving a copy of every change made to that data, is patented by United Kingdom entrepreneur Pete Malcolm.[3]
1990 Possibly the earliest reference to the term "digital preservation" (to mean converting analog media to digital and preserving in digital form) is from this year.[4]:124
1996 January Web archiving The initial version of the command-line downloading program Wget, then known as Geturl, is released.[5][6]
1996 Web archiving The Internet Archive is founded by Brewster Kahle.[7][8]
1996 April Web archiving Alexa Internet is founded by Brewster Kahle.[9] Since this year, Alexa Internet has donated its crawl data to the Internet Archive.[8][10]
1996 Preserving Digital Information: Report of the Task Force on Archiving of Digital Information (Donald Waters, John Garrett, eds.) is published.[11] It became a fundamental document in the field of digital preservation that helped set out key concepts, requirements, and challenges.[12][13]
1997 April 8 Web archiving cURL, a computer software project providing a library and command-line tool for transferring data using various protocols, releases its initial version of the tool. It is known at this point as HttpGet, would briefly rename itself to urlget, and would finally rename itself to cURL in March 1998. cURL can be used to download files over a network.[14][15]
1998 May Web archiving The first version of HTTrack, a free and open source Web crawler and offline browser, is released.[16]
2000 The National Digital Information Infrastructure and Preservation Program (NDIIPP) launches.[17]
2001 October Web archiving The Wayback Machine is launched.[18]
2001 October 14 Version 1.0 of the Parity Volume Set specification, used in Par1, is published.[19]
2002 January Web archiving TinyURL, the first notable URL shortening service, is launched.[20]
2003 July The International Internet Preservation Consortium is founded.[21]
2005 Cloud storage Box is launched as Box.net.[22][23]
2005 April 29 Web archiving Safari version 2.0 introduces the ability to save complete websites using the proprietary WebArchive format (details at Safari version history).[24]
2005 August 1 Physical storage The article "Kryder's Law" is published The law observes that magnetic disk areal storage density has been increasing very quickly.[25]
2005 August Versioning Writely, a web-based word processor created by the software company Upstartle, launches.[26][27] By January 2006, Writely would have support for revision history.[28] Upstartle would later be acquired by Google and Writely would be integrated into Google Docs.
2005 October 31 File system The first implementation of ZFS, a file system that includes protection against data corruption, is integrated into Solaris.[29]
2006 March 19 Cloud storage Amazon Web Services launches by releasing the Simple Storage Service (S3), intended for storing individual files (called objects) in a highly redundant and available fashion.[30][31] S3 is designed for at least 99.999999999% durability (i.e., that percentage of objects is expected to survive after a year) and 99.99% availability (i.e., that percentage of objects is accessible at any given time).[32] The cost of S3 storage dropped over the next decade, reaching 2.3 cents a GB effective December 1, 2016.[33] S3 has been widely used by corporations, libraries, and governments to digitize data for long-term storage.[34][35][36]
2007 January 30 Versioning Microsoft Office 2007 is released. Word 2007 introduces the ability to track changes in documents.[37]
2007 June Cloud storage Dropbox is founded by MIT students Drew Houston and Arash Ferdowsi, as a startup company from the American seed accelerator Y Combinator.[38]
2007 September 21 Physical storage The initial version of Paperkey is released. Paperkey is a free software implementation of a paper key. It extracts the essential secret bytes from an OpenPGP private key, which can then be printed to paper.[39]
2007 October 26 Versioning Apple releases the initial version of Time Machine.
2007 Physical storage Two software for densely storing information on paper are released: PaperBack[40][41] and Twibright Labs' Optar.[42][43][44]
2007 Federal Agencies Digital Guidelines Initiative (FADGI) FADGI is a collaborative effort of 20 federal agencies to articulate common sustainable practices and guidelines for digitized and born digital historical, archival and cultural content. Two working groups study issues specific to two major areas, Still Image and Audio-Visual.[45]
2008 Web archiving The URL shortening service Bitly is launched.[46]
2008 April 10 Versioning GitHub, a web-based Git repository hosting service, is launched. GitHub would popularize version control and Git. GitHub would also play an important role in encouraging people to make their source code freely available for posterity, allowing others to fork the code and acting as a de facto archive. In addition to software projects, GitHub would also be used to host code repositories for scientific research[47][48] as well as for hosting and backing up websites and content.
2008 November 20 Digitizing The prototype for Europeana launches.[49]
2009 January 6 Web archiving The Archive Team begins operating.[50][51] Its first big effort, for which it receives press coverage, is to download Geocities data before the service shuts down.[52]
2009 Web archiving SocialSafe Ltd, the company responsible for developing SocialSafe, is founded.[53]
2009 March 23 File system The initial version of Btrfs, a file system that supports checksums, incremental backups, and the ability to repair errors,[54] is released as part of the Linux kernel version 2.6.29.[55][56]
2009 May 15 Web archiving The WARC file format is published as the standard ISO 28500:2009 1st edition.[57]
2009 October 26 Web archiving Yahoo! GeoCities, a web hosting service founded in 1994, closes its United States branch.[58] Various attempts at archiving GeoCities are made. The site would continue to be available only in Japan.
2010 April 14 Web archiving Twitter announces that it will donate its archive of public Tweets to the Library of Congress.[21][59]
2010 December 1 Web archiving The Memento Project provides a standard for interoperability between web archives and the live web. Memento wins the Digital Preservation Award 2010 because "Memento offers an elegant and easily deployed method that reunites web archives with their home on the live web. It opens web archives to tens of millions of new users and signals a dramatic change in the way we use and perceive digital archives."[60]
2011 June 28 Web archiving Google Takeout is launched by the Google Data Liberation Front.[61]
2012 August 1 File system Microsoft introduces ReFS.[62] ReFS has a number of features related to digital preservation including integrity checking and data scrubbing, protection against data degradation, built-in handling of hard disk drive failure and redundancy, and integration of the RAID functionality.
2012 August 21 Cloud storage Amazon Web Services launches Amazon Glacier, an addition to its S3 offerings with lower storage costs than S3 (initially 1 cent per GB). Glacier is intended for long-term archival in cases where retrieval is rare; therefore retrieval is costly and slow. Glacier offers the same durability as the standard S3 offering.[63][64] In December 2016, the price of Glacier is reduced to 0.4 cents per GB.[33] Glacier has been used by governments, corporations, and libraries for low-cost long-term archival.[36] It has also been recommended for use for personal backups when frequent access is not needed.[65][66]
2013 April 6 Web archiving In the United Kingdom , the Legal Deposit Libraries (Non-Print Works) Regulations come into force, bringing digital and online material under the scope of the UK's legal deposit. Previously, the Legal Deposit Libraries Act 2003 had given the Secretary of State the powers to make regulations governing the deposit of non-print publications, but such regulations were never made at that time.[21][67][68]:5
2013 April 18 Digitizing The Digital Public Library of America launches.[69]
2013 July 1 Web archiving Google Reader, an RSS/Atom feed aggregator operated by Google, shuts down after having launched in 2005.[70] The shutdown prompts an effort to archive the feed data from the service.[71][72]
2013 December Web archiving The Memento Project is published as a standard in RFC 7089.[73]
2017 August Web archiving The WARC file format is published as the standard ISO 28500:2017 2nd edition.[74]

See also

References

  1. Dorian Lynskey (May 28, 2015). "How the compact disc lost its shine". The Guardian. https://www.theguardian.com/music/2015/may/28/how-the-compact-disc-lost-its-shine. "CBS released the world’s first commercially available CD, a reissue of Billy Joel’s 52nd Street, in Japan in October 1982. Philips missed the production deadline so the international release was put back to March 1983." 
  2. Benj Edwards (October 1, 2012). "The CD player turns 30". PCWorld. http://www.pcworld.com/article/2010810/the-cd-player-turns-30.html. Retrieved November 9, 2016. "On October 1, 1982, Sony ignited a digital audio revolution with the release of the world’s first commercial compact disc player, the CDP-101 (above), in Japan.". 
  3. Peter B. Malcolm (November 13, 1989). "US Patent 5086502: Method of operating a data processing system". Google Patents. https://www.google.com/patents/US5086502. "Filing date Nov 13, 1989" 
  4. Hirtle, Peter B. (c. 2003). "The History and Current State of Digital Preservation in the United States". https://cip.cornell.edu/DPubS/Repository/1.0/Disseminate?view=body&id=pdf_1&handle=cul.pub/1238609304. "The earliest reference that I could find in English to the "digital preservation" of data occurs in the context of the research that Anne Kenney and Lynne Personnius undertook in 1990 at the Cornell University Library in conjunction with the Xerox Corporation." 
  5. "GNU Wget NEWS – history of user-visible changes". Svn.dotsrc.org. 2005-03-20. http://svn.dotsrc.org/repo/wget/tags/WGET_1_10/NEWS. "Wget 1.4.0 [formerly known as Geturl] is an extensive rewrite of Geturl."  This NEWS file is included in source distributions of Wget.
  6. Niksic, Hrvoje (June 24, 1996). "Geturl: Software for non-interactive downloading". comp.infosystems.www.announce. http://groups-beta.google.com/group/comp.infosystems.www.announce/msg/4268334d269d42ce?hl=en. 
  7. "Internet Archive: About IA". https://archive.org/about/. 
  8. 8.0 8.1 "Internet Archive: Bios". Internet Archive. https://archive.org/about/bios.php. 
  9. "Alexa Internet - Company Overview". http://www.alexa.com/company. 
  10. "Alexa Crawls". Internet Archive. https://archive.org/details/alexacrawls&tab=about. 
  11. Donald Waters; John Garrett (1996). Preserving digital information: Report of the task force on archiving of digital information. CLIR. ISBN 1-88733450-5. http://www.clir.org/pubs/reports/pub63. Retrieved November 15, 2012. 
  12. Tibbo, Helen R. (2003). On the Nature and Importance of Archiving in the Digital Age. Advances in Computers. 57. p. 26. doi:10.1016/S0065-2458(03)57001-2. ISBN 9780120121571. 
  13. "Principles and Good Practice for Preserving Data. IHSN Working Paper No 003". International Household Survey Network. December 2009. http://www.surveynetwork.org/home/sites/default/files/resources/IHSN-WP003.pdf. 
  14. "History of curl - How curl Became Like This". curl. https://curl.haxx.se/docs/history.html. "Daniel simply adopted an existing command-line open-source tool, httpget, that Brazilian Rafael Sagula had written and recently release version 0.1 of. After a few minor adjustments, it did just what he needed. [...] HttpGet 1.0 was released on April 8th 1997 with brand new HTTP proxy support." 
  15. Stenberg, Daniel (20 March 2015). "curl, 17 years old today". https://daniel.haxx.se/blog/2015/03/20/curl-17-years-old-today/. 
  16. Roche, Xavier (February 8, 2014). "Re: Full History of HTTrack". HTTrack Forum. http://forum.httrack.com/readmsg/32457/32456/index.html. "The first release was in May 1998, but only as binaries." 
  17. "A Brief History of NDIIPP". The Library of Congress - Digital Preservation. http://www.digitalpreservation.gov/meetings/documents/ndiipp08/NDIIPPtoNOW_post.ppt. "2000 - NDIIP legislation is passed" 
  18. "Internet Archive launches WayBack Machine". Online Burma Library. 2001-10-25. http://www.burmalibrary.org/reg.burma/archives/200110/msg00079.html. 
  19. Nahas, Michael (2001-10-14). "Parchive: Parity Volume Set specification 1.0". http://sourceforge.net/docman/display_doc.php?docid=7273&group_id=30568. 
  20. Katie Dean (March 16, 2004). "Honey, I Shrunk the URL". Wired. https://www.wired.com/2004/03/honey-i-shrunk-the-url/. Retrieved November 17, 2016. "So the 24-year-old Web developer from Blaine, Minnesota, launched TinyURL.com in January 2002, a free site that converts huge strings of characters into more manageable snippets.". 
  21. 21.0 21.1 21.2 Lepore, Jill (January 26, 2015). "The Cobweb: Can the Internet be archived?". The New Yorker. https://www.newyorker.com/magazine/2015/01/26/cobweb. Retrieved December 6, 2016. "Twitter is a rare case: it has arranged to archive all of its tweets at the Library of Congress. [...] The U.K. has what's known as a legal-deposit law; it requires copies of everything published in Britain to be deposited in the British Library. In 2013, that law was revised to include everything published on the U.K. Web.". 
  22. Rachel King (March 6, 2014). "How Aaron Levie and his childhood friends built Box into a $2 billion business, without stabbing each other in the back". TechRepublic. http://www.techrepublic.com/article/how-aaron-levie-and-his-childhood-friends-built-box-into-a-2-billion-business-without-stabbing-each-other-in-the-back/. Retrieved December 1, 2016. "Development for Box, then Box.net, started at the end of 2004, but really got off the ground and went online in 2005 during their sophomore years of college.". 
  23. Aaron Levie (September 14, 2011). "Commentary: Why we had to leave Seattle to build Box.net". GeekWire. http://www.geekwire.com/2011/leave-seattle-build-boxnet/. "Box – which now competes with Redmond's very own Microsoft SharePoint – had been started in early '05 from college dorm rooms in California and North Carolina." 
  24. Frakes, Dan; Griffiths, Rob (October 20, 2005). "The Secrets of Safari". Macworld. https://www.macworld.com/article/1047531/safarisecrets.html. Retrieved November 22, 2016. "In older versions of Safari, “saving” a Web page saved only its HTML source code; images and other embedded content were lost. Fortunately, Apple fixed this in Safari 2.0: the Save As command includes a Web Archive option, which saves nearly everything on the page, including images.". 
  25. Walter, Chip (August 2005). "Kryder's Law". Scientific American 293 (2): 32–33. doi:10.1038/scientificamerican0805-32. PMID 16053134. Bibcode2005SciAm.293b..32W. http://www.scientificamerican.com/article.cfm?id=kryders-law. 
  26. Sawers, Paul (September 2, 2011). "15 tips to get the most out of Google Docs". The Next Web. https://thenextweb.com/google/2011/09/02/15-tips-to-get-the-most-out-of-google-docs/. 
  27. Chang, Emily (October 5, 2005). "eHub Interviews Writely". eHub. http://emilychang.com/ehub/app/ehub-interviews-writely/. 
  28. Dennis Tsang (January 26, 2006). "Writely - The Web Word Processor". http://dennistt.net/2006/01/26/writely-the-web-word-processor/. "Writely saves all the revisions each time you edit, so that you can go back and see what has been edited at each revision." 
  29. Jeff Bonwick (October 31, 2005). "ZFS: The Last Word in Filesystems". Jeff Bonwick's Blog. https://blogs.oracle.com/bonwick/en_US/entry/zfs_the_last_word_in. "And today, 10/31/2005, we integrated into Solaris." 
  30. "Amazon Web Services Launches "Amazon S3"" (Press release). 2006-03-14. Retrieved 2015-09-22.
  31. "A Decade of Innovation". http://perspectives.mvdirona.com/2016/03/a-decade-of-innovation/. 
  32. "Amazon Simple Storage Service (S3) FAQs". Amazon Web Services. https://aws.amazon.com/s3/faqs/. 
  33. 33.0 33.1 Barr, Jeff (November 21, 2016). "AWS Storage Update – S3 & Glacier Price Reductions + Additional Retrieval Options for Glacier". Amazon Web Services. https://aws.amazon.com/blogs/aws/aws-storage-update-s3-glacier-price-reductions/. 
  34. Iglesias, Edward (December 21, 2010). "Using Amazon S3 in Digital Preservation in a mid sized academic library: A case study of CCSU ERIS digital archive system". The Code4Lib Journal (12). http://journal.code4lib.org/articles/4468. Retrieved January 4, 2017. 
  35. "Using Cloud Services: Three Case Studies". The Texas Record. May 13, 2013. https://www.tsl.texas.gov/slrm/blog/2013/05/using-cloud-services-three-case-studies/. 
  36. 36.0 36.1 Han, Yan (2015). "Cloud storage for digital preservation: optimal uses of Amazon S3 and Glacier". Library Hi Tech 33 (2): 261–271. doi:10.1108/LHT-12-2014-0118. ISSN 0737-8831. 
  37. "Track changes while you edit - Office Support". Microsoft. https://support.office.microsoft.com/en-us/article/Track-changes-while-you-edit-024158a3-7e62-4f05-8bb7-dc3ecf0295c4. 
  38. "About Dropbox". Dropbox, Inc.. https://www.dropbox.com/about. "Dropbox was founded by Drew Houston and Arash Ferdowsi in 2007, and received seed funding from Y Combinator." 
  39. David Shaw. "Paperkey - an OpenPGP key archiver". http://www.jabberwocky.com/software/paperkey/. "Paperkey extracts just those secret bytes and prints them."  From the NEWS file of the http://www.jabberwocky.com/software/paperkey/paperkey-1.4.tar.gz source]: "Noteworthy changes in version 0.5 (2007-09-21) [...] Initial release."
  40. Bruce Sterling (August 16, 2012). "PaperBack paper backup". WIRED. https://www.wired.com/2012/08/paperback-paper-backup/. 
  41. Oleh Yuschuk (2007). "PaperBack". http://ollydbg.de/Paperbak/. 
  42. Karel 'Clock' Kulhavy (2007). "Twibright Optar". http://ronja.twibright.com/optar/. 
  43. cook (July 24, 2007). "Store data on paper with Twibright Optar". LWN.net. https://lwn.net/Articles/242735/. 
  44. "Optar - Just Solve the File Format Problem". Archive Team. http://fileformats.archiveteam.org/wiki/Optar. 
  45. "Digital Preservation at the Library of Congress". Library of Congress. https://www.loc.gov/preservation/digital/. 
  46. Newman, Andrew Adam (1 December 2014). "Bitly Helps the Red Cross Get to Hope.ly". The New York Times. https://www.nytimes.com/2014/12/02/business/media/bitly-helps-the-red-cross-get-to-hopely.html. "Introduced in 2008, Bitly has grown rapidly because, along with shortening URLs for character-limited social media like Twitter, it helps users monitor how others subsequently share the links that they share." 
  47. Bergman, Casey (November 8, 2012). "On the Preservation of Published Bioinformatics Code on Github". https://caseybergman.wordpress.com/2012/11/08/on-the-preservation-of-published-bioinformatics-code-on-github/. 
  48. Rios, Fernando (April 28, 2016). "Beyond Data: Reproducibility in Scientific Software and the Role of Digital Preservation". Council on Library and Information Resources. http://connect.clir.org/blogs/fernando-rios/2016/04/28/beyond-data. 
  49. "Background". Europeana.eu. https://www.europeana.eu/portal/aboutus_background.html. "2008 Europeana's prototype is launched on November 20th by Viviane Reding, European Commissioner for Information Society and Media, and the President of the Commission, José Manuel Barroso." 
  50. Scott, Jason (January 6, 2009). "Team Archive is GO". ASCII by Jason Scott. http://ascii.textfiles.com/archives/1664. 
  51. "Revision history of "Main Page"". Archive Team. http://www.archiveteam.org/index.php?title=Main_Page&dir=prev&action=history. 
  52. Modine, Austin (April 28, 2009). "Web 0.2 archivists save Geocities from deletion. Preserving history one hideous webpage at a time". https://www.theregister.co.uk/2009/04/28/geocities_preservation/. 
  53. Steve O'Hear (October 1, 2013). "SocialSafe Raises Further $1M, Microsoft 'Life-Log' Researcher Gordon Bell Becomes Investor And Advisor". TechCrunch. https://techcrunch.com/2013/10/01/total-social-media-recall/. 
  54. "btrfs Wiki § Features". https://btrfs.wiki.kernel.org/index.php/Main_Page#Features. 
  55. Wuelfing, Britta (12 January 2009). "Kernel 2.6.29: Corbet Says Btrfs Next Generation Filesystem". Linux Magazine. http://www.linux-magazine.com/Online/News/Kernel-2.6.29-Corbet-Says-Btrfs-Next-Generation-Filesystem. 
  56. "Linux 2 6 29". Linux Kernel Newbies. https://kernelnewbies.org/Linux_2_6_29#head-c33d4f6e374829e789c45d89bdcfea93b306bf02. "Linux 2.6.29 kernel released on 23 March 2009. [...] Btrfs is a new filesystem developed from scratch following the design principles of filesystems like ZFS, WAFL, etc." 
  57. "ISO 28500:2009 - Information and documentation -- WARC file format". International Organization for Standardization. http://www.iso.org/iso/catalogue_detail.htm?csnumber=44717. 
  58. Fox, Geoff (2009-07-10). "Yahoo Sets the Date of GeoCities' Death". PCMag.com. https://www.pcmag.com/article2/0,2817,2350024,00.asp. Retrieved 2010-11-05. 
  59. Stone, Biz (April 14, 2010). "Tweet Preservation". Twitter Blog. https://blog.twitter.com/2010/tweet-preservation. "It is our pleasure to donate access to the entire archive of public Tweets to the Library of Congress for preservation and research. [...] there are some specifics regarding this arrangement. Only after a six-month delay can the Tweets be used for internal library use, for non-commercial research, public display by the library itself, and preservation." 
  60. "Memento Project wins Digital Preservation Award 2010". December 1, 2010. https://www.dpconline.org/news/memento-project-wins-digital-preservation-award-2010. 
  61. "The Data Liberation Front Delivers Google Takeout". June 28, 2011. http://dataliberation.blogspot.com/2011/06/data-liberation-front-delivers-google.html. 
  62. Snover, Jeffrey (1 August 2012). "Windows Server 2012 released to manufacturing!". Microsoft Corporation. http://blogs.technet.com/b/windowsserver/archive/2012/08/01/windows-server-2012-released-to-manufacturing.aspx. 
  63. Mlot, Stephanie (August 21, 2012). "Amazon Launches Glacier Cloud Storage Service". PCMag.com (Ziff Davis, Inc.). https://www.pcmag.com/article2/0,2817,2408707,00.asp. 
  64. Jeff Barr (August 21, 2012). "Amazon Glacier: Archival Storage for One Penny Per GB Per Month". AWS Blog. https://aws.amazon.com/blogs/aws/amazon-glacier-offsite-archival-storage-for-one-penny-per-gb-per-month/. 
  65. Pinola, Melanie (November 8, 2013). "How to Use Amazon Glacier as a Dirt Cheap Backup Solution". LifeHacker. http://lifehacker.com/how-to-use-amazon-glacier-as-a-dirt-cheap-backup-solut-1460814873. 
  66. Fisher, John (February 15, 2015). "Super Cheap Data Backups with Amazon Glacier Storage". https://spin.atomicobject.com/2015/02/15/cheap-long-term-backup-amazon-glacier-storage/. 
  67. "Non-print legal deposit: FAQs". British Library. http://www.bl.uk/catalogues/search/non-print_legal_deposit.html. "As of 6 April 2013, legal deposit also covers material published digitally and online, so that the Legal Deposit Libraries can provide a national archive of the UK's non-print published material, such as websites, blogs, e-journals and CD-ROMs." 
  68. "Guidance on the Legal Deposit Libraries (Non-Print Works) Regulations 2013". UK Department for Culture, Media & Sport. April 5, 2013. https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/182339/NPLD_Guidance_April_2013.pdf. 
  69. O'Leary, M. (2013). The digital public library of America opens its doors. Information Today, 30(7), 20-21.
  70. Hölzle, Urs. "A second spring of cleaning". googleblog.blogspot.com. http://googleblog.blogspot.com/2013/03/a-second-spring-of-cleaning.html. 
  71. "Google Reader". Archive Team. http://archiveteam.org/index.php?title=Google_Reader. 
  72. "Google Reader/War room". Archive Team. http://archiveteam.org/index.php?title=Google_Reader/War_room. 
  73. "RFC 7089 - HTTP Framework for Time-Based Access to Resource States -- Memento". https://datatracker.ietf.org/doc/html/rfc7089. 
  74. "Iso 28500:2017". https://www.iso.org/standard/68004.html.