This lecture was presented at the 3D Digital Documentation Summit held July 10-12, 2012 at the Presidio, San Fransisco, CA.
Archive of Digital Data for HABS, HAER, and HALS
The NPS creates a variety of documents and records, such as inventory and monitoring plans, drawings, photographs, and conservation treatment records, to assist in the planning, management and preservation of cultural resources. Most of these, including many of 3D digital documentation products, are permanent records under the NPS Records Retention schedule, requiring the NPS to preserve them in some form. In addition, under NPS Director’s Order 19, cultural resource management records are mission critical, required for the management of the cultural resources within our parks, and must be permanently preserved. Programs such as HABS/HAER/HALS create large amounts of electronic data, such as point clouds, CAD files, and digital field photographs that constitute valuable field data permitting the verifiability of the final documentation.
Electronic records, particularly laser scanning and imaging technologies, present long-term preservation and storage challenges. Even technologies that allow for a file format with an open standard, such as a point cloud conversion to ASCII, are still problematic because of inadequate IT infrastructure within that does not facilitate storage, migration and retrieval of digital data. Moreover, the Library of Congress (LOC), which houses traditional print HABS/HAER/HALS documentation and is the sole repository designated in the National Historic Preservation Act for engineering and architectural documentation produced for Sections 106 and 110 compliance, has collections policies prohibiting proprietary software and storing of data directed at a limited audience that would prevent the inclusion of many of the products being discussed at this summit.
Despite on-going efforts for several years, resolving these issues has proven problematic. The LOC and HABS/HAER/HALS are jointly exploring born digital equivalents to large-format film photography that is currently required to meet Secretary of Interior Standards, but the lack of standards within the commercial photography community as well as the high cost of large-format digital capture and storage makes writing standards difficult. Likewise, the lack of industry standards for technologies such as laser scanning, and the reliance on proprietary software and file formats, discourages the LOC from accepting these files into its collections. Because of this, HABS/HAER/HALS uses laser scanning as a tool to create traditional print drawings on vellum or mylar that can be permanently preserved at LOC, rather than producing laser scans as an end product in and of itself. HABS/HAER/HALS also is consulting with the National Archives and Records Administration (NARA) to determine if some of these file types can be preaccessioned into the Electronic Records Archives (ERA). Currently the file formats that can be preaccessioned are extremely limited, but we hope that NARA can accommodate more in the future. With no other public repository for these files, NPS has few alternatives but to maintain its own digital records and confront the technological and financial challenges this presents. NPS has no IT preservation system in place to prevent the gradual decay of storage media over time and the corruption of electronic files, also known as bit rot. Creating a digital storage system modeled on Open Archive Information System (OAIS), which runs file integrity checks to guard against data loss, would require a significant investment in infrastructure and money at a time when the NPS is facing a multi-billion dollar maintenance backlog for its historic structures.
In sum, 3D digital documentation can produce some exciting products that were not previously possible, but we must recognize that the challenges associated with digital preservation put all of these products at risk unless we find solutions that permit their responsible and economical curation and preservation.
Cordell: Thank you guys. Our next speaker is Anne Mason. She’s a Collection’s Manager for HABS/HAER/HALS in Washington DC. She began working for the Park Service in 2001 at the National Register of Historic Places in the National Historic Landmark Programs as their Digital Library Production Manager and she oversaw the digitization of the National Register archives in that position. She served as Collections Manager for HABS/HAER/HAL since 2006, working closely with the Library of Congress to preserve and manage the collection. She’s been an advocate for thoughtful digital preservation within the National Park Service. Anne, welcome.
Mason: Thank you. Good morning. So HTP, like any other federal agency is governed by a number of federal laws and regulations and Paul and Dana went over a few of these. I’m not going to rehash these here but I did want to make a point that these laws emphasize our mission to produce documentation that’s preserved for future use and reference and that these laws also affirm our relationship with the Library of Congress and they are the sole repository for the documentation that we produce.
I think that Dana and Paul did a really great job of explaining the Secretary of Interior Standards, so I won’t go over those again but because our collection is deposited at the Library, we have to abide by their collection policies. The record copy is the physical analogue copy and this is what the Library strives to preserve long term. We transmit digital files as well, each quarter along with the physical records but the digital files are representations of the physical copies and their primary purpose is for access.
The HTP online collection gets approximately 50,000 visitors a month viewing approximately 800,000 pages. I think this speaks to the power of digital access when it’s compared to, prior to digitization, there were about 3000 visitors a year to the physical collection at the Library. The Library like many archival institutions places restrictions on the type of digital files they will accept. I have to make a quick disclaimer. I don’t work for the Library, I am our liaison to the Library so these are kind of my opinions of their collection policies that they’ve conveyed to us.
The Library has a lot of hesitancy about proprietary file formats. Back in the 1980’s and 1990’s, they, along with many other archival institutions made significant investments in Déjà Vu and got burnt pretty badly and ended up wasting a lot of money on that file format. Also the Library has a policy that everything in the collection must serve a wide audience and so highly specialized data that’s useful or understandable to a small number of people is not appropriate for their collections.
HTP also has a record group at the National Archives where all the other permanent records that we produce are stored and deposited. The Federal Records Act gives the National Archives authority to establish record retention schedules that all federal agencies are required to follow. The electronic record archive or the ERA which launched in 2011 and NARA has some very specific file format requirements for materials that can be accessioned into the ERA and as you can see, it’s actually a pretty small set of non-proprietary file formats. They do accept ASKE, so we are working with them to find out if we can transmit our point clouds to them at ASKE.
Director’s Order Number 19, “we affirm these laws.” In 2011, the National Park Service worked with NARA to rewrite the records which changed the schedule for the entire NPS. HTP, as I said, has its own record group at NARA, and I am in the process of working with NARA to rewrite that record schedule. The Director’s Order states that resource management records are mission critical and all are permanent and so we have to follow NARA regulations with regards to electronic records and so the permanent records, they all should eventually end up in National Archives. Any electronic records should be converted into an acceptable format for NARA, whether that’s an analog conversion, which is what we generally do when we convert them to drawings, or whether you take an electronic format and convert it into a non-proprietary electronic format.
So technological obsolescence cycles are very short, oftentimes, three to five years and if you don’t take care of your files, you end up with error messages such as this. HTP has been examining digital photography. We still require film negatives and this is one of the reasons why. This film negative on the right is from our collection and was taken in the 1930’s. This was scanned recently and you can see it’s still in really great condition seventy eight years after capture. All the storage media on the left, you know they’re machine dependent and the image data that’s stored on them, these file formats are all obsolete, so good luck getting your data off of that. The Secretary of Interior Standards have a 500 year permanency standards. The analog materials that we produce meet that 500 year permanency standard. I think digital is still a big unknown. One of the studies by Stanford University Library concluded that digital storage reliability would have to increase by a factor of one billion to have a 50% chance of files being usable after just 100 years and when our permanency standard is 500 years, digital still has a long way to go to meet that standard.
Lack of standards for creation and storage of digital objects is one of the reasons for this unreliability. This conference is primarily about 3-D data, but I did want to just talk briefly about our efforts looking at born digital photography. HTP has been examining digital photography for the last several years to figure out if it could meet Secretary of Interior Standards. As I said, the lack of durability that I kind of already addressed. Our standard is large format photography and it’s really only recently that digital photography has been able to come close to meeting the kind of clarity, quality, and amount of imformation that a large format camera can capture. The camera systems that can do this are still quite expensive. The Phase One camera that’s pictured here retails for about $60,000, so for a lot of photographers out in the field producing our documentation, these cameras would be out of reach because they are so expensive.
So why do we require large format? The level of detail achieved in large format photography can reveal construction details and in this case the degradation of structure that is not captured in smaller formats. Perceptive correction using shift lenses is also an essential part of our large format photography practice. You can do perspective correction of a sort after capture in Photoshop but this ends up distorting the image. Here’s kind of what a normal camera sees looking up at a building, it distorts the building. This is what the perspective correction in a view camera can accomplish. The building is no longer distorted and this is perspective correction after capture in Photoshop and you can kind of see, it distorts the building. The lines are straight but it’s not an accurate representation of the building.
So HTP is not the only archival institution that is concerned about storage of digital data. The Internet Archive is doing a huge digitization of books project, and they are keeping the original books because they believe that only paper is authoritative and safe copy for the future. The Academy of Motion Picture Arts and Sciences did two studies examining digital preservation. The first in 2007 and the other one they released just recently in 2012 and their studies make clear the economic burdens of digital preservation. This is the cost to preserve one movie for one year, on film it’s $1000 dollars a year. That same film digitally is about $12,000 and this is pretty much across the board with every economic study I’ve seen. Digital preservation is generally about ten to twelve times more expensive than storing an equivalent analog product. All the major film studios have an ongoing economic interest in making money from their films are converting their films which are generally shot digitally onto film and film is what they’re preserving long term. The 2012 study made a couple of, I think remarkable observations, that “the archival system for digital materials that meets or exceeds the performance characteristics of traditional archives does not yet exist” and that analog materials made one hundred years ago were more likely to survive than the digital products produced today.
I’ve thrown around the term “digital preservation.” What does that really mean? I think this is a really great definition and I’ll read it real quick. “Digital preservation combines policies, strategies, and action to ensure access to reformat in born digital content regardless of the challenges of media failure and technological changes, and the goal of preservation is an accurate rending of authenticated content over time and I think that authenticated content is really key. That’s really difficult for an organization like us where we are accepting documentation as produced by other people authenticating that content is very difficult.
Digital objects need to be sustainable to be preserved and media degradation happens very quickly regardless of the media. Even things that were once touted as being archival like CD’s and DVD’s when they first came out, it was said that these are going to last one hundred years. Well now that they’ve been out for awhile and people are testing them, ten years is probably really the max that those will last. Byte rot of file errors occur regardless of storage media and at alarmingly fast rates.
Digital data has created an economic problem since digital archives require constant maintenance to be preserved through time. Driving up labor costs for archival storage at a time when federal agencies and just about everyone else are facing economic challenges and budget cuts. The kind of old archival paradigm of “store and ignore” is not possible for digital data. So on a very basic level, there are three essential components of digital preservation; backing up data which means having multiple copies on multiple media on multiple occasions, verification of data to protect against byte rot and human error through file validation and checksums and migration to combat technical obsolescence. And just kind of a quick review of how NPS is doing this. We do a pretty good job backing-up generally. NPS stores the majority of our data on lands and sands which are backed up but I was shocked recently. Last fall I went to a conference specifically for NPS archivists and museum professionals, and I was shocked at how many of those programs and parks are using hard drives to store data because they don’t have adequate storage space for IT infrastructure to store their data and it was over ninety percent of the people in that room, including HABS and HALS actually. So when you don’t have adequate IT storage space to store your basic data, storing really large 3-D files will be impossible for a lot of Parks.
The Department of Interior is in the midst of an IT transformation and one of the things that originally was supposed to be part of that transformation was a scaleable cloud environment and that would certainly help us with our storage space and size, but right now that transformation is taking a strategic pause and it’s kind of unclear what we’ll end up with anytime in the near future. NPS does have a number of storage systems. NPS Focus is one. This is primarily a digital library so it’s a lot of images and documents and really, 3-D data can’t be deposited here. They do have a really great backup system but they do not do file verification and they have no plans for migration of data through time. IRMA is another storage system within NPS but again, IRMA is basically a storage system. They do a really fabulous job backing up their data. I’ve had some conversations with them regarding migration through time. They don’t have any plans in place but they said they probably could make a commitment to migrating certain file formats like TIF and PDFA through time. They also do not do any verification. The good thing about IRMA is any NPS employee can load data, almost any format can be accommodated. They do have a five gigabyte file size limit and I think we are one of the groups that will really test that size limit and they really didn’t anticipate that anybody would have files as large as that.
So greatly simplified, digital preservation and kind of the aspects for this preservation. In reality it’s much more complex. The OAIS is an ISO standard that came out in 2003 and it forms the backbone of most good digital preservation systems. I won’t go into too many details here. If you’re curious you can read the standard but in a nutshell, it ingests files, it validates content, it stores and creates technical administrative and descriptive metadata to help preserve those files through time, performs fixie checks to guard against byte rot and human error, and it finally has a delivery system that delivers content to users via search and retrieval systems. Just another thing to be aware of if you are storing your digital data. I think someone mentioned yesterday the idea of a trustworthy repository. There’s a whole audit and certification process that even if you don’t get certified as being trustworthy, it’s a really good system to go through and examine your risk and look at the strength and weaknesses of your storage systems.
So what does all this mean for 3-D documentation. 3-D documentation is very exciting. It’s, I think, made our work flow better for HTP. It’s allowed HTP to create products that we never have before such as interactive fly throughs that can provide some really great educational content for Parks. But the reality is digital data is for right now ephemeral nature. NPS does not have a digital preservation system and building one modeled on OAIS would really be a multibillion dollar investment on a price level and at a time when NPS is facing a multibillion dollar backlog on our historic structures, building a preservation system like that is not feasible for the Park Service.
So this means we have to look to our partners. The Library of Congress and NARA are currently the partners we work with. The Library of Congress has said they cannot accept proprietary data. They are not comfortable with the 3-D digital data that we’re producing so that really leaves NARA. NARA is working on, as I said, we’re working on rewriting our HABS/HAER house records schedule. I’ve brought a lot of the 3-D data to their attention. They do accept ASKE file and while they haven’t explicitly stated to me that they’re going to accept those, without stating a reason why they won’t.
HTP has a very long history of collaboration with other partners and this is an option that we may need to look at when we’re archiving our digital data. I’m not sure where that would go or what those partnerships would look like, but we’ve partnered with people a lot before. So really in order to assure the long term preservation of our data, converting our digital data into analog drawings is our only viable option at this point and we will continue to do that moving forward in order to meet our obligations required underlaw.
Oh, and just as a side note, anybody who uses our collection, the Library of Congress redesigned the online website in 2011. The “Built in America” site is still up on the Library of Congresses website. It is no longer being maintained or updated. This is our current collection and if you want to view the latest materials, it has a lot more interactive function than the “Built in America” site, so that’s the site you need to go to.
Anne Mason began working for the National Park Service in 2001 at the National Register of Historic Places and National Historic Landmark programs as a Digital Library Production Manager, overseeing the digitization of the National Register archives. She has served as the Collections Manager for HABS/HAER/HALS since 2006, working closely with the Library of Congress to preserve and manage the collection. She has been an advocate for thoughtful digital preservation within NPS.