Last session. Let’s talk money!
Jeff Ubois (PrestoCentre)
Issues: Predictability, Boundaries (commercial and non-commercial), Institutions/Individuals
Will always produce more than we have the resources to save.
Numeric was a study in Europe looking at scanning costs. Film is much, much more expensive to ingest than text pages. (Need to get study) Local contractor (price to store a box): $200 to $700. Storage costs vary widely.
NIH collects gene sequencing information and is going deaccession data because price of sequencing is dropping much more quickly than cost of storage space.
Millions of dollars have been spent creating complex cost models for archives.Gap between project based funding and need for perpetuity of data. Real estimates: Princeton: $5,000/TB (100x media cost), PrestoPRIME: 40x cost of raw media cost. Basis for buy a brick: endow a TB (interesting idea).
Roles for Commerce: Ingest scales well, cataloging & indexing, but what are the long term promises? Partnerships: huge commercial uncertainties can mean harsh terms.
Digitizing archives is a way to engage the public and bridge individuals and institutions. Lots of room for collaborations.
Paying for long-term storage
David S. H. Rosenthal (LOCKSS Program at Stanford)
Rent (Amazon), Monetize the content (Gmail selling adverts on accesses to it), Endow the data (sufficient capital up-front to pay for preservation)
Digital preservation is vulnerable to interruption of revenue stream. Endowment provides a relatively predictable return, but need to figure out how much will need in the future. However, it rests on assumptions that storage is the major cost of preservation and Kryder’s Law (storage costs go down exponentially) will continue at least another decade. So endowment business model may not work. Storage costs go down, but associated costs are not going down as much (eg cooling, space, power costs).
Why Kryder’s Law Might Not Hold:
Desktop PC market is going away, next drive technology transition problematic, solid state disks.
You will get what you paid for: pay now, get service later (no leverage if service not delivered), need escrow service, if service fails, transfer data to successor.
All this means that estimating endowment need is very difficult. Also, it there is a marketing problem if telling people need 70x the cost of storing raw data to have perpetual preservation of the digital data. So, once again, we have an issue with figuring out how to get a reliable revenue stream in the archives.
Cost of hardware= 20% of the total cost. Lots of the cost goes to people’s salary. Luckily that increasing storage does not mean same amount of increase in number of people at the Internet Archive. If costs go down, expectations go up.
What helps us is that a petabye is a lot of storage space. “So people may be running out of stuff.” And Kahle believes that preservation must be done in a non-commercial way. Non-profits last a lot longer than many corporations.
“Love the Data”
Preserve the data in a way that people care about and make it so people can get to the data easily. “Access drives preservation.” Dark archives is not a good idea. (Out of sight out of mind)
Three issues at the Internet Archive
Perception of Rights Issue
Therapy (ego stuff)
How much does it cost to digitize a box of stuff? $100-$750 per box because a lot of variation in the type of stuff in boxes. You find a lot of random things in boxes. Costs about $15/video hour to digitize and film is about $300 per program hour to digitize. Books and microfilm= $0.10/page to digitize.
Born digital: Have upload button on Internet Archive website, then they back-up and add metadata.
Costs $1-$2 million to start up scanning/ingesting a new type of media in order to build relationships, get hardware, adapt software, etc.
Really want to start digital archives project for individuals, working with personal archives. New avenue for the Internet Archive.
Perspectives on Funding
Steve Griffin (Library of Congress/National Science Foundation)
Need to ask whether research funding is keeping up with the way research is now happening. May need to change funding models. Need effort by scholarly researchers to get federal funding agencies to change models so they work for today’s scholars.
Take away: Very difficult to estimate costs of long-term digital preservation and it costs a lot so we have a marketing problem when soliciting funding. But we need funding, so we’ve got to figure this out. Also, economies of scale are very important and if you give people an easy way to upload their data (a la Internet Archive), people will upload a ton of stuff. So let’s keep positive and make the changes in funding structures that will allow us to preserve our digital data for the long-term.