SCA: Going Digital: Less Process, More Content

Happy Saturday! Time for session notes from the first talk of the day. Let’s talk about digital materials and processing. Allons-y!

Moderator: Lellani Marshall (Sourisseau Academy for State and Local History, SJSU)

Paula Jabloner (Computer History Museum)
Russell Rader (Hoover Institution Archives, Stanford University)
Lisa Miller (Hoover Institution Archives, Stanford University)

Topic: Ways in which we can apply “More product, less process” in digital realm. Speakers will be sharing two case studies.

Lisa Miller (Hoover Institution Archives)
Hoover is well-funded compared to most, but still lacking some tools for digital processing and preservation. Shows a wish list: digital repository system, dedicated IT staff, computer programmer, DAM system, Tools for METS, PREMIS, etc..

Available resources; PC and Mac computers, floppy disks, server space, some staff time, and eager researcher for the Katayev Collection (2007). Spurred the archives to create basic processing procedures. Mostly had Web 1.0 files (Word files, non-interactive media, etc.). Researchers just wanted content, not concerned so much with authenticity issues (diplomatics) like archivists.

Basic steps:

  1. Find computer media: looking for media in collections via finding aids and catalog. No standard way of indexing media, so serendipity plays a role in finding them.
  2. Get files off the disks. Scan for viruses.
  3. Use checksums for file integrity (MP5 checksums). Can also be used to de-dup collection. Verify checksums when move or duplicate the files.
  4. Preserve files with unaltered bits and with author’s filenames intact. Sometimes change to target formats (.txt, PDF, PDF/A, delimited text for spreadsheets and databases). Try to do as much batch processing as possible. Add prefix to filename to delineate converted files
  5. Centralize files in one place on a server. Verify checksums regularly and do backups on tape. Manually initiate checksum verification each month.
  6. Document work with a “Read Me” text file. (Nice idea.) Explains processing steps-unstructured metadata.
  7. Use creator’s semantic folder system. Researchers can use at the Hoover Institution. The files are not available online because of copyright issues, etc..
  8. Describe the aggregate in the finding aid. Put information on finding aid on OAC, even if just a stump record without the rest of the collection having been processed. Description based on creator’s file structure and naming convention. Still trying to figure out meaningful ways to describe the content and extent.


  • Viruses= stop doing anything with the file.
  • Unformatted disks
  • File extensions are lacking
  • Filenames don’t have any meaning. Problem with digital camera photos,
  • Corrupted files.
  • Character and coding problems, especially with data from other countries.
  • Scalability of: unstructured metadata in read me files, workflows for hundreds of media items, Web 2.0 formats (complex formats)

Ending thoughts: not ideal process, but files are recovered and can be used by researchers. “Preservation is for five years or forever, which ever comes first.” In the future, want to make part of regular collection processing workflow, create truly compliant PDF/A files, establish quarantine station, find digital tools to facilitate/expand workflow, and optimize file delivery for researchers.

Russell Rader (Hoover Institute)
Digital projects exceed our reach and Rader posits that we stopped asking the right questions. (What are the right questions?) Also believes that archivists are still afraid of “the digital.”

Talking about keeping workflows simple, which is a good idea. Using open source and free programs and tools are good ideas. Archivists need to learn more technology.

Paula Jabloner: Welcome to Nerdvana
Started at Computer History Museum in 2004 and needed to get stuff online. Over 80,000 records online. Museum has a “get it done” attitude. Everyone was for online access because if it isn’t online it’s a “black hole.” Concentrated n the doable not the perfect, one catalog for all artifact types (physical objects, software, A/V, and digital files), simple and seamless online experience (so get easy search process, but may not be exhaustive or authoritative)= broad based access, not an interpretive catalog.

Idea behind quick and dirty processing is to make it available asap. Put a lot of trust in the audience, because the audience is highly technical. Expect the audience to understand the content of the records. Also, used a lot of volunteers and interns for the creating the catalog.

Implementing MPLP: two year processing experiment. One full-time processing archivist supervising interns and volunteers. 12,500 folder level records created by the end. Stripped down metadata entry: set it up so almost everything could be entered automatically. Could duplicate records to speed up processing too.

Finding aids available on website and OAC. No open hours at museum; everything is by appointment for research use of materials. Finding aids are very stripped down. Not a lot of context given in the finding aids and you get minimal access. Always a trade-off between speed of processing and describing and many access points with contextual finding aid. 70% of collection now available online via catalog records.

Success: 16 finding aids online (entire archival collection in catalog), 32,000 searchable catalog records, 575,000 page views for a year, and 450,000 catalog page views in a year. However, records can be confusing, searching could be more user-friendly, too many databases to manage, etc..

Take Home Message
Processing and preservation of digital materials is difficult. You can speed up processing, but will lose extensive metadata creation and some ability to scale process (example, scaling text “Read Me” files). I’m conflicted about MPLP: I want more stuff online and available, but don’t think that there will be time to go back are reprocess, so will this minimal processing and metadata creation be a detriment in the future? Or does it not really matter as “digital preservation is for five years or forever, which ever comes first”?

SCA: Virtual Worlds in Archival Settings

Time for the session notes from the afternoon session on Virtual Worlds in Archival Settings, moderated by Mattie Taormina (Stanford University). Allons-y!

Speaker list:
Henry Lowood (Stanford University)
Bob Ketner (Manager of The Tech Virtual, Tech Museum of Innovation)
Pamela Jackson (Information Literacy Librarian, San Diego State University)

Most of panelists will be talking about projects done in Second Life. Examples: using Second Life for exhibits after the exhibit is closed in physical realm. Using Second Life for reference and integrating all Web 2.0/social media feeds in Second Life.

Pamela Jackson: San Diego State University in Second Life
Public services perspective: outreach to students and teaching instructors to teach in Second Life. Started in 2007 through 2009, focusing on faculty. Faculty thought it was a lot of work and didn’t embrace the technology. Received Information Literacy Grant Project for 2008-2009 to create a library, created online tutorials and had links to help via link to reference librarians in the physical library, but not enough students to justify having a librarian in Second Life.

In 2009, bought island: Azlan Island and shifted focus to students so students could explore 3D environments. Created a few landmarks that map to buildings and landscape features of campus. Worked with 3D modeling class and imported models into Second Life. Senior students in Art and Design create virtual exhibits for the University Art Gallery. One student created a studio for machinima film. Also used by educational technology students for a summer class.


  • If you build it, will they come? Mostly middle aged women in Second Life instead of college students. May be able to get students come in with cool stuff.
  • Staff time and expertise: someone needs to be supported to manage the Second Life stuff
  • Technology Requirements: need higher-end computers, admin rights, etc.
  • Digital “ownership”: need to own your stuff in order to have it not disappear-ephemeral nature
  • Transferability: need to be able to transfer your content to other virtual worlds; can also transfer skills between platforms

Bob Ketner: Tech Museum of Innovation: Virtual Worlds: Archive of the Imagination
Tech Virtual: virtual prototyping space for museum exhibits, use it for a collaborative space, test interactive exhibits
Used because: great tools, rapid speed of visualization, diversity of input from experts

Roots of virtual words in “Augmenting the Human Intellect” by Doug Engelbart. So longer history than most people think.

Teens transformed an entire gallery space (not part of a formal class). Created exhibits about microchips and technology, also created interactive exhibit. Have weekly design meetings/sessions.

Questions: Can you archive a virtual “place”? What to do with models if move away from Second Life? Can you archive a zeitgeist (spirit of a time)? (thinking points for the audience)

Bruce Damer ( working on archiving virtual worlds.

Henry Lowood (Stanford University) Life Squared: Archiving the Virtual Archive
Dante Hotel (now the Hotel Europa): first example of site-based art installation-recreated in Second Life now. Lynn Hershman created the art installation. Archives has documents and photographs from Hershman. Integrated documents in Second Life model.

Used actual floorplan of hotel for Second Life hotel and created hotel, incorporated documents and photographs to create an immersive experience. Created “meta-archive.” Lowood showed a video of the hotel tour in Second Life. Also use space to show films and other art exhibits.

Worked on project, Preserving Virtual Worlds, on issues of preservation metadata, encoding standards, selection, etc.. Second Life was probably the most negative aspect of project for preservation. Linden Lab does not assert copyright over what users create in Second Life which is very progressive, but makes preservation difficult because you need to obtain permission from each user to archive stuff and many users are anonymous (only know the individual’s avatar).

Lowood teaches archival courses at SJSU SLIS. Over half of his students in a class said we shouldn’t archive virtual worlds/Twitter. (Interesting) Some are resistant to having their creations move into an archives.

My question: Can you get usage statistics off of Second Life?
Can get some usage statistics of exhibits and galleries in Second Life, but it’s not automatic (except for number of avatars landing on the island). But can create counters, see number of unique visitors, figure out what they touch, can also figure out how much time avatars are spending in the archives, etc. (pretty cool metrics)

Take Home Message
It is a ton of work to create stuff in Second Life and it is very time-consuming and difficult to preserve the created virtual world. I’m still not sold on investing in creating archives in Second Life. I am glad to hear you can get metrics out of Second Life, though. Let’s hear it for using evidence-based practice for evaluation and assessment of all projects through using metrics.

SCA Session: Taking Our Pulse: The OCLC Research Survey of Special Collections and Archives

Next up: Taking Our Pulse: The OCLC Research Survey of Special Collections and Archives. Allons-y!

Talks by David Zeidberg (Huntington Library), Tom Hyry (UCLA) and Mary Morganti (California Historical Society)

Overview of Survey Results (You can check out the report here: PDF of report.

  • Collections size is growing
  • Use is increasing
  • Backlogs continue to grow
  • Staffing is stable
  • 75% of library have had budget cuts

275 Libraries surveyed, 61% response rate
Wanted diversity of special collections and archives represented, but academic archives were most heavily represented in respondents.

ARL collection growth since 1998: Archives/manuscripts: 50% growth (average)
Special collections in remote storage: 67% of respondents use remote storage

Use of archival materials is increasing, which is cool. Many archives provide access to uncataloged/unprocessed materials (we do or we wouldn’t be able to let people see anything!). In 87% of the special collections reading rooms, you can use digital cameras.

So access is increasing and archivists and special collections librarians are getting better about being flexible for giving access to collections.

50% of archival materials are available via online catalogs
Backlog is decreasing with implementation of “More product, less process”
Need cataloging and metadata processes that are scalable

Archival management
40% of archival finding aids are online
34% of respondents are using Archivists’ Toolkit

One of the great challenges for archives-we can never do enough.
52% of an active program of digitization
38% have completed large-scale digitization of special collections (systematic reproduction of entire collections using streamlined production methods that account for special needs)

Born-digital Materials
Undercollected, undercounted, undermanaged, unpreserved, and inaccessible.
Need to do more with the born-digital materials; most people need more training
Funding named as biggest challenge of managing born-digital materials

Mary Morganti (CHS)
Small staff and lots of different materials (museum materials and archival materials)
Can solve everything with creativity, time and money! (very true)
Space is a huge issue for many organizations. Talking about lack of space for storing collections (also environmentally controlled storage)
CHS are looking at “right sizing” the collection storage in the correct boxes. (We’re doing this with our collections, too! It’s amazing the kind of shelf space you can regain)
Uses Archivists’ Toolkit (very cool) and contributes to the OAC (Online Archive of California)
Her concerns: metadata discovery, access, decreasing backlogs, funding

David Zeidberg (Huntington Library)
Thinking about the issues philosophically. We all continue to collect faster than we can catalog. Collection development and access to collections (decreasing the backlogs) should be the top priorities (they are at the Huntington). Two schools of thought of collection development: take everything lest it be lost; take only those collections that can be processed in a reasonable period of time to put in hands of researchers. Need to remember ethical responsibility to donor to process the collection. Take material that can be used= need to be more selective in acquisition. Need to do field appraisal before saying you will take the collection.

Reaction to low level of formalized collection development reported in OCLC survey: haven’t seemed to work or be sustainable. Practical alternative: update and share collection development policies with one another. Then we can see who is collecting in particular areas. Need to behave ethically, always.

Tom Hyry (UCLA Special Collections)
Despair over increasing A/V materials, ’cause we weren’t that good at these before, backlogs are growing, and budgets have been cut.
Hope over using streamlined processes and getting more materials online.
At UCLC, reading room is too small as usage has gone up. UCLA is collecting aggressively.

Trends in research libraries: selection is changing, budgets have shrunken, approval processes for purchasing, cataloging departments have changed, and how to support emerging fields (e.g. digital humanities).

Growth areas in research libraries: digital libraries; teaching and outreach; growth of special collections and prominence of special collections. Opportunities for special collections to capitalize on interest in special collections: example, using catalogers with language skills and training them in archival cataloging.

See born-digital materials as an opportunity as they be able to serve our users better. Can serve the materials over networks (don’t have to digitize them). Argues that appraisal is more important now than ever.

Take Home Message
Interesting data and results. Tip for presenters: if you are going to go over a lot of statistics, either go slower so people can take notes and process the information (and give less of it) or make sure to tell people (up front) where you will make your slides available online. Acquisitions and backlogs are important issues facing the profession. Always behave ethically= motto to archive by and if you remember this point, you’ll do well in your archival work.

SCA Friday Plenary: David E. Hoffman

Happy Friday! First up at Society of California Archivists’ Annual General Meeting: David E. Hoffman, “Inside the Kremlin: Unraveling the papers of Vitaly Katayev and Soviet thinking during the latter stages of the Cold War.”

Talking about how he used an archives for his research for his book (The Dead Hand: The Untold Story of the Cold War Arms Race and Its Dangerous Legacy He won the 2010 Pulitzer Prize for General Non-fiction.) Tried to write a book from his archival research and his own experiences.

Parallels he saw: Cold War Symmetry with shape of bombs, space shuttles, etc. in the thinking and engineering.

Asymmetry after the Cold War was in understanding and making sense of the Cold War. In the United States= triumphant versus in Russia= introspective and reflective, not triumphant

Challenge: how to write a history that reflects both sides and also getting access to archives (lots of stuff still not open) Very difficult to get into archives in Russia

Goal: To tell the Cold War story from both sides and try to figure out what was going on in the Soviet system

Discovered papers of Vitaly Katayev (former professional staff member, Defense Department, Central Committee, 1974-1991). 10 boxes of papers acquired by Hoover Institution Library and Archives prior to 2001. These papers are very important as most information from the Kremlin is not available. (Katayev died in 2001) In November 2004, Hoffman received a Hoover media fellowship and used time to search collection of Katayev papers. No finding aid, not processed collection. Hoffman was a third of the way through his book research when he began with Katayev’s papers.

Found two inventories: one in Russian and one in English. The one in English referenced 79 floppy, but only a few floppy disks found in the collection. The collection was quite raw-not processed and no finding aid.

Found insights into Soviet thinking that never seen before through the records of Kataylev (many bound volumes). Kataylev wrote a manuscript on the reactions of the Kremlin to Regan’s announcement of SDI (aka Star Wars). Ideas for tons of missiles, a Soviet Star Wars program, etc. Very detailed notes and technical details on spreadsheets. A treasure trove of information, not on Kremlin gossip, but on important technical tests and meeting decisions. This collection allowed Hoffman to see into the Soviet thinking during the Cold War that wasn’t understood before using Katayev’s papers.

Hoffman did an index of nine boxes in 2005 and did a survey with Pavel Podvig in 2006. (Lucky archivists to get the help to index the collection!)

The papers have allowed Hoffman, with the help of scientists, to piece together insight about the actual capacity of the Soviet Union in terms of accuracy of missiles and other technologies. A big mystery was how much Gorbachev knew about the Soviet biological weapons program. United States discovered that the Soviets were not following the agreement to not create chemical and biological weapons.

In 2007, when the box was able to be opened in Kataylev’s collection, Hoffman found documents chronicling the Kremlin’s decisions on biological weapons. Shows decisions under Gorbachev began in 1986. Up until this was revealed, we didn’t know who knew in the Soviet Union and when. The Central Committee resolutions means, according to Hoffman, that Gorbachev must have known about the biological weapons program.

Katayev took very good notes (was the official note-taker at many meetings). Because of his great notes, Hoffman has been able to piece together a lot of new insights on the Soviet weapons programs.

In August 2007, Hoffman met Ksenia Kostrova (26) the granddaughter of Katayev. She was very close to Katayev and became the custodian of his records after his death. Discovered in apartment: family photos, additional documents, 79 floppy disks, and a memoir. From August to December, Kostrova made a mast index and Hoffman photographed all of the paper documents. Copied all the disks and sent entire collection via FTP to Washington immediately. Was able to read 40 of the disks. (Talk about a find for a historian–it’s amazing!) But had problems reading the files. Talked with Kostrova about the procedure to open the files she did when she was 11 years old to decode the files. 19 more of the floppy disks had recoverable data, done by a UK specialist.

Didn’t have to use Russian official archives. Much of the collection is still raw with many documents to be examined. No official Russian government reaction to Hoffman’s book. Hopefully Katayev’s memoir will be published sometime next year.

Take Home Message: Archives are exciting and you can find information that is unique and incredibly important for the understanding of past events and reactions to these events. Many times, you have to do a lot of work and sort through a lot of documents, but it is worth it in the end. Nice work, Mr. Hoffman.