Data Management, Preservation, Curation, & Repositories: IL 2011 Session Summary

First session after lunch! Very excited for this session. It is by Christie Peters and Anita R. Dryden from University of Houston; Susan Chesley Perry from University of California, Santa Cruz; and William Gunn from Mendeley. This session was about data curation and management (and finally we finally heard the term archivist!). Allons-y!

Assessing Data Management Needs at the University of Houston
by Christie Peters and Anita R. Dryden

For more information, they have a forthcoming article in Science & Technology Libraries.

Started needs assessment of data management needs in summer 2010 at University of Houston. Lots of librarians feel that Data is a four-letter word. Many librarians feel that there is a lack of support, qualified personnel, infrastructure and lack of trust from faculty. You need good relationships in order to get buy-in from people.

They got buy in due to NSF (National Science Foundation) Data Management Plan requirements for data storage that was implemented in 2011. Also, the University of Houston wants to attain Tier One status. Therefore, these two initiatives/requirements helped get buy-in to the data management needs assessment project.

Worked with other units on campus. Worried about infringing on other units’ territories. Contacted Division of Research and found out they were overworked and understaffed and eager to work with the library on the assessment pilot study. Got a list of NSF and NIH (National Institute of Health) grant-funded projects for the pilot study. Criteria for inclusion in pilot study: large grant, individual or group projects, and cross-section of disciplines. Targeted 9 NSF and 5 NIH grants and interviewed 7 NSF PIs (Principal Investigator) and 3 NIH PIs. Fairly good response. Interviewed others because often the PI does not manage the data.

Many toolkits for assessing your digital curation needs, including: DCC/JISC Data Asset Framework and Purdue Data Curation Profile .

For University of Houston’s pilot study, did face-to-face interviews. Met with subjects, provided paper copy of the interview instrument, did not record interviews, and compiled responses for analysis. Asked questions about: project information, data lifecycle/workflows, data characteristics, data management, data organization, and data use. Graduate students were most often responsible for the data management.

Results: researchers were not looking for server space or data storage. They did need: assistance with funding agency DMP requirements, grant proposal process, finding data-related services on campus, publication support, and targeted research assistance in the area of data management.

Next steps: Plan to expand the project via establish a data working group and expand assessment. Also, try to get everyone together to see who is providing what support services.

The Great Wave: Extending Current Curation Practices to Data
by Susan Chesley Perry from UCSC

Susan Chesley Perry also works with the University Archives. [Yay! Finally someone is talking with the archivists]

Developing strategies to preserve data sets–both small and large datasets. Digital humanities lack the funding than the sciences. Many faculty are just worried about their grant projects and not about the future preservation and use of their data sets.

It would be great to hire a data librarian, but not possible with the current budget. So, UCSC must leverage existing staff and services. Luckily UC campuses have the California Digital Library (CDL) and Online Archive of California (OAC). Have DMP Tool to help PIs curate their data. One of CDL’s services is Merritt and helps with ingesting data and digital objects for archiving.

Looking to adapt online archiving policies for faculty to use. Need to get faculty to use standards for metadata and file formats, or at least use naming conventions for their filenames.

Looking at crowdsourcing the cataloging and transcription for the collections. UCSC will be doing this for the Grateful Dead Archives.

Embedding Institutional Deposit into the Scholarly Workflow
by William Gunn of Mendeley

Gunn gave an overview of Mendeley. Mendeley has around 120 million documents deposited in less than three years. Interface design is important to success of depositing materials. Mendeley has a freeium model. Currenlty working on a pilot project.

Takeaway
Make it easy for faculty to curate and archive their data sets. Don’t forget to include archivists in this conversation–they are doing a lot of data curation and preservation work, too.