Fun Stuff to Share on a Friday

Happy Friday! I’m so excited that it is almost the weekend, even if the weekend is supposed to be gloomy and rainy. So today, dear readers, we are going to throw away any pretense of this being a blog post of great substance on the issues of the day in library and archives land. Instead, this post is to get you ready for the weekend, armed with fun (and helpful) tidbits and goodies to share with whomever you cross paths. Yes, it is a classically random Friday post. Did you expect anything less?

First, something very important and good to share with your family, friends, and library patrons: Give them Lifehacker’s article on how to give to Japanese recovery efforts without getting scammed. Good cause, better if your money actually makes it to those who need it.

If somehow you missed it on Twitter this morning, the videos from Personal Digital Archiving Conference 2011 are now available on the Internet Archive. There are many interesting talks and I highly suggest listening to them, especially if anything I wrote while summarizing them last month didn’t make sense.

In exciting library publishing news, College & Research Libraries is going fully open access. To this I say, yay and it’s about time. Also, could we please make the UI better? It looks a bit wonky in Chrome.

Speaking of open access journals, I have to give props to Evidence Based Library and Information Practice which has always been (so far as I know) open access. Plus it publishes some great research articles, although I may be a bit biased as this journal published my first article and I am on their evidence summaries team. But really, it’s a great journal and the latest issue just came out, so go take a look.

This is something fun to share with your friends (and get a wee bit competitive about who does better on the test, if that’s your thing): The Cambridge Face Memory Test. Take the test to see how good you really are at facial recognition. (This one’s for the people I know who say they are excellent at faces, but quite bad with names. Let me know how you do.)

Lifehacker Night School has done it again. Check out the latest course on Digital Painting 101.

And finally, if you have people coming over in the near-ish future and it is as gloomy where you live as it is in the Bay Area, consider making Chipotle Black Bean Pizza. It sounds yummy and warming, plus who doesn’t like pizza?

Have a lovely weekend, filled with fun, relaxation, and reading. I plan on it. I’ll be back next week with actual thoughts on libraries, archives, tech, and other randomness. Allons-y!

PDA 2011 Closing Keynote

Closing Keynote by Rudy Rucker, Sr.. Let’s get to the summary!

Lifebox Immortality

Science fiction dream of achieving immortality via personal digital archiving. But, we don’t understand how brains store information. It’s not practical to tag everything yourself; you need ways of automating tagging and metadata creation.

Wrote a book called, The Lifebox, The Seashell, and the Soul. His day job was as a computer science instructor at San Jose State (he is retired now).

Lifebox is an idea of a personal digital archive that is “really good” and that you can search easily. It’s not hard to search your lifebox if you are a writer like Rucker and uploaded a lot of information on your own website and created a custom Google search engine for your site.

In human conversation, people do not answer your questions directly. There is an actual conversation. But you could create a chat bot copy of yourself in a lifebox. What is missing is the creativity of the person in these stand-ins. So you don’t really achieve immortality.

Most people aren’t writers. Rucker says you should write like you talk. You could also tell a story instead of writing the story of your life. This is already reality via speech recognition software. Still missing “the spark.”

Suggested reading: On Intelligence by Jeff Hawkins (about neural networks).

Not hard to get chat bots as long as you get people to upload enough data. (That’s always the problem, isn’t it? People have to exert effort which is a hard sell.)

Take away: Easy enough to create a chat bot, but much more difficult to recreate “the spark” or the creativity of humanity. Many approaches to personal archiving, may never be a standardized way of archiving when making “a copy of yourself.”

Forensics, Privacy, Security

Last session before closing keynote. On to the summaries of forensics, privacy, and security!

Questions posed: What is the proper boundary between public and private data? How far should archivists go in collecting what might be private data?

Session moderated by Elizabeth Churchill (Yahoo! Research)

Archival Applications of Digital Forensics Tools and Techniques or Why I started reading Forensic Cop Journal
Kam Woods (University of North Carolina)

Parallels between forensics and archiving: case files and archival packaging; exploit private data to support criminal prosecution and identify private and legally-encumbered data to redact/protect, etc..

Acquisition
Archives increasingly have to deal with ingesting heterogeneous fixed and removable media. Need to ensure reliable data extraction and reducing hidden risk. Need to know what you’re given. Want to establish “ground truth” about what is on the media. Looking at residual and system data.

Handle issues of privacy via forensics formats such as cryptopgraphic hashing and unique identifiers. Working with bulk_extractor tool to process data with proactive detection and decompression, stream processing during disk imaging, and parallelized processing. Also using fiwalk: creates DFXML from disk images using SleuthKit and creates Dublin Core metadata for files, and file level hashing. Tools, APIs, etc. available at afflib.org.

Developing other projects mentioned in earlier session: BitCurator and Realistic corpora for archival education and training.

The Personal in the Organizational: Value and Ethics
Sam Meister (University of Maryland)

Discussion of the issues and implications of personal data embedded in the records of failed businesses. Framing the talk as privacy as an ethical matter. Part of larger research endeavor.

Sherwood is a restructuring firm and offers private bankruptcy option. Sherwood becomes new owners of the company and winds the company down (managed liquidation). Therefore it also has a lot of private data and records.

Records of start-ups are messy. No records management. “These small organizations are like big people”: both are a bit messy and disorganized. Lots of personal, private data goes to Sherwood when Sherwood is given company.

No comprehensive legislature when it comes to preserving personal data. Therefore it is an ethical issue. Can look at Codes of Ethics: Privacy statements from Society of American Archivists and International Council on Archives.

Strategies/Questions
Looking at selection and appraisal: How can we collect these records? Difficult to know how much personal information is located in the records and where it is in the files. Options: redaction, not collecting employee records: each method has downsides.
Access: How do we give access to the records? How do we keep the private data private?

Transition of private records to public. There is complexity of rights and ethical questions about access and privacy (or the tension between the two). All about maintaining trust.

Take away: Many issues to think about in regards to forensics, privacy, and access. More questions than answers at this point, but definitely learned more about valuable projects furthering our understanding and practical ability to deal with the records coming to our archives.

Personal Health Data Panel

Notes on Personal Health Data Panel. Allons-y!

Panelists:Dave Marvit (Fujitsu Laboratories of America), Gordon Bell (Microsoft Research), Linda Branagan (Telemedicine Products, Medweb), and Khaled Hassounah (MedHelp)

The Quantitative Self aka Quantified Self (QS)
Gordon Bell

Started MyLifeBits over 10 years ago. One uses of SenseCam= capture health data. Challenges: privacy and entrenched, structured growth industry. Scanned everything he had in regards to health documents. You can record and keep lots of personal health data digitally now. Need to do wellness monitoring. Recommends a pedometer.

Bringing Personal Health Archiving to the Masses
Khaled Hassounah

MedHelp is the largest online health community (12 million users monthly), leading provider of PHRs and health applications, over 300 active condition specific communities and forums, partnerships with leading medical institutions, over 200 experts responding to users’ questions, and live chats with medical experts.

Community is very important to MedHelp. Three years ago decided to do Personal Health Records (PHR), have patients involved. Built it and no one came.

Why didn’t they come?
People are more interested in managing their health or a medical condition, records and archives are not relevant to most of the population most of the time, and users wanted to share, but privacy is selectively necessary. People want to decide what they want to share and what they want to keep private. To build community, you need to be able to share.

Need to give people something that is relevant to them, right now. Give them a tool they can use now. For example, Birth of a Tracker: Ovulation/Fertility Tracker. Very popular tool and other communities wanted trackers too. Key: need to have instant relevancy and benefits for the people in order for it to be popular. Other trackers created: Mood Tracker, Sleep Tracker, Pain Tracker, etc.. MedHelp has over 50 trackers now.

Make it really easy for people to decide whether they want to make the tracker information public or private. Make it obvious for people instead of hiding options (yay!). 85% of the people choose to make their trackers public.

What We Learned:
Have to focus on the activity, records and archives are foreign concepts, sharing is important and privacy should be an option, and the question that is most important to people is “Am I normal?”

Health Data
Linda Branagan

Electronic Medical Records (EMR) are maintained by your doctors (aka your chart). Governed by HIPAA and stored by healthcare provider.

Personal Health REcords (PHR)

  1. Type 1: Patient owned and operated. Online record of interactions with all your healthcare providers. Not covered by HIPPA. Example is Google Health.
  2. Type 2. Tethered PHRs (aka “patient window”). Probably not details of your physician encounter, still stored and maintained by healthcare providers, you may have a separate one for each provider, and may be provided by your insurance company. HIPAA applies.
  3. Type 3: PHRs. Self-collected data store, created by you, often stored on vendor websites, might incorporate or access via a Type 1 PHR, and some home health monitoring devices will deliver data to a physician’s EMR.

PHRs are not universally embraced by healthcare providers. Worry about correctness, liability, and usefulness. Not universally rejected either because: PHR-using patients are more likely to be participative, engaged, and compliant to treatment, may help avoid duplicate diagnostic tests, can help coordination during a complex episode of care (ex. having difficult diagnosis or if you have a chronic condition and an acute disease/illness), and can assist family/friend advocates.

Take away: First, this was not really a panel at all. This session consisted of three separate presentations and no interactions among the panelists. Interesting information about how people create and use personal health data online. I would have liked to hear discussion among the panelists and have a dialogue with the audience.

Teaching, Professional Development & Theory

First session after lunch. Time to talk about teaching and how it relates to personal digital archiving. Let’s get into the nitty gritty.

Digital Forensic Training
Cal Lee (University of North Carolina
Forensication: The incorporation of digital forensics methods, tools, and concepts in contexts other than criminal investigations.

Forensication of Archives: recover data when technology fail, capturing evidence from places that are not always immediately visible, ensuring that actions don’t make irreversible changes, attending to order of volatility, documenting what we do, so others will know what we might have changed, taking advantage of the information associated with files to ensure that users of the files understand their context of creation.

Collecting institutions are getting removable media and want to collect the online traces of individuals. And digital forensics field provides training and tools, primarily focused on law enforcement.

Example: School of Information at UNC Chapel Hill
Created lab for learning about the application of digital forensics to the acquisition of digital materials. Check out digitalcorpora.org

Want to build the capacity at UNC and translation of industry models and techniques to the archival world. Lots of questions to answer about: how to apply the tools, how much adaptation is required, and what software is most useful. Also looking at ethics of access: can’t be avoided because users can exploit forensic methods, even if we don’t.

Vision: widespread incorporation of forensic methods into routine processing of archival materials. BitCurator–a modular software environment that implements various batch processes on bitstreams to support two contexts: established forensic programs at institutions and those institutions/individuals getting started with digital forensics.

Personal Digital Archiving, the Diminishing Information Age, and the Archival Paradigm
Richard Cox (University of Pittsburgh)

Big picture context type of discussion. We are so immersed in technology that we are not listening to each other. Need to see how projects connect to each other and what are the practical applications.

People are losing confidence about being able to access their content= information is diminishing. Also, this is why people are interested in personal digital archiving. Worried about losing information with transition to to online/digital way of doing things. Cox’s example, ebooks as “ghost books” versus the physical book.

Problems: libraries closing, losing browsability, end of slow reading, students don’t know how to read and think critically, disappearing bookstores, declining newspaper sales and end of journals, worried about authority of news online, and library and information schools changing/transitioning to iSchools.

Archival paradigm needs to change to have archivists become enablers of others to be able to curate/archive their own data (personal archiving). People are worried about losing their data.

We need to think more deeply and broadly about digital archives and collaborate with each other.

Archival Sense-making: Personal Digital Archiving as an Iteration
Mark Matienzo (Yale University Library) and Amelia Abreu (University of Washington)

Frame personal digital archiving within the context of appraisal and archivalization, examine the contexts of archival sensemaking and identity creation.

Archival sensemaking is a situated action and archivialization is a conscious or unconscious decision process whether something is worth archiving and sensemaking is a theoretical guideline for the analyses of this study. They have taken sensemaking from other disciplines and drawing heavily on Brenda Dervin’s work.

How does sensemaking take place in personal digital archiving?
Collecting as meaningful negotiation. Also looking at context. Looking at archival genres (influenced by Derrida): collections and spaces where you can dwell on text and create new materials.

While sensemaking may be a promising framework in archival research, however there are limits to using sensemaking as a theoretical framework. (This is true of most imported theories, but it is great that these researchers are explicitly documenting the limitations.)

Take away: Library and information science education is changing and should change. We need to collaborate more and break down the silos among our projects. Theories from allied fields may be imported for archival research successfully, but we must be aware of the limitations. Final thoughts? Interesting things happening at graduate schools and we need to figure out how to share information in a more efficient and meaningful way.

Perspectives from Computer Industry Founders

Session on perspectives from computer industry founders. Fingers getting tired from typing, but we will carry on. To the notes!

Ted Nelson (Xanadu)

Considers himself the only dissenter in the computer industry. Started Xanadu in 1960. It was easy to create your own computer world in 1960 because no constraints. Worst problem now is the myth of technology. Most of what people consider technology are constructs and conventions.

Talking about lack of marginalia in digital documents. (I find this personally hilarious because Collin and I were talking about this issue on the way to the conference this morning. And we talked about how you can still do marginalia digitally and will hopefully be able to do more when things such as NoteSlate come out.) Nelson is talking about his idea for creating documents that have connections to show marginalia.

Need to represent connections. (Totally agree. Life is about connections because we are social creatures.)

Scholars Building a Personal Archive for Scholarly Use
Ed Feigenbaum (Stanford University)

Talking about SALT: Self Archiving Legacy Toolkit
Self= Probably means Professors Emeriti, especially those with archives worth preserving for scholarly use + DIY with only a little help from professional librarians
Toolkit=webpage formats and software to facilitate DIY

SALT’s JANUS Approach: two faces, looks outward to give access to researchers and students of today and years from now and other face uses Zotero to facilitate the work of scholars doing their archive building and enrichment

The two faces talk to each other on a regular basis. Need to sync between Stanford Digital Repository and Zotero cloud servers.

SALTworks is the name of the experimental system at Stanford. It supports full text search over the entire Feigenbaum digital archive. It contains 15,000 documents. It has users already, even though it is still experimental.

Learnings from a life’s work: The Doug Englebart Archives
Christina Englebart (Doug Engelbart Institute)

About her father’s archives. Doug Engelbart started research lab and created computer software, the computer mouse, and more. Definitely a computer pioneer. Came up with lots of innovations and terminology.

Saved a lot of materials for the archives. Lots of archiving happened in real-time because archiving function was built into their computer programs.

Then, re-archiving by placing the information on the web. First website was created in 1995. Had already gave a lot of documents to Stanford beginning in the 1980s. The material is housed in many different places online: Stanford, Computer History Museum, and the Internet Archive.

Lots of work always to do. Connecting technology to the vision is very important.

Take away: I must be distracted because I’m hungry for lunch as I don’t have an overarching take away from this session. Basically, think about what you are doing and how you might archive it…eventually. Back with more after lunch.

User Studies and User Behavior

Next up: User studies: careful observation of archival practices reveal some surprising things about user behavior. To the session notes: Allons-y!

CTRL-S is Poor Archival Practice
Devin Becker (University of Idaho) Collier Nogues (University of California, Irvine)

Did a study of writers via online, open ended survey, about 100 people responded. Writers serve as a sort of focusing agent for the field: increased value assigned to digital files by writers themselves and by archival community. 75% of the respondents were poets, 77% had published one or more books.

Why is this an important issue? Because people don’t save earlier drafts of their works. So you can’t see earlier drafts/versions like you can in, say, the Ernest Hemingway Collection at the JFK Library.

53% claimed to save over their files primarily, but only about 20% always did this. 35% saved drafts all in one file. 9% only saved drafts as printouts and have only one digital file.

Only 8% work exclusively digitally; most work in both paper and digital formats. Many have very strong views about what points of their workflow they use digital versus physical to do their work. “There is really no feeling of management whatsoever when it comes down to it.” People save things everywhere (not surprising to archivists).

Only 7% admitted to never backing up their files. Over 70% said they backup at least monthly. However, this backup is not always done really well. Most backup on external hard drives.

Implications
Benign neglect does seem to be these writers’ basic curatorial mode. People have a fear that electronic files all look alike unlike manuscript drafts. Anxiety about confusing files because they look the same. Writers are more anxious about the management of their files than they are about losing their files.

Recommendations for Archivists
Don’t meddle too much with writers’ files
Meddle a little: 80% would be interested in receiving information about recommended digital archiving practices
Propose: Writers’ Digital Preservation Awareness Week (Why don’t writers just participate in ALA’s Preservation Week? It’s coming up–April 24-30)

File Folders on Computers in Personal Digital Archiving
Hong Zhang (University of Illinois)

Talking about filing systems people use on their computers, can be seen as organic archives created by people. More hard drives coming to archives with lots of digital files. How do we decipher these files?

Methodology: multiple case studies with 12 participants, two rounds of interviews,m disk scan, re-finding tasks observations (part of Zhang’s dissertation work)

How do people archive their files?
Explicitly indicate archives folders via folder names, for example: “archive”
Implicitly indicate archives folders via dating folders, for example
Keeping the original structure when archiving because used to the structure and no motivation to change it when archiving because won’t be using the information again

Relationships among files may be complicated and important or almost non-existent. This is an important idea to remember when trying to appraisal, process, and archive personal digital collections.

Gmail is a Storyworld
Jason Zalinger (Rensselaer Polytechnic Institute)

We are all digital storytellers, historians, curators, etc. of our own lives. We are very good at capturing personal data, but we are not good at helping people make sense of it all. We are not good at encouraging people to explore their archives for self-reflection. When Gmail changes the interface, it changes your storyworld. Thousands of clues to our life stories are sitting in our archives.

User study: conducted six interviews, 3 male, 3 female, highly educated, ages 27-39, 3 via audio recording, 3 via IM, asked about their archives and about stories.

Findings and Design Recommendations

  • A Label Named “Forget”: everything a person wants to forget, but wants to archive. Design Recommendation: Forget & Remember labels built into Gmail. Pop-up message years later to read message and see if want to delete
  • Digital Regret: send emails that you regret later. Design Recommendation: Gmail has the “Undo” send button already. Gmail’s Mail Goggles makes you solve math problems in order to send emails (aka friends don’t let friends email drunk). Wants “Sleep On It”: sends email to your archive and then pop-up lets you re-read your email before sending it the next morning.
  • Characters: Conversation View (email threaded conversations). Design Recommendations: Storyfox would format your conversation thread to a Google Doc formatted to look like a screenplay or as a comic strip (Geomic)
  • How do you know what is meaningful? Design Recommendation: Gmail Meaning Labeler (crowdsourcing)
  • Design Recommendation from Interviewees: word clouds for email

Note: design recommendations are at the conceptual stage and Zalinger hasn’t created them.

Cognitively Motivated Lifelog Software
Aiden Doherty, Cathal Gurrin, Alan F. Smeaton (CLARITY: Centre for Sensor Web Technologies, Dublin City University)

People have talked about personal life archives for years. People have taken this further and created weird technologies to capture their life. However, the researchers use wearable sensors: SenseCam is a Microsoft Research Prototype, now the Vicon Revue: contains a camera and various sensors, GPS, Bluetooth and takes about 5,500 photos per day. Researchers have their own smartphone App: integrates all sensors, can connect to external capture devices, and uploads to a server in real-time.

What is an e-memory archive?
“We use sensors to capture and understand life activities.” Lots of information via the information captured by the sensors. (That’s a lot of data to mine) Don’t record audio because people stopped talking to the researchers. 4.5 years= around 7 million photos.

In one year: 12,500 events or moments, 20 million accelerometer and temperature and compass readings, 2.3 million GPS points, 25,000 unique Bluetooth encounters (wow!)

Want to build search engines for these e-memory archives because visuals are powerful memory clues. Great for remembering different parts of one’s life. Make search engines based on cognitive science. Biomimicry of how human mind stores and organizes memory to model for the search engines. (wow, again) Can determine unique events and moments out of the mundane and then finally display in the browser.

Applying 12 years of video/image search experiences showed many different axes of retrieval for information. Designed initial browser 4 years ago, larger images are more important, and some search functionalities. Then designed a new browser with more flexible search options. Newer browser is much better at finding events, but still at 2 minutes for retrieval. Need to think of new ways to tackle challenge of efficient and fast searching.

Take away: Users are idiosyncratic in their use and creation of digital files. This is not surprising, but kind of sad, for archivists–it means a lot of work to decipher the information when it comes to the archives. (Yay for job security, though.) Lots of data being created and need ways to search and display it visually. Very interesting session, especially the information about lifelog search engines.

Images: Capture and Collection

First morning session: “Images: Capture and Collection.” On to the session notes!

What is everyone doing with all these cheap cameras?
Daniel Reetz (DIY Book Scanner)

Created own book scanner and shared instructions online: diybookscanner.org. Lots of people are using these scanners for great projects all over the world. Cheap cameras can change the world, can liberate information and help others share information with others.

Cheap cameras are very cheap. Cheaper a camera is, the harder it is to control. Cameras define the our aesthetics. Photographs have become the basis of our memories. Aging of photographs as aging of memory.

Cheap cameras are everywhere. The most common camera in the world is in your computer mouse. Color in digital images is calibrated to what is most liked by people, not by math. It is what sells that defines the color settings in the camera. People like saturated photos and sharper images. (Interesting and scary at the same time) Lots of processing done within the camera before you ever get the image into Photoshop.

We can’t trust photos like we might like (this is not a new idea). “Consumer preference undermines control.” Technology is affected by desire and fantasy. People don’t want to show reality–they want to show idealized world.

Need to construct tool that help us determine how reliable the images are that we have in the archives. Lots of potential for use of cheap cameras and digital photos. Need to show people how to do more with their cameras.

The Center for Home Movies Digitization and Access Summit
Dwight Swanson (Center for Home Movies)

Talking about Summit at Library of Congress Packard Campus in September of 2010 (46 attendees). Problem addressing: limitations on access to home movies have resulted in limitations to our understanding and use of them. Want a way to easily find home movies online and way to upload/access home movies online.

Where are home movies online now?
YouTube, Internet Archive, Regional film archives, and film transfer companies. Center for Home Movies have an arrangement with Internet Archive for their home movies. Regional film archives have historically been the most active in collecting and providing access to home movies, but have been restricted by budget.

Challenges
What would we need to do and spend in order to implement a mass digitization and web portal project involving home movies and video from both public and private collections–getting them online for free public access?
What impact would the availability of these collections have on their use and analyses?

Summit Topics (can go here to download final report)

  1. Taxonomy of home movies. LoC wanted a taxonomy: definitions, genres and tropes
  2. Cataloging and description: metadata structures and management as well as crowdsourced tagging. Coming up with list of terms and fields needed to be included when describing home movies
  3. Legal issues: documents created for terms of use, privacy, takedown policies and had discussion of rights issues of orphan films
  4. Technical issues: comparison of film digitization systems, recommended technical standards, different workflow scenarios
  5. Use and users: scholarly users (why do home movies matter?) and commercial users (who are the people using home movies and what are they looking for?)
  6. The Film Collectors’ Community: perceptions of value of home movies due to companies such as eBay and engagement with collectors

Lingering questions from the Summit:
Who would be the primary users of a home movie portal?
What could it do that YouTube and the Internet Archive Can’t already do?
What types of media do we want to deal with?
What is the relationship between preservation and access?
What form should the project take?

Archiving Space: Capturing personal and shared spaces with explorable gigapixel imager
Rich Gibson (Gigapan Project)

“The world is the set where we live our lives.”

Gigapan allows users to upload photos and pan/zoom throughout the panoramic images. Software stitches the images together.

Spaces are changed and images allow us to see these changes. Many programs allow us to explore these changes and spaces online.

“Explorable gigapixel images change the way we see.” (I think that photographs in general change the way we see. Taking photographs definitely change the way we see, the way we compose our lives, and the way we constrain our world through the viewfinder.)

Gigapixel allows for different ways of curating art exhibits and displaying art. It can also be seen as a way of “archiving” transient, ephemera activities and exhibits.

(You should check out the website–lots of very cool images to explore. I could see spending a lot of time on the site.)

Take away: Images are important to our memories, our lives, and our identities. We need to think critically about how we interact with these visual images and the people that care about the images. We should also empower people to capture images and to think critically about their own visual record of their lives. (I love photography so this was a very interesting talk to me, personally. Also, if you want a wonderful book that will have you thinking about many of the issues brought up in this session in greater depth, check out Susan Sontag’s Regarding the Pain of Others.)

Day 2 Keynote: Clifford Lynch

Happy Friday! It’s Day 2 of the Personal Digital Archiving Conference 2011. Time for the morning keynote by Clifford Lynch from Coalition for Networked Information (CNI).

Talking about some of the key issues on Lynch’s mind. We are moving into second generation understanding of personal archives. We can see tensions around this evolution. By the mid-90s, we had realized there was a revolution in personal archiving. We were taking ideas from personal archiving in the physical space into the digital. See problems about saving digital files on bad media, concerned with loss of information esp. via drafting documents online (who keeps different versions of drafts), worried about ephemeral correspondence. But everything was extrapolated from the ideas of personal papers.

Now we are seeing a problem because now there is a shared space online. Material that is shared by groups and made public in limited sense, such as contained social media networks. How do we relate this to personal archives in the earlier sense?

Everything is being shared online and we find that the shared versions have more value because of added comments. We also face a problem of ownership. Example: family archives. Need collective decision making process. Not “pure” individual archives. This can lead to confusion.

Lots of emphasis on what happens to your stuff after you die and about honoring interests of the individuals. But passage of digital objects becomes much less clear when in shared spaces. It is a collective issue.

Implications:
Changes in decision making: collective.
Shared spaces are a vulnerable platform. We’ll see more abrupt shutdowns of online spaces in the coming years.
Digital records are very vulnerable when individuals change jobs.
Platform migration of all kinds in social settings are periods of peril/vulnerability for continuity of material. We need to think carefully about this issue.
Need to think about length of relationships of individuals have with a social platform. (very interesting point with emerging technologies)
How do these relationships with social platforms relate to length of relationships individuals have with memory institutions and archives? Need to figure this out.

Large scale of social media systems: LoC archiving Twitter. Need to have arrangements to preserve this massive amount of public information. We don’t understand this relationship in any complex way. Need to be thoughtful and understand these relationships and how to create these relationships to save this digital information.

Notion of public lives and a sense that there is some minimum record of information about an individual is held by many. We’ve built many systems to record and manage this type of information over the years. These are becoming much more open, connected, and extensive now. For example, look at scale of online genealogy. Lots of move to make information more transparent and more public. Need to think about how public, online social spaces interconnect with ideas of identity and societal relationships.

Question: What is a public part of a life? Do we have consensus? Not really.

What are actions that people can take that can become permanently public? How does this connect with public social spaces?

Many questions about how the individual and his/her information relates into the social setting and issues of public and private.

If we simply extrapolate the challenge from personal papers and shoehorn the development of the shared social spaces into this historic view, we will miss a tremendous amount of the complexity and issues (and potential solutions).

Take away: Personal digital archiving must be seen in a socially connected manner and we need to ask the difficult questions of how the individual relates to the social public spaces and their wishes about how their data is connected and viewed. Wonderful speaker, great ideas, fabulous talk, and a great start to Day 2!

PDA 2011: Day 1 Closing Keynote

Brian W. Fitzpatrick (The Data Liberation Front)

The Data Liberation Front wants to get people to think about how to get their data out of the cloud. The Data Liberation started in 1988. It’s very important that people have control over their data and make it easy to take their data.

There are business benefits to making it easy for users’ to take out their data. You get users’ trust by doing something good for the users. It’s not altruism, but a long-term strategy.

Choice: it is easier than ever for users to choose your product and to leave your product

Trust: You need to get the users’ trust in order to get their business for the long-term.

“Lock-in is not a business model.” It’s not good for users to not have control over their data. The Internet breaks all the distribution rules. It costs almost nothing. Now you get lock-in through innovation. Need to make product so good that the user doesn’t want (or need) to go anywhere else.

Most users don’t think about data liberation until the moment they want to leave.

Three questions to ask:
Can I get my data out?
How much is it going to cost to get my data out?
How much of my time is it going to take to get my data out?

Need a big download button to batch download your data. But there are issues: conversion issues, huge downloads, proprietary formats, and the largest issue: business that still try to lock-in people.

APIs are only the first step. Many users can’t use APIs. There is still a lot of work to do with data liberation. Want to make it even easier.

Take away: Data liberation for the win! Spread the word and the three questions to ask before giving a company your data to your family, friends, and library/archives users.