HathiTrust Research Center Data Capsule v1.0 Non Consumptive Analysis of HathiTrust repository using Data Capsule VM* Overview of Functionality

September 10th 2014 @ 12:00pm
Room E174

Beth Plale, Professor at School of Informatics and Computing and Director of Data to Insight Center
Data to Insight Center

Miao Chen, Research Associate
Data to Insight Center

Robert McDonald, Associate Dean for Library Technologies
Library Technologies

The first mode of access by the community of digital humanities and informatics researchers and educators to the copyrighted content of the HathiTrust digital repository will be to extracted statistical and aggregated information about the copyrighted texts. But can the HathiTrust Research Center support scientific research that allows a researcher to carry out their own analysis and extract their own information?


This question is the focus of a 3-year, $606,000 grant from the Alfred P. Sloan Foundation (Plale, Prakash 2011-2014), which has resulted in a novel experimental framework that permits analytical investigation of a corpus but prohibits data from leaving the capsule. The HTRC Data Capsule is both a system architecture and set of policies that enable computational investigation over the protected content of the HT digital repository that is carried out and controlled directly by a researcher. It leverages the foundational security principles of the Data Capsules of A. Prakash of University of Michigan, which allows privileged access to sensitive data while also restricting the channels through which that data can be released.


Ongoing work extends the HTRC Data Capsule to give researchers more compute power at their fingertips. The new thrust, HT-DC Cloud, extends existing security guarantees and features to allow researchers to carry out compute-heavy tasks, like LDA topic modeling, on large-scale compute resources.


HTRC Data Capsule works by giving a researcher their own virtual machine that runs within the HTRC domain. The researcher can configure the VM as they would their own desktop with their own tools. After they are done, the VM switches into a "secure" mode, where network and other data channels are restricted in exchange for access to the data being protected. Results are emailed to the user.


In this talk we discuss the motivations for the HTRC Data Capsule, its successes and challenges. HTRC Data Capsule runs at Indiana University.

See more at http://d2i.indiana.edu/non-consumptive-research


This lecture is part of the ongoing Digital Library Brown Bag Series. Follow and contribute to the presentations and discussions on twitter: #dlbb.
Fall 2014 Digital Library Brown Bag Schedule
Programs will be held from 12:00 pm to 1:00 pm EST in the Herman B Wells Library in Rooms E174 and E159 (Hazelbaker Hall in the Scholars' Commons).
Remote Access to the Brown Bag
This semester's Digital Library Brown Bag series will be available for remote access via the Web, unless otherwise specified. Anyone may log in; you do not need to be an IU affiliate.
Presentation slides and audio will be available via the Adobe Connect Meeting Service). Go to http://connect.iu.edu/diglib to view and listen to the presentation. If you are not a registered user for Connect Meeting/Breeze, select the "Enter as a Guest" option.
Sign up for email reminders! Send an email to iulist@iulist.indiana.edu with the message body: sub dl-brownbag-l Your Full Name

Read more about host(s):

Contact Info

Wells Library W501
1320 East Tenth Street
Indiana University
Bloomington, IN 47405
Michelle Dalmau
Michelle Dalmau - Head, Digital Collections Services, Associate Librarian
(812) 855-1261