Topic Exploration with the HTRC Data Capsule for Non- Consumptive Joint Conference on Digital Libraries 2015 | Knoxville, TN| 06.21.15 Robert H. McDonald | Jiaan Zeng - Data To Insight Center Jaimie Murdock – InPho Project Indiana University Tweet us - @HathiTrust #HTRC HATHI TRUST RESEARCH CENTER Tweet us - @InPhoproject
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Topic Exploration with the HTRC Data Capsule for Non-Consumptive
Joint Conference on Digital Libraries 2015 | Knoxville, TN| 06.21.15
Robert H. McDonald | Jiaan Zeng - Data To Insight CenterJaimie Murdock – InPho Project
Indiana University
Tweet us - @HathiTrust #HTRC
HATHI TRUST RESEARCH CENTER
Tweet us - @InPhoproject
#HTRC @HathiTrust
Tutorial Agenda
• 9:00-9:15 - An overview of the HTRC (Robert McDonald)
• 9:15-9:30 - HTRC Data Capsule Intro (Jiaan Zeng)• 9:30-9:45 - Intro to Topic Models and the InPho
Explorer (Jaimie Murdock)• 9:45-10:30 - Hands-On Parts 1&2• 10:30-10:45 - Break• 10:45-11:30 - Hands-On Parts 3&4• 11:30-11:45 – Advanced Notebooks (Jaimie Murdock)• 11:45-12:00 – HTRC Advanced Collaborative Support
(Robert McDonald)
HTRC@Events• HTRC UnCamp 2015 – March 30-
31, 2015 Ann Arbor, MI• Stephen Downie Keynote at JCDL
2015• Digital Humanities 2015 – June
29-July 3, 2015 Sydney Australia• (LSA)'s Biennial Linguistic
Institute, July 13, 2015 Chicago, IL• HILT 2015 – July 28-29, 2015
Indianapolis, IN
HATHI TRUST RESEARCH CENTER
Many thanks …HTRC IU Team• Beth Plale (PI)• Robert H. McDonald• Miao Chen• Guangchen Ruan• Zong Peng• Milinda Pathirage• Samitha Liyanage• Jiaan Zeng• Zong Peng• Leena Unnikrishnan• Nicholae Cline
HTRC UIUC Team• J. Stephen Downie (PI)• Beth Namachchivaya• Megan Senseney• Sayan Bhattacharyya• Loretta Auvil• Boris Capitanu• Harriet Green• Eleanor Dickson
#HTRC @HathiTrust
Outline
• What is the HTRC?• Non-Consumptive Research Paradigm• Current Architecture• Future Architecture• Advanced Collaborative Support (RFP)
#HTRC @HathiTrust
HathiTrust Digital Library
• HathiTrust is a partnership of 90+ academic & research institutions, offering a collection of millions of digitized titles.
• http://hathitrust.org
– IU is a founding member of the HathiTrust along with University of Michigan, University of California, and the University of Virginia
Digital Humani-ties (60)Education (60)Informatics (60)Observers (20)
194 existing user accountsLots of user accounts; good starting point.
Improve :• Increase amount of real work
being accomplished as measured by usage on HTRC’s compute resources Quarry and Big Red II at IU
• Develop educational uses• Develop informatics uses• Decrease number of observers
to 10%
Project 200 users at any one time of which 90% are doing relevant education/scholarship
9
HTRC Current Users (ca Now)
#HTRC @HathiTrust
Non-Consumptive Research Paradigm
• No action or set of actions on part of users, either acting alone or in cooperation with other users over duration of one or multiple sessions can result in sufficient information gathered from collection of copyrighted works to reassemble pages from collection.
• Definition disallows collusion between users, or accumulation of material over time. Differentiates human researcher from proxy which is not a user. Users are human beings.
HTRC
Complexity hiding interface
All the complexity
Tabular info
Statistical plots
Spatial plots
Request
HTRC Version 2.0
HTRC Goals• Provide a persistent and sustainable structure to
enable original and cutting edge research. – Leverage data storage and computational infrastructure at Indiana &
Illinois– Stimulate community development of new functionality and tools– Use tools to enable discoveries that would not be possible without the
HTRC
• Enable scholars to fully utilize content of HathiTrust Library while preventing intellectual property misuse within U.S. copyright law.
– Provision secure computational and data environment for scholars to perform research using HathiTrust Digital Library.
HTRC Organization2014-18
HTRC Executive Mgmt
Administrative Support
Core Development
Advanced Research
Advanced Collaborative
Support
Scholarly Commons
HTRC Data CapsuleHTRC Data Capsule@IU Team• Beth Plale (PI)• Jiaan Zeng• Guangchen Ruan
HTRC Data Capsule@Michigan Team• Atul Prakash (PI)• Alexander Crowell
Jiaan Zeng, Guangchen Ruan, Alexander Crowell, Atul Prakash, and Beth Plale. 2014. Cloud computing data capsules for non-consumptiveuse of texts. In Proceedings of the 5th ACM workshop on Scientific cloud computing (ScienceCloud '14). ACM, New York, NY, USA, 9-16. DOI=10.1145/2608029.2608031 http://doi.acm.org/10.1145/2608029.2608031Special Thanks to
Scholarly Commons User Support Service• Develop training materials • Educational workshops• Tool and workset creation• Collaborate with librarians and DH
centers at HT institutions• Assist researchers in HTRC text data
mining research projects• Led out of University of Illinois
Library; smaller group at IU• Resourced at 2.7 FTE.
20
#HTRC @HathiTrust
HTRC Future Work• Copyrighted content in progress• Advanced Collaborative Support
– The award model– Award content is HTRC ACS staff time– Collaborate with scholars on addressing their research needs related to HTRC– E.g. prototyping, running text analysis– Advocate open source; encourage extending the work to a grant submission
• Scholars Commons– Interaction with scholars to help using HTRC tools and services– An interface to interact with HTRC users via the channel of scholars commons– Series of workshops at IU and other places– Weekly consulting time– Every Wed 2:30 – 4:30pm, IU library, Scholars Commons 157R– Contact: Miao Chen, Nicholae Cline
#HTRC @HathiTrust
• For details http://www.hathitrust.org/htrc/faq• General contact info