SAN DIEGO SUPERCOMPUTER CENTER Emerging HIPAA and Protected Data Requirements for Research Computing at SDSC Ron Hawkins Director of Industry Relations / TSCC Program Manager April 23, 2014
Dec 14, 2015
SAN DIEGO SUPERCOMPUTER CENTER
Emerging HIPAA and Protected Data Requirements for Research
Computing at SDSC
Ron HawkinsDirector of Industry Relations /
TSCC Program ManagerApril 23, 2014
SAN DIEGO SUPERCOMPUTER CENTER
Objectives (for participation)
• Understand requirements for protected data processing on HPC systems
• Develop a roadmap for implementation at UCSD• Focus on “services” not “projects”• Understand how technology be used to
implement protected data environments• Contribute to understanding/solutions/best
practices across the community
SAN DIEGO SUPERCOMPUTER CENTER
What we are being asked for…
• dbGaP• Database of Genotypes and Phenotypes• Human genomic studies data administered by NIH• Must apply and must comply with dbGaP Code of Conduct
and “Security Best Practices” document• Bottom line: Don’t put the data on the Internet
• HIPAA• If you have to ask…
SAN DIEGO SUPERCOMPUTER CENTER
SDSC Roles/Functions/Services• Operate national HPC systems under XSEDE
program (Trestles, Gordon, Comet)• Operate a hybrid “hotel/condo” computing cluster
(TSCC) for UC researchers• Operate a co-location facility for UC campuses• Operate several storage chargeback facilities
(“Project”, “Cloud”, Commvault)• Conduct sponsored research and operate various
individual projects• Work with biotech industry & external research
institutes
SAN DIEGO SUPERCOMPUTER CENTER
Campus Overview
SDSC
Moores Cancer Center
UCSD School of Medicine
Salk Institute
Scripps Translational Science Institute
J. Craig Venter Institute
40GbE to CENIC (100GbE late 2014)
10GbE Campus Network
SAN DIEGO SUPERCOMPUTER CENTER
SDSC Data Center
CGHub(Cage)
COMET (late 2014)
STORAGE(Multiple Systems)
CO-LO CO-LO
CAGED CO-LO
TRESTLES
TSCC
GORDON
10GbE Network Fabric
MSKCC
ANNAI
12,000 SQ. FT.5,000 SQ. FT.
SAN DIEGO SUPERCOMPUTER CENTER
TSCC & Project Storage Use Case
SDSC DATA CENTER
TSCC
NFS
CAMPUS LAB
NFS
NFS“PROJECT”STORAGE
LAB USERS
OTHER SHARES(ON/OFF CAMPUS)
LAB SERVER
Medicaid Integrity Group Data EngineMedicaid Integrity Group Data Engine• The Center for Program Integrity’s Medicaid CI Platform• FISMA-Certified, HIPAA-Compliant CMS System of Record• Built in 2008/2009, Operations & Maintenance 2009-2016• 10+ years of Medicaid claims and reference data (~100 TB)• 26 families of Security Controls, over 200 controls, sub-controls • Implements NIST SP 800-53 and CMS ARS requirements• Data Warehouse, Analysis, BI, and Case Management Tools• 350+ Users (CPI, CMS Contractors, CMCS, OIG, DOJ, and Others)• 100+ Concurrent Users, 500+ Algorithms, 4000+ Daily Queries• Connections to CMS Networks and Data Transfer Capabilities
9
Sherlock CloudSherlock Cloud• Infrastructure as a service (IaaS), includes compliance of the entire
software architecture and management processes.• Meets federal “Cloud First” requirements and flexibility goals• Maintains the security and oversight aspects of a traditional managed
services model • Common standards, reliability, and compliance methods provide
economies of scale and a shared management knowledge base.• FISMA-certified, HIPAA compliant, and more open (Agile) environments
separate projects and enforce appropriate compliance• Undertaking FedRAMP Cloud Service Provider (CSP) certification,
becoming a requirement in many government contracts and grants
10
Sherlock CloudSherlock Cloud• Suite of component cloud services:
– Storage: File, Block, Database– Compute: Full virtualization; Support for Windows, Linux, and AIX– Shared Services: Backups, Authentication, Configuration Mgt., Ticketing,
Logging, High-Speed File Transfer, Remote Access, DNS, etc.– Security: Project-customized firewalling, IDS, and monitoring– Networking: Non-blocking 10Gb networking end to end– Disaster recovery: Multi-site backup and failover capabilities
• Used by CMS, NIH, CalIT2, UCSF, UCOP and UCSD• We Evaluate potential clients and only accept partners with a
commitment to securely operating their environments.
11
SAN DIEGO SUPERCOMPUTER CENTER
Protected Data on HPC
• Researchers value the HPC and storage services provided by SDSC
• Startup costs of dbGaP- or HIPAA-compliant “silos” are too much for most projects
• There are some workarounds but have limits:• De-identified data• Obtain consent and IRB approval for research use of
human subject data (but not PII)
• “Projects” lack economies of scale, on-demand service, and elasticity
SAN DIEGO SUPERCOMPUTER CENTER
What we are doing at present…
• Continuing to work with researchers on a project basis
• Continuing to evaluate and understand use cases
• Examining feasibility of one or more pilot projects in FY 2014 (7/1/14-6/30/15) – under auspices of UCSD’s “Research Cyberinfrastructure” program
SAN DIEGO SUPERCOMPUTER CENTER
How do we?
• Understand requirements and best practices for protected data processing in HPC?
• Develop a roadmap for implementation on our campus?
• Develop “services” not “projects”?• Deploy technology to implement protected data
environments on shared infrastructure?• Contribute to understanding/solutions/best
practices across the community?