SEASR Introduc.on High Performance Compu.ng in the Humani.es, Arts, and Social Science Workshop UIUC/NCSA July 28, 2008 LoreHa Auvil Na.onal Center for Supercompu.ng Applica.ons University of Illinois at Urbana Champaign
SEASRIntroduc.on
HighPerformanceCompu.ngintheHumani.es,Arts,andSocialScienceWorkshop
UIUC/NCSAJuly28,2008
LoreHaAuvil
Na.onalCenterforSupercompu.ngApplica.onsUniversityofIllinoisatUrbanaChampaign
SEASRGoalsThisprojectwillfocusondeveloping,integra.ng,deploying,andsustainingasetofreusableandexpandablesoPwarecomponentsandasuppor.ngframeworkthatwillbenefitabroadsetofdataminingapplica.onsforscholarsinhumani.es.
ThekeygoalsestablishedforthiseffortareasetofsoPwarecentricdirec.ves:
– Supportthedevelopmentofastate‐of‐the‐artsoPwareenvironmentfordatamanagementandanalysisofdigitallibraries,repositoriesandarchives,aswellaseduca.onalplaVormsthatareexpectedtocontributetomanyofthehumani.esbreakthroughsofthe21stcentury.
– Supportthecon.nueddevelopment,expansion,andmaintenanceofend‐to‐endsoPwaresystem–userinterfaces,workflowengines,datamanagement,analysisandvisualiza.ontools,collabora.vetools,andothersoPwareintegratedintoacompleteenvironment–tobringthefullpowerofdataanaly.cstothescholars.
– Supporteduca.onandtrainingforuseofthissoPwareenvironmentforanalysisthroughworkshopstopromoteitsusageamongscholars.
ProjectHighlights
• SEASRwillemployacomprehensiveenvironmentthatintegratestwocomplementaryandrevolu.onarytechnicaladvances–ServiceOrientedArchitectureandSeman.cWeb,intoasinglecompu.ngarchitecture–Seman.cEnabledServiceOrientedArchitecture
• SEASRaddressesthechallengesoftransforminginforma.onintoknowledgebyconstruc.ngthesoPwarebridgesthatarerequiredtomovefromtheunstructuredandsemi‐structureddataworldtothestructureddataworld
WhatdoesthismeanfortheDHcommunity?
SEASRwill:
• helpscholarsaccessexis.nglargedatastoresmorereadily• providescholarswithenhanceddatasynthesisandquery
analysis
– fromfocuseddataretrievalanddataintegra.on– tointelligenthuman‐computerinterac.onsforknowledgeaccess
– toseman.cdataenrichment– toen.tyandrela.onshipdiscovery– toknowledgediscoveryandhypothesisgenera.on
• empowercollabora.onamongscholarsbyenhancingandinnova.ngvirtualresearchenvironments
Seman.callyEnabledSOA
Seman.callyEnabledSOA2
TechnicalComponents
• High‐LevelComponentRequirements– Hardwareabstrac.on(virtualiza.on)– Assetsstorageandcura.on– Taskcrea.onanddefini.on(components)– Processdescrip.on(flows)– Openservicesandstandardizemetadataexchange– Easyreachingtoanontechnicalcommunity(visualprogrammingandinterac.onUIs)
– Socialinterac.onplaVormforresearchers– NLP,machinelearning,andunderstandablevisualiza.ons
TechnicalComponentsTechnicalarchitecturethatemphasizesflexibility,scalability,
modularity,providescommunityhubtoheterogeneoussystems,andreducespathdependence
• Seman.c‐webdrivenarchitecturetostandardizeinteroperability
• Designforcommunitybuildingandtoencouragesharingandpar.cipa.on
• Data‐intensiveflowstomovefromasimpledesktoptoalargeclustertransparently
• Movablecomputa.on.Computa.oncanbetransparentlyshippedtotheassets(complyingwithprivacyissues)
• Quickre‐configurability(flowscanbeadaptedandreusedinseconds)
• Buildtoreuseandcross‐fer.liza.onacrossdomains
SEASRComponents
Virtualiza.onInfrastructure
HadoopFSSharedStores SOAGateways
MeandreInfrastructure
Visualiza.on
MetadataStores
ComponentRepository ComponentDiscovery
MeandreData‐IntensiveFlows
SEASRApps SEASRServicesSEASRPlugins SEASRWebApps
Analy.csData
GatewayConnec.onsDataPersistence
DataTransforma.on
Predic.veModelingDiscovery
NaturalLangProcessing
Char.ngModelingVis
InfoVis
Develop
erToo
ls
SEASRApps:CommunityHub
MoreCommunityHub
CommunityHubImplementa.on
Implemen.ngCommunityHubfunc.onalityaswordpressplugins
MoreCommunityHub
MeandreWorkbenchDesign
MeandreWorkbench
SEASRApps:WebApp
• Administra.ontool– Future:Addsecuritylevels
• Jobmanagementcontrol
• Usermanagement/profile
• Repositoryexplora.on
MeandreInfrastructure
• ComponentandFlowAPI• Repository– Future:VersioningofComponentsandFlows
• Execu.onEngine– Future:Parallelism,checkpoin.ng,faulttolerance,extendfiringpolicy
• Debugger/Monitorforflowexecu.on• ZigZag– Highlevellanguagefordescribingflows– Interpreter/compilerforexecu.ngtheflows– Automa.cparalleliza.onatcomponentlevel
MeandreInfrastructure
• WebServiceOpera.ons– Callstotherepositoryforflowsandcomponents
– Current:REST– Future:SOAPenable
• WebUI– Current:ComponentsusewebUIfragment(whichpasshtml)
– Future:Enablemorecomplexvisualcomponentsforlandscapeconstruc.on
ComponentRepository
• MeandreRepository– RDFdescrip.onsforcomponentsandflows
– Supportforrdfonlocalfile;webaccessiblefiles;jdbcenabledrela.onaldatabase(Derby)oratriplestore
– Supportforrdf,Hl,ntformats
SEASRComponents:NLP• Syntac.canalysis
– Tokeniza.on– POStagging– Shallowparsing– Customliterarytagging
• Seman.canalysis– NamedEn.tytagging– Seman.cCategory(unnamed
en.ty)tagging– Co‐referenceresolu.on– Ontologicalassocia.on(WordNet,
VerbNet)– Seman.cRoleanalysis– Concept‐Rela.onextrac.on– Logicalanalysis– Eventsequenceinference– Eventcausalinference
• TopicFiltering– bytopic– by.meperiod– byloca.on– etc.
• Seman.cnetwork– Extractpredicate‐argument&other
triples– ConverttoRDFtriples– AddtriplestoRDFstore– Posestructuredqueries– Graph‐basedinference
• Explora.on,DiscoveryandKnowledgeExtrac.on– Query‐based–ques.onanswering– Visual–naviga.on
SEASRComponent:MachineLearning
• DataTransforma.on– Featureextrac.onandconstruc.on– Boos.ngandBagging
• UnsupervisedLearning– Clustering,SOMs– HypothesisGenera.on
• SupervisedLearning– Tradi.onalSta.s.calLinearMethods– Bayesian,SupportVectorMachines,DecisionTrees– EnsembleModels
• Op.miza.onApproaches– GAs
Developers:EclipsePlugin
SEASR@Work‐MONK
SEASR@Work–NEMA