Page 1
WholeTale:TheExperience ofResearch…through reproducible,computationalnarratives
YesWorkflow:Revealingworkflow,provenancefromscriptsKurator:AutomatingdatacleaningworkflowsEulerX:Agreeingtodisagreeaboutvarianttaxonomies
BertramLudä[email protected]
BCoN Workshop2018-02-13..14UKansas
Director,CenterforInformaticsResearchinScience&Scholarship(CIRSS)SchoolofInformationSciences(iSchool@Illinois)
&NationalCenterforSupercomputingApplications(NCSA)&DepartmentofComputerScience(CS@Illinois)
1
Page 2
WholeTale:Thenextstepintheevolutionofthescholarlyarticle:The“Living”Paper
• 1st Generation:– narrative (prose)
• 2nd Generation:plus …– name..identify..include(accessto)data
• 3rd Generation:plus …– name..reference..includecode (software)..– andprovenance …andexecenvironment(containers)
Ludäscher:Whole-Tale++ 2
WholeTale
WholeTaleDashboard
Page 3
WholeTale:What’sinaname?
(1)WholeTale⇔WholeStory:◦ Support(computational /data)scientists◦…alongthecompleteresearchlifecycle◦ ...fromexperimentto(newkindof)publication◦ ...andback!
(2)WholeTale⇔ fortheLongTailofScience–Easysharingofyourcomputationalnarratives,data,andexec-env since2017!
–Powerapplicationsforeveryone!
3Ludäscher:Whole-Tale++
Page 4
TheWholeTale:MergingScienceandCyberinfrastructurePathways
NSF-DIBBSaward (5years,5institutions)• Illinois(NCSA&iSchool)• BertramLudäscher(PI),MTCampbell(PM)[KandaceTurner],VictoriaStodden(coPI),MattTurk(coPI),KacperKowalik(sw-architect),CraigWillis(dev)
•UofChicago• KyleChard(coPI),MihaelHategan(dev)
•UTAustin/TACC•NiallGaffney(coPI),SivaKulasekaran(dev)
•UNotreDame• JarekNabrzyski(coPI),IanTaylor(dev),AdamBrinckman(dev)
•UCSB/NCEAS•Matt Jones(coPI),BryceMecum(dev)
4
Page 5
Whole TaleMotivation• Can'treproduceresultbecause:
• Don'tknowhowtorunanalysis
• Can'tgetthesoftwarerunning
• Can'tpayforthecomputerorcomputepowertheresultwascomputedon
Source:BryceMecum,WTteam@NCEAS5
Page 6
Whole TaleVisionAddressingreproducibility
6
Data Code
ExecutionEnvironment
Article
Page 7
Whole TaleVision• Livingpublication
(data+code+environment)
• Facilitatereproducibility
• Encourageinvestigationofresultsmakingiteasytorecreatetheenvironmenttheresultwascreatedin
Article
7
Page 8
Whole TaleVisionAddressingreproducibility
Article
Tale
+
8
Page 9
WholeTaleVision
Tale
Data
{Code
D1PROV
9
Page 10
WTArchitecture
10Ludäscher:Whole-Tale++
https://dashboard.wholetale.org
Page 11
DEMO:(re-)useexistingtaleor…
Ludäscher:Whole-Tale++ 11https://dashboard.wholetale.org
Page 12
…CreateaNewTale!
Ludäscher:Whole-Tale++ 12https://dashboard.wholetale.org
Page 13
Ludäscher:Whole-Tale++ 13https://dashboard.wholetale.org
Page 14
Ludäscher:Whole-Tale++ 14https://dashboard.wholetale.org
Page 15
Ludäscher:Whole-Tale++ 15https://dashboard.wholetale.org
Page 16
Ludäscher:Whole-Tale++ 16https://dashboard.wholetale.org
Page 17
Ludäscher:Whole-Tale++ 17https://dashboard.wholetale.org
Page 18
RunningwithRStudio:LocallyoronWT…
Ludäscher:Whole-Tale++ 18You’reupandrunningquicklyonWholeTale!!
Page 19
MaybeyoujustwanttouseWTtolearnRforDataScience...
Ludäscher:Whole-Tale++ 19
Page 20
AnotherexampleTale:LIGOgravitationalwavedetection
(tutorialJupyter notebook)
Ludäscher:Whole-Tale++ 20
Page 21
Ludäscher:Whole-Tale++ 21https://dashboard.wholetale.org
Page 22
Ludäscher:Whole-Tale++ 22https://dashboard.wholetale.org
Page 23
Ludäscher:Whole-Tale++ 23https://dashboard.wholetale.org
Page 24
Ludäscher:Whole-Tale++ 24https://dashboard.wholetale.org
Page 25
Ludäscher:Whole-Tale++ 25https://dashboard.wholetale.org
Page 26
Ludäscher:Whole-Tale++ 26https://dashboard.wholetale.org
Page 27
Ludäscher:Whole-Tale++ 27https://dashboard.wholetale.org
Page 28
New&UpcomingFeaturesinWT...• AddyourownFrontends(e.g.OpenRefine,..)• Persistent,sharedorpersonalfiles:
– /data/(registered/externaldata,read-only,associatedwithatale)– /home/(yourowndata,r/w,associatedwithallyourtales)– /workspace/(sharedr/wdata,associatedwithatale,acrossallusers)
• WT“DerivedTales”:– takeatale;modifyittoyourliking;andpublishasaderivedwork
• WT“Take-Out”:– Wanttorunyourtaleselsewhere?– Take-out yourtaleandrunonyouron(orcloud)platform
• WT“Scale-Out”:– IftheWT-dashboardisn’tenoughè runyourownWTsystem!
• WT Provenance support:– …viaDataONE provenancetools,ProvONE model(W3CPROVextension)– …viaYesWorkflow
• InterestinjoiningaWTBiodiversityInformaticsWorkingGroup!?– Wealreadyhave:archaeology&ecology,astronomy,materialsscience– Yourinputwanted!(isWTdevelopingsomethingusefulforyou?)– TryoutWT,createsomeexamples(inR,Python,...)andprovidefeedback!– =>possibilitytofundasummerintern!
Ludäscher:Whole-Tale++ 28
Page 29
AdditionalMaterial…
…teasersahead!
Ludäscher:Whole-Tale++ 29
Page 30
Provenanceis:keepingrecords …
• GrandCanyon’srocklayersarearecordoftheearlygeologichistoryofNorthAmerica.Theancestralpuebloan granariesatNankoweap Creektellarchaeologistsaboutmorerecenthumanhistory.(ByDrenaline,licensedunderCCBY-SA3.0)
• Notshown:computationalarchaeologistsreconstructingpastclimatefrommultipletree-ringdatabasesè computationalprovenanceiskeyfortransparency &reproducibility
Ludäscher:Workflows&Provenance=>Understanding 30
Page 31
...andprovenanceis:Understanding whathappened!
Zrzavý,Jan,DavidStorch,and StanislavMihulka.Evolution:EinLese-Lehrbuch.
Springer-Verlag,2009.
Author:Jkwchui (BasedondrawingbyTruth-seeker2004)
Ludäscher:Workflows&Provenance=>Understanding 31
Page 32
Computational Provenance …• Origin,processinghistoryofartifacts
– dataproducts,figures,...– also:underlyingworkflowè understandmethods,dataflow,anddependencies
Ludäscher:Workflows&Provenance=>Understanding 32
Climate Change Impacts in the United States
U.S. National Climate AssessmentU.S. Global Change Research Program
Page 33
YesWorkflow:HowdoestheLIGOscriptproduceitsresults??
Ludäscher:Whole-Tale++ 33
Page 34
YesWorkflow:Prospective&RetrospectiveProvenance…(almost)forfree!
• YWannotationsina(Python,R,…)scriptrecreateaworkflowviewfromthescript…
cassette_id
sample_score_cutoff
sample_spreadsheetfile:cassette_{cassette_id}_spreadsheet.csv
calibration_imagefile:calibration.img
initialize_run
run_logfile:run/run_log.txt
load_screening_results
sample_namesample_quality
calculate_strategy
rejected_sample accepted_sample num_images energies
log_rejected_sample
rejection_logfile:/run/rejected_samples.txt
collect_data_set
sample_id energy frame_numberraw_image
file:run/raw/{cassette_id}/{sample_id}/e{energy}/image_{frame_number}.raw
transform_images
corrected_imagefile:data/{sample_id}/{sample_id}_{energy}eV_{frame_number}.img
total_intensitypixel_count corrected_image_path
log_average_image_intensity
collection_logfile:run/collected_images.csv
YW!
Ludäscher:Whole-Tale++ 34
@BEGIN..@END..@IN..@OUT..@URI..@LOG..
Page 35
GetModernClimate
PRISM_annual_growing_season_precipitation
SubsetAllData
dendro_series_for_calibration
dendro_series_for_reconstruction CAR_Analysis_unique
cellwise_unique_selected_linear_models
CAR_Analysis_union
cellwise_union_selected_linear_models
CAR_Reconstruction_union
raster_brick_spatial_reconstruction raster_brick_spatial_reconstruction_errors
CAR_Reconstruction_union_output
ZuniCibola_PRISM_grow_prcp_ols_loocv_union_recons.tif ZuniCibola_PRISM_grow_prcp_ols_loocv_union_errors.tif
master_data_directory prism_directory
tree_ring_datacalibration_years retrodiction_years
Paleoclimate Reconstruction(openSKOPE.org)• …explainedusingYesWorkflow!
KyleB.,(computational)archaeologist:"Ittookmeabout20minutestocomment.LessthananhourtolearnandYW-annotate,all-told."
Ludäscher:Whole-Tale++ 35
Page 36
Data Curation Workflows (Filtered-Push … Kepler … Kurator projects)
Ludäscher:Whole-Tale++ 36
Page 37
Ludäscher:Whole-Tale++ 37
http://kurator.acis.ufl.edu/kurator-web/
Page 38
Ludäscher:Whole-Tale++ 38
http://kurator.acis.ufl.edu/kurator-web/
Page 39
DwCA TaxonLookupWorkflow
• Declareinputs,outputs,andsteps ofascript(orwf)withYWannotationsto...– communicateprovenancegraphically(viagraphviz)
– combine differentformsofprovenance
– query provenance• SimpleYWannotationsincomments:– @BEGINStep,@ENDStep– @INData,@OUTData– @URITemplate,@LOGPattern
Ludäscher:Whole-Tale++ 39
�����������������
�������������������������������������������������������������������
��������������������������������������������������������������
������������������������������������������������
�������������������������
�������������������������������������������������������������
����������
�������������������������������������������������������������������������������������������������������
����������������
���������������������
�������������������������������������������������������
����������������
�������������������������������������������������������
�������������������
������������������������������������������
������������������
����������������������������������������
�����������������
���������������������������������������
������������
�������������������������������������������������������������������
��������������������������������������������������������
�����������������
Page 40
TaxonLookupWorkflow:DataViewandProcessView
Ludäscher:Whole-Tale++ 40
Page 41
Thestoryoftwoindividual
records
Ludäscher:Whole-Tale++ 41
�����������������
�����������������
�������������������
�������
����������
����������
�����������������
�����
���������
��������������
����������������
����������
���������������
�����������������
����������������
������
������������������
����������������
�������������������������������
�����������
������������������
����
�����������
������������
�������������
���������������������
�������������������������������������������������������������������
�����������������
�������������������������������������������������������������������������
�����������������
������������������
����������������
�������
����������
�����������
������������������
�����
���������
��������������
����������������
����������
���������������
�����������������
����������������
���������
�����������������
�������������������
���������������������������������
����������
�����������������
��������������������������������������
�����������
������������
�������������
���������������������
�������������������������������������������������������������������
�����������������
������������������������������������������������������������������
• OnetooktheGBIFroute,while…
• … theotherwentallWORMS!
Non-Marine?è GBIF
Marine?èWORMS
Page 42
Theaggregate story..
Ludäscher:Whole-Tale++ 42
�����������������
�����
���������
��������������
����������������
��������������������
�����������������
��������������������������
�������
����������
������������������
�������������������������
�����������������
����������������������������
�����������
�������������������������������
���������
����������
������������������������������
��������
�����������
������������
�������������
���������������������
�������������������������������������������������������������������
�����������������
�������������������������������������������������������������������������
• Howmanyrecordswereobservedasinputsoroutputsofworkflowsteps?
• WerethereanyNULLvalues?Howmany?
Page 43
YesWorkflow Summary• Lightweight YWannotationscan
beaddedeasilytoyourscriptstoreapworkflowbenefits– Documentation ofwhat’s
important– Visualization ofdependencies– Queryingprovenance(prospective,
retrospective,andhybrid)– Independent ofsystemorlanguage
used(R,Python,MATLAB,workflowtools,…)
èmake provenanceactionableè provenanceforself!
=> github.com/yesworkflow-org/yw=> try.yesworkflow.org
Ludäscher:Whole-Tale++ 43
�����������������
�������������������������������������������������������������������
��������������������������������������������������������������
������������������������������������������������
�������������������������
�������������������������������������������������������������
����������
�������������������������������������������������������������������������������������������������������
����������������
���������������������
�������������������������������������������������������
����������������
�������������������������������������������������������
�������������������
������������������������������������������
������������������
����������������������������������������
�����������������
���������������������������������������
������������
�������������������������������������������������������������������
��������������������������������������������������������
�����������������
�����������������
�����
���������
��������������
����������������
��������������������
�����������������
��������������������������
�������
����������
������������������
�������������������������
�����������������
����������������������������
�����������
�������������������������������
���������
����������
������������������������������
��������
�����������
������������
�������������
���������������������
�������������������������������������������������������������������
�����������������
�������������������������������������������������������������������������
Page 44
DemoTime
Ludäscher:Whole-Tale++ 44
(Disclaimer) https://github.com/idaks/dataone-ahm-2016-posterhttps://github.com/idaks/wt-prov-summer-2017https://github.com/yesworkflow-org/yw-idcc-17
Page 45
DataONE:SearchandProvenanceDisplay
45Ludäscher:Whole-Tale++
Page 46
DataONE:SearchandProvenanceDisplay
46Ludäscher:Whole-Tale++
Page 47
Adding YesWorkflow to DataONEYaxing’s script withinputs &outputproducts
Christopher’sYesWorkflow
model
ChristopherusingYaxing’s outputsasinputsforhisscript
Christopher’sresultscanbetracedbackall
thewaytoYaxing’sinput
Ludäscher:Whole-Tale++ 47
Page 48
Yi-YunCheng1,NicoFranz2,JodiSchneider1,Shizhuo Yu3,ThomasRodenhausen4,BertramLudäscher11SchoolofInformationSciences,UniversityofIllinoisatUrbana-Champaign;2SchoolofLifeSciences,ArizonaStateUniversity;3DepartmentofComputerScience,UniversityofCaliforniaatDavis;4SchoolofInformation,UniversityofArizona
Agreeing to Disagree: Reconciling Conflicting Taxonomic Views using a Logic-based Approach
Acknowledgments
Supportoftheauthors’researchthroughtheNationalScienceFoundationiskindlyacknowledged(DEB-1155984,DBI-1342595,andDBI-1643002).TheauthorsthankProfessorKathrynLaBarreforhercommentsandsuggestions.WewouldalsoliketothankDr.LaetitiaNavarroandJeffTerstriep forhelpwithcreatingmapoverlaysinQGIS.
CONCLUSION
• Ourlogic-basedtaxonomyalignmentapproachcanbeusedtosolvecrosswalking issuesWewillbeabletomitigatethemembershipconditionproblemsthatoccurinequivalentcrosswalking.
• RCC-5approachpreservestheoriginaltaxonomieswhileprovidinganalignmentviewWecansolvedataintegrationproblemsthathappeninthemorecoarse-grainedrelativecrosswalking,whichotherwiseissubjectedtoinformationloss.
• Ourstudyalsounderscoresthebenefitsofdesigningdifferentalignmentworkflows(Bottomupvs.Top-down)tomatchtheneedsofspecifictaxonomyalignmentproblemsBottom-upapproach:seemstoworkwellwheneverwehavenon-overlappingrelationshipsattheleaf-level(lowest-level)articulations,andwearenotsurehowthehigher-levelconceptsshouldbealigned.
Top-downapproach:seemsfavorablewhenthereisanexpectationofcertainhigher-levelarticulationsinconjunctionwithunder-specified,complex,andoftenoverlappingleaf-levelrelations.
RELATEDWORK
• TaxonomyAlignmentProblems(TAP)TaxonomiesT1,T2 areinter-linkedviaasetofinputarticulations A,definedasRCC-5relations, toyielda“merged”taxonomyT3 .
• Euler/XArticulations – aconstraintorrulethatdefinesarelationship(asetconstraint)betweentwoconceptsfromdifferenttaxonomies.
RegionConnectionCalculus(RCC-5)
PossibleWorlds–WhenencodingandsolvingTAPsviaASP,thedifferentanswersetsrepresentalternativetaxonomymergesolutionsorpossibleworlds(PWs).
INTRODUCTION
Tina:HeyAmy,canyourecommendasignaturedishfromwhereyoulive?
Amy:Oh,definitelythehalf-smokesfromtheNortheast!Theyarethesetastyhalf-porkandhalf-beefsausages.
Tina:Whatacoincidence!Wehavehalf-smokesintheSouth,too!WheredoyouliveintheNortheast?NewYork?Boston?
Amy:Wrongguesses!WheredoyouliveintheSouth?
TinaandAmytogether:Washington,D.C.
[Thetwoofthemlookateachother,confused.]
“Inthefaceofincompatibleinformationordatastructuresamongusersoramongthosespecifyingthesystem,attemptstocreateunitaryknowledgecategoriesarefutile.Rather,parallelormultiplerepresentationalformsarerequired…”(Bowker&Star,2000).
CASE1RESULTS:CENvs.NDC
• State-levelalignmentsareallcongruent(Bottom-up)• Inferrednewarticulationsforregional-levelalignments
CASE2RESULTS:CENvs.TZ
Figure 3. (Left) CEN-NDC taxonomy alignment problem with 49 input articulations between TCEN and TNDCFigure 4. (Right) The unique possible world (PW) T3 reconciling TCEN and TNDC via inferred relationships
Figure 1. National Diversity Council map (NDC) vs. Census Bureau map (CEN)
• Github link:https://github.com/EulerProject/ASIST17
• Email:[email protected]
West
Southwest Southeast
Midwest North-east
West
South
Midwest North-east
PacificMountain
CentralEastern
West
South
Midwest
North-east
RESEARCHDESIGN
Step1. SupplyinputtaxonomiesT1 andT2Step2.FormulateRCC-5articulationsbetweenT1 andT2Step3. IterativelyeditarticulationsinEuler/X
Y X X YX Y X Y X Y
CongruenceX == Y
InclusionX > Y
Inverse InclusionX < Y
OverlapX>< Y
DisjointnessX ! Y
T1 T2
T1 T2
Inconsistent (N=0) Ambiguous (N>1)
T3
Add/Edit Articulations A
Euler/X
N Possible Worlds
N=1 N=0 or N>1
R1
R2
R3
R4
R5
R6
R7
R8
R9
CEN.Midwest
CEN.USATZ.USA
CEN.West
CEN.NortheastTZ.Eastern\CEN.Midwest
TZ.Eastern\CEN.South
CEN.South
CEN.South*TZ.CentralTZ.Central\CEN.Midwest
CEN.South\TZ.Eastern
CEN.South\TZ.Mountain
TZ.Central
CEN.Midwest\TZ.Eastern
TZ.Mountain\CEN.SouthTZ.Mountain
CEN.Midwest\TZ.Mountain
TZ.Mountain\CEN.Midwest
CEN.Midwest*TZ.Mountain
CEN.Midwest\TZ.Central
TZ.Mountain\CEN.West
CEN.Midwest*TZ.Eastern
CEN.West*TZ.Mountain
CEN.South*TZ.MountainCEN.South\TZ.Central
TZ.Eastern
CEN.South*TZ.Eastern
CEN.Midwest*TZ.CentralTZ.Central\CEN.South
TZ.PacificCEN.West\TZ.Mountain
Nodes
CEN 4newComb 18comb 1TZ 4
Edges
input 6inferred 37
CEN.IL NDC.IL==
CEN.IN NDC.IN==
CEN.RI NDC.RI==
CEN.IA NDC.IA==
CEN.WV NDC.WV==
CEN.KS NDC.KS==
CEN.KY NDC.KY==
CEN.TX NDC.TX==
CEN.NortheastCEN.VTCEN.MA
CEN.ME
CEN.CT
CEN.PA
CEN.NY
CEN.NH
CEN.NJ
CEN.South
CEN.TN
CEN.MS
CEN.MD
CEN.DC
CEN.DE
CEN.VA
CEN.FL
CEN.AR
CEN.AL
CEN.OK
CEN.SC
CEN.LACEN.GA
CEN.NC
CEN.ID NDC.ID==
NDC.TN==
CEN.WY NDC.WY==
NDC.VT==
NDC.MS==
CEN.MT NDC.MT==
NDC.MA==
CEN.USA
CEN.Midwest
CEN.West
NDC.ME==
NDC.MD==
CEN.MI NDC.MI==
CEN.MN NDC.MN==
NDC.DC==
NDC.DE==
CEN.OR NDC.OR==
CEN.OH NDC.OH==
NDC.VA==
NDC.FL==
NDC.AR==
CEN.AZ NDC.AZ==
NDC.AL==
NDC.OK==
NDC.CT==
CEN.CO NDC.CO==
CEN.CA NDC.CA==
CEN.SD NDC.SD==
NDC.SC==
CEN.MO
CEN.ND
CEN.NE
CEN.WI
NDC.LA==
NDC.MO==
CEN.UT NDC.UT==
NDC.GA==
NDC.PA==
CEN.NV
CEN.NM
CEN.WA
NDC.NY==
NDC.NV==
NDC.NM==
NDC.WA==
NDC.NH==
NDC.NJ==
NDC.ND==
NDC.NE==
NDC.WI==
NDC.NC==
NDC.West
NDC.Midwest
NDC.Northeast
NDC.Southeast
NDC.USA
NDC.Southwest
Nodes
CEN 54NDC 55 Edges
isa_CEN 53isa_NDC 54Art. 49
CEN.West
NDC.Southwest
CEN.USANDC.USA
CEN.Northeast
NDC.Northeast
CEN.SouthNDC.Southeast
NDC.West
CEN.DCNDC.DC
CEN.NMNDC.NM
CEN.NDNDC.ND
CEN.MidwestNDC.Midwest
CEN.AZNDC.AZ
CEN.CANDC.CA
CEN.MTNDC.MT
CEN.MANDC.MA
CEN.INNDC.IN
CEN.NVNDC.NV
CEN.MDNDC.MD
CEN.CTNDC.CT
CEN.NHNDC.NH
CEN.KYNDC.KY
CEN.PANDC.PA
CEN.CONDC.CO
CEN.WANDC.WA
CEN.MINDC.MI
CEN.VANDC.VA
CEN.WINDC.WI
CEN.NENDC.NE
CEN.SDNDC.SD
CEN.MNNDC.MN
CEN.MSNDC.MS
CEN.IDNDC.ID
CEN.WVNDC.WV
CEN.NYNDC.NY
CEN.NJNDC.NJ
CEN.UTNDC.UT
CEN.MENDC.ME
CEN.ILNDC.IL
CEN.TNNDC.TN
CEN.VTNDC.VT
CEN.GANDC.GA
CEN.DENDC.DE
CEN.NCNDC.NC
CEN.OKNDC.OK
CEN.MONDC.MO
CEN.SCNDC.SC
CEN.ARNDC.AR
CEN.TXNDC.TX
CEN.LANDC.LA
CEN.OHNDC.OH
CEN.IANDC.IA
CEN.KSNDC.KS
CEN.RINDC.RI
CEN.WYNDC.WY
CEN.FLNDC.FL
CEN.ORNDC.OR
CEN.ALNDC.AL
Nodes
CEN 3NDC 4comb 51 Edges
input 61inferred 3
overlapsinferred 3
CEN.Northeast
TZ.Eastern
<
CEN.Midwest><
TZ.Mountain
><
TZ.Pacific
!
CEN.South
><
><
!
TZ.Central
><
CEN.USA
CEN.West
TZ.USA
==
!
><
!
Nodes
CEN 5TZ 5
Edges
isa_CEN 4isa_TZ 4Art. 12
CEN.Midwest
CEN.USATZ.USA
TZ.Eastern
TZ.Central
TZ.Mountain
CEN.South
CEN.Northeast
CEN.West TZ.Pacific
Nodes
CEN 4comb 1TZ 4
Edges
input 7overlapsinput 6overlapsinferred 1
inferred 1
R1 R2
R3
R4
R5
R6 R7
R8
R9
Figure 2. The process of aligning taxonomies T1 and T2 with Euler/X
Figure 5. Top-downinput alignments between TCEN and TTZ
Figure 6. The unique PW for the TCEN with TTZ alignment
Figure 10. Combined concepts solution for TCEN and TTZ
taxonomy CEN Census_Regions(USA Northeast Midwest South West)(Northeast CT MA ME NH NJ NY PA RI VT)(Midwest IL IN IA KS MI MN MO NE ND OH SD WI)(South AL AR DE DC FL GA KY LA MD MS NC OK SC TN TX VA WV)(West AZ CA CO ID MT NV NM OR UT WA WY)
taxonomy NDC National_Diversity_Council(USA Midwest Northeast Southeast Southwest West)(Northeast CT DC DE MD MA ME NH NJ NY PA RI VT)(Midwest IA IL IN KS MI MN MO ND NE OH SD WI)(Southeast AL AR FL GA KY LA MS NC SC TN VA WV)(Southwest AZ NM OK TX)(West CA CO ID MT NV OR WA WY UT)
articulations CEN NDC[CEN.AL equals NDC.AL][CEN.AR equals NDC.AR][CEN.AZ equals NDC.AZ][CEN.CA equals NDC.CA][CEN.CO equals NDC.CO][CEN.CT equals NDC.CT][CEN.DC equals NDC.DC][CEN.DE equals NDC.DE][CEN.FL equals NDC.FL][CEN.GA equals NDC.GA][CEN.IA equals NDC.IA][CEN.ID equals NDC.ID][CEN.IL equals NDC.IL][CEN.IN equals NDC.IN][CEN.KS equals NDC.KS][CEN.KY equals NDC.KY][CEN.LA equals NDC.LA][CEN.MA equals NDC.MA][CEN.MD equals NDC.MD][CEN.ME equals NDC.ME][CEN.MI equals NDC.MI][CEN.MN equals NDC.MN]...
Quick Scan!
taxonomy CEN Census_Regions(USA Midwest South West Northeast)
taxonomy TZ Time_Zone(USA Pacific Mountain Central Eastern)
articulations CEN TZ[CEN.Midwest disjoint TZ.Pacific][CEN.Midwest overlaps TZ.Eastern][CEN.Midwest overlaps TZ.Mountain][CEN.Northeast is_included_in TZ.Eastern][CEN.South disjoint TZ.Pacific][CEN.South overlaps TZ.Central][CEN.South overlaps TZ.Eastern][CEN.South overlaps TZ.Mountain][CEN.USA equals TZ.USA][CEN.West disjoint TZ.Central][CEN.West disjoint TZ.Eastern][CEN.West overlaps TZ.Mountain]
Ludäscher:Whole-Tale++ 48
Page 49
Foranothertime?Non-unitary syntheses
of systematic knowledgeNico Franz
School of Life Sciences, Arizona State University
CIRSS Seminar – Center for Informatics Research in Science and Scholarship
February 17, 2017 – iSchool, University of Illinois Urbana-Champaign
@ http://www.slideshare.net/taxonbytes/franz-2017-uiuc-cirss-non-unitary-syntheses-of-systematic-knowledge 49Ludäscher:Whole-Tale++
Tracingtaxonomicnames(concepts!)overtime…
Page 50
Taxonomic concept alignment, Andropogon glomeratus-virginicus complex, spanning across 11 classifications authored 1889-2015
• 36 unique taxonomic names
• 88 taxonomic concept labelsÞ name sec. author strings
• Alignment by A.S. WeakleyÞ row position = congruence
• 1/36 names with unique 1 : 1name : meaning cardinalityacross all classifications
• Andropogon virginicus
• Source: Franz et al. 20161
1 Franz et al. 2016. Names are not good enough: reasoning over taxonomic change in the Andropogon complex.Semantic Web Journal (IOS). doi:10.3233/SW-160220
Page 51
http://taxonbytes.org/wp-content/uploads/2014/10/Peet-BIGCB-2014-Changing-Perspectives-on-Plant-Distributions.pdf51Ludäscher:Whole-Tale++
Page 52
Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)
"Taxonomic concept labels"identify input concept regions
RCC–5 articulations providedfor each species-level concept
• Input visualization: MSW3 (2005) versus MSW2 (1993)
Source: Franz et al. 2016. Two influential primate classifications logical aligned. doi:10.1093/sysbio/syw023
52Ludäscher:Whole-Tale++
Page 53
• Alignment visualization: "grey means taxonomically congruent"
Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)
53Ludäscher:Whole-Tale++
Page 54
One name &congruent region
Many names &congruent region
One name &non-congruent regions
Many names &non-congruent regions
New names &exclusive regions
• Application of coverage constraint: parent-to-parent articulations (><) arefully defined by alignment signal propagated from their respective children.
è Sensible when complete sampling of children is intended.
Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)
• Alignment visualization: "grey means taxonomically congruent"
54Ludäscher:Whole-Tale++
Page 55
1 in 3 names is unreliable across MSW2/MSW3 classifications
Source: Franz et al. 2016. Two influential primate classifications logical aligned. doi:10.1093/sysbio/syw023
55Ludäscher:Whole-Tale++
Page 56
The 'consensus' The 'bible'
The (formerly) federal
'standard'
The 'best', latest regional flora
"Controlling the taxonomic variable"
Expert viewsare in
conflict
"Just bad"
Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610
56Ludäscher:Whole-Tale++
Page 57
The 'consensus' The 'bible'
The (formerly) federal
'standard'
The 'best', latest regional flora
Impact:Name-based aggregation has created
a novel synthesis that nobody believes in
"Controlling the taxonomic variable"
"Just bad"
Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610
57Ludäscher:Whole-Tale++
Page 58
The 'consensus' The 'bible'
The (formerly) federal
'standard'
The 'best', latest regional flora
"Controlling the taxonomic variable"
"Just bad"
Expert viewsare
reconciled
Solution:Instead of aggregating
an artificial 'consensus',build translation services
Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610
58Ludäscher:Whole-Tale++
Page 59
Leavingtaxonandspeciesheadaches…• ToillustrateEulerthinkofasimplerusecase:• Agreeingtodisagree!• …whentherearemultiple,legitimateperspectives
• Sortingthingsout!– Eulerasataxonconcept(&name)“microscope”...– ..or“timemachine”?
59Ludäscher:Whole-Tale++
Page 60
TwoTaxonomies:NDC vs CEN
“…in the face of incompatible information or data structures among users or among thosespecifying the system, attempts to create unitary knowledge categories are futile. Rather, parallelor multiple representational forms are required” [Bowker & Star, 2000, p.159]
West
Southwest Southeast
Midwest North-east
West
South
Midwest North-east
NationalDiversityCouncilmap(NDC) USCensusBuero map(CEN)
Source:Yi-Yun(Jessica)Cheng(PhDstudent,iSchool @Illinois)Ludäscher:Whole-Tale++ 60
Page 61
Thetaxonomies
Ludäscher:Whole-Tale++
• TheCensusRegionsMap(CEN),consistsoffour regions:West,Midwest,Northeast,andSouth,i.e.,thecontiguous48statesandWashingtonD.C.
West
South
Midwest
North-east
61
Page 62
Thetaxonomies
• TheNationalDiversityCouncilMap(NDC),consistsoffiveregions:West,Southwest,Midwest,Northeast,Southeast,the48statesandWashingtonD.C.
NDC(withstates)
West
Southwest Southeast
Midwest North-east
• NDC splits South into SW and SE
• Do NDC and CEN agree on “West”? “Midwest”? …
• How can we sort this out?
Ludäscher:Whole-Tale++ 62
Page 63
Sortingthingsout…
Ludäscher:Whole-Tale++
CEN.Midwest
CEN.USA
CEN.South CEN.West CEN.Northeast NDC.Northeast
NDC.USA
NDC.Southeast NDC.Midwest NDC.Southwest NDC.West
Nodes
CEN 5NDC 6 Edges
is_a (CEN) 4is_a (NDC) 5
CEN.South
NDC.Northeast
o
NDC.Southwest
o
NDC.Southeast>
CEN.Midwest NDC.Midwest=
CEN.USA
CEN.West
CEN.NortheastNDC.USA
=
!
oNDC.West
>
<
Nodes
CEN 5NDC 6 Edges
is_a (CEN) 4is_a (NDC) 5articulations 9
CEN.Midwest
CEN.USA
CEN.South CEN.West CEN.Northeast NDC.Northeast
NDC.USA
NDC.Southeast NDC.Midwest NDC.Southwest NDC.West
Nodes
CEN 5NDC 6 Edges
is_a (CEN) 4is_a (NDC) 5
• Given:– taxonomiesT1,T2– andrelationsT1~T2
(articulations,alignment)• Find:
– mergedtaxonomyT3• Suchthat:
– T1,T2arepreserved– allpairwiserelationsare
explicit
T1 T2
63
Page 64
5waystorelateconcepts(regions)
• Idea:relateconceptsXandYwitharticulations
• ArticulationLanguage:RegionConnectionCalculus (RCC5):congruence,inclusion,inverseinclusion,overlap,disjointness
Y X X YX Y X Y X Y
CongruenceX == Y
InclusionX > Y
Inverse InclusionX < Y
OverlapX>< Y
DisjointnessX ! Y
CEN.South
NDC.Northeast
><
NDC.Southwest
><
NDC.Southeast>
CEN.Midwest NDC.Midwest==
CEN.USA
CEN.West
CEN.NortheastNDC.USA
==
!
><NDC.West
>
<
Nodes
CEN 5NDC 6 Edges
is_a (CEN) 4is_a (NDC) 5articulations 9
Ludäscher:Whole-Tale++ 64
Page 65
MergedtaxonomyT3
CEN.South
NDC.Northeast
NDC.Southwest
CEN.USANDC.USA
CEN.West
CEN.Northeast
NDC.Southeast
NDC.West
CEN.MidwestNDC.Midwest
Nodes
CEN 3NDC 4
congruent 2 Edges
is_a (input) 8overlaps (input) 3
CEN.Midwest
CEN.USA
CEN.South CEN.West CEN.Northeast NDC.Northeast
NDC.USA
NDC.Southeast NDC.Midwest NDC.Southwest NDC.West
Nodes
CEN 5NDC 6 Edges
is_a (CEN) 4is_a (NDC) 5
CEN.Midwest
CEN.USA
CEN.South CEN.West CEN.Northeast NDC.Northeast
NDC.USA
NDC.Southeast NDC.Midwest NDC.Southwest NDC.West
Nodes
CEN 5NDC 6 Edges
is_a (CEN) 4is_a (NDC) 5
CEN.South
NDC.Northeast
><
NDC.Southwest
><
NDC.Southeast>
CEN.Midwest NDC.Midwest==
CEN.USA
CEN.West
CEN.NortheastNDC.USA
==
!
><NDC.West
>
<
Nodes
CEN 5NDC 6 Edges
is_a (CEN) 4is_a (NDC) 5articulations 9
T1 T2
T1~T2 T3
Ludäscher:Whole-Tale++ 65
Page 66
HowwealigntwotaxonomiesT1andT2
• Step1. SupplyinputtaxonomiesT1andT2
• Step2.DescribetherelationshipsbetweenT1 andT2
• Step3. IterativelyeditarticulationsinEuler/X
T1 T2
T1 T2
Inconsistent (N=0) Ambiguous (N>1)
T3
Add/Edit Articulations A
Euler/X
N Possible Worlds
N=1 N=0 or N>1
• … but where do the articulationscome from??– expert opinion– automatically derived from data
Ludäscher:Whole-Tale++ 66
Page 67
Case1:CensusRegionvs.NationalDiversityCouncil
Ludäscher:Whole-Tale++
West
South
Midwest
North-east
NDC(withstates)
West
Southwest Southeast
Midwest North-east
CEN NDC
• … but where do the articulationscome from??– automatically derived from data– expert input
67
Page 68
Ludäscher:Whole-Tale++
CEN.IL NDC.IL==
CEN.IN NDC.IN==
CEN.RI NDC.RI==
CEN.IA NDC.IA==
CEN.WV NDC.WV==
CEN.KS NDC.KS==
CEN.KY NDC.KY==
CEN.TX NDC.TX==
CEN.NortheastCEN.VTCEN.MA
CEN.ME
CEN.CT
CEN.PA
CEN.NY
CEN.NH
CEN.NJ
CEN.South
CEN.TN
CEN.MS
CEN.MD
CEN.DC
CEN.DE
CEN.VA
CEN.FL
CEN.AR
CEN.AL
CEN.OK
CEN.SC
CEN.LACEN.GA
CEN.NC
CEN.ID NDC.ID==
NDC.TN==
CEN.WY NDC.WY==
NDC.VT==
NDC.MS==
CEN.MT NDC.MT==
NDC.MA==
CEN.USA
CEN.Midwest
CEN.West
NDC.ME==
NDC.MD==
CEN.MI NDC.MI==
CEN.MN NDC.MN==
NDC.DC==
NDC.DE==
CEN.OR NDC.OR==
CEN.OH NDC.OH==
NDC.VA==
NDC.FL==
NDC.AR==
CEN.AZ NDC.AZ==
NDC.AL==
NDC.OK==
NDC.CT==
CEN.CO NDC.CO==
CEN.CA NDC.CA==
CEN.SD NDC.SD==
NDC.SC==
CEN.MO
CEN.ND
CEN.NE
CEN.WI
NDC.LA==
NDC.MO==
CEN.UT NDC.UT==
NDC.GA==
NDC.PA==
CEN.NV
CEN.NM
CEN.WA
NDC.NY==
NDC.NV==
NDC.NM==
NDC.WA==
NDC.NH==
NDC.NJ==
NDC.ND==
NDC.NE==
NDC.WI==
NDC.NC==
NDC.West
NDC.Midwest
NDC.Northeast
NDC.Southeast
NDC.USA
NDC.Southwest
Nodes
CEN 54NDC 55 Edges
isa_CEN 53isa_NDC 54Art. 49
CEN.IL NDC.IL==
CEN.IN NDC.IN==
CEN.RI NDC.RI==
CEN.IA NDC.IA==
CEN.WV NDC.WV==
CEN.KS NDC.KS==
CEN.KY NDC.KY==
CEN.TX NDC.TX==
CEN.NortheastCEN.VTCEN.MA
CEN.ME
CEN.CT
CEN.PA
CEN.NY
CEN.NH
CEN.NJ
CEN.South
CEN.TN
CEN.MS
CEN.MD
CEN.DC
CEN.DE
CEN.VA
CEN.FL
CEN.AR
CEN.AL
CEN.OK
CEN.SC
CEN.LACEN.GA
CEN.NC
CEN.ID NDC.ID==
NDC.TN==
CEN.WY NDC.WY==
NDC.VT==
NDC.MS==
CEN.MT NDC.MT==
NDC.MA==
CEN.USA
CEN.Midwest
CEN.West
NDC.ME==
NDC.MD==
CEN.MI NDC.MI==
CEN.MN NDC.MN==
NDC.DC==
NDC.DE==
CEN.OR NDC.OR==
CEN.OH NDC.OH==
NDC.VA==
NDC.FL==
NDC.AR==
CEN.AZ NDC.AZ==
NDC.AL==
NDC.OK==
NDC.CT==
CEN.CO NDC.CO==
CEN.CA NDC.CA==
CEN.SD NDC.SD==
NDC.SC==
CEN.MO
CEN.ND
CEN.NE
CEN.WI
NDC.LA==
NDC.MO==
CEN.UT NDC.UT==
NDC.GA==
NDC.PA==
CEN.NV
CEN.NM
CEN.WA
NDC.NY==
NDC.NV==
NDC.NM==
NDC.WA==
NDC.NH==
NDC.NJ==
NDC.ND==
NDC.NE==
NDC.WI==
NDC.NC==
NDC.West
NDC.Midwest
NDC.Northeast
NDC.Southeast
NDC.USA
NDC.Southwest
Nodes
CEN 54NDC 55 Edges
isa_CEN 53isa_NDC 54Art. 49
68
Page 69
Ludäscher:Whole-Tale++
CEN.West
NDC.Southwest
CEN.USANDC.USA
CEN.Northeast
NDC.Northeast
CEN.SouthNDC.Southeast
NDC.West
CEN.DCNDC.DC
CEN.NMNDC.NM
CEN.NDNDC.ND
CEN.MidwestNDC.Midwest
CEN.AZNDC.AZ
CEN.CANDC.CA
CEN.MTNDC.MT
CEN.MANDC.MA
CEN.INNDC.IN
CEN.NVNDC.NV
CEN.MDNDC.MD
CEN.CTNDC.CT
CEN.NHNDC.NH
CEN.KYNDC.KY
CEN.PANDC.PA
CEN.CONDC.CO
CEN.WANDC.WA
CEN.MINDC.MI
CEN.VANDC.VA
CEN.WINDC.WI
CEN.NENDC.NE
CEN.SDNDC.SD
CEN.MNNDC.MN
CEN.MSNDC.MS
CEN.IDNDC.ID
CEN.WVNDC.WV
CEN.NYNDC.NY
CEN.NJNDC.NJ
CEN.UTNDC.UT
CEN.MENDC.ME
CEN.ILNDC.IL
CEN.TNNDC.TN
CEN.VTNDC.VT
CEN.GANDC.GA
CEN.DENDC.DE
CEN.NCNDC.NC
CEN.OKNDC.OK
CEN.MONDC.MO
CEN.SCNDC.SC
CEN.ARNDC.AR
CEN.TXNDC.TX
CEN.LANDC.LA
CEN.OHNDC.OH
CEN.IANDC.IA
CEN.KSNDC.KS
CEN.RINDC.RI
CEN.WYNDC.WY
CEN.FLNDC.FL
CEN.ORNDC.OR
CEN.ALNDC.AL
Nodes
CEN 3NDC 4comb 51 Edges
input 61inferred 3
overlapsinferred 3
CEN.West
NDC.Southwest
CEN.USANDC.USA
CEN.Northeast
NDC.Northeast
CEN.SouthNDC.Southeast
NDC.West
CEN.DCNDC.DC
CEN.NMNDC.NM
CEN.NDNDC.ND
CEN.MidwestNDC.Midwest
CEN.AZNDC.AZ
CEN.CANDC.CA
CEN.MTNDC.MT
CEN.MANDC.MA
CEN.INNDC.IN
CEN.NVNDC.NV
CEN.MDNDC.MD
CEN.CTNDC.CT
CEN.NHNDC.NH
CEN.KYNDC.KY
CEN.PANDC.PA
CEN.CONDC.CO
CEN.WANDC.WA
CEN.MINDC.MI
CEN.VANDC.VA
CEN.WINDC.WI
CEN.NENDC.NE
CEN.SDNDC.SD
CEN.MNNDC.MN
CEN.MSNDC.MS
CEN.IDNDC.ID
CEN.WVNDC.WV
CEN.NYNDC.NY
CEN.NJNDC.NJ
CEN.UTNDC.UT
CEN.MENDC.ME
CEN.ILNDC.IL
CEN.TNNDC.TN
CEN.VTNDC.VT
CEN.GANDC.GA
CEN.DENDC.DE
CEN.NCNDC.NC
CEN.OKNDC.OK
CEN.MONDC.MO
CEN.SCNDC.SC
CEN.ARNDC.AR
CEN.TXNDC.TX
CEN.LANDC.LA
CEN.OHNDC.OH
CEN.IANDC.IA
CEN.KSNDC.KS
CEN.RINDC.RI
CEN.WYNDC.WY
CEN.FLNDC.FL
CEN.ORNDC.OR
CEN.ALNDC.AL
Nodes
CEN 3NDC 4comb 51 Edges
input 61inferred 3
overlapsinferred 3
USA,MidwestandState-levelalignmentsareallcongruent
69
Page 70
Ludäscher:Whole-Tale++
CEN.West
NDC.Southwest
CEN.USANDC.USA
CEN.Northeast
NDC.Northeast
CEN.SouthNDC.Southeast
NDC.West
CEN.DCNDC.DC
CEN.NMNDC.NM
CEN.NDNDC.ND
CEN.MidwestNDC.Midwest
CEN.AZNDC.AZ
CEN.CANDC.CA
CEN.MTNDC.MT
CEN.MANDC.MA
CEN.INNDC.IN
CEN.NVNDC.NV
CEN.MDNDC.MD
CEN.CTNDC.CT
CEN.NHNDC.NH
CEN.KYNDC.KY
CEN.PANDC.PA
CEN.CONDC.CO
CEN.WANDC.WA
CEN.MINDC.MI
CEN.VANDC.VA
CEN.WINDC.WI
CEN.NENDC.NE
CEN.SDNDC.SD
CEN.MNNDC.MN
CEN.MSNDC.MS
CEN.IDNDC.ID
CEN.WVNDC.WV
CEN.NYNDC.NY
CEN.NJNDC.NJ
CEN.UTNDC.UT
CEN.MENDC.ME
CEN.ILNDC.IL
CEN.TNNDC.TN
CEN.VTNDC.VT
CEN.GANDC.GA
CEN.DENDC.DE
CEN.NCNDC.NC
CEN.OKNDC.OK
CEN.MONDC.MO
CEN.SCNDC.SC
CEN.ARNDC.AR
CEN.TXNDC.TX
CEN.LANDC.LA
CEN.OHNDC.OH
CEN.IANDC.IA
CEN.KSNDC.KS
CEN.RINDC.RI
CEN.WYNDC.WY
CEN.FLNDC.FL
CEN.ORNDC.OR
CEN.ALNDC.AL
Nodes
CEN 3NDC 4comb 51 Edges
input 61inferred 3
overlapsinferred 3
CEN.West
NDC.Southwest
CEN.USANDC.USA
CEN.Northeast
NDC.Northeast
CEN.SouthNDC.Southeast
NDC.West
CEN.DCNDC.DC
CEN.NMNDC.NM
CEN.NDNDC.ND
CEN.MidwestNDC.Midwest
CEN.AZNDC.AZ
CEN.CANDC.CA
CEN.MTNDC.MT
CEN.MANDC.MA
CEN.INNDC.IN
CEN.NVNDC.NV
CEN.MDNDC.MD
CEN.CTNDC.CT
CEN.NHNDC.NH
CEN.KYNDC.KY
CEN.PANDC.PA
CEN.CONDC.CO
CEN.WANDC.WA
CEN.MINDC.MI
CEN.VANDC.VA
CEN.WINDC.WI
CEN.NENDC.NE
CEN.SDNDC.SD
CEN.MNNDC.MN
CEN.MSNDC.MS
CEN.IDNDC.ID
CEN.WVNDC.WV
CEN.NYNDC.NY
CEN.NJNDC.NJ
CEN.UTNDC.UT
CEN.MENDC.ME
CEN.ILNDC.IL
CEN.TNNDC.TN
CEN.VTNDC.VT
CEN.GANDC.GA
CEN.DENDC.DE
CEN.NCNDC.NC
CEN.OKNDC.OK
CEN.MONDC.MO
CEN.SCNDC.SC
CEN.ARNDC.AR
CEN.TXNDC.TX
CEN.LANDC.LA
CEN.OHNDC.OH
CEN.IANDC.IA
CEN.KSNDC.KS
CEN.RINDC.RI
CEN.WYNDC.WY
CEN.FLNDC.FL
CEN.ORNDC.OR
CEN.ALNDC.AL
Nodes
CEN 3NDC 4comb 51 Edges
input 61inferred 3
overlapsinferred 3
Theoverlappingrelationsareautomaticallyderivedfromdata
70
Page 71
Ludäscher:Whole-Tale++
CEN.West
NDC.Southwest
CEN.USANDC.USA
CEN.Northeast
NDC.Northeast
CEN.SouthNDC.Southeast
NDC.West
CEN.DCNDC.DC
CEN.NMNDC.NM
CEN.NDNDC.ND
CEN.MidwestNDC.Midwest
CEN.AZNDC.AZ
CEN.CANDC.CA
CEN.MTNDC.MT
CEN.MANDC.MA
CEN.INNDC.IN
CEN.NVNDC.NV
CEN.MDNDC.MD
CEN.CTNDC.CT
CEN.NHNDC.NH
CEN.KYNDC.KY
CEN.PANDC.PA
CEN.CONDC.CO
CEN.WANDC.WA
CEN.MINDC.MI
CEN.VANDC.VA
CEN.WINDC.WI
CEN.NENDC.NE
CEN.SDNDC.SD
CEN.MNNDC.MN
CEN.MSNDC.MS
CEN.IDNDC.ID
CEN.WVNDC.WV
CEN.NYNDC.NY
CEN.NJNDC.NJ
CEN.UTNDC.UT
CEN.MENDC.ME
CEN.ILNDC.IL
CEN.TNNDC.TN
CEN.VTNDC.VT
CEN.GANDC.GA
CEN.DENDC.DE
CEN.NCNDC.NC
CEN.OKNDC.OK
CEN.MONDC.MO
CEN.SCNDC.SC
CEN.ARNDC.AR
CEN.TXNDC.TX
CEN.LANDC.LA
CEN.OHNDC.OH
CEN.IANDC.IA
CEN.KSNDC.KS
CEN.RINDC.RI
CEN.WYNDC.WY
CEN.FLNDC.FL
CEN.ORNDC.OR
CEN.ALNDC.AL
Nodes
CEN 3NDC 4comb 51 Edges
input 61inferred 3
overlapsinferred 3
CEN.West
NDC.Southwest
CEN.USANDC.USA
CEN.Northeast
NDC.Northeast
CEN.SouthNDC.Southeast
NDC.West
CEN.DCNDC.DC
CEN.NMNDC.NM
CEN.NDNDC.ND
CEN.MidwestNDC.Midwest
CEN.AZNDC.AZ
CEN.CANDC.CA
CEN.MTNDC.MT
CEN.MANDC.MA
CEN.INNDC.IN
CEN.NVNDC.NV
CEN.MDNDC.MD
CEN.CTNDC.CT
CEN.NHNDC.NH
CEN.KYNDC.KY
CEN.PANDC.PA
CEN.CONDC.CO
CEN.WANDC.WA
CEN.MINDC.MI
CEN.VANDC.VA
CEN.WINDC.WI
CEN.NENDC.NE
CEN.SDNDC.SD
CEN.MNNDC.MN
CEN.MSNDC.MS
CEN.IDNDC.ID
CEN.WVNDC.WV
CEN.NYNDC.NY
CEN.NJNDC.NJ
CEN.UTNDC.UT
CEN.MENDC.ME
CEN.ILNDC.IL
CEN.TNNDC.TN
CEN.VTNDC.VT
CEN.GANDC.GA
CEN.DENDC.DE
CEN.NCNDC.NC
CEN.OKNDC.OK
CEN.MONDC.MO
CEN.SCNDC.SC
CEN.ARNDC.AR
CEN.TXNDC.TX
CEN.LANDC.LA
CEN.OHNDC.OH
CEN.IANDC.IA
CEN.KSNDC.KS
CEN.RINDC.RI
CEN.WYNDC.WY
CEN.FLNDC.FL
CEN.ORNDC.OR
CEN.ALNDC.AL
Nodes
CEN 3NDC 4comb 51 Edges
input 61inferred 3
overlapsinferred 3
DCisinboththeSouthandtheNortheast
71
Page 72
Case2:CensusRegionvsTimeZone
Ludäscher:Whole-Tale++
PacificMountain
CentralEastern
West
South
Midwest
North-east
CEN TZ
• … but where do the articulationscome from??– automatically derived from data– expert input
72
Page 73
Ludäscher:Whole-Tale++
CEN.Northeast
TZ.Eastern
<
CEN.Midwest><
TZ.Mountain
><
TZ.Pacific
!
CEN.South
><
><
!
TZ.Central
><
CEN.USA
CEN.West
TZ.USA
==
!
><
!
Nodes
CEN 5TZ 5
Edges
isa_CEN 4isa_TZ 4Art. 12
CEN.Midwest
CEN.USATZ.USA
TZ.Eastern
TZ.Central
TZ.Mountain
CEN.South
CEN.Northeast
CEN.West TZ.Pacific
Nodes
CEN 4comb 1TZ 4
Edges
input 7overlapsinput 6overlapsinferred 1
inferred 1
InputOutput:PossibleWorld
Top-downregionalalignment
73
Page 74
Howdoweknowifour‘expertarticulations’arecorrect?
Ludäscher:Whole-Tale++
R1 R2
R3
R4
R5
R6 R7
R8
R9
GIS solution as the Ground Truth..
74
Page 75
Ludäscher:Whole-Tale++
R1
R2
R3
R4
R5
R6
R7
R8
R9
CEN.Midwest
CEN.USATZ.USA
CEN.West
CEN.NortheastTZ.Eastern\CEN.Midwest
TZ.Eastern\CEN.South
CEN.South
CEN.South*TZ.CentralTZ.Central\CEN.Midwest
CEN.South\TZ.Eastern
CEN.South\TZ.Mountain
TZ.Central
CEN.Midwest\TZ.Eastern
TZ.Mountain\CEN.SouthTZ.Mountain
CEN.Midwest\TZ.Mountain
TZ.Mountain\CEN.Midwest
CEN.Midwest*TZ.Mountain
CEN.Midwest\TZ.Central
TZ.Mountain\CEN.West
CEN.Midwest*TZ.Eastern
CEN.West*TZ.Mountain
CEN.South*TZ.MountainCEN.South\TZ.Central
TZ.Eastern
CEN.South*TZ.Eastern
CEN.Midwest*TZ.CentralTZ.Central\CEN.South
TZ.PacificCEN.West\TZ.Mountain
Nodes
CEN 4newComb 18comb 1TZ 4
Edges
input 6inferred 37
Combinedconceptssolutionforregional-levelalignments
75
Page 76
DothetaxonomieshavetobespatialinordertouseRCC-5?
• No!Themoretypicalcasesfortaxonomyalignmentareusuallybetweennon-spatialtaxonomies– forwhichno“GISroute”ordirectvisualcuesaboutregionalextensionsareavailable
– theuseofRCC-5asanalignmentvocabularyisasuitableapproachtoperformawiderangeofmulti-hierarchyreconciliations
Ludäscher:Whole-Tale++ 76
Page 77
Conclusion&Discussion• Underscoresthebenefitsofdesigningdifferentalignmentworkflows(Bottom-upvs.Top-Down)– Bottom-up:non-overlappingrelationshipsatthelowest-levelarticulations,notsurehowtoalignthehigher-levelconcepts
– Top-Down:whenthereisoftenoverlappingleaf-levelrelations..Expertinputwillfrequentlybeneededtoestablishsuchexpectationsunderthetop-downapproach
Ludäscher:Whole-Tale++
https://github.com/EulerProject/[email protected]
77
Page 78
Implications
• Logic-basedtaxonomyalignmentapproach– Disambiguatename-basedtaxonomyalignmentovertime
• 40%oftheconceptsinbiologytaxonomiesundergoesnamechangeovertime(Franzetal.,2016)
– Maymitigateproblemsinequivalentcrosswalking• Membershipconditionproblemthatwasoftencriticizedincrosswalking
– Preservestheoriginaltaxonomieswhileprovidinganalignmentview
• Solvedataintegrationproblemsthathappeninthemorecoarse-grainedrelativecrosswalking
Ludäscher:Whole-Tale++
https://github.com/EulerProject/[email protected]
78
Page 79
• …Aristotle…• …Euler…• …• …GregWhitbread…
• [BPB93]J.H.Beach,S.Pramanik,andJ.H.Beaman.Hierarchictaxonomicdatabases.,Advances inComputerMethodsforSystematicBiology:ArtificialIntelligence,Databases,ComputerVision,1993
• [Ber95]WalterG.Berendsohn.Theconceptof“potentialtaxa” indatabases.Taxon,44:207–212,1995.
• [Ber03]WalterG.Berendsohn.MoReTax – HandlingFactualInformationLinkedtoTaxonomicConceptsinBiology.No.39inSchriftenreihe fürVegetationskunde.Bundesamt für Naturschutz,2003.
• [GG03]M.Geoffroy andA.Güntsch.Assemblingandnavigatingthepotentialtaxongraph.In[Ber03],pages71–82,2003.
• [TL07]Thau,D.,&Ludäscher,B.(2007).Reasoningabouttaxonomiesinfirst-orderlogic.EcologicalInformatics,2(3),195-209.
• [FP09]Franz,N.M.,&Peet,R.K.(2009).Perspectives:towardsalanguageformappingrelationshipsamongtaxonomicconcepts.SystematicsandBiodiversity,7(1),5-20.
• … 79
SomeEulerXHistory
Ludäscher:Whole-Tale++