Top Banner
Whole Tale: The Experience of Research through reproducible, computational narratives YesWorkflow: Revealing workflow, provenance from scripts Kurator: Automating data cleaning workflows EulerX: Agreeing to disagree about variant taxonomies Bertram Ludäscher [email protected] BCoN Workshop 2018-02-13..14 U Kansas Director,Center for Informatics Research in Science & Scholarship (CIRSS) School of Information Sciences (iSchool@Illinois) & National Center for Supercomputing Applications (NCSA) & Department of Computer Science (CS@Illinois) 1
79

Whole-Tale: The Experience of Research

Mar 17, 2018

Download

Data & Analytics

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Whole-Tale: The Experience of Research

WholeTale:TheExperience ofResearch…through reproducible,computationalnarratives

YesWorkflow:Revealingworkflow,provenancefromscriptsKurator:AutomatingdatacleaningworkflowsEulerX:Agreeingtodisagreeaboutvarianttaxonomies

BertramLudä[email protected]

BCoN Workshop2018-02-13..14UKansas

Director,CenterforInformaticsResearchinScience&Scholarship(CIRSS)SchoolofInformationSciences(iSchool@Illinois)

&NationalCenterforSupercomputingApplications(NCSA)&DepartmentofComputerScience(CS@Illinois)

1

Page 2: Whole-Tale: The Experience of Research

WholeTale:Thenextstepintheevolutionofthescholarlyarticle:The“Living”Paper

• 1st Generation:– narrative (prose)

• 2nd Generation:plus …– name..identify..include(accessto)data

• 3rd Generation:plus …– name..reference..includecode (software)..– andprovenance …andexecenvironment(containers)

Ludäscher:Whole-Tale++ 2

WholeTale

WholeTaleDashboard

Page 3: Whole-Tale: The Experience of Research

WholeTale:What’sinaname?

(1)WholeTale⇔WholeStory:◦ Support(computational /data)scientists◦…alongthecompleteresearchlifecycle◦ ...fromexperimentto(newkindof)publication◦ ...andback!

(2)WholeTale⇔ fortheLongTailofScience–Easysharingofyourcomputationalnarratives,data,andexec-env since2017!

–Powerapplicationsforeveryone!

3Ludäscher:Whole-Tale++

Page 4: Whole-Tale: The Experience of Research

TheWholeTale:MergingScienceandCyberinfrastructurePathways

NSF-DIBBSaward (5years,5institutions)• Illinois(NCSA&iSchool)• BertramLudäscher(PI),MTCampbell(PM)[KandaceTurner],VictoriaStodden(coPI),MattTurk(coPI),KacperKowalik(sw-architect),CraigWillis(dev)

•UofChicago• KyleChard(coPI),MihaelHategan(dev)

•UTAustin/TACC•NiallGaffney(coPI),SivaKulasekaran(dev)

•UNotreDame• JarekNabrzyski(coPI),IanTaylor(dev),AdamBrinckman(dev)

•UCSB/NCEAS•Matt Jones(coPI),BryceMecum(dev)

4

Page 5: Whole-Tale: The Experience of Research

Whole TaleMotivation• Can'treproduceresultbecause:

• Don'tknowhowtorunanalysis

• Can'tgetthesoftwarerunning

• Can'tpayforthecomputerorcomputepowertheresultwascomputedon

Source:BryceMecum,WTteam@NCEAS5

Page 6: Whole-Tale: The Experience of Research

Whole TaleVisionAddressingreproducibility

6

Data Code

ExecutionEnvironment

Article

Page 7: Whole-Tale: The Experience of Research

Whole TaleVision• Livingpublication

(data+code+environment)

• Facilitatereproducibility

• Encourageinvestigationofresultsmakingiteasytorecreatetheenvironmenttheresultwascreatedin

Article

7

Page 8: Whole-Tale: The Experience of Research

Whole TaleVisionAddressingreproducibility

Article

Tale

+

8

Page 9: Whole-Tale: The Experience of Research

WholeTaleVision

Tale

Data

{Code

D1PROV

9

Page 10: Whole-Tale: The Experience of Research

WTArchitecture

10Ludäscher:Whole-Tale++

https://dashboard.wholetale.org

Page 11: Whole-Tale: The Experience of Research

DEMO:(re-)useexistingtaleor…

Ludäscher:Whole-Tale++ 11https://dashboard.wholetale.org

Page 12: Whole-Tale: The Experience of Research

…CreateaNewTale!

Ludäscher:Whole-Tale++ 12https://dashboard.wholetale.org

Page 13: Whole-Tale: The Experience of Research

Ludäscher:Whole-Tale++ 13https://dashboard.wholetale.org

Page 14: Whole-Tale: The Experience of Research

Ludäscher:Whole-Tale++ 14https://dashboard.wholetale.org

Page 15: Whole-Tale: The Experience of Research

Ludäscher:Whole-Tale++ 15https://dashboard.wholetale.org

Page 16: Whole-Tale: The Experience of Research

Ludäscher:Whole-Tale++ 16https://dashboard.wholetale.org

Page 17: Whole-Tale: The Experience of Research

Ludäscher:Whole-Tale++ 17https://dashboard.wholetale.org

Page 18: Whole-Tale: The Experience of Research

RunningwithRStudio:LocallyoronWT…

Ludäscher:Whole-Tale++ 18You’reupandrunningquicklyonWholeTale!!

Page 19: Whole-Tale: The Experience of Research

MaybeyoujustwanttouseWTtolearnRforDataScience...

Ludäscher:Whole-Tale++ 19

Page 20: Whole-Tale: The Experience of Research

AnotherexampleTale:LIGOgravitationalwavedetection

(tutorialJupyter notebook)

Ludäscher:Whole-Tale++ 20

Page 21: Whole-Tale: The Experience of Research

Ludäscher:Whole-Tale++ 21https://dashboard.wholetale.org

Page 22: Whole-Tale: The Experience of Research

Ludäscher:Whole-Tale++ 22https://dashboard.wholetale.org

Page 23: Whole-Tale: The Experience of Research

Ludäscher:Whole-Tale++ 23https://dashboard.wholetale.org

Page 24: Whole-Tale: The Experience of Research

Ludäscher:Whole-Tale++ 24https://dashboard.wholetale.org

Page 25: Whole-Tale: The Experience of Research

Ludäscher:Whole-Tale++ 25https://dashboard.wholetale.org

Page 26: Whole-Tale: The Experience of Research

Ludäscher:Whole-Tale++ 26https://dashboard.wholetale.org

Page 27: Whole-Tale: The Experience of Research

Ludäscher:Whole-Tale++ 27https://dashboard.wholetale.org

Page 28: Whole-Tale: The Experience of Research

New&UpcomingFeaturesinWT...• AddyourownFrontends(e.g.OpenRefine,..)• Persistent,sharedorpersonalfiles:

– /data/(registered/externaldata,read-only,associatedwithatale)– /home/(yourowndata,r/w,associatedwithallyourtales)– /workspace/(sharedr/wdata,associatedwithatale,acrossallusers)

• WT“DerivedTales”:– takeatale;modifyittoyourliking;andpublishasaderivedwork

• WT“Take-Out”:– Wanttorunyourtaleselsewhere?– Take-out yourtaleandrunonyouron(orcloud)platform

• WT“Scale-Out”:– IftheWT-dashboardisn’tenoughè runyourownWTsystem!

• WT Provenance support:– …viaDataONE provenancetools,ProvONE model(W3CPROVextension)– …viaYesWorkflow

• InterestinjoiningaWTBiodiversityInformaticsWorkingGroup!?– Wealreadyhave:archaeology&ecology,astronomy,materialsscience– Yourinputwanted!(isWTdevelopingsomethingusefulforyou?)– TryoutWT,createsomeexamples(inR,Python,...)andprovidefeedback!– =>possibilitytofundasummerintern!

Ludäscher:Whole-Tale++ 28

Page 29: Whole-Tale: The Experience of Research

AdditionalMaterial…

…teasersahead!

Ludäscher:Whole-Tale++ 29

Page 30: Whole-Tale: The Experience of Research

Provenanceis:keepingrecords …

• GrandCanyon’srocklayersarearecordoftheearlygeologichistoryofNorthAmerica.Theancestralpuebloan granariesatNankoweap Creektellarchaeologistsaboutmorerecenthumanhistory.(ByDrenaline,licensedunderCCBY-SA3.0)

• Notshown:computationalarchaeologistsreconstructingpastclimatefrommultipletree-ringdatabasesè computationalprovenanceiskeyfortransparency &reproducibility

Ludäscher:Workflows&Provenance=>Understanding 30

Page 31: Whole-Tale: The Experience of Research

...andprovenanceis:Understanding whathappened!

Zrzavý,Jan,DavidStorch,and StanislavMihulka.Evolution:EinLese-Lehrbuch.

Springer-Verlag,2009.

Author:Jkwchui (BasedondrawingbyTruth-seeker2004)

Ludäscher:Workflows&Provenance=>Understanding 31

Page 32: Whole-Tale: The Experience of Research

Computational Provenance …• Origin,processinghistoryofartifacts

– dataproducts,figures,...– also:underlyingworkflowè understandmethods,dataflow,anddependencies

Ludäscher:Workflows&Provenance=>Understanding 32

Climate Change Impacts in the United States

U.S. National Climate AssessmentU.S. Global Change Research Program

Page 33: Whole-Tale: The Experience of Research

YesWorkflow:HowdoestheLIGOscriptproduceitsresults??

Ludäscher:Whole-Tale++ 33

Page 34: Whole-Tale: The Experience of Research

YesWorkflow:Prospective&RetrospectiveProvenance…(almost)forfree!

• YWannotationsina(Python,R,…)scriptrecreateaworkflowviewfromthescript…

cassette_id

sample_score_cutoff

sample_spreadsheetfile:cassette_{cassette_id}_spreadsheet.csv

calibration_imagefile:calibration.img

initialize_run

run_logfile:run/run_log.txt

load_screening_results

sample_namesample_quality

calculate_strategy

rejected_sample accepted_sample num_images energies

log_rejected_sample

rejection_logfile:/run/rejected_samples.txt

collect_data_set

sample_id energy frame_numberraw_image

file:run/raw/{cassette_id}/{sample_id}/e{energy}/image_{frame_number}.raw

transform_images

corrected_imagefile:data/{sample_id}/{sample_id}_{energy}eV_{frame_number}.img

total_intensitypixel_count corrected_image_path

log_average_image_intensity

collection_logfile:run/collected_images.csv

YW!

Ludäscher:Whole-Tale++ 34

@BEGIN..@END..@IN..@OUT..@URI..@LOG..

Page 35: Whole-Tale: The Experience of Research

GetModernClimate

PRISM_annual_growing_season_precipitation

SubsetAllData

dendro_series_for_calibration

dendro_series_for_reconstruction CAR_Analysis_unique

cellwise_unique_selected_linear_models

CAR_Analysis_union

cellwise_union_selected_linear_models

CAR_Reconstruction_union

raster_brick_spatial_reconstruction raster_brick_spatial_reconstruction_errors

CAR_Reconstruction_union_output

ZuniCibola_PRISM_grow_prcp_ols_loocv_union_recons.tif ZuniCibola_PRISM_grow_prcp_ols_loocv_union_errors.tif

master_data_directory prism_directory

tree_ring_datacalibration_years retrodiction_years

Paleoclimate Reconstruction(openSKOPE.org)• …explainedusingYesWorkflow!

KyleB.,(computational)archaeologist:"Ittookmeabout20minutestocomment.LessthananhourtolearnandYW-annotate,all-told."

Ludäscher:Whole-Tale++ 35

Page 36: Whole-Tale: The Experience of Research

Data Curation Workflows (Filtered-Push … Kepler … Kurator projects)

Ludäscher:Whole-Tale++ 36

Page 37: Whole-Tale: The Experience of Research

Ludäscher:Whole-Tale++ 37

http://kurator.acis.ufl.edu/kurator-web/

Page 38: Whole-Tale: The Experience of Research

Ludäscher:Whole-Tale++ 38

http://kurator.acis.ufl.edu/kurator-web/

Page 39: Whole-Tale: The Experience of Research

DwCA TaxonLookupWorkflow

• Declareinputs,outputs,andsteps ofascript(orwf)withYWannotationsto...– communicateprovenancegraphically(viagraphviz)

– combine differentformsofprovenance

– query provenance• SimpleYWannotationsincomments:– @BEGINStep,@ENDStep– @INData,@OUTData– @URITemplate,@LOGPattern

Ludäscher:Whole-Tale++ 39

�����������������

�������������������������������������������������������������������

��������������������������������������������������������������

������������������������������������������������

�������������������������

�������������������������������������������������������������

����������

�������������������������������������������������������������������������������������������������������

����������������

���������������������

�������������������������������������������������������

����������������

�������������������������������������������������������

�������������������

������������������������������������������

������������������

����������������������������������������

�����������������

���������������������������������������

������������

�������������������������������������������������������������������

��������������������������������������������������������

�����������������

Page 40: Whole-Tale: The Experience of Research

TaxonLookupWorkflow:DataViewandProcessView

Ludäscher:Whole-Tale++ 40

Page 41: Whole-Tale: The Experience of Research

Thestoryoftwoindividual

records

Ludäscher:Whole-Tale++ 41

�����������������

�����������������

�������������������

�������

����������

����������

�����������������

�����

���������

��������������

����������������

����������

���������������

�����������������

����������������

������

������������������

����������������

�������������������������������

�����������

������������������

����

�����������

������������

�������������

���������������������

�������������������������������������������������������������������

�����������������

�������������������������������������������������������������������������

�����������������

������������������

����������������

�������

����������

�����������

������������������

�����

���������

��������������

����������������

����������

���������������

�����������������

����������������

���������

�����������������

�������������������

���������������������������������

����������

�����������������

��������������������������������������

�����������

������������

�������������

���������������������

�������������������������������������������������������������������

�����������������

������������������������������������������������������������������

• OnetooktheGBIFroute,while…

• … theotherwentallWORMS!

Non-Marine?è GBIF

Marine?èWORMS

Page 42: Whole-Tale: The Experience of Research

Theaggregate story..

Ludäscher:Whole-Tale++ 42

�����������������

�����

���������

��������������

����������������

��������������������

�����������������

��������������������������

�������

����������

������������������

�������������������������

�����������������

����������������������������

�����������

�������������������������������

���������

����������

������������������������������

��������

�����������

������������

�������������

���������������������

�������������������������������������������������������������������

�����������������

�������������������������������������������������������������������������

• Howmanyrecordswereobservedasinputsoroutputsofworkflowsteps?

• WerethereanyNULLvalues?Howmany?

Page 43: Whole-Tale: The Experience of Research

YesWorkflow Summary• Lightweight YWannotationscan

beaddedeasilytoyourscriptstoreapworkflowbenefits– Documentation ofwhat’s

important– Visualization ofdependencies– Queryingprovenance(prospective,

retrospective,andhybrid)– Independent ofsystemorlanguage

used(R,Python,MATLAB,workflowtools,…)

èmake provenanceactionableè provenanceforself!

=> github.com/yesworkflow-org/yw=> try.yesworkflow.org

Ludäscher:Whole-Tale++ 43

�����������������

�������������������������������������������������������������������

��������������������������������������������������������������

������������������������������������������������

�������������������������

�������������������������������������������������������������

����������

�������������������������������������������������������������������������������������������������������

����������������

���������������������

�������������������������������������������������������

����������������

�������������������������������������������������������

�������������������

������������������������������������������

������������������

����������������������������������������

�����������������

���������������������������������������

������������

�������������������������������������������������������������������

��������������������������������������������������������

�����������������

�����������������

�����

���������

��������������

����������������

��������������������

�����������������

��������������������������

�������

����������

������������������

�������������������������

�����������������

����������������������������

�����������

�������������������������������

���������

����������

������������������������������

��������

�����������

������������

�������������

���������������������

�������������������������������������������������������������������

�����������������

�������������������������������������������������������������������������

Page 44: Whole-Tale: The Experience of Research

DemoTime

Ludäscher:Whole-Tale++ 44

(Disclaimer) https://github.com/idaks/dataone-ahm-2016-posterhttps://github.com/idaks/wt-prov-summer-2017https://github.com/yesworkflow-org/yw-idcc-17

Page 45: Whole-Tale: The Experience of Research

DataONE:SearchandProvenanceDisplay

45Ludäscher:Whole-Tale++

Page 46: Whole-Tale: The Experience of Research

DataONE:SearchandProvenanceDisplay

46Ludäscher:Whole-Tale++

Page 47: Whole-Tale: The Experience of Research

Adding YesWorkflow to DataONEYaxing’s script withinputs &outputproducts

Christopher’sYesWorkflow

model

ChristopherusingYaxing’s outputsasinputsforhisscript

Christopher’sresultscanbetracedbackall

thewaytoYaxing’sinput

Ludäscher:Whole-Tale++ 47

Page 48: Whole-Tale: The Experience of Research

Yi-YunCheng1,NicoFranz2,JodiSchneider1,Shizhuo Yu3,ThomasRodenhausen4,BertramLudäscher11SchoolofInformationSciences,UniversityofIllinoisatUrbana-Champaign;2SchoolofLifeSciences,ArizonaStateUniversity;3DepartmentofComputerScience,UniversityofCaliforniaatDavis;4SchoolofInformation,UniversityofArizona

Agreeing to Disagree: Reconciling Conflicting Taxonomic Views using a Logic-based Approach

Acknowledgments

Supportoftheauthors’researchthroughtheNationalScienceFoundationiskindlyacknowledged(DEB-1155984,DBI-1342595,andDBI-1643002).TheauthorsthankProfessorKathrynLaBarreforhercommentsandsuggestions.WewouldalsoliketothankDr.LaetitiaNavarroandJeffTerstriep forhelpwithcreatingmapoverlaysinQGIS.

CONCLUSION

• Ourlogic-basedtaxonomyalignmentapproachcanbeusedtosolvecrosswalking issuesWewillbeabletomitigatethemembershipconditionproblemsthatoccurinequivalentcrosswalking.

• RCC-5approachpreservestheoriginaltaxonomieswhileprovidinganalignmentviewWecansolvedataintegrationproblemsthathappeninthemorecoarse-grainedrelativecrosswalking,whichotherwiseissubjectedtoinformationloss.

• Ourstudyalsounderscoresthebenefitsofdesigningdifferentalignmentworkflows(Bottomupvs.Top-down)tomatchtheneedsofspecifictaxonomyalignmentproblemsBottom-upapproach:seemstoworkwellwheneverwehavenon-overlappingrelationshipsattheleaf-level(lowest-level)articulations,andwearenotsurehowthehigher-levelconceptsshouldbealigned.

Top-downapproach:seemsfavorablewhenthereisanexpectationofcertainhigher-levelarticulationsinconjunctionwithunder-specified,complex,andoftenoverlappingleaf-levelrelations.

RELATEDWORK

• TaxonomyAlignmentProblems(TAP)TaxonomiesT1,T2 areinter-linkedviaasetofinputarticulations A,definedasRCC-5relations, toyielda“merged”taxonomyT3 .

• Euler/XArticulations – aconstraintorrulethatdefinesarelationship(asetconstraint)betweentwoconceptsfromdifferenttaxonomies.

RegionConnectionCalculus(RCC-5)

PossibleWorlds–WhenencodingandsolvingTAPsviaASP,thedifferentanswersetsrepresentalternativetaxonomymergesolutionsorpossibleworlds(PWs).

INTRODUCTION

Tina:HeyAmy,canyourecommendasignaturedishfromwhereyoulive?

Amy:Oh,definitelythehalf-smokesfromtheNortheast!Theyarethesetastyhalf-porkandhalf-beefsausages.

Tina:Whatacoincidence!Wehavehalf-smokesintheSouth,too!WheredoyouliveintheNortheast?NewYork?Boston?

Amy:Wrongguesses!WheredoyouliveintheSouth?

TinaandAmytogether:Washington,D.C.

[Thetwoofthemlookateachother,confused.]

“Inthefaceofincompatibleinformationordatastructuresamongusersoramongthosespecifyingthesystem,attemptstocreateunitaryknowledgecategoriesarefutile.Rather,parallelormultiplerepresentationalformsarerequired…”(Bowker&Star,2000).

CASE1RESULTS:CENvs.NDC

• State-levelalignmentsareallcongruent(Bottom-up)• Inferrednewarticulationsforregional-levelalignments

CASE2RESULTS:CENvs.TZ

Figure 3. (Left) CEN-NDC taxonomy alignment problem with 49 input articulations between TCEN and TNDCFigure 4. (Right) The unique possible world (PW) T3 reconciling TCEN and TNDC via inferred relationships

Figure 1. National Diversity Council map (NDC) vs. Census Bureau map (CEN)

• Github link:https://github.com/EulerProject/ASIST17

• Email:[email protected]

West

Southwest Southeast

Midwest North-east

West

South

Midwest North-east

PacificMountain

CentralEastern

West

South

Midwest

North-east

RESEARCHDESIGN

Step1. SupplyinputtaxonomiesT1 andT2Step2.FormulateRCC-5articulationsbetweenT1 andT2Step3. IterativelyeditarticulationsinEuler/X

Y X X YX Y X Y X Y

CongruenceX == Y

InclusionX > Y

Inverse InclusionX < Y

OverlapX>< Y

DisjointnessX ! Y

T1 T2

T1 T2

Inconsistent (N=0) Ambiguous (N>1)

T3

Add/Edit Articulations A

Euler/X

N Possible Worlds

N=1 N=0 or N>1

R1

R2

R3

R4

R5

R6

R7

R8

R9

CEN.Midwest

CEN.USATZ.USA

CEN.West

CEN.NortheastTZ.Eastern\CEN.Midwest

TZ.Eastern\CEN.South

CEN.South

CEN.South*TZ.CentralTZ.Central\CEN.Midwest

CEN.South\TZ.Eastern

CEN.South\TZ.Mountain

TZ.Central

CEN.Midwest\TZ.Eastern

TZ.Mountain\CEN.SouthTZ.Mountain

CEN.Midwest\TZ.Mountain

TZ.Mountain\CEN.Midwest

CEN.Midwest*TZ.Mountain

CEN.Midwest\TZ.Central

TZ.Mountain\CEN.West

CEN.Midwest*TZ.Eastern

CEN.West*TZ.Mountain

CEN.South*TZ.MountainCEN.South\TZ.Central

TZ.Eastern

CEN.South*TZ.Eastern

CEN.Midwest*TZ.CentralTZ.Central\CEN.South

TZ.PacificCEN.West\TZ.Mountain

Nodes

CEN 4newComb 18comb 1TZ 4

Edges

input 6inferred 37

CEN.IL NDC.IL==

CEN.IN NDC.IN==

CEN.RI NDC.RI==

CEN.IA NDC.IA==

CEN.WV NDC.WV==

CEN.KS NDC.KS==

CEN.KY NDC.KY==

CEN.TX NDC.TX==

CEN.NortheastCEN.VTCEN.MA

CEN.ME

CEN.CT

CEN.PA

CEN.NY

CEN.NH

CEN.NJ

CEN.South

CEN.TN

CEN.MS

CEN.MD

CEN.DC

CEN.DE

CEN.VA

CEN.FL

CEN.AR

CEN.AL

CEN.OK

CEN.SC

CEN.LACEN.GA

CEN.NC

CEN.ID NDC.ID==

NDC.TN==

CEN.WY NDC.WY==

NDC.VT==

NDC.MS==

CEN.MT NDC.MT==

NDC.MA==

CEN.USA

CEN.Midwest

CEN.West

NDC.ME==

NDC.MD==

CEN.MI NDC.MI==

CEN.MN NDC.MN==

NDC.DC==

NDC.DE==

CEN.OR NDC.OR==

CEN.OH NDC.OH==

NDC.VA==

NDC.FL==

NDC.AR==

CEN.AZ NDC.AZ==

NDC.AL==

NDC.OK==

NDC.CT==

CEN.CO NDC.CO==

CEN.CA NDC.CA==

CEN.SD NDC.SD==

NDC.SC==

CEN.MO

CEN.ND

CEN.NE

CEN.WI

NDC.LA==

NDC.MO==

CEN.UT NDC.UT==

NDC.GA==

NDC.PA==

CEN.NV

CEN.NM

CEN.WA

NDC.NY==

NDC.NV==

NDC.NM==

NDC.WA==

NDC.NH==

NDC.NJ==

NDC.ND==

NDC.NE==

NDC.WI==

NDC.NC==

NDC.West

NDC.Midwest

NDC.Northeast

NDC.Southeast

NDC.USA

NDC.Southwest

Nodes

CEN 54NDC 55 Edges

isa_CEN 53isa_NDC 54Art. 49

CEN.West

NDC.Southwest

CEN.USANDC.USA

CEN.Northeast

NDC.Northeast

CEN.SouthNDC.Southeast

NDC.West

CEN.DCNDC.DC

CEN.NMNDC.NM

CEN.NDNDC.ND

CEN.MidwestNDC.Midwest

CEN.AZNDC.AZ

CEN.CANDC.CA

CEN.MTNDC.MT

CEN.MANDC.MA

CEN.INNDC.IN

CEN.NVNDC.NV

CEN.MDNDC.MD

CEN.CTNDC.CT

CEN.NHNDC.NH

CEN.KYNDC.KY

CEN.PANDC.PA

CEN.CONDC.CO

CEN.WANDC.WA

CEN.MINDC.MI

CEN.VANDC.VA

CEN.WINDC.WI

CEN.NENDC.NE

CEN.SDNDC.SD

CEN.MNNDC.MN

CEN.MSNDC.MS

CEN.IDNDC.ID

CEN.WVNDC.WV

CEN.NYNDC.NY

CEN.NJNDC.NJ

CEN.UTNDC.UT

CEN.MENDC.ME

CEN.ILNDC.IL

CEN.TNNDC.TN

CEN.VTNDC.VT

CEN.GANDC.GA

CEN.DENDC.DE

CEN.NCNDC.NC

CEN.OKNDC.OK

CEN.MONDC.MO

CEN.SCNDC.SC

CEN.ARNDC.AR

CEN.TXNDC.TX

CEN.LANDC.LA

CEN.OHNDC.OH

CEN.IANDC.IA

CEN.KSNDC.KS

CEN.RINDC.RI

CEN.WYNDC.WY

CEN.FLNDC.FL

CEN.ORNDC.OR

CEN.ALNDC.AL

Nodes

CEN 3NDC 4comb 51 Edges

input 61inferred 3

overlapsinferred 3

CEN.Northeast

TZ.Eastern

<

CEN.Midwest><

TZ.Mountain

><

TZ.Pacific

!

CEN.South

><

><

!

TZ.Central

><

CEN.USA

CEN.West

TZ.USA

==

!

><

!

Nodes

CEN 5TZ 5

Edges

isa_CEN 4isa_TZ 4Art. 12

CEN.Midwest

CEN.USATZ.USA

TZ.Eastern

TZ.Central

TZ.Mountain

CEN.South

CEN.Northeast

CEN.West TZ.Pacific

Nodes

CEN 4comb 1TZ 4

Edges

input 7overlapsinput 6overlapsinferred 1

inferred 1

R1 R2

R3

R4

R5

R6 R7

R8

R9

Figure 2. The process of aligning taxonomies T1 and T2 with Euler/X

Figure 5. Top-downinput alignments between TCEN and TTZ

Figure 6. The unique PW for the TCEN with TTZ alignment

Figure 10. Combined concepts solution for TCEN and TTZ

taxonomy CEN Census_Regions(USA Northeast Midwest South West)(Northeast CT MA ME NH NJ NY PA RI VT)(Midwest IL IN IA KS MI MN MO NE ND OH SD WI)(South AL AR DE DC FL GA KY LA MD MS NC OK SC TN TX VA WV)(West AZ CA CO ID MT NV NM OR UT WA WY)

taxonomy NDC National_Diversity_Council(USA Midwest Northeast Southeast Southwest West)(Northeast CT DC DE MD MA ME NH NJ NY PA RI VT)(Midwest IA IL IN KS MI MN MO ND NE OH SD WI)(Southeast AL AR FL GA KY LA MS NC SC TN VA WV)(Southwest AZ NM OK TX)(West CA CO ID MT NV OR WA WY UT)

articulations CEN NDC[CEN.AL equals NDC.AL][CEN.AR equals NDC.AR][CEN.AZ equals NDC.AZ][CEN.CA equals NDC.CA][CEN.CO equals NDC.CO][CEN.CT equals NDC.CT][CEN.DC equals NDC.DC][CEN.DE equals NDC.DE][CEN.FL equals NDC.FL][CEN.GA equals NDC.GA][CEN.IA equals NDC.IA][CEN.ID equals NDC.ID][CEN.IL equals NDC.IL][CEN.IN equals NDC.IN][CEN.KS equals NDC.KS][CEN.KY equals NDC.KY][CEN.LA equals NDC.LA][CEN.MA equals NDC.MA][CEN.MD equals NDC.MD][CEN.ME equals NDC.ME][CEN.MI equals NDC.MI][CEN.MN equals NDC.MN]...

Quick Scan!

taxonomy CEN Census_Regions(USA Midwest South West Northeast)

taxonomy TZ Time_Zone(USA Pacific Mountain Central Eastern)

articulations CEN TZ[CEN.Midwest disjoint TZ.Pacific][CEN.Midwest overlaps TZ.Eastern][CEN.Midwest overlaps TZ.Mountain][CEN.Northeast is_included_in TZ.Eastern][CEN.South disjoint TZ.Pacific][CEN.South overlaps TZ.Central][CEN.South overlaps TZ.Eastern][CEN.South overlaps TZ.Mountain][CEN.USA equals TZ.USA][CEN.West disjoint TZ.Central][CEN.West disjoint TZ.Eastern][CEN.West overlaps TZ.Mountain]

Ludäscher:Whole-Tale++ 48

Page 49: Whole-Tale: The Experience of Research

Foranothertime?Non-unitary syntheses

of systematic knowledgeNico Franz

School of Life Sciences, Arizona State University

CIRSS Seminar – Center for Informatics Research in Science and Scholarship

February 17, 2017 – iSchool, University of Illinois Urbana-Champaign

@ http://www.slideshare.net/taxonbytes/franz-2017-uiuc-cirss-non-unitary-syntheses-of-systematic-knowledge 49Ludäscher:Whole-Tale++

Tracingtaxonomicnames(concepts!)overtime…

Page 50: Whole-Tale: The Experience of Research

Taxonomic concept alignment, Andropogon glomeratus-virginicus complex, spanning across 11 classifications authored 1889-2015

• 36 unique taxonomic names

• 88 taxonomic concept labelsÞ name sec. author strings

• Alignment by A.S. WeakleyÞ row position = congruence

• 1/36 names with unique 1 : 1name : meaning cardinalityacross all classifications

• Andropogon virginicus

• Source: Franz et al. 20161

1 Franz et al. 2016. Names are not good enough: reasoning over taxonomic change in the Andropogon complex.Semantic Web Journal (IOS). doi:10.3233/SW-160220

Page 51: Whole-Tale: The Experience of Research

http://taxonbytes.org/wp-content/uploads/2014/10/Peet-BIGCB-2014-Changing-Perspectives-on-Plant-Distributions.pdf51Ludäscher:Whole-Tale++

Page 52: Whole-Tale: The Experience of Research

Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)

"Taxonomic concept labels"identify input concept regions

RCC–5 articulations providedfor each species-level concept

• Input visualization: MSW3 (2005) versus MSW2 (1993)

Source: Franz et al. 2016. Two influential primate classifications logical aligned. doi:10.1093/sysbio/syw023

52Ludäscher:Whole-Tale++

Page 53: Whole-Tale: The Experience of Research

• Alignment visualization: "grey means taxonomically congruent"

Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)

53Ludäscher:Whole-Tale++

Page 54: Whole-Tale: The Experience of Research

One name &congruent region

Many names &congruent region

One name &non-congruent regions

Many names &non-congruent regions

New names &exclusive regions

• Application of coverage constraint: parent-to-parent articulations (><) arefully defined by alignment signal propagated from their respective children.

è Sensible when complete sampling of children is intended.

Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)

• Alignment visualization: "grey means taxonomically congruent"

54Ludäscher:Whole-Tale++

Page 55: Whole-Tale: The Experience of Research

1 in 3 names is unreliable across MSW2/MSW3 classifications

Source: Franz et al. 2016. Two influential primate classifications logical aligned. doi:10.1093/sysbio/syw023

55Ludäscher:Whole-Tale++

Page 56: Whole-Tale: The Experience of Research

The 'consensus' The 'bible'

The (formerly) federal

'standard'

The 'best', latest regional flora

"Controlling the taxonomic variable"

Expert viewsare in

conflict

"Just bad"

Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610

56Ludäscher:Whole-Tale++

Page 57: Whole-Tale: The Experience of Research

The 'consensus' The 'bible'

The (formerly) federal

'standard'

The 'best', latest regional flora

Impact:Name-based aggregation has created

a novel synthesis that nobody believes in

"Controlling the taxonomic variable"

"Just bad"

Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610

57Ludäscher:Whole-Tale++

Page 58: Whole-Tale: The Experience of Research

The 'consensus' The 'bible'

The (formerly) federal

'standard'

The 'best', latest regional flora

"Controlling the taxonomic variable"

"Just bad"

Expert viewsare

reconciled

Solution:Instead of aggregating

an artificial 'consensus',build translation services

Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610

58Ludäscher:Whole-Tale++

Page 59: Whole-Tale: The Experience of Research

Leavingtaxonandspeciesheadaches…• ToillustrateEulerthinkofasimplerusecase:• Agreeingtodisagree!• …whentherearemultiple,legitimateperspectives

• Sortingthingsout!– Eulerasataxonconcept(&name)“microscope”...– ..or“timemachine”?

59Ludäscher:Whole-Tale++

Page 60: Whole-Tale: The Experience of Research

TwoTaxonomies:NDC vs CEN

“…in the face of incompatible information or data structures among users or among thosespecifying the system, attempts to create unitary knowledge categories are futile. Rather, parallelor multiple representational forms are required” [Bowker & Star, 2000, p.159]

West

Southwest Southeast

Midwest North-east

West

South

Midwest North-east

NationalDiversityCouncilmap(NDC) USCensusBuero map(CEN)

Source:Yi-Yun(Jessica)Cheng(PhDstudent,iSchool @Illinois)Ludäscher:Whole-Tale++ 60

Page 61: Whole-Tale: The Experience of Research

Thetaxonomies

Ludäscher:Whole-Tale++

• TheCensusRegionsMap(CEN),consistsoffour regions:West,Midwest,Northeast,andSouth,i.e.,thecontiguous48statesandWashingtonD.C.

West

South

Midwest

North-east

61

Page 62: Whole-Tale: The Experience of Research

Thetaxonomies

• TheNationalDiversityCouncilMap(NDC),consistsoffiveregions:West,Southwest,Midwest,Northeast,Southeast,the48statesandWashingtonD.C.

NDC(withstates)

West

Southwest Southeast

Midwest North-east

• NDC splits South into SW and SE

• Do NDC and CEN agree on “West”? “Midwest”? …

• How can we sort this out?

Ludäscher:Whole-Tale++ 62

Page 63: Whole-Tale: The Experience of Research

Sortingthingsout…

Ludäscher:Whole-Tale++

CEN.Midwest

CEN.USA

CEN.South CEN.West CEN.Northeast NDC.Northeast

NDC.USA

NDC.Southeast NDC.Midwest NDC.Southwest NDC.West

Nodes

CEN 5NDC 6 Edges

is_a (CEN) 4is_a (NDC) 5

CEN.South

NDC.Northeast

o

NDC.Southwest

o

NDC.Southeast>

CEN.Midwest NDC.Midwest=

CEN.USA

CEN.West

CEN.NortheastNDC.USA

=

!

oNDC.West

>

<

Nodes

CEN 5NDC 6 Edges

is_a (CEN) 4is_a (NDC) 5articulations 9

CEN.Midwest

CEN.USA

CEN.South CEN.West CEN.Northeast NDC.Northeast

NDC.USA

NDC.Southeast NDC.Midwest NDC.Southwest NDC.West

Nodes

CEN 5NDC 6 Edges

is_a (CEN) 4is_a (NDC) 5

• Given:– taxonomiesT1,T2– andrelationsT1~T2

(articulations,alignment)• Find:

– mergedtaxonomyT3• Suchthat:

– T1,T2arepreserved– allpairwiserelationsare

explicit

T1 T2

63

Page 64: Whole-Tale: The Experience of Research

5waystorelateconcepts(regions)

• Idea:relateconceptsXandYwitharticulations

• ArticulationLanguage:RegionConnectionCalculus (RCC5):congruence,inclusion,inverseinclusion,overlap,disjointness

Y X X YX Y X Y X Y

CongruenceX == Y

InclusionX > Y

Inverse InclusionX < Y

OverlapX>< Y

DisjointnessX ! Y

CEN.South

NDC.Northeast

><

NDC.Southwest

><

NDC.Southeast>

CEN.Midwest NDC.Midwest==

CEN.USA

CEN.West

CEN.NortheastNDC.USA

==

!

><NDC.West

>

<

Nodes

CEN 5NDC 6 Edges

is_a (CEN) 4is_a (NDC) 5articulations 9

Ludäscher:Whole-Tale++ 64

Page 65: Whole-Tale: The Experience of Research

MergedtaxonomyT3

CEN.South

NDC.Northeast

NDC.Southwest

CEN.USANDC.USA

CEN.West

CEN.Northeast

NDC.Southeast

NDC.West

CEN.MidwestNDC.Midwest

Nodes

CEN 3NDC 4

congruent 2 Edges

is_a (input) 8overlaps (input) 3

CEN.Midwest

CEN.USA

CEN.South CEN.West CEN.Northeast NDC.Northeast

NDC.USA

NDC.Southeast NDC.Midwest NDC.Southwest NDC.West

Nodes

CEN 5NDC 6 Edges

is_a (CEN) 4is_a (NDC) 5

CEN.Midwest

CEN.USA

CEN.South CEN.West CEN.Northeast NDC.Northeast

NDC.USA

NDC.Southeast NDC.Midwest NDC.Southwest NDC.West

Nodes

CEN 5NDC 6 Edges

is_a (CEN) 4is_a (NDC) 5

CEN.South

NDC.Northeast

><

NDC.Southwest

><

NDC.Southeast>

CEN.Midwest NDC.Midwest==

CEN.USA

CEN.West

CEN.NortheastNDC.USA

==

!

><NDC.West

>

<

Nodes

CEN 5NDC 6 Edges

is_a (CEN) 4is_a (NDC) 5articulations 9

T1 T2

T1~T2 T3

Ludäscher:Whole-Tale++ 65

Page 66: Whole-Tale: The Experience of Research

HowwealigntwotaxonomiesT1andT2

• Step1. SupplyinputtaxonomiesT1andT2

• Step2.DescribetherelationshipsbetweenT1 andT2

• Step3. IterativelyeditarticulationsinEuler/X

T1 T2

T1 T2

Inconsistent (N=0) Ambiguous (N>1)

T3

Add/Edit Articulations A

Euler/X

N Possible Worlds

N=1 N=0 or N>1

• … but where do the articulationscome from??– expert opinion– automatically derived from data

Ludäscher:Whole-Tale++ 66

Page 67: Whole-Tale: The Experience of Research

Case1:CensusRegionvs.NationalDiversityCouncil

Ludäscher:Whole-Tale++

West

South

Midwest

North-east

NDC(withstates)

West

Southwest Southeast

Midwest North-east

CEN NDC

• … but where do the articulationscome from??– automatically derived from data– expert input

67

Page 68: Whole-Tale: The Experience of Research

Ludäscher:Whole-Tale++

CEN.IL NDC.IL==

CEN.IN NDC.IN==

CEN.RI NDC.RI==

CEN.IA NDC.IA==

CEN.WV NDC.WV==

CEN.KS NDC.KS==

CEN.KY NDC.KY==

CEN.TX NDC.TX==

CEN.NortheastCEN.VTCEN.MA

CEN.ME

CEN.CT

CEN.PA

CEN.NY

CEN.NH

CEN.NJ

CEN.South

CEN.TN

CEN.MS

CEN.MD

CEN.DC

CEN.DE

CEN.VA

CEN.FL

CEN.AR

CEN.AL

CEN.OK

CEN.SC

CEN.LACEN.GA

CEN.NC

CEN.ID NDC.ID==

NDC.TN==

CEN.WY NDC.WY==

NDC.VT==

NDC.MS==

CEN.MT NDC.MT==

NDC.MA==

CEN.USA

CEN.Midwest

CEN.West

NDC.ME==

NDC.MD==

CEN.MI NDC.MI==

CEN.MN NDC.MN==

NDC.DC==

NDC.DE==

CEN.OR NDC.OR==

CEN.OH NDC.OH==

NDC.VA==

NDC.FL==

NDC.AR==

CEN.AZ NDC.AZ==

NDC.AL==

NDC.OK==

NDC.CT==

CEN.CO NDC.CO==

CEN.CA NDC.CA==

CEN.SD NDC.SD==

NDC.SC==

CEN.MO

CEN.ND

CEN.NE

CEN.WI

NDC.LA==

NDC.MO==

CEN.UT NDC.UT==

NDC.GA==

NDC.PA==

CEN.NV

CEN.NM

CEN.WA

NDC.NY==

NDC.NV==

NDC.NM==

NDC.WA==

NDC.NH==

NDC.NJ==

NDC.ND==

NDC.NE==

NDC.WI==

NDC.NC==

NDC.West

NDC.Midwest

NDC.Northeast

NDC.Southeast

NDC.USA

NDC.Southwest

Nodes

CEN 54NDC 55 Edges

isa_CEN 53isa_NDC 54Art. 49

CEN.IL NDC.IL==

CEN.IN NDC.IN==

CEN.RI NDC.RI==

CEN.IA NDC.IA==

CEN.WV NDC.WV==

CEN.KS NDC.KS==

CEN.KY NDC.KY==

CEN.TX NDC.TX==

CEN.NortheastCEN.VTCEN.MA

CEN.ME

CEN.CT

CEN.PA

CEN.NY

CEN.NH

CEN.NJ

CEN.South

CEN.TN

CEN.MS

CEN.MD

CEN.DC

CEN.DE

CEN.VA

CEN.FL

CEN.AR

CEN.AL

CEN.OK

CEN.SC

CEN.LACEN.GA

CEN.NC

CEN.ID NDC.ID==

NDC.TN==

CEN.WY NDC.WY==

NDC.VT==

NDC.MS==

CEN.MT NDC.MT==

NDC.MA==

CEN.USA

CEN.Midwest

CEN.West

NDC.ME==

NDC.MD==

CEN.MI NDC.MI==

CEN.MN NDC.MN==

NDC.DC==

NDC.DE==

CEN.OR NDC.OR==

CEN.OH NDC.OH==

NDC.VA==

NDC.FL==

NDC.AR==

CEN.AZ NDC.AZ==

NDC.AL==

NDC.OK==

NDC.CT==

CEN.CO NDC.CO==

CEN.CA NDC.CA==

CEN.SD NDC.SD==

NDC.SC==

CEN.MO

CEN.ND

CEN.NE

CEN.WI

NDC.LA==

NDC.MO==

CEN.UT NDC.UT==

NDC.GA==

NDC.PA==

CEN.NV

CEN.NM

CEN.WA

NDC.NY==

NDC.NV==

NDC.NM==

NDC.WA==

NDC.NH==

NDC.NJ==

NDC.ND==

NDC.NE==

NDC.WI==

NDC.NC==

NDC.West

NDC.Midwest

NDC.Northeast

NDC.Southeast

NDC.USA

NDC.Southwest

Nodes

CEN 54NDC 55 Edges

isa_CEN 53isa_NDC 54Art. 49

68

Page 69: Whole-Tale: The Experience of Research

Ludäscher:Whole-Tale++

CEN.West

NDC.Southwest

CEN.USANDC.USA

CEN.Northeast

NDC.Northeast

CEN.SouthNDC.Southeast

NDC.West

CEN.DCNDC.DC

CEN.NMNDC.NM

CEN.NDNDC.ND

CEN.MidwestNDC.Midwest

CEN.AZNDC.AZ

CEN.CANDC.CA

CEN.MTNDC.MT

CEN.MANDC.MA

CEN.INNDC.IN

CEN.NVNDC.NV

CEN.MDNDC.MD

CEN.CTNDC.CT

CEN.NHNDC.NH

CEN.KYNDC.KY

CEN.PANDC.PA

CEN.CONDC.CO

CEN.WANDC.WA

CEN.MINDC.MI

CEN.VANDC.VA

CEN.WINDC.WI

CEN.NENDC.NE

CEN.SDNDC.SD

CEN.MNNDC.MN

CEN.MSNDC.MS

CEN.IDNDC.ID

CEN.WVNDC.WV

CEN.NYNDC.NY

CEN.NJNDC.NJ

CEN.UTNDC.UT

CEN.MENDC.ME

CEN.ILNDC.IL

CEN.TNNDC.TN

CEN.VTNDC.VT

CEN.GANDC.GA

CEN.DENDC.DE

CEN.NCNDC.NC

CEN.OKNDC.OK

CEN.MONDC.MO

CEN.SCNDC.SC

CEN.ARNDC.AR

CEN.TXNDC.TX

CEN.LANDC.LA

CEN.OHNDC.OH

CEN.IANDC.IA

CEN.KSNDC.KS

CEN.RINDC.RI

CEN.WYNDC.WY

CEN.FLNDC.FL

CEN.ORNDC.OR

CEN.ALNDC.AL

Nodes

CEN 3NDC 4comb 51 Edges

input 61inferred 3

overlapsinferred 3

CEN.West

NDC.Southwest

CEN.USANDC.USA

CEN.Northeast

NDC.Northeast

CEN.SouthNDC.Southeast

NDC.West

CEN.DCNDC.DC

CEN.NMNDC.NM

CEN.NDNDC.ND

CEN.MidwestNDC.Midwest

CEN.AZNDC.AZ

CEN.CANDC.CA

CEN.MTNDC.MT

CEN.MANDC.MA

CEN.INNDC.IN

CEN.NVNDC.NV

CEN.MDNDC.MD

CEN.CTNDC.CT

CEN.NHNDC.NH

CEN.KYNDC.KY

CEN.PANDC.PA

CEN.CONDC.CO

CEN.WANDC.WA

CEN.MINDC.MI

CEN.VANDC.VA

CEN.WINDC.WI

CEN.NENDC.NE

CEN.SDNDC.SD

CEN.MNNDC.MN

CEN.MSNDC.MS

CEN.IDNDC.ID

CEN.WVNDC.WV

CEN.NYNDC.NY

CEN.NJNDC.NJ

CEN.UTNDC.UT

CEN.MENDC.ME

CEN.ILNDC.IL

CEN.TNNDC.TN

CEN.VTNDC.VT

CEN.GANDC.GA

CEN.DENDC.DE

CEN.NCNDC.NC

CEN.OKNDC.OK

CEN.MONDC.MO

CEN.SCNDC.SC

CEN.ARNDC.AR

CEN.TXNDC.TX

CEN.LANDC.LA

CEN.OHNDC.OH

CEN.IANDC.IA

CEN.KSNDC.KS

CEN.RINDC.RI

CEN.WYNDC.WY

CEN.FLNDC.FL

CEN.ORNDC.OR

CEN.ALNDC.AL

Nodes

CEN 3NDC 4comb 51 Edges

input 61inferred 3

overlapsinferred 3

USA,MidwestandState-levelalignmentsareallcongruent

69

Page 70: Whole-Tale: The Experience of Research

Ludäscher:Whole-Tale++

CEN.West

NDC.Southwest

CEN.USANDC.USA

CEN.Northeast

NDC.Northeast

CEN.SouthNDC.Southeast

NDC.West

CEN.DCNDC.DC

CEN.NMNDC.NM

CEN.NDNDC.ND

CEN.MidwestNDC.Midwest

CEN.AZNDC.AZ

CEN.CANDC.CA

CEN.MTNDC.MT

CEN.MANDC.MA

CEN.INNDC.IN

CEN.NVNDC.NV

CEN.MDNDC.MD

CEN.CTNDC.CT

CEN.NHNDC.NH

CEN.KYNDC.KY

CEN.PANDC.PA

CEN.CONDC.CO

CEN.WANDC.WA

CEN.MINDC.MI

CEN.VANDC.VA

CEN.WINDC.WI

CEN.NENDC.NE

CEN.SDNDC.SD

CEN.MNNDC.MN

CEN.MSNDC.MS

CEN.IDNDC.ID

CEN.WVNDC.WV

CEN.NYNDC.NY

CEN.NJNDC.NJ

CEN.UTNDC.UT

CEN.MENDC.ME

CEN.ILNDC.IL

CEN.TNNDC.TN

CEN.VTNDC.VT

CEN.GANDC.GA

CEN.DENDC.DE

CEN.NCNDC.NC

CEN.OKNDC.OK

CEN.MONDC.MO

CEN.SCNDC.SC

CEN.ARNDC.AR

CEN.TXNDC.TX

CEN.LANDC.LA

CEN.OHNDC.OH

CEN.IANDC.IA

CEN.KSNDC.KS

CEN.RINDC.RI

CEN.WYNDC.WY

CEN.FLNDC.FL

CEN.ORNDC.OR

CEN.ALNDC.AL

Nodes

CEN 3NDC 4comb 51 Edges

input 61inferred 3

overlapsinferred 3

CEN.West

NDC.Southwest

CEN.USANDC.USA

CEN.Northeast

NDC.Northeast

CEN.SouthNDC.Southeast

NDC.West

CEN.DCNDC.DC

CEN.NMNDC.NM

CEN.NDNDC.ND

CEN.MidwestNDC.Midwest

CEN.AZNDC.AZ

CEN.CANDC.CA

CEN.MTNDC.MT

CEN.MANDC.MA

CEN.INNDC.IN

CEN.NVNDC.NV

CEN.MDNDC.MD

CEN.CTNDC.CT

CEN.NHNDC.NH

CEN.KYNDC.KY

CEN.PANDC.PA

CEN.CONDC.CO

CEN.WANDC.WA

CEN.MINDC.MI

CEN.VANDC.VA

CEN.WINDC.WI

CEN.NENDC.NE

CEN.SDNDC.SD

CEN.MNNDC.MN

CEN.MSNDC.MS

CEN.IDNDC.ID

CEN.WVNDC.WV

CEN.NYNDC.NY

CEN.NJNDC.NJ

CEN.UTNDC.UT

CEN.MENDC.ME

CEN.ILNDC.IL

CEN.TNNDC.TN

CEN.VTNDC.VT

CEN.GANDC.GA

CEN.DENDC.DE

CEN.NCNDC.NC

CEN.OKNDC.OK

CEN.MONDC.MO

CEN.SCNDC.SC

CEN.ARNDC.AR

CEN.TXNDC.TX

CEN.LANDC.LA

CEN.OHNDC.OH

CEN.IANDC.IA

CEN.KSNDC.KS

CEN.RINDC.RI

CEN.WYNDC.WY

CEN.FLNDC.FL

CEN.ORNDC.OR

CEN.ALNDC.AL

Nodes

CEN 3NDC 4comb 51 Edges

input 61inferred 3

overlapsinferred 3

Theoverlappingrelationsareautomaticallyderivedfromdata

70

Page 71: Whole-Tale: The Experience of Research

Ludäscher:Whole-Tale++

CEN.West

NDC.Southwest

CEN.USANDC.USA

CEN.Northeast

NDC.Northeast

CEN.SouthNDC.Southeast

NDC.West

CEN.DCNDC.DC

CEN.NMNDC.NM

CEN.NDNDC.ND

CEN.MidwestNDC.Midwest

CEN.AZNDC.AZ

CEN.CANDC.CA

CEN.MTNDC.MT

CEN.MANDC.MA

CEN.INNDC.IN

CEN.NVNDC.NV

CEN.MDNDC.MD

CEN.CTNDC.CT

CEN.NHNDC.NH

CEN.KYNDC.KY

CEN.PANDC.PA

CEN.CONDC.CO

CEN.WANDC.WA

CEN.MINDC.MI

CEN.VANDC.VA

CEN.WINDC.WI

CEN.NENDC.NE

CEN.SDNDC.SD

CEN.MNNDC.MN

CEN.MSNDC.MS

CEN.IDNDC.ID

CEN.WVNDC.WV

CEN.NYNDC.NY

CEN.NJNDC.NJ

CEN.UTNDC.UT

CEN.MENDC.ME

CEN.ILNDC.IL

CEN.TNNDC.TN

CEN.VTNDC.VT

CEN.GANDC.GA

CEN.DENDC.DE

CEN.NCNDC.NC

CEN.OKNDC.OK

CEN.MONDC.MO

CEN.SCNDC.SC

CEN.ARNDC.AR

CEN.TXNDC.TX

CEN.LANDC.LA

CEN.OHNDC.OH

CEN.IANDC.IA

CEN.KSNDC.KS

CEN.RINDC.RI

CEN.WYNDC.WY

CEN.FLNDC.FL

CEN.ORNDC.OR

CEN.ALNDC.AL

Nodes

CEN 3NDC 4comb 51 Edges

input 61inferred 3

overlapsinferred 3

CEN.West

NDC.Southwest

CEN.USANDC.USA

CEN.Northeast

NDC.Northeast

CEN.SouthNDC.Southeast

NDC.West

CEN.DCNDC.DC

CEN.NMNDC.NM

CEN.NDNDC.ND

CEN.MidwestNDC.Midwest

CEN.AZNDC.AZ

CEN.CANDC.CA

CEN.MTNDC.MT

CEN.MANDC.MA

CEN.INNDC.IN

CEN.NVNDC.NV

CEN.MDNDC.MD

CEN.CTNDC.CT

CEN.NHNDC.NH

CEN.KYNDC.KY

CEN.PANDC.PA

CEN.CONDC.CO

CEN.WANDC.WA

CEN.MINDC.MI

CEN.VANDC.VA

CEN.WINDC.WI

CEN.NENDC.NE

CEN.SDNDC.SD

CEN.MNNDC.MN

CEN.MSNDC.MS

CEN.IDNDC.ID

CEN.WVNDC.WV

CEN.NYNDC.NY

CEN.NJNDC.NJ

CEN.UTNDC.UT

CEN.MENDC.ME

CEN.ILNDC.IL

CEN.TNNDC.TN

CEN.VTNDC.VT

CEN.GANDC.GA

CEN.DENDC.DE

CEN.NCNDC.NC

CEN.OKNDC.OK

CEN.MONDC.MO

CEN.SCNDC.SC

CEN.ARNDC.AR

CEN.TXNDC.TX

CEN.LANDC.LA

CEN.OHNDC.OH

CEN.IANDC.IA

CEN.KSNDC.KS

CEN.RINDC.RI

CEN.WYNDC.WY

CEN.FLNDC.FL

CEN.ORNDC.OR

CEN.ALNDC.AL

Nodes

CEN 3NDC 4comb 51 Edges

input 61inferred 3

overlapsinferred 3

DCisinboththeSouthandtheNortheast

71

Page 72: Whole-Tale: The Experience of Research

Case2:CensusRegionvsTimeZone

Ludäscher:Whole-Tale++

PacificMountain

CentralEastern

West

South

Midwest

North-east

CEN TZ

• … but where do the articulationscome from??– automatically derived from data– expert input

72

Page 73: Whole-Tale: The Experience of Research

Ludäscher:Whole-Tale++

CEN.Northeast

TZ.Eastern

<

CEN.Midwest><

TZ.Mountain

><

TZ.Pacific

!

CEN.South

><

><

!

TZ.Central

><

CEN.USA

CEN.West

TZ.USA

==

!

><

!

Nodes

CEN 5TZ 5

Edges

isa_CEN 4isa_TZ 4Art. 12

CEN.Midwest

CEN.USATZ.USA

TZ.Eastern

TZ.Central

TZ.Mountain

CEN.South

CEN.Northeast

CEN.West TZ.Pacific

Nodes

CEN 4comb 1TZ 4

Edges

input 7overlapsinput 6overlapsinferred 1

inferred 1

InputOutput:PossibleWorld

Top-downregionalalignment

73

Page 74: Whole-Tale: The Experience of Research

Howdoweknowifour‘expertarticulations’arecorrect?

Ludäscher:Whole-Tale++

R1 R2

R3

R4

R5

R6 R7

R8

R9

GIS solution as the Ground Truth..

74

Page 75: Whole-Tale: The Experience of Research

Ludäscher:Whole-Tale++

R1

R2

R3

R4

R5

R6

R7

R8

R9

CEN.Midwest

CEN.USATZ.USA

CEN.West

CEN.NortheastTZ.Eastern\CEN.Midwest

TZ.Eastern\CEN.South

CEN.South

CEN.South*TZ.CentralTZ.Central\CEN.Midwest

CEN.South\TZ.Eastern

CEN.South\TZ.Mountain

TZ.Central

CEN.Midwest\TZ.Eastern

TZ.Mountain\CEN.SouthTZ.Mountain

CEN.Midwest\TZ.Mountain

TZ.Mountain\CEN.Midwest

CEN.Midwest*TZ.Mountain

CEN.Midwest\TZ.Central

TZ.Mountain\CEN.West

CEN.Midwest*TZ.Eastern

CEN.West*TZ.Mountain

CEN.South*TZ.MountainCEN.South\TZ.Central

TZ.Eastern

CEN.South*TZ.Eastern

CEN.Midwest*TZ.CentralTZ.Central\CEN.South

TZ.PacificCEN.West\TZ.Mountain

Nodes

CEN 4newComb 18comb 1TZ 4

Edges

input 6inferred 37

Combinedconceptssolutionforregional-levelalignments

75

Page 76: Whole-Tale: The Experience of Research

DothetaxonomieshavetobespatialinordertouseRCC-5?

• No!Themoretypicalcasesfortaxonomyalignmentareusuallybetweennon-spatialtaxonomies– forwhichno“GISroute”ordirectvisualcuesaboutregionalextensionsareavailable

– theuseofRCC-5asanalignmentvocabularyisasuitableapproachtoperformawiderangeofmulti-hierarchyreconciliations

Ludäscher:Whole-Tale++ 76

Page 77: Whole-Tale: The Experience of Research

Conclusion&Discussion• Underscoresthebenefitsofdesigningdifferentalignmentworkflows(Bottom-upvs.Top-Down)– Bottom-up:non-overlappingrelationshipsatthelowest-levelarticulations,notsurehowtoalignthehigher-levelconcepts

– Top-Down:whenthereisoftenoverlappingleaf-levelrelations..Expertinputwillfrequentlybeneededtoestablishsuchexpectationsunderthetop-downapproach

Ludäscher:Whole-Tale++

https://github.com/EulerProject/[email protected]

77

Page 78: Whole-Tale: The Experience of Research

Implications

• Logic-basedtaxonomyalignmentapproach– Disambiguatename-basedtaxonomyalignmentovertime

• 40%oftheconceptsinbiologytaxonomiesundergoesnamechangeovertime(Franzetal.,2016)

– Maymitigateproblemsinequivalentcrosswalking• Membershipconditionproblemthatwasoftencriticizedincrosswalking

– Preservestheoriginaltaxonomieswhileprovidinganalignmentview

• Solvedataintegrationproblemsthathappeninthemorecoarse-grainedrelativecrosswalking

Ludäscher:Whole-Tale++

https://github.com/EulerProject/[email protected]

78

Page 79: Whole-Tale: The Experience of Research

• …Aristotle…• …Euler…• …• …GregWhitbread…

• [BPB93]J.H.Beach,S.Pramanik,andJ.H.Beaman.Hierarchictaxonomicdatabases.,Advances inComputerMethodsforSystematicBiology:ArtificialIntelligence,Databases,ComputerVision,1993

• [Ber95]WalterG.Berendsohn.Theconceptof“potentialtaxa” indatabases.Taxon,44:207–212,1995.

• [Ber03]WalterG.Berendsohn.MoReTax – HandlingFactualInformationLinkedtoTaxonomicConceptsinBiology.No.39inSchriftenreihe fürVegetationskunde.Bundesamt für Naturschutz,2003.

• [GG03]M.Geoffroy andA.Güntsch.Assemblingandnavigatingthepotentialtaxongraph.In[Ber03],pages71–82,2003.

• [TL07]Thau,D.,&Ludäscher,B.(2007).Reasoningabouttaxonomiesinfirst-orderlogic.EcologicalInformatics,2(3),195-209.

• [FP09]Franz,N.M.,&Peet,R.K.(2009).Perspectives:towardsalanguageformappingrelationshipsamongtaxonomicconcepts.SystematicsandBiodiversity,7(1),5-20.

• … 79

SomeEulerXHistory

Ludäscher:Whole-Tale++