Top Banner
Saving the Elephant with Slonik Agnieszka Figiel @agnessa480 UNEP-WCMC Railsberry 2013
21

Saving the Elephant with Slonik

May 11, 2015

Download

Career

Railsberry 2013 presentation about how we're saving the planet at WCMC using PostgreSQL :)
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Saving the Elephant with Slonik

Saving the Elephant with SlonikAgnieszka Figiel @agnessa480UNEP-WCMC

Railsberry 2013

Page 2: Saving the Elephant with Slonik

Taxon concepts and ranks

taxon conceptsranks

Page 3: Saving the Elephant with Slonik

A brief history of gorilla classification

Author & Year Scientific name

Savage1847

Troglodytes gorilla(Pan gorilla)

I. Geoffroy St. Hilaire 1952

Gorilla gorilla

Tuttle1967

Pan gorilla

Groves1967

Gorilla gorilla gorilla

homonym

synonym

split / merge

Page 4: Saving the Elephant with Slonik

A matter of opinion

Taxonomy A:Loxodonta africana

Taxonomy B:Loxodonta africanaLoxodonta cyclotis

Page 5: Saving the Elephant with Slonik

#1: CTE's

WITH name [ ( columns) ] AS ( attached query)primary query

Page 6: Saving the Elephant with Slonik
Page 7: Saving the Elephant with Slonik

WITH endemic_taxon_concepts AS ( SELECT taxon_concept_id FROM distributions GROUP BY taxon_concept_id HAVING COUNT(*) = 1), countries_with_endemic_distributions AS ( SELECT d.geo_entity_id, COUNT(d.taxon_concept_id) AS cnt FROM distributions d INNER JOIN endemic_taxon_concepts q ON d.taxon_concept_id = q.taxon_concept_id GROUP BY d.geo_entity_id)SELECT geo_entities.name_en, cntFROM countries_with_endemic_distributions qINNER JOIN geo_entities ON geo_entities.id = q.geo_entity_idORDER BY cnt DESC

Page 8: Saving the Elephant with Slonik

name cnt

Indonesia 1353

Mexico 1069

Madagascar 970

Australia 886

Brazil 763

Ecuador 564

Papua New Guinea 561

South Africa 532

United States of America 520

Page 9: Saving the Elephant with Slonik

Data-modifying CTE's

WITH deactivated_geo_entities AS ( UPDATE geo_entities SET is_active = FALSE WHERE id IN (#{old_geo_entity_ids}) RETURNING id)UPDATE distributionsSET geo_entity_id = #{new_geo_entity_id}FROM deactivated_geo_entitiesWHERE distributions.geo_entity_id = deactivated_geo_entities.id

CTE = materialize by design

Page 10: Saving the Elephant with Slonik

#2: Recursive CTE's

WITH RECURSIVE name [ (columns) ] AS ( non-recursive term

UNION [ALL]

recursive term)primary query

Page 11: Saving the Elephant with Slonik

WITH RECURSIVE self_and_descendants (id, full_name) AS ( SELECT id, full_name FROM taxon_concepts WHERE id = 472 UNION SELECT hi.id, hi.full_name FROM taxon_concepts hi JOIN self_and_descendants d ON d.id = hi.parent_id)SELECT COUNT(*) FROM self_and_descendants

count

432

Page 12: Saving the Elephant with Slonik

WITH RECURSIVE self_and_ancestors ( parent_id, full_name, level) AS ( SELECT parent_id, full_name, 1 FROM taxon_concepts WHERE id = 5563 UNION SELECT hi.parent_id, hi.full_name, q.level + 1 FROM taxon_concepts hi JOIN self_and_ancestors q ON hi.id = q.parent_id )SELECT full_nameFROM self_and_ancestors ORDER BY level DESC

Page 13: Saving the Elephant with Slonik

WITH crocodile_ancestry AS ( WITH RECURSIVE self_and_ancestors ( -- [AS IN PREVIOUS SLIDE] ))SELECT ARRAY_TO_STRING(ARRAY_AGG(full_name), ' > ')AS breadcrumb FROM crocodile_ancestry

breadcrumb

Animalia > Chordata > Reptilia > Crocodylia > Crocodylidae > Crocodylus > Crocodylus niloticus

Page 14: Saving the Elephant with Slonik

Cascade with exceptions

Page 15: Saving the Elephant with Slonik
Page 16: Saving the Elephant with Slonik

WITH RECURSIVE cascading_refs(taxon_concept_id, exclusions) AS ( SELECT h.id, h_refs.excluded_taxon_concepts_ids FROM taxon_concepts h LEFT JOIN taxon_concept_references h_refs ON h_refs.taxon_concept_id = h.id WHERE h.id = 10 AND h_refs.reference_id = 369

UNION

SELECT hi.id, cascading_refs.exclusions FROM taxon_concepts hi JOIN cascading_refs ON cascading_refs.taxon_concept_id = hi.parent_id WHERE NOT COALESCE(cascading_refs.exclusions, ARRAY[]::INT[]) @> ARRAY[hi.id])UPDATE taxon_concepts SET has_std_ref = TRUEFROM cascading_refsWHERE cascading_refs.taxon_concept_id = taxon_concepts.id

Page 17: Saving the Elephant with Slonik

#3: Window functions

SELECT ROW_NUMBER() OVER(ORDER BY full_name), full_name FROM taxon_conceptsWHERE parent_id = 335 ORDER BY full_name

row_number full_name

1 Canis

2 Cerdocyon

3 Chrysocyon

4 Cuon

5 Dusicyon

Page 18: Saving the Elephant with Slonik

WITH RECURSIVE q(id, full_name, path) AS ( SELECT id, full_name, ARRAY[1] FROM taxon_concepts h WHERE id = 335 UNION SELECT hi.id, hi.full_name, q.path || ( ROW_NUMBER() OVER( PARTITION BY parent_id ORDER BY hi.full_name ) )::INT FROM taxon_concepts hi JOIN q ON hi.parent_id = q.id)SELECT path, full_name FROM qORDER BY path

CTE + window function

Page 19: Saving the Elephant with Slonik

path full_name

{1} Canidae

{1,1} Canis

{1,1,1} Canis adustus

{1,1,2} Canis aureus

{1,1,3} Canis familiaris

(...)

{1,1,7} Canis lupus

{1,1,7,1} Canis lupus crassodon

{1,1,7,2} Canis lupus dingo

{1,2} Cerdocyon

{1,2,1} Cerdocyon thous

Page 20: Saving the Elephant with Slonik

With CTE and Windowing,SQL is Turing Complete.