Saving the Elephant with Slonik Agnieszka Figiel @agnessa480 UNEP-WCMC Railsberry 2013
May 11, 2015
Saving the Elephant with SlonikAgnieszka Figiel @agnessa480UNEP-WCMC
Railsberry 2013
Taxon concepts and ranks
taxon conceptsranks
A brief history of gorilla classification
Author & Year Scientific name
Savage1847
Troglodytes gorilla(Pan gorilla)
I. Geoffroy St. Hilaire 1952
Gorilla gorilla
Tuttle1967
Pan gorilla
Groves1967
Gorilla gorilla gorilla
homonym
synonym
split / merge
A matter of opinion
Taxonomy A:Loxodonta africana
Taxonomy B:Loxodonta africanaLoxodonta cyclotis
#1: CTE's
WITH name [ ( columns) ] AS ( attached query)primary query
WITH endemic_taxon_concepts AS ( SELECT taxon_concept_id FROM distributions GROUP BY taxon_concept_id HAVING COUNT(*) = 1), countries_with_endemic_distributions AS ( SELECT d.geo_entity_id, COUNT(d.taxon_concept_id) AS cnt FROM distributions d INNER JOIN endemic_taxon_concepts q ON d.taxon_concept_id = q.taxon_concept_id GROUP BY d.geo_entity_id)SELECT geo_entities.name_en, cntFROM countries_with_endemic_distributions qINNER JOIN geo_entities ON geo_entities.id = q.geo_entity_idORDER BY cnt DESC
name cnt
Indonesia 1353
Mexico 1069
Madagascar 970
Australia 886
Brazil 763
Ecuador 564
Papua New Guinea 561
South Africa 532
United States of America 520
Data-modifying CTE's
WITH deactivated_geo_entities AS ( UPDATE geo_entities SET is_active = FALSE WHERE id IN (#{old_geo_entity_ids}) RETURNING id)UPDATE distributionsSET geo_entity_id = #{new_geo_entity_id}FROM deactivated_geo_entitiesWHERE distributions.geo_entity_id = deactivated_geo_entities.id
CTE = materialize by design
#2: Recursive CTE's
WITH RECURSIVE name [ (columns) ] AS ( non-recursive term
UNION [ALL]
recursive term)primary query
WITH RECURSIVE self_and_descendants (id, full_name) AS ( SELECT id, full_name FROM taxon_concepts WHERE id = 472 UNION SELECT hi.id, hi.full_name FROM taxon_concepts hi JOIN self_and_descendants d ON d.id = hi.parent_id)SELECT COUNT(*) FROM self_and_descendants
count
432
WITH RECURSIVE self_and_ancestors ( parent_id, full_name, level) AS ( SELECT parent_id, full_name, 1 FROM taxon_concepts WHERE id = 5563 UNION SELECT hi.parent_id, hi.full_name, q.level + 1 FROM taxon_concepts hi JOIN self_and_ancestors q ON hi.id = q.parent_id )SELECT full_nameFROM self_and_ancestors ORDER BY level DESC
WITH crocodile_ancestry AS ( WITH RECURSIVE self_and_ancestors ( -- [AS IN PREVIOUS SLIDE] ))SELECT ARRAY_TO_STRING(ARRAY_AGG(full_name), ' > ')AS breadcrumb FROM crocodile_ancestry
breadcrumb
Animalia > Chordata > Reptilia > Crocodylia > Crocodylidae > Crocodylus > Crocodylus niloticus
Cascade with exceptions
WITH RECURSIVE cascading_refs(taxon_concept_id, exclusions) AS ( SELECT h.id, h_refs.excluded_taxon_concepts_ids FROM taxon_concepts h LEFT JOIN taxon_concept_references h_refs ON h_refs.taxon_concept_id = h.id WHERE h.id = 10 AND h_refs.reference_id = 369
UNION
SELECT hi.id, cascading_refs.exclusions FROM taxon_concepts hi JOIN cascading_refs ON cascading_refs.taxon_concept_id = hi.parent_id WHERE NOT COALESCE(cascading_refs.exclusions, ARRAY[]::INT[]) @> ARRAY[hi.id])UPDATE taxon_concepts SET has_std_ref = TRUEFROM cascading_refsWHERE cascading_refs.taxon_concept_id = taxon_concepts.id
#3: Window functions
SELECT ROW_NUMBER() OVER(ORDER BY full_name), full_name FROM taxon_conceptsWHERE parent_id = 335 ORDER BY full_name
row_number full_name
1 Canis
2 Cerdocyon
3 Chrysocyon
4 Cuon
5 Dusicyon
WITH RECURSIVE q(id, full_name, path) AS ( SELECT id, full_name, ARRAY[1] FROM taxon_concepts h WHERE id = 335 UNION SELECT hi.id, hi.full_name, q.path || ( ROW_NUMBER() OVER( PARTITION BY parent_id ORDER BY hi.full_name ) )::INT FROM taxon_concepts hi JOIN q ON hi.parent_id = q.id)SELECT path, full_name FROM qORDER BY path
CTE + window function
path full_name
{1} Canidae
{1,1} Canis
{1,1,1} Canis adustus
{1,1,2} Canis aureus
{1,1,3} Canis familiaris
(...)
{1,1,7} Canis lupus
{1,1,7,1} Canis lupus crassodon
{1,1,7,2} Canis lupus dingo
{1,2} Cerdocyon
{1,2,1} Cerdocyon thous
With CTE and Windowing,SQL is Turing Complete.
SQL Antipatterns: Avoiding the Pitfalls of Database Programming Bill Karwin
PostgreSQL: Up and Running Regina Obe, Leo Hsu
High Performance SQL with PostgreSQL 8.4
https://github.com/unepwcmc/SAPI
Checklist of CITES Species
Biodiversity Information Standards (TDWG)
Items freed into the public domain Pearson Scott Foresman
PostgreSQL
Code & Demo
Graphics
Taxonomy