Top Banner
Spatial Indexing of large astronomical databases László Dobos, István Csabai, Márton Trencséni ELTE, Hungary
16

Spatial Indexing of large astronomical databases László Dobos, István Csabai, Márton Trencséni ELTE, Hungary.

Dec 30, 2015

Download

Documents

Job Foster
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Spatial Indexing of large astronomical databases László Dobos, István Csabai, Márton Trencséni ELTE, Hungary.

Spatial Indexing of large astronomical databases

László Dobos,István Csabai,

Márton TrencséniELTE, Hungary

Page 2: Spatial Indexing of large astronomical databases László Dobos, István Csabai, Márton Trencséni ELTE, Hungary.

Typical datasets

~200M data points

multidimensional parameter spacereal space, magnitudes, colors, redshift

etc.

over several hundred GBs

Page 3: Spatial Indexing of large astronomical databases László Dobos, István Csabai, Márton Trencséni ELTE, Hungary.

Typical tasks

Object types are classified by a set of linear inequlities in magnitude space

(n dimensional polihedra)Compute a histogram of the whole

parameter spaceFind similar objectsFind clustersCompare the distribution of two very large

datasets

Page 4: Spatial Indexing of large astronomical databases László Dobos, István Csabai, Márton Trencséni ELTE, Hungary.

Database servers

Ideal for storing a large amount of data even when data structure is not so complex

Optimized access of data over file systemsClever caching methodsMSSQL 2005 can be programmed

efficiently

Page 5: Spatial Indexing of large astronomical databases László Dobos, István Csabai, Márton Trencséni ELTE, Hungary.

Problems

In DB servers evaluating data points against inequalities is done on a per row basis

Traditionally points close in the magnitude space may be far on the disk(s)

when the expected resultset is small, it is not optimal to run a tablescan

Page 6: Spatial Indexing of large astronomical databases László Dobos, István Csabai, Márton Trencséni ELTE, Hungary.

Main idea

divide the parameter space into small cells two main methods:

hierarchical: kd-tree, modified kd-tree etcadaptive: Voronoi tessalation

section cells with search polihedra instead of check every point

check on a per point basis only when needed

Page 7: Spatial Indexing of large astronomical databases László Dobos, István Csabai, Márton Trencséni ELTE, Hungary.

Voronoi-Dealunay tessalation

Page 8: Spatial Indexing of large astronomical databases László Dobos, István Csabai, Márton Trencséni ELTE, Hungary.
Page 9: Spatial Indexing of large astronomical databases László Dobos, István Csabai, Márton Trencséni ELTE, Hungary.

kd-Trees

Page 10: Spatial Indexing of large astronomical databases László Dobos, István Csabai, Márton Trencséni ELTE, Hungary.
Page 11: Spatial Indexing of large astronomical databases László Dobos, István Csabai, Márton Trencséni ELTE, Hungary.
Page 12: Spatial Indexing of large astronomical databases László Dobos, István Csabai, Márton Trencséni ELTE, Hungary.

Steps for creating the tessalation

1. For adaptive (like Voronoi):choose starting points (randomly?)

2. Calculate the cells, store in the database

3. Lookup the cell of each data point

4. Create a DB index on the cell IDorders dataset on the disk

5. Run queries

Page 13: Spatial Indexing of large astronomical databases László Dobos, István Csabai, Márton Trencséni ELTE, Hungary.

Technical details

done in MSSQL using the new SQL CLR features

allows running programs within the process of the database server: very fast!

linear programming, Voronoi etc. libraries are ported to .net/c#

Page 14: Spatial Indexing of large astronomical databases László Dobos, István Csabai, Márton Trencséni ELTE, Hungary.

Preliminary results

BoxTree vs. Standard SQL query times

0

10000

20000

30000

40000

50000

60000

70000

80000

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

# returned rows / # total rows

qu

ery

tim

e [m

sec]

BoxTree durations

SQL durations

Page 15: Spatial Indexing of large astronomical databases László Dobos, István Csabai, Márton Trencséni ELTE, Hungary.

Scientific ideas

SDSS photometry – 5D – 300M pointsfinding all objects with similar colorssource classificationstar – quasar separationblue – red galaxy locus etc.

Karhunen-Loeve (PCA) coeffs of Bruzual-Charlot models – 5-15D – 100K-100M p

Quick match with observed spectra

Page 16: Spatial Indexing of large astronomical databases László Dobos, István Csabai, Márton Trencséni ELTE, Hungary.

Scientific ideas cont.

Magnitudes of spectral synthesis models – 5-10D – 100K-100M points

match with observationsphoto-zphysical props. from photometrycheck consitency of various models

(BC-GRASIL)Multiresolution visualization of large

number of points