Top Banner
Distributed and Streaming Evaluation of Batch Queries for Data-Intensive Computational Turbulence Kalin Kanov Department of Computer Science Johns Hopkins University
12

Distributed and Streaming Evaluation of Batch Queries for Data-Intensive Computational Turbulence Kalin Kanov Department of Computer Science Johns Hopkins.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Distributed and Streaming Evaluation of Batch Queries for Data-Intensive Computational Turbulence Kalin Kanov Department of Computer Science Johns Hopkins.

Distributed and Streaming Evaluation of Batch Queries for Data-Intensive

Computational TurbulenceKalin Kanov

Department of Computer Science Johns Hopkins University

Page 2: Distributed and Streaming Evaluation of Batch Queries for Data-Intensive Computational Turbulence Kalin Kanov Department of Computer Science Johns Hopkins.

Streaming Evaluation Method

• Linear data requirements of the computation allow for:– Incremental evaluation– Streaming over the data– Concurrent evaluation of batch queries

Page 3: Distributed and Streaming Evaluation of Batch Queries for Data-Intensive Computational Turbulence Kalin Kanov Department of Computer Science Johns Hopkins.

Motivation

• Heavy DB usage slows down the service by a factor of 10 to 20

• Query evaluation techniques adapted from simulation code do not access data coherently

• Substantial storage overhead incurred to localize each computation

• 95% of queries perform Lagrange Polynomial interpolation

Page 4: Distributed and Streaming Evaluation of Batch Queries for Data-Intensive Computational Turbulence Kalin Kanov Department of Computer Science Johns Hopkins.

Turbulence Database Cluster

Page 5: Distributed and Streaming Evaluation of Batch Queries for Data-Intensive Computational Turbulence Kalin Kanov Department of Computer Science Johns Hopkins.

MHD Database

• Stores velocity, magnetic field, magnetic vector potential and pressure fields– 10 attributes, 4 bytes each– 1024 time-steps over a 10243 grid– 40TB total size

• In order to reduce total amount of I/O:– Smaller atoms (43 voxel)– No replication

Page 6: Distributed and Streaming Evaluation of Batch Queries for Data-Intensive Computational Turbulence Kalin Kanov Department of Computer Science Johns Hopkins.

Lagrange Polynomial Interpolation

f (x',y ') lypN

2 j

j1

N

(y') lxnN

2i

i1

N

(x')f (xnN

2i,y

pN

2 j)

Lagrange coefficientsData

Page 7: Distributed and Streaming Evaluation of Batch Queries for Data-Intensive Computational Turbulence Kalin Kanov Department of Computer Science Johns Hopkins.

Processing a Batch Query

Page 8: Distributed and Streaming Evaluation of Batch Queries for Data-Intensive Computational Turbulence Kalin Kanov Department of Computer Science Johns Hopkins.

Additional Optimizations

• Process the computation of values that are stored together concurrently

• Iterate in the appropriate order• Compute the Lagrange coefficients with the

procedures described by Purser and Leslie*

*R. J. Purser and L. M. Leslie. An Efficient Interpolation Procedure for High-Order Three-Dimensional Semi-Lagrangian Models. Monthly Weather Review, 119:2492–+, 1991.

Page 9: Distributed and Streaming Evaluation of Batch Queries for Data-Intensive Computational Turbulence Kalin Kanov Department of Computer Science Johns Hopkins.

Experimental Evaluation

• Random workloads:– across the entire cube space – a 1283 subset of the entire space

• Workload derived from the usage log of the Turbulence Database cluster

• Compare with:– Direct methods of evaluation

Page 10: Distributed and Streaming Evaluation of Batch Queries for Data-Intensive Computational Turbulence Kalin Kanov Department of Computer Science Johns Hopkins.

Setup

• Experimental version of the MHD database– ~300 timesteps of the velocity fields of the MHD

DNS– Two 2.33 GHz dual quad-core Windows 2003

servers with SQL Server 2008 and 8GB of memory– Data tables striped across 7 disks

Page 11: Distributed and Streaming Evaluation of Batch Queries for Data-Intensive Computational Turbulence Kalin Kanov Department of Computer Science Johns Hopkins.
Page 12: Distributed and Streaming Evaluation of Batch Queries for Data-Intensive Computational Turbulence Kalin Kanov Department of Computer Science Johns Hopkins.

Questions/Comments