Top Banner
Big Data: Big Challenges for Computer Science Henri Bal Vrije Universiteit Amsterdam
12

Big Data: Big Challenges for Computer Science Henri Bal Vrije Universiteit Amsterdam.

Apr 02, 2015

Download

Documents

Megan Haggett
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Big Data: Big Challenges for Computer Science Henri Bal Vrije Universiteit Amsterdam.

Big Data:Big Challenges for Computer Science

Henri BalVrije Universiteit Amsterdam

Page 2: Big Data: Big Challenges for Computer Science Henri Bal Vrije Universiteit Amsterdam.

Multiple types of data explosions

High-volume data

10-100 x global internet traffic per year (by 2018)

Complex data

Page 3: Big Data: Big Challenges for Computer Science Henri Bal Vrije Universiteit Amsterdam.

Graphics Processing Units (GPUs)

Page 4: Big Data: Big Challenges for Computer Science Henri Bal Vrije Universiteit Amsterdam.

Differences CPUs and GPUs● CPU: minimize latency of 1 activity (thread)

● Must be good at everything● Big on-chip caches● Sophisticated control logic

● GPU: maximize throughput of all threads usinglarge-scale parallelism

ControlALU ALU

ALU ALU

Cache

Page 5: Big Data: Big Challenges for Computer Science Henri Bal Vrije Universiteit Amsterdam.

Example: NVIDIA Maxwell● 16 independent

streaming multiprocessors

● 2048 compute cores

Page 6: Big Data: Big Challenges for Computer Science Henri Bal Vrije Universiteit Amsterdam.

Ongoing GPU work at VU● Applications

● Multimedia data● Digital forensics data● Climate modelling● Radio astronomy data

● Methodologies● Hadoop on accelerators● Programming methods

for accelerators

● Teaching GPUs (with UvA)● National ICT research infrastructure

COMMIT/

Page 7: Big Data: Big Challenges for Computer Science Henri Bal Vrije Universiteit Amsterdam.

Complex data● Still smaller in volume than astronomy etc.● Much more complicated, semantically rich

data● Growing fast ….

Page 8: Big Data: Big Challenges for Computer Science Henri Bal Vrije Universiteit Amsterdam.

Semantic web● Make the Web smarter by injecting meaning

so that machines can reason about it● initial idea by Tim Berners-Lee in 2001

● Now attracted the interest of big IT companies

Page 9: Big Data: Big Challenges for Computer Science Henri Bal Vrije Universiteit Amsterdam.

WebPIE: a Web-scale Parallel Inference Engine

● Web-scale parallel reasoner doing full materialization● Orders of magnitude faster than previous work by

using smart parallel algorithms● Jacopo Urbani + Frank van Harmelen (VU)

Christiaan Huygens nomination PhD thesis Urbani

Page 10: Big Data: Big Challenges for Computer Science Henri Bal Vrije Universiteit Amsterdam.

Reasoning on changing data

● WebPIE must recompute everything if data changes● Takes on the order of 1 day on a 64-node compute

cluster

● Challenge: real-time incremental reasoning, combining new (streaming) data & historic data● Nanopublications (http://nanopub.org)● Handling 2 million news articles per day (Piek

Vossen, VU)● Data streams from (health) sensors & smart phones

● Exploit massive parallel computing and GPUs

Page 11: Big Data: Big Challenges for Computer Science Henri Bal Vrije Universiteit Amsterdam.

Other work on complex data

● Use semantic web to describe and reason about computer infrastructure (Cees de Laat, UvA)

● Machine learning using GPUs (Hadoop)● Joint work with Max Welling (UvA)

● Business applications● With Frans Feldberg (VU, Economy)

Page 12: Big Data: Big Challenges for Computer Science Henri Bal Vrije Universiteit Amsterdam.

Discussion

● We can process peta-scale (1015 , LHC) simple datawith cluster and grid technology

● Exascale (1018 , SKA) may be feasible with GPUs, but requires new parallel programming methodologies

● Processing complex data is vastly more complicated, even at smaller scales

● Complex data is also escalating in size● Dynamic (streaming) data will be next● Processing exa-scale dynamic complex data?