Top Banner
Modeling Big Data • Execution speed limited by: – Model complexity – Software Efficiency – Spatial and temporal extent and resolution – Data size & access speed – Hardware performance
20

Modeling Big Data Execution speed limited by: –Model complexity –Software Efficiency –Spatial and temporal extent and resolution –Data size & access speed.

Dec 29, 2015

Download

Documents

MARK NASH
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Modeling Big Data Execution speed limited by: –Model complexity –Software Efficiency –Spatial and temporal extent and resolution –Data size & access speed.

Modeling Big Data

• Execution speed limited by:– Model complexity– Software Efficiency– Spatial and temporal extent and resolution– Data size & access speed– Hardware performance

Page 2: Modeling Big Data Execution speed limited by: –Model complexity –Software Efficiency –Spatial and temporal extent and resolution –Data size & access speed.

Data Access

ALU

Cache

Dynamic RAM

CPU

Static RAM

Hard Disk Network

Page 4: Modeling Big Data Execution speed limited by: –Model complexity –Software Efficiency –Spatial and temporal extent and resolution –Data size & access speed.

Vector or Array Computing

• Super computers of the 60’s, 70’s, and 80’s

• Harvard Architecture:– Separate program and data– 2^n Processors execute the same program

on different data

• Vector arithmetic

• Limited flexibility

Page 5: Modeling Big Data Execution speed limited by: –Model complexity –Software Efficiency –Spatial and temporal extent and resolution –Data size & access speed.

• Instructions and data share memory

• More flexible

• Allows for one task to be divided into many processes and executed individually

CPU

Von Neumann Architecture

ALU

Cache

RAM

I/O Instructions Data

Page 6: Modeling Big Data Execution speed limited by: –Model complexity –Software Efficiency –Spatial and temporal extent and resolution –Data size & access speed.

Applications and Services

Page 7: Modeling Big Data Execution speed limited by: –Model complexity –Software Efficiency –Spatial and temporal extent and resolution –Data size & access speed.

Multiprocessing

• Multiple processors (CPUs) working on the same task

• Processes– Applications: have a UI– Services: Run in background– Can be executed individually– See “Task Manager”

• Processes can have multiple “threads”

Page 9: Modeling Big Data Execution speed limited by: –Model complexity –Software Efficiency –Spatial and temporal extent and resolution –Data size & access speed.

Threads

• A process can have lots of threads– ArcGIS now can have 2, one for the GUI

and one for a geoprocessing task

• Obtain a portion of the CPU cycles

• Must “sleep” or can lockup

• Share access to memory, disk, I/O

Page 10: Modeling Big Data Execution speed limited by: –Model complexity –Software Efficiency –Spatial and temporal extent and resolution –Data size & access speed.

Distributed Processing

• Task must be broken up into processes that can be run independently or sequentially

• Typically:– Command line-driven– Scripts or compiled programs– R, Python, C++, Java, PHP, etc.

Page 11: Modeling Big Data Execution speed limited by: –Model complexity –Software Efficiency –Spatial and temporal extent and resolution –Data size & access speed.

Distributed Processing

• Grid – distributed computing

• Beowulf – lots of simple computer boards (motherboards)

• Condor – software to share free time on computers

• “The Cloud?” – web-based “services”. Should allow submittal of processes in the future.

Page 12: Modeling Big Data Execution speed limited by: –Model complexity –Software Efficiency –Spatial and temporal extent and resolution –Data size & access speed.

Trends

• Processors are not getting faster

• The internet is not getting faster

• RAM continues to decrease in price

• Hard discs continue to increase in size– Solid State Drives available

• Number of “Cores” continues to increase

Page 13: Modeling Big Data Execution speed limited by: –Model complexity –Software Efficiency –Spatial and temporal extent and resolution –Data size & access speed.

Future Computers?

• 128k cores, lots of “cache”

• Multi-terabyte RAM

• Terabyte SSD Drives

• 100s of terabyte hard discs?

• Allows for:– Large datasets in RAM (multi-terabyte)– Event larger datasets on “hard disks”– Lots of tasks to run simultaneously

Page 14: Modeling Big Data Execution speed limited by: –Model complexity –Software Efficiency –Spatial and temporal extent and resolution –Data size & access speed.

Reality Check

• Whether through local processing or distributed processing:– We will need to “parallelize” spatial analysis

in the future to manage:• Larger datasets• Larger modeling extends and finer resolution• Move complex models

• Desire:– Break-up processing into “chunks” that can

be each executed somewhat independently of each other

Page 15: Modeling Big Data Execution speed limited by: –Model complexity –Software Efficiency –Spatial and temporal extent and resolution –Data size & access speed.

Challenge

• Having all the software you need on the computer you are executing the task on– Virtual Application: Entire computer disk

image sent to another computer– All required software installed.

• Often easier to manage your own cluster– Programs installed “once”– Shared hard disc access– Communication between threads

Page 16: Modeling Big Data Execution speed limited by: –Model complexity –Software Efficiency –Spatial and temporal extent and resolution –Data size & access speed.

Software

• ArcGIS: installation, licensing, processing makes it almost impossible to use

• Quantum, GRASS: installation make it challenging

• FWTools, C++ applications,

• Use standard language libraries and functions to avoid compatibility problems

Page 17: Modeling Big Data Execution speed limited by: –Model complexity –Software Efficiency –Spatial and temporal extent and resolution –Data size & access speed.

Data Issues

• Break data along natural lines:– Different species– Different time slices

• Window spatial data– Oversized

• Vector data: size typically not an issue

• Raster data: size is an issue

Page 18: Modeling Big Data Execution speed limited by: –Model complexity –Software Efficiency –Spatial and temporal extent and resolution –Data size & access speed.

Windowing Spatial Data

• Raster arithmetic is natural – Each pixel result is only dependent on one

pixel in the source raster

1 2

2 3

12 9

13 10

13 11

15 13+ =

Page 19: Modeling Big Data Execution speed limited by: –Model complexity –Software Efficiency –Spatial and temporal extent and resolution –Data size & access speed.

Windowing Spatial Data

• N x N filters:– Needs to use oversized windows

12 20 23 34 40

15 23 30 31 39

15 22 29 30 40

14 20 28 29 38

13 19 25 32 37

Columns

Rows

Page 20: Modeling Big Data Execution speed limited by: –Model complexity –Software Efficiency –Spatial and temporal extent and resolution –Data size & access speed.

Windowing Spatial Data

• Others are problematic:– Viewsheds– Stream networks– Spatial simulations

ScienceDirect.com