Enabling Near-Data Accelerators in Datacenterspxt176/publications/Enabling Near-Data...Enabling Near-Data Accelerators in Datacenters Dave Ojika, ... DRAM 0.74% 5.74% ... Intel Corporation

2015 Intel Big Data Software Summithttp://goto/bigdatasoftware

Enabling Near-Data Accelerators in DatacentersDave Ojika, Jayson Strayer, Gaurav Kaul, Prashanth Thinakaran, Darin Acosta

• Motivation

• Bring unconventional compute cores (especially

FPGAs) into mainstream big data use

• Abstract software complexity by introducing

efficient accelerator programming model

• Enable a data-oriented framework for near-

memory, distributed processing

• Approach

• Accelerate computation on FPGA; transfer data

over low-latency DDR bus

• Provide in-memory storage using open-source

Tachyon framework

• Offload Spark workload to accelerator

• Method

• Use Compute-Near Memory (CNM) architecture

for design-space exploration

• Map data to cores with affinity to specific memory

regions

• Integrate a Java-OpenCL middleware to support

scheduling of tasks on accelerator

Highlights Accelerator Overview

• Memory-speed data access

• Memory-centric buffer

synchronizes with

underlying file system

Write Method

Read Method

Data

Register 4 TB

Image

Cmd

Register

Interface

Connect Object

Copy Method

W

W

R

R

R

W

write_bit

read_bit

copy_bit

R/W

Workload Analysis

In-Memory Framework

Data and Compute Layer Current Developments

• Boosted Decision Tree (BDT)• Latency-sensitive

• Poor data locality

• Fits in 4TB memory

• 7-fold cross-validation

Hit 1GB 10GB

L1 94.06% 89.75%

DRAM 0.74% 5.74%quad-core i7 CPU, 8 GB RAM

• Fraction of store-bound stalls increases with size of

dataset; memory bandwidth requirement too high for CPU

Workload can be trivially parallelized across DIMMs

• 1st Place ATLAS ‘14 Higgs ML Challenge:• Deep Learning from Oxdata’s H20

• Where do FPGA accelerators stand?• Explore BDT on CNM accelerator

High-energy physics experiment at CERN’s LHC

(collaboration with UF Physics)

• Simics Simulator • Functional model

• Software stack

• Apps & workload

exploration

Task

Task

Host Middleware

Driver FPGA

Queue

Scheduler

Tachyon

File System (Local or HDFS)

• In-memory data exchange

• Reliable file sharing at

memory-speed

• Caching of working set files

in memory

• Fault-tolerant and distributed

API

Tachyon utilizes memory aggressively, leveraging data lineage

• OpenCL driver integration

• Container enablement

• Cloud orchestration

• NVM support and NFV

Compute Near Memory (CNM)

Big Data Framework

Application

API

API

Prototype with PCI, DDR and

Direct I/O interfaces

JOC

JOC: Java-to-OpenCL Component

No-Higgs or Higgs

• BDT on CPU (2nd Place

ATLAS ‘14 Higgs ML Challenge)

Application to architecture transformation

• Utilize parallelism on FPGA

• Leverage low-latency DDR

and 100 GB optical links

100 GB

Transceiver

IP

FIFO Decoder(data filter)

Data

Reassembly

Level 2: FPGA as high-performance accelerator

Level 1: FPGA receives and pre-processes data in real-time

QFSP• Direct I/O

• Real-time

• Low latency

• Low power

• Compute Engine

• Up to 3 TFLOPS

• OpenCL kernel

• BW of host memory

Altera Arria 10 FPGA

Generic implementation

To

Datastore

DRAM

DRAMDRAM

Pre-processed dataset

Datastore

Synchronize

• In-memory data store• Memory-centric distributed storage

• Reliable data sharing at memory speed

Development kit

Cloud Orchestration

Training time for 11 million events: 5 hours!

Xeon E5-2680 @ 2.8 GHZ

BDT on MATLAB

• Prediction time: 370 ms

• Okay for online, real-

time prediction

• Training time: 5 hours

• Grew with increasing

data size

• Data affinity • Cores cooperate with each

other for shared data

accesses

• Shared Virtual Memory (SVM)

Accelerator model

CPU and device both

access shared data using

the same virtual

addresses

No explicit data

marshaling

Dave Ojika: Cloud Infrastructure Jayson Strayer: Platform Silicon

Gaurav Kaul: Health and Life SciencesPrashanth Thinakaran: Big Data

Darin Acosta: Physics Professor, UF

Enabling Near-Data Accelerators in Datacenterspxt176/publications/Enabling Near-Data...Enabling Near-Data Accelerators in Datacenters Dave Ojika, ... DRAM 0.74% 5.74% ... Intel Corporation

Documents