Top Banner
Tao Zhong Kshitij A. Doshi Xi Tang Ting Lou Zhongyan Lu Hong Li Presented by: Raminder Kaur Wayne State University
23

A big-data architecture for real-time analytics

Jun 29, 2015

Download

Education

On mixing high-speed updates and in-memory queries: A big-data architecture for real-time analytics
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 2: A big-data architecture for real-time analytics

Introduction Motivation and Background Architecture Framework Result Future work Conclusion Index term References

Wayne State University

Page 3: A big-data architecture for real-time analytics

This paper describes: a few key additional requirements that result from having to

support in-memory processing of data while updates proceed concurrently.

RAF Two RAF based solutions (discussed further)

Wayne State University

Page 4: A big-data architecture for real-time analytics

A few examples of information in motion that may just be seconds old, and

not yet well categorized or linked to other data:

- GPS-based navigation : to reduce wasted energy, accidents, delays and emergencies.

- A credit card company : to detect and intercept suspicious transactions

- A metropolitan or regional power grid : to modulate power generation, perform load-balancing, direct repair actions, and take policy enforcement steps

An essential feature in the above examples is the need to integrate new transactions into analysis results within a very short time—sometimes as short as a few tens of milliseconds.

Wayne State University

Page 5: A big-data architecture for real-time analytics

RDD makes in-memory solutions less failure prone. So RAF enhances RDD

approach so that resiliency is blended with a few additional characteristics as

listed below:

• Efficient allocation and control of memory resources

• Resilient update of information at much finer resolution

• Flexible and highly efficient concurrency control

• Replication and partitioning of data transparent to clients

Architecturally RAF elevates memory across an entire cluster to a first class

storage entity and defines high level mechanisms by which applications on RAF

can orchestrate distributed actions upon objects stored in cluster memory.

To promote responsible and transparent use of memory, RAF opts to use a programming language such as C, C++, over mixed language environments in which garbage allocation is opaque.

Wayne State University

Page 6: A big-data architecture for real-time analytics

Data has a lot of value when mined. As data continues to compound at brisk

rates, institutions need to grapple with two broad demands – accumulating, processing, synopsizing and utilizing information in a timely manner storing the refined data resiliently keeping the data accessible at high speed.

The term Big Data itself is elastic and serves well as a description of the scale

or volume of these solutions, but does not define a constraining principle for

organizing storage .

Wayne State University

Page 7: A big-data architecture for real-time analytics

Requirements for low-latency and high throughput analytics ondatasets:

In-memory structures and storage Resiliency Sharing data through memory Uniform interaction with storage Minimizing memory recycling Efficient integration of CRUD Synchronizing efficiently Searching Efficiently

Wayne State University

Page 8: A big-data architecture for real-time analytics
Page 9: A big-data architecture for real-time analytics

Translation of eight requirements into five design elements: C and C++ based programming for efficient sharing of data

through memory Resilient storing of new content Efficient concurrency Processing information in motion Fast, general, ad-hoc searches

Wayne State University

Page 10: A big-data architecture for real-time analytics

This framework targets the execution of complex queries at very low latency.

Information upon which queries operate may be available on some storage medium, or generated dynamically as a result of ongoing transactional activities.

RAF provides distributed computing environment which is integrated with memory-centric, distributed storage system where one application can pass the data to another in order to share data in memory

Wayne State University

Page 11: A big-data architecture for real-time analytics

RDD: used to store information in memory of one or more machines to assure that in case of failure of one or more machines, the RDD can be reconstructed.

Transformations: operation on RDD to generate new data sets. RAF transformations are join, map, union, etc.

Filter: a particular type of transformation. Produces a dataset whose contents satisfy a specified condition.

Delegate: It is a bridged module. Purpose of delegate is to create a version of datastore at a particular time and present it as memory resident RDD.

Wayne State University

Page 12: A big-data architecture for real-time analytics
Page 13: A big-data architecture for real-time analytics

Efficient storage sharing using DELEGATE Memory-centric storage operation-Reliability Data and storage types-Structured data-Storage types (Replicated store and Partitioned store) Distributed Execution of Analytics tasks-Analytics tasks interface

Wayne State University

Page 14: A big-data architecture for real-time analytics
Page 15: A big-data architecture for real-time analytics
Page 16: A big-data architecture for real-time analytics

Unit Testing:-Scalability testing results (how well update operations scale)-Latency relative to Hive/HDFS (how long does it take to complete a query)NOTE: These unit test results show advantage of in-memory distributed processing

oriented design of RAF.

Solution-level implementation and testing-Telecommunications subscriber Management-Safe City Solution

Wayne State University

Page 17: A big-data architecture for real-time analytics

Wayne State University

Page 18: A big-data architecture for real-time analytics

Wayne State University

Page 19: A big-data architecture for real-time analytics

Motivated by the high degree of familiarity that many developers have with database interfaces, we are incrementally introducing SQL-92/JDBC/ODBC like interfaces on top of RAF. A number of optimizations are also being added.

These optimizations include: application requested indexing, to accelerate searches blending in column-store capabilities where appropriate (for example, for

rarely-written data) compression, in order to reduce data transported between nodes.

Wayne State University

Page 20: A big-data architecture for real-time analytics

Discussed RAF, an architectural approach that meshes memory-centric non-relational query processing for low latency analytics with memory-centric update processing to accommodate high volumes of updates.

Delegate, which participates as a special type of content transformer in a hierarchy of RDD transformations.

In RAF, protocol buffers are used to obtain data abstraction and efficient conveyance among applications, providing applications with a high degree of independence in location, representation, and transmission of data.

A light-weight but expressive interface for RAF Using unit tests we show high cluster scaling capability for transactions, an

order of magnitude latency improvement for query processing. Discussed two real-world usage scenarios in which RAF is being used.

Wayne State University

Page 21: A big-data architecture for real-time analytics

RDD: Resilient distributed dataset RAF: Real-time Analytics Foundation CRUD : Create/Retrieve/Update/Delete HDFS: Hadoop Distributed File System

Wayne State University

Page 22: A big-data architecture for real-time analytics

Apache Hadoop: http://hadoop.apache.org/ Apache HBase: http://hbase.apache.org/ Memcached: http://www.memcached.org/ Oracle Coherence: http://www.oracle.com/technetwork/ middle ware/ coherence/ H. Plattner, A. Zeier, In-Memory Data Management. Protobuf: http://code.google.com/p/protobuf/ Redis: http://www.redis.io/ SQLStream: http://www.sqlstream.com/ Vertica: http://www.vertica.com/ VoltDB: http://www.voltdb.com

Wayne State University

Page 23: A big-data architecture for real-time analytics

Thanks !!!