Associative Data Schemes for Cloud Computing

Post on 24-Feb-2016

39 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Associative Data Schemes for Cloud Computing. Amir Basirat PhD Candidate Amir.Basirat@monash.edu Supervisor: Dr Asad Khan. Clayton School of IT, Monash University STINT Workshop, Lulea, Sweden - May 2012. Contents. 1. Cloud Computing. 2. Hadoop MapReduce. 3. - PowerPoint PPT Presentation

Transcript

1

Associative Data Schemes for Cloud Computing

Amir BasiratPhD Candidate

Amir.Basirat@monash.edu

Supervisor: Dr Asad Khan

Clayton School of IT, Monash UniversitySTINT Workshop, Lulea, Sweden - May 2012

2

Cloud Computing

Hadoop MapReduce

Research Objective

Graph Neuron for Scalable Pattern Recognition

HGN and DHGN

Contents

8 EdgeHGN

9 Simulation Showcase

Pattern Recognition and Distributed Approach

Web-based GN

1

4

3

5

6

2

7

3

What is Cloud Computing?The vision of Cloud Computing encompasses a general shift of computer processing, storage, and software delivery away from the desktop and local servers, across the network, and into next generation of data centers hosted by large infrastructure companies.

4

Big Data!

An IDC estimate put the size of the “digital universe” at 0.18 zetta-bytes back in 2006, and forecasted a tenfold growth by 2011 to 1.8 zetta-bytes.

This flood of data is coming from many sources. Consider the following:• The New York Stock Exchange generates about one terabyte of new trade

data per day.

• Facebook hosts approximately 10 billion photos, taking up one petabyte of storage.

• Ancestry.com, the genealogy site, stores around 2.5 petabytes of data.

• The Internet Archive stores around 2 petabytes of data, and is growing at a rate of 20 terabytes per month.

• The Large Hadron Collider near Geneva, Switzerland, will produce about 15 petabytes of data per year.

5

Challenge?

Our existing capability to generate data seems to outstrip our capability to analyze it.

6

Data Management in Cloud

There are some underlying issues that need to be addressed properly by any data management scheme deployed for clouds (Abadi, 2009), including:• capability to parallelise data workload• security concerns as a result of storing data at an untrusted host• and data replication functionality.

Thus the question, how to effectively process immense data sets is becoming increasingly urgent.

7

Cloud Computing

Hadoop MapReduce

Research Objective

Graph Neuron for Scalable Pattern Recognition

HGN and DHGN

Contents

8 EdgeHGN

9 Simulation Showcase

Pattern Recognition and Distributed Approach

Web-based GN

1

4

3

5

6

2

7

8

Hadoop

In a nutshell, what Hadoop provides: “A reliable shared storage and analysis system. The storage is provided by HDFS and analysis by MapReduce”

(Hadoop, 2011)

9

10

MapReduce

(Hadoop, 2011)

MapReduce programming model requires expressing the solutions with two functions: Map and Reduce. • A map function takes a key/value pair, computes and emits a set of

intermediate key/value pairs as output. • A reduce function merges all intermediate values associated with the same

intermediate key, executes some computation on them, and emits the final output.

11

Word Count in MapReduce

1: class MAPPER2: method MAP (docid a, doc d)3: for all term t in doc d do4: EMIT(term t, count 1)

1: class REDUCER2: method REDUCE(term t, counts [c1,c2,…])3: sum = 04: for all count c in counts [c1,c2,…] do5: sum = sum + c6: EMIT(term t, count sum)

Pseudo code for word count algorithm in MapReduce

12

Challenges and Hurdles in MapReduce

• Map function conducts its operation assuming all related data is distributed vertically, i.e. records being uniformly distributed across the network. However, it is possible that some parts of the related records being stored at different physical locations.

• Intermediate records would need to be sorted before these are input to the reduce function.

• Solution must be expressed in terms of the Map and Reduce functions working on key/value pairs, while in some cases this may not be possible or natural, such as multi-stage processes.

• Moreover, dependency on HDFS for data storage and retrieval can create single-points of failure for Map/Reduce infrastructure, especially at master nodes.

13

Cloud Computing

Hadoop MapReduce

Distributed Hierarchical Graph Neuron (DHGN)

Graph Neuron (GN)

Hierarchical Graph Neuron (HGN)

Contents

8 Simulation Showcase

9 Question Time

Distributed Pattern Recognition

Edge Detecting Hierarchical Graph Neuron (EdgeHGN)

1

4

3

5

6

2

7

Existing data management schemes do not work well when data is partitioned among numerous available nodes dynamically.

Approaches towards scalable data management in cloud, which offer greater portability, manageability and compatibility of applications and data, are yet to be fully realised.

14

Solution?

Treat data records as patterns

As a result, data storage and retrieval is performed using a distributed pattern recognition approach that is implemented through the integration of loosely-coupled computational networks, followed by a divide-and-distribute approach that allows distribution of these networks within the cloud dynamically.

To develop a distributed data access scheme that enables data storage and retrieval by association

15

Associative Model of Data

This associative model treats data records as pattern and hence it does not matter how data is represented.

The associative model uses a single, common structure for all types of data

16

Cloud Computing

Hadoop MapReduce

Research Objective

Graph Neuron for Scalable Pattern Recognition

HGN and DHGN

Contents

8 EdgeHGN

9 Simulation Showcase

Pattern Recognition and Distributed Approach

Web-based GN

1

4

3

5

6

2

7

17

Distributed Pattern Recognition

Distributed computing approach offers seemingly unlimited scalability towards pattern growth with the rapid advent of network computing technology that enables processing to be performed within the body of a network rather than concentrating on exhaustive single-CPU utilization

Existing approaches are still lagged behind, due to highly-complex recognition algorithms being implemented.

Neural network approach offers promising tool for large-scale pattern recognition. However, there are also several issues related to its implementation. These include:

• convergence problems, • complex iterative learning procedures, • and low scalability with regards to the training data required for optimum

recognition

18

Cloud Computing

Hadoop MapReduce

Research Objective

Graph Neuron for Scalable Pattern Recognition

HGN and DHGN

Contents

8 EdgeHGN

9 Simulation Showcase

Pattern Recognition and Distributed Approach

Web-based GN

1

4

3

5

6

2

7

19

An eight node GN is in the process of storing patterns (Khan, 2002). P1 (RED), P2 (BLUE), P3 (BLACK), and P4 (GREEN)

20

Cloud Computing

Hadoop MapReduce

Research Objective

Graph Neuron for Scalable Pattern Recognition

HGN and DHGN

Contents

8 EdgeHGN

9 Simulation Showcase

Pattern Recognition and Distributed Approach

Web-based GN

1

4

3

5

6

2

7

21

Hierarchical Graph Neuron (HGN)

HGN compositions of 2-dimension (7x5) and 3-dimension (7x5x3) for pattern sizes

22

Distributed Hierarchical Graph Neuron (DHGN)

DHGN distributed pattern recognition architecture (Muhammad Amin and Khan, 2009).

23

Cloud Computing

Hadoop MapReduce

Research Objective

Graph Neuron for Scalable Pattern Recognition

HGN and DHGN

Contents

8 EdgeHGN

9 Simulation Showcase

Pattern Recognition and Distributed Approach

Web-based GN

1

4

3

5

6

2

7

24

Research Objectives

• Redesigning data management architecture from a scalable associative computing perspective for creating a database-like functionality that can scale up or down over the available infrastructure without interruption or degradation, dynamically.

• Investigating a distributed data access scheme that enables data storage and retrieval by association while data records are treated as patterns

• Processing the database and handling the dynamic load using a distributed pattern recognition approach

• Developing an intelligent MapReduce framework that allows complex data representations to be used as keys for Map operations

• Reducing cloud storage fragmentation by implementing a divide-and-distribute approach

• Enhancing the existing cloud data management models for scalability

• Validation of results and finding asymptotical limits of the technique through a rigorously designed computer simulation environment

25

Cloud Computing

Hadoop MapReduce

Research Objective

Graph Neuron for Scalable Pattern Recognition

HGN and DHGN

Contents

8 EdgeHGN

9 Simulation Showcase

Pattern Recognition and Distributed Approach

Web-based GN

1

4

3

5

6

2

7

26

Progress to Date

• Proposing a Web-based GN for Real-time Image Recognition

27

Web-based GN

(a) Total number of positive and negative matches. (b) Distortion rates for each line of image (each constructed HGN).

Image distortion rates vs. rotation degrees.

28

Cloud Computing

Hadoop MapReduce

Research Objective

Graph Neuron for Scalable Pattern Recognition

HGN and DHGN

Contents

8 EdgeHGN

9 Simulation Showcase

Pattern Recognition and Distributed Approach

Web-based GN

1

4

3

5

6

2

7

29

Edge Detecting Hierarchical Graph Neuron (EdgeHGN)

7-by-7 bit Binary Character A and its 7 equally-sized DHGN subnets

Reducing number of neurons by applying a drop-fall technique

30

Drop Fall Scheme

• Drop-fall is often used for dividing touching pairs of digits into isolated character. Drop-fall algorithm simulates the path produced by a drop of water falling from above the character and sliding downwards along the contour under the action of gravity.

• When the drop gets stuck in a groove, it melts the character‘s stroke and then continues to fall. The dividing path produced by Drop-fall algorithm depends on three aspects: a start point, movement rules, and direction.

• There are four possible directions that generally produce four different paths to divide touching digits. They can start on the left or right side and can evolve downwards or upwards. One of the four is likely to produce the right result.

• Therefore, a set of Drop-fall algorithms consists of four methods which try to segment a block by simulating a drop-falling process: Descending-left algorithm, Descending-right algorithm, Ascending-left algorithm, and Ascending-right algorithm

31

EdgeHGN Performance

32

Cloud Computing

Hadoop MapReduce

Research Objective

Graph Neuron for Scalable Pattern Recognition

HGN and DHGN

Contents

8 EdgeHGN

9 Simulation Showcase

Pattern Recognition and Distributed Approach

Web-based GN

1

4

3

5

6

2

7

33

Disclaimer

I am not proposing any computer vision scheme for Image processing here.

I am not suggesting in any way that my scheme is capable of competing against a bunch of image processing and face recognition algorithms which are treated in the literature.

I am doing pattern matching and I could simply use any form of data representation for the purpose of my research.

Images are complex matrixes of values, but people can relate to images very well, and that is why I found it an easy way to illustrate the effectiveness and strength of my proposed model.

34

Binary Image Recognition

Fifty different individuals in the face image dataset obtained from the Face Recognition Data.

35

Sobel Operator

Edge map after applying Global Binary Signature and Sobel‘s edge detection

In simple terms, the Sobel operator calculates the gradient of the image intensity at each point, giving the direction of the largest possible increase from light to dark and the rate of change in that direction.

The result therefore shows how "abruptly" or "smoothly" the image changes at that point, and therefore how likely it is that that part of the image represents an edge, as well as how that edge is likely to be oriented.

36

References

Abadi, D.J. (2009). Data Management in the Cloud: Limitations and Opportunities, Bulletin of the Technical Committee on Data Engineering, pp. 3 - 12.

Khan, A. I. and Muhamad Amin, A. (2007). One shot associative memory method for distorted pattern recognition, Al 2007: Advances in Artificial Intelligence, Springer, Berlin/Heidelberg, pp. 705—709.

Muhamad Amin, A. and Khan, A. I. (2009). Collaborative-comparison learning for complex event detection using distributed hierarchical graph neuron (DHGN) approach in wireless sensor network, Al 2009: Advances in Artificial Intelligence, Springer, Berlin/Heidelberg, pp. 111—120

Nasution, B. B. and Khan, A. I. (2008). A hierarchical graph neuron scheme for real-time pattern recognition, IEEE Transactions on Neural Networks 19(2): 212—229.

Shiers, J. (2009). Grid today, clouds on the horizon, Computer Physics Communications, pp. 559 - 563.

Welsh, M., Malan, D., Duncan, B., Fulford-Jones, T. and Moulton, S. (2004). Wireless sensor networks for emergency medical care, GE global conference, Harvard university and Boston University school of medicine, Boston, MA.

37

Acknowledgement

Thank You.

I would like here to thank everyone who helped me to make this possible. The first and foremost person that deserves immense gratitude is my thesis supervisor, Dr Asad Khan for his support and kind contributions.

38

top related