Analysis of economic data using big data

Post on 14-Apr-2017

83 Views

Category:

Engineering

3 Downloads

Preview:

Click to see full reader

Transcript

Analysis of Economic Data Using Bigdata

Presented BySHIVUMANJESH P

[4JC13MCA51] VI SEM MCA

SJCE

Internal Guide C J HARSHITHA Assistant Professor

Dept. Of MCA SJCE

External Guide Imran basha Senior Consultant

Snipe IT Solutions

JSS MAHAVIDYAPEETHA SRI JAYACHAMARAJENDRA COLLEGE OF ENGINEERING MYSURU-570006

AN AUTONOMOUS INSTITUTE AFFILIATED TOVISVESVARAYA TECHNOLOGICAL UNIVERSITY, BELGAVI.

Presentation on

Problem Definition1. Inflation is rising as a serious threat for countries

development.

2. Unscientific farming

3. Big-picture problem, economic indicators and decision makers rely on the native economic transactions and on the data records.

Objective

To examine economic data and recording the increasing and decreasing vegetables and food items prices year to year.

Preferring the fresh and edible food products and to overcome various problems of deficiency and malnutrition.

To maintain the continuous connectivity between Demand-Supply Chain

Scope of The Project• The Economic data analysis make an immense impact on E-

commerce and also builds a potential to the business activities and also in the investments

• The analysis is limited to the particular products and can be future extended based on the requirements and developments.

• The big data analysis can be presented using the android application by providing simple and smart user interfaces about products they use in the daily life

• It requires high end specification of the system on which it is implementing, dealing with large data set with diversified features and functionalities.

User characteristics

• The system will provide a very precise and simple platform to the respective users.

• The admin will provide the access to the developer as well as to the user and provides data sets.

• The developer collects the data sets clusters the data based on the particular criteria and analyze the behavior of the data elements.

• The user gets the desired result by firing a query.

General constraints• The big data usage is efficient for large data sets and it is

not suitable for data with less volume.

• Since the main objective is based on data analysis user interface section is given least priority.

• Sometimes it may find tedious to deal with complete unstructured data items.

• The data which is obtained from the various source may not be of same parameters

Functional Requirements Storage • Hadoop Distributed File System is designed for storing very large

files with streaming data access patterns, running on clusters of commodity hardware.

• The economic data is a highly diversified data set which is both large and variety in nature.

• A dataset is typically generated or copied from source, and then various analyses are performed on that dataset over time.

• Applications that require low-latency access to data, in the tens of milliseconds range, will not work well with HDFS.

Computation• MapReduce is a processing technique that allows for

massive scalability across hundreds or thousands of servers in a Hadoop cluster.

• The MapReduce algorithm contains two important tasks, namely Map and Reduce.

• This algorithm in economic data analysis helps in finding the demand for the particular goods based on certain key words.

• The shuffle and sort process is dependent mainly on volume of the data sets.

Performance Requirements• The major aim for choosing the domain of big data for

economic analysis is for the velocity criteria of data processing.

• Connecting of the commodity systems and forming the node between them helps in quick retrieval of the data items.

• There is a vast development of flexibility in distributed system environment.

• Hardware Requirements

Processor : Core i3 onwards RAM : 4GB + Hard disk space : 40GB +

• Software Requirements

Technology : Hadoop Tools : Apache Hive Apache Pig

Apache SqoopApache oozie

R Studio Operating System : Linux

System Architecture

Class Diagram

Algorithm Design

Dataflow Diagram

Level 1 DFD

Activity Diagram

Use case Diagram – Admin & User

Use case Diagram-Developer

Sequence Diagram

Requirements

Unstructured Datasets

Structured Datasets

System Implementation

R Environment

Experimental Results

Test Cases

Testcase no Testcase Discription

Required input

Expected output

Actual output

Test pass/fail

#TC 01 Verification of the nodes

Command to start Hadoop nodes (Start-

all.sh)

All nodes should start

All nodes are present

P    

#TC 02 Verification of Hive Installation

Command Hive version

It should return Installed Hive

Hive Version is returned

P

#TC 03 Verification of Pig Installation

Command to start Pig (/opt/pig)

It should return grunt shell

grunt shell is returned

P

#TC 04 Verification of Sqoop

Installation

Command Sqoop version

It should return Installed Sqoop

Sqoop Version is returned

P

#TC 05 Verification of Data Imported to

HDFS from RDBMS

Entering to Hadoop file system from

local file system

Imported data should be

present in HDFS

Imported data is present in HDFS

P

#TC 06 Validating user Query

Entering Query Valid query should be entered

Valid query is entered

P

#TC 07 Testing the processed data

Post Query Processed Data should be correct

Processed Data should be valid

P

#TC 08 Importing the processed data to

R

Import Dataset Processed data should be imported

Processed Data is imported

P

#TC 09 Mapping of processed dataset

Barplot() Processed dataset should

be mapped correctly

Processed data is mapped correctly

P

#TC 10 Mapping in Pie chart

Pie() Processed data should be mapped in

percent

Processed data is not mapped with percent

F

 #TC 11

  

 Retrieving the

result Less than 5 seconds

 Dump()

Result should be displayed

within 5 seconds

Results is displaying more than 5 seconds

  

 #TC 12

  

 Plotting the

values obtained in R

  Plot()

 All the values

should be obtained

 Some values are missing

   

Conclusion• The statistical analysis is carried out for fruits and

vegetables from the 1970-2013

• The major Requirements is based on the context of inflation problem

• The analysis is done mainly on product based and Year based

• This analysis serves as a vital input for machine learning mechanism

Future Enhancements• The analysis can be extended further on the food grains

• The enterprise application can be build by embedding a search engine which will be helpful for end user

• The data sets can be tuned which may leads in deriving of some other requirements of different paradigm

• The graphical representation can be changed further by displaying of accurate value rather than range

Company Details

Company Name : Snipe IT Solutions

Address : # 123, 3rd floor, 70th Cross, 5th Block, Rajajinagar Nagar, Bengaluru.

External guide : Imran basha Senior Consultant

Snipe IT Solutions

Email : mkimranbasha@gmail.com Ph no : 9590071811

 

Thank You

top related