Top Banner
Data Analysis and Simulation Modeling BY – VARUN SHARMA
16

Data Analysis and Simulation Modeling

Jan 19, 2017

Download

Documents

Varun Sharma
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data Analysis and Simulation Modeling

Data Analysis and Simulation Modeling

BY – VARUN SHARMA

Page 2: Data Analysis and Simulation Modeling

Briefing

The first half of this report will deal with simulation modeling, i.e. – To generate data via computer simulation when you don’t have any.

In the second half, I will be talking about Data Analysis and making predictions based on the learning examples.

Page 3: Data Analysis and Simulation Modeling

Some important Terms…

Data Analysis is a process of inspecting, cleaning, transforming, and modeling data with the goal of discovering useful information, suggesting conclusions, and supporting decision-making.

Simulation modeling is the process of creating and analyzing a digital prototype of a physical model to predict its performance in the real world.

Page 4: Data Analysis and Simulation Modeling

Monte Carlo Simulations

Monte Carlo methods (or Monte Carlo experiments) are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results.

Page 5: Data Analysis and Simulation Modeling

First Model

You are given 6 balls in a rag, three are white and other three are black. You pick three balls with eyes closed, find the probability that all three are of the same color.

def run(): c = [1,1,1,2,2,2] a = [] for i in range(3): a.append(random.choice(c)) c.remove(a[i]) if (sum(a) == 3) or (sum(a) == 6): return True else: return False

Page 6: Data Analysis and Simulation Modeling

Observation:

Running this simulation 500k times, we get the –Output: 0.099574Which is very close to the real value as per the formulas of probability theory, i.e. – 0.01

Modification of the model:

Everything is same but this time, you are given 8 balls in total, 4 of each color.

def run(): c = [1,1,1,1,2,2,2,2] #Declared and Initialized every time the function

is called (In each iteration) a = [] for i in range(3): a.append(random.choice(c)) c.remove(a[i]) #This removes the first instance of a[i] in the list

to simulate no replacing if (sum(a) == 3) or (sum(a) == 6): return True else: return False

Page 7: Data Analysis and Simulation Modeling

Observation:

Running this simulation the same 500k times, we get -Output: 0.143306Which is very close to the real value of 0.14

Page 8: Data Analysis and Simulation Modeling

HIV Virus Simulation

No Drugs Drugs with Change

Page 9: Data Analysis and Simulation Modeling

Observation

In case of No Drugs, the virus propagates without any barrier and grows exponentially.

However, in case of Simulation with Drugs :- Initially, the viruses grow slowly. Picking up resistances on the way. As we

change the drug given to the patient, the population of viruses’ drops significantly.

In the meantime, the average population of resistant to the given drugs starts to rise. After a few lifecycles, the average population of viruses is equal to the average resistant population.

Which means that only those viruses survived who developed a resistance and every virus became resistant in the end.

Page 10: Data Analysis and Simulation Modeling

Machine Learning

Machine learning is a subfield of computer science that evolved from the study of pattern recognition and computational learning theory in artificial intelligence.

In this report, I will be dealing with Regression Analysis using Supervised Machine Learning.

Page 11: Data Analysis and Simulation Modeling

Regression Analysis

Dataset Model

Page 12: Data Analysis and Simulation Modeling

Observation

Theta found by gradient descent: -3.630291, 1.166362 For the city with a population of 35,000, we predict a profit of

4519.767868 For the city with a population of 70,000, we predict a profit of

45342.450129

Page 13: Data Analysis and Simulation Modeling

Multivariate Gradient Descent

Estimating Cost of House: Dataset: [Area (Sq. Feet), Bedrooms] [Price] Normalizing the Features... Running gradient descent for Normalized Dataset... Theta computed from gradient descent: 334302.063993 100087.116006 3673.548451 The prediction for a 3 bedroom house with area of 1650 sq. Feet: $289314.620338

Page 14: Data Analysis and Simulation Modeling

Machine Learning for Indian Railways

With advanced computers and storage techniques available, Indian Railways hold the capability to generate and store data like never before.

The problem arises when this data becomes so enormous that it cannot be analyzed by conventional methods.

But the possibilities remain enormous. CRIS is currently working on models to predict Train Arrival Delays, Possible component breakdowns, and many more.

Page 15: Data Analysis and Simulation Modeling

Future Aspects

Whenever a train comes late, it causes inconvenience to the passengers, delays the schedules and puts a question on the reliability of Indian Railway’s services.

It has been seen that there is always a pattern to every event. The same is the case with Train arrival times. When we analyze weather, seasons, date and time, we see a pattern on how all these constraints affect arrival times.

More than that, we get to know the ‘Hotspots’ of delays in train arrivals. By all this data, we are able to predict the chances of any train getting late (and by how much time) at any particular time when we feed in all these constraints to the system.

This helps us plan ahead in time and be able to provide a better service.

Page 16: Data Analysis and Simulation Modeling

Thank You!