Big Data Harisfazillah Jamel Startup and Developer 4th Meetup 5th November 2016
Big DataHarisfazillah Jamel
Startup and Developer 4th Meetup
5th November 2016
Why Big Data?Big Data is not only for big player
Big Data is also for Us. Startup and developers
Data is raw gold. Information about us is the end product.
Data define us. Web Server log, web page analytic and comments about or products.
What Is Big Data?Big data is a term for data sets that are so large or complex that traditional data processing applications are inadequate to deal with them. Challenges include analysis, capture, data curation, search, sharing, storage, transfer, visualization, querying, updating and information privacy. (Wikipedia)
Lets redefine big data for us.
What Is Big Data?
Volume . Variety . Velocity . Veracity
● Very big data● Multiple sources● Stream in data● Accuracy of the data
Redefine Big Data For Startup4 important terms :-
● Data Sets● Data Processing● Analytic● Visualization
Big Data is big. We need to focus
What Should We Call Our Big Data?● Small Data● Startup Data● No Data
We need to visualize our data since day 0
It’s a must
Why Big Data?Big data analytics examines large amounts of data to uncover hidden patterns, correlations and other insights. (SAS)
We need to know our own insight. Visualize our future.
Data SetsWe don’t have any data (No data) or lack of data - Hendak cari data kita cari data
Our own data or
We have a place to start. www.data.gov.my
Data Set : Our Own Data?● Web server log
○ IP address of the visitors. IP2Country● Web access analysis
○ Most visited pages● Comments from our users.
○ Good, bad, Like, Dislike.
Issues With The Data?Lack of useable information.
We need to collect data on our own.
Ini peluang business untuk startup.
What Need To Be Collected?
Good Bad Like DislikeWhat we want to know from big data and any data that we
analysis is this :-
GOOD BAD LIKE DISLIKE
Sentiment analysis
When Who Where What Why HowWhen - @timestamp is important for data analysis.
Who - Anonymous is important but we need to know male or female and his or her age.
Where - Anonymous is important, but we still need the IP address to know from which country or state or county.
What - The operating system, the browser's version
Why - Keywords thats lead them
How - How they know about us
How To Visualize Our DataI’m a fan of ELK
Elasticsearch Logstash & Kibana
ELK is one of Big Data tools
Index The Data With ESUsed Elasticsearch to Index our data.
One misconception. ES is not for storage.
Don’t used ES to store our data.
Data need to be archived elsewhere.
ES Search APIThe result in JSON. Developer love JSON. (May be)
https://www.elastic.co/guide/en/elasticsearch/reference/5.0/_exploring_your_data.html
Kibana
We can use Kibana to view our data in ES.
DKANWe can store data with DKAN. DKAN follow CKAN.
The open source open data platform with a full suite of cataloging, publishing and visualization features that allows organizations to easily share data with the public.
http://www.nucivic.com/dkan/
Take advantage DKAN Datastore API
GeoSpatial Is ImportantOur data need to have spatial information (GPS Coordinate)
We can used GeoServer to have our own Map Server.
http://geoserver.org/
The End
Q & A
019-6085482http://linuxmalaysia.harisfazillah.info/