19th January 2015 BDA Technologies & Selected Case Studies
Ettikan Kandasamy Karuppiah (Ph.D),Principal Researcher &
Director of Accelerative Technologies Lab MIMOS Berhad SEMINAR
INTERNET COMPUTING TECHNOLOGYTheme: Delivering Values From
Hyperconnectivities 2.00-2.45pm @Bilik Serbaguna 1, MAMPU 2 Big
data is defined by the high volume, velocity, variety, veracity and
value of data which are generated every second, minute, hour,
day.by device, human etc Turning big data into Value ECONOMIC
BENEFITS GOVERNMENT BENEFITS SOCIETAL BENEFITS VOLUME Growing data
90% of worlds data generated over last 2 years VELOCITY Increasing
data 175,000 tweets per second VARIETY Broadening data 80% of the
worlds data is unstructured (text, geospatial, audio, video)
VERACITY Establishing the of big data sources Big Data technology
allows us to establish quality and accuracy especially in
unstructured data Big Data Analytics in a Glance Big Data Computing
in ICT SectorThe Malaysian ICT services sub-sector has huge
potential growth, with a projected share of 35% in the nations
Digital Economy in 2020...RequiresTransformativePlatform Source:
MDEC, as taken from APeJ Big DataMaturityScape Assessment 2013 by
IDC Software Solutions and Support is the Key GDP
ContributorBusiness Value Data Modeling & Visualization for
PDRM Workforce Planning& GPGPU Data Security Library MIMOS
BigData Technologies R&D Establish work on General Purpose
Graphics Processing Unit for text manipulation,Hadoop Trainings
MultiCore Java Compiler Acquire Train Conducted Workshop, Hadoop
Programming training to Malaysian Research Community Collaboration
R&D MiAccLib CleansingMiAccLib Finance Data Cleansing Engine
for PERKESO & Data Warehouse for PERKESO MiAccLib Algo/Map
nVidia COE for GPGPU Established MiAccLib Crypto Sentiment Analysis
Model &Data Modeling & Data Warehouse for PIK MOH &
GPGPU Video Data Analytics Library R&D Data
Encryption/Decryption for National Data Protection MiAccLib Video
GPU Accelerated Libraries for Data Cleansing & Financial Risk
Modeling MiAccLib BigData Accelerated Libraries for Database
Accelerator Library (Galactica) 2014 MIMOS Berhad. All Rights
Reserved. 4 GE13 Electoral Roll Analysis with Hadoop & GPU
MiAccLib Cleansing ESRI Inc/US Mou Established Acquire Train Intel
Malaysia /US MoU AMD Malaysia /US/Europe MoU High Risk Profiling,
Illicit, Taxable & Drugs Detection (PoC) MiAccLib Image RM10
->Foundation & Early Adaptation for Heterogenetic Computing
RM11 ->Maturation & Progressive Deployment of Scalable
Heterogenetic Computing Assisting Both Government & Private
Sector NeedsPrivate Sector to Go Global National Public Sector
Source : MDeC DECISIONS REQUESTED FCC is requested to: 1. Take note
of data science upskilling for civil servants 2. Take note of MAMPU
developing the Government Open Data framework by 2015 3. Endorse
the DG Lab on BDA to identify use cases and pilot projects that
address societal wellbeing 4. Take note of MIMOS defining and
developing the Big Data technology platform for Government by 2015.
5. Mandate opening up of allrelevant data (Open/Non-Open) to the DG
Lab on BDA for the pilot projectsRahsiaBesarRahsiaSulit
TerhadTerbuka
Opening Up Non-Sensitive Government Data Policy for all
government agencies to open up data categorised under terbuka o
E.g. - non-sensitive data like meteorology, transport timetables
and pricing of essential goods based on Open Data criteria + Policy
Technology Developing BDA Open Innovation Platform An
open-innovation platform between Government, businesses and Rakyat
to improve e-participation and user satisfaction. Prioritization
through the development of high impact, low-cost, demand driven
life-event solutions POCs, pilots & apps Secure environment
(sandbox) for Government Data BDA DG (Digital Government) LAB
Expertise - Community Data - Government Data Project Sponsor
Sector-specific use cases /life-events: eg. Welfare, Education,
Healthcare, Transportation BDA Technology Platform DATAOUTCOMES
Open Data Data.gov.my DATA Community Government Research &
Development on KEY Data Extraction, Processing & Analytics
Components i. National Data Sovereignty ii. Trusted Data iii.
Secured Data Localized Entity (ie. MIMOS, Cybersecurity) Key Values
Data Visualization Data Staging Cleansing Harmonisation
Anonymisation Data Model & AnalyticsSecurity Infrastructure
Management Data DB Store Data Extraction Traceability Machine
Learning - Malaysian Context- (BM, English, Chinese, Tamil)
Accelerated Computing Secured Cloud Services Visualization -
MalaysianPerspective BDA Technology Platform Strategy 8 Mi-Cloud
Mi-Harmony Mi-UAP Mi-MobileMi-MOCHA Mi-HelioMi-Morphe Mi-Harvester
Mi-CLIP Mi-DocMi-ScramblerMi-Portal Mi-BISMi-ARMC Mi-Trust Mi-SP
(Video Analytics) Mi-STP Mi-TargetMi-HPDW Mi-AccLytics
Mi-DSSMi-AccLib Mi-TraceMi-ROSSMi-DW Mi-MarketGalactica
Customization 3rd Party Systems & Hardware Data Security Data
Extraction Data Staging Data DB Store Data Visualization Data Model
& AnalyticsSecurity DataManagement Infrastructure Management
Traceability Cleansing Harmonisation Anonymisation Data Source
Structured + Open LinkedData Unstructured Applications BDA
Technology Platform Strategy Extracting Value from Data Data
Sharing Data Visualization Scrambled database & Datamarts
Granular Primary Database DataAnonymisation PublishedData Marts
Harmonisation DataHarmonisation Harmonisation Terminologies
Cleansing DataCleansing Data Correction Staging Data DataHarvesting
UnStructuredData Sources StructuredData Sources Virtualized
Platform & Integrity Manager Mi-CLOUD + Mi-Mocha
UnstructuredData Collector Mi-Clip Data Harmonisation Mi-Harmony +
Mi-Semantics Detect Correction Exception Mi-Morphe +Mi-AccLib Data
Anonymisation Mi-Scramble+ Mi-Crypto + MiAccLib Authentication
& Authorization Mi-UAP Mi-ARMC Data Warehouse
Platform(Mi-Galactica, Mi-AccConnect, Mi-HPDW) Data Modeling 2014
MIMOS Berhad. All Rights Reserved. 9 DataStatistics Mi-AccStat
Sentiment Analytics Mi-Intelligence; Mi-NLP Data Visualization
Mi-HELIO; Mi-BIS DataAnalytics Mi-Portal Social Network Analytics
Mi-Visualitic Knowledge Harvester (LOD) Mi-Harvester DataAnalytics
Mi-HPDW Data Analytics DataAnalytics Mi-Target 10 Mi-Cloud
Mi-Harmony Mi-UAP Mi-MobileMi-MOCHA Mi-HelioMi-Morphe Mi-Harvester
Mi-CLIP Mi-DocMi-ScramblerMi-Portal Mi-BISMi-ARMC Mi-Trust Mi-SP
(Video Analytics) Mi-STP Mi-TargetMi-HPDW Mi-AccLytics
Mi-DSSMi-AccLib Mi-TraceMi-ROSSMi-DW Mi-MarketGalactica New
Platforms & Revisions Technology Challenges Ahead (11th
Malaysia Plan) NEWER Channels of Consumption(eg. Omni channel data
market) NEWER Sources of Data(eg. high speed streams) NEWER
Methodsof Visualization(eg. Multi dimensional view) NEWER Paradigms
on Computing (eg. Dockers) Technology Pull Technology Push 11 IoA
Internet of AnythingII Industrial Internet IoE Internet of
Everything IoT Internet of Things Big Data Moving Forward 12 IoA
Internet of AnythingII Industrial Internet IoE Internet of
Everything IoT Internet of Things Software Defined Network Big Data
Processing Mobile Systems Wearables Cloud Computing
Cyber-biological systems Cyber-physical systems InternetofHumans
Big Data Moving Forward Open Platform & BDA Middleware
Architecture DataExtraction Flume Mi-Clip Mi-Harvester Mi-Morphe
Structured,Semi-structured & Un-structured Data Sources Open
LinkedData Web & Social Media RDBMS Files Sqoop Data Model
Mi-HPDW Kafka Data Cleansing Mi-MorpheMi-AccLib Data Anonymisation
Mi-ScrambleData Harmonization Mi-HarmonyData Source
Mi-CryptoMi-AccLib RDBMS Galactica FS HDFS, NoSQL
GalacticaHadoopData warehouse / Data mart Data Storage Mi-HPDW
STORAGE Infrastructure Mi-Cloud Mi-MochaGalactica YARN
Mi-AccConnect PigHiveImpalaShark Galactica Connector RMahout ML-Lib
(Spark) Mi-NLP Mi-AccStat Mi-HelioMi-BISMi-Portal Data
Visualisation Data Analytics Tools (Machine Learning) Mi-UAP Data
Security Mi-HPDW Mi-HPDW Mi-HPDW Mi-Target GIS Apache Drill |
Spark/Shark| Hue ClouderaSearch & Solr RDF Graph DB
Mi-Intelligence Cloudera Manager/Falcon Zoo Keeper Oozie Sentry
Data Management Data Staging MIMOS Solution3rd Party Solution
Mi-Trust Mi-Visualitics (Data Sources Type)RDBMSStreaming
(twitter,logs, etc)NoSQL Data Type Stream Spark | Kafka | Spring XD
& Storm Search Cloudera Search & Solr Application Program
Interface Thrift | REST | Java API | AVRO Management YARN (resource
management) | Big Data Orchestration Engine/Layer | Zookeeper
(configuration and synchronization)Oozie (work flow scheduler) |
Cloudera Manager | Management for Luster StorageHDFS | HPDW-Storage
|Galactica FS| NoSQL (Hbase) Distributed Database (Cassandra)
|RDBMS (Postgress, MySQL) VisualizationMi-Helio | Mi-Portal |
Mi-BIS(Mi-AccConnect) | 3rd Party Apps Batch Query MapReducev2 |
Pig | Hive Real Time QueryMi-BIS with Impala through
Mi-AccConnectHue | Galactica | Apache Drill | Spark/Shark
|HPDW-BigData DB Machine Learning Mi-BIS (Weka) | Accstats (R and
Cloudera C++)ML-LIB (Spark) | Revolution R, Weka Processing
Mi-Morphe | Morphlines | Mi-Acclib MapReducev2 (Accelerated
ETL)HPDW Data Model Plugin(For MiMorphev3/Pentaho) Analytics
Simulator | Planning Tool | Predictive Prescriptive | Prediction
AlgorithmMi-BIS (Mi-Accstats) Mi-BIS (Data Mining) Revolution R 3rd
PartyGIS 3rd party Legend: Security and Authentication Sentry |
Mi-UAP | Mi-ARMC | Mi-Trust Data Management Sqoop | Flume MIMOS
BigData Stack With Reference to Hadoop Stack Multi & Many Cores
Processors (CPU +GPU)Complete 3rd Party3rd Party & MIMOS
Offering MIMOS Technologies3rd Party Technologies 15 Proof of
Concepts Selected Use Cases 16 Proof of Concepts -Mixed Scenario-
(Technology Capabilities) 17 Challenges to be Addressed During
Initial Roll-Outs Data Challenges (Stage 1) Data is stored in
partial & distributed locations Format of data both in digital
& non digital while some are in paper based format Incomplete
data set (Q issues) Cleanliness of the data Missing values, Random,
Non-Random, CR, Noise Cleaning while maintaining integrity &
value Extracting the features Data in plural languages (at least
English & Malay) Structured has longer historical value to be
acquired Data storage media & format for extraction and usage
How to authenticate the key values? Where is the reference point?
As for unstructured data (e.g social media), current technology is
adequate to support the pre-processing, analytics With some local
challenges Who are the data owner? How to ensure the security level
of the data for sharing? PDP compliance confusion . More to be
share by visiting MIMOS Lab Analytics Challenges (Stage 2) Tools
are available but right approach is still critical for evaluation
Which are the best/right algorithms to be used? Can you identify
the right domain expert within the organization? Who are the local
domain experts to be consulted for the methods/algorithms
selection? You may not have data scientist in specific gov.
organization, but how to form one (external + internal) ->
analytics team What exactly are the data owners business needs? Why
do they need to do this? Headache for thembest to leave the data to
rest in peace !! Which data to be included and which to be
excluded, what to be anonymized? concern of meaning/trend
extraction Plurality of languages & interpretation accuracy
Semantification of the language specific analytics Bottlenecks to
be identified and accelerated approach required for the specific
processing Agile is the best way Results Challenges (Stage 3)
Visualization of the results in simple, action-able and
communicable how to handle continuously changing analytics (and the
results) due to New data inclusion New domain expert inclusion New
additional factors to be considered Who validates the results? How
to translate results to value for (gov) organization How to
translate the value to actions? How to follow-up on 2nd cycle of
activities?Benefiting Humanity Through Technology Thank You