Top Banner
Apache®, Apache Ignite, Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache So8ware Founda<on in the United States and/or other countries. DMITRIY SETRAKYAN GridGain Founder & Chief Product Officer Apache Ignite PMC VALENTIN KULICHENKO GridGain Lead Architect Apache Ignite PMC Be#er Together Apache Ignite & Apache Spark Fast Data Meets Open Source hRp://ignite.apache.org @apacheignite @dsetrakyan
20

Be#er%Together–)Apache%Ignite%&%Apache%Spark% ·...

Aug 14, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Be#er%Together–)Apache%Ignite%&%Apache%Spark% · Apache®,)Apache)Ignite,)Ignite®,)and)the)Apache)Ignite)logo)are)either)registered)trademarks)or)trademarks)of)the)Apache)So8ware)Foundaon)in)the)United)States)and

Apache®,  Apache  Ignite,  Ignite®,  and  the  Apache  Ignite  logo  are  either  registered  trademarks  or  trademarks  of  the  Apache  So8ware  Founda<on  in  the  United  States  and/or  other  countries.  

DMITRIY  SETRAKYAN  GridGain  Founder  &  Chief  Product  Officer  

Apache  Ignite  PMC    

VALENTIN  KULICHENKO  GridGain  Lead  Architect  Apache  Ignite  PMC  

Be#er  Together  –  Apache  Ignite  &  Apache  Spark  Fast  Data  Meets  Open  Source  

hRp://ignite.apache.org   @apacheignite   @dsetrakyan  

Page 2: Be#er%Together–)Apache%Ignite%&%Apache%Spark% · Apache®,)Apache)Ignite,)Ignite®,)and)the)Apache)Ignite)logo)are)either)registered)trademarks)or)trademarks)of)the)Apache)So8ware)Foundaon)in)the)United)States)and

Apache®,  Apache  Ignite,  Ignite®,  and  the  Apache  Ignite  logo  are  either  registered  trademarks  or  trademarks  of  the  Apache  So8ware  Founda<on  in  the  United  States  and/or  other  countries.  

Agenda  •  Apache  Ignite(tm)  Overview  •  Data  Grid  

•  Par<<oning  Schemes  •  SQL  

•  Shared  Memory  Layer  •  Share  Spark  RDDs  •  In-­‐Memory  File  System  •  DevOps:  Yarn  and  Mesos  

•  Faster  MapReduce  &  Hive  •  Ignite  MapReduce  

•  Demo  -­‐  Shared  Ignite  RDDs  •  Demo  -­‐  SQL  using  Apache  Zeppelin  •  Q  &  A  

Page 3: Be#er%Together–)Apache%Ignite%&%Apache%Spark% · Apache®,)Apache)Ignite,)Ignite®,)and)the)Apache)Ignite)logo)are)either)registered)trademarks)or)trademarks)of)the)Apache)So8ware)Foundaon)in)the)United)States)and

Apache®,  Apache  Ignite,  Ignite®,  and  the  Apache  Ignite  logo  are  either  registered  trademarks  or  trademarks  of  the  Apache  So8ware  Founda<on  in  the  United  States  and/or  other  countries.  

•  Very  Ac<ve  Community  •  Great  Way  to  Learn  Distributed  Compu<ng  •  How  To  Contribute:  

 –  hRps://ignite.apache.org/community/

contribute.html#contribute    

–  hRps://cwiki.apache.org/confluence/display/IGNITE/How+to+Contribute  

Apache  Ignite  -­‐  We  Are  Hiring!  

Page 4: Be#er%Together–)Apache%Ignite%&%Apache%Spark% · Apache®,)Apache)Ignite,)Ignite®,)and)the)Apache)Ignite)logo)are)either)registered)trademarks)or)trademarks)of)the)Apache)So8ware)Foundaon)in)the)United)States)and

Apache®,  Apache  Ignite,  Ignite®,  and  the  Apache  Ignite  logo  are  either  registered  trademarks  or  trademarks  of  the  Apache  So8ware  Founda<on  in  the  United  States  and/or  other  countries.  

Apache  IgniteTM  In-­‐Memory  Data  Fabric:    Strategic  Approach  to  IMC  

•  Supports Applications of various types and languages

• Open Source – Apache 2.0 • Simple Java APIs • 1 JAR Dependency • High Performance & Scale • Automatic Fault Tolerance • Management/Monitoring • Runs on Commodity Hardware

• Supports existing & new data sources

• No need to rip & replace

Page 5: Be#er%Together–)Apache%Ignite%&%Apache%Spark% · Apache®,)Apache)Ignite,)Ignite®,)and)the)Apache)Ignite)logo)are)either)registered)trademarks)or)trademarks)of)the)Apache)So8ware)Foundaon)in)the)United)States)and

Apache®,  Apache  Ignite,  Ignite®,  and  the  Apache  Ignite  logo  are  either  registered  trademarks  or  trademarks  of  the  Apache  So8ware  Founda<on  in  the  United  States  and/or  other  countries.  

Apache  Ignite  In-­‐Memory  Data  Fabric  

Page 6: Be#er%Together–)Apache%Ignite%&%Apache%Spark% · Apache®,)Apache)Ignite,)Ignite®,)and)the)Apache)Ignite)logo)are)either)registered)trademarks)or)trademarks)of)the)Apache)So8ware)Foundaon)in)the)United)States)and

Apache®,  Apache  Ignite,  Ignite®,  and  the  Apache  Ignite  logo  are  either  registered  trademarks  or  trademarks  of  the  Apache  So8ware  Founda<on  in  the  United  States  and/or  other  countries.  

•  Long  Running  Applica<ons  –  Passing  State  Between  Jobs  

•  Disk  File  System  (HDFS?)  –  Convert  RDDs  to  Disk  Files  and  Back  –  Argh#$%  

•  Share  RDDs  In-­‐Memory  –  Na<ve  Spark  API  –  Na<ve  Spark  Transforma<ons  

Why  Share  State  in  Spark?  

Page 7: Be#er%Together–)Apache%Ignite%&%Apache%Spark% · Apache®,)Apache)Ignite,)Ignite®,)and)the)Apache)Ignite)logo)are)either)registered)trademarks)or)trademarks)of)the)Apache)So8ware)Foundaon)in)the)United)States)and

Apache®,  Apache  Ignite,  Ignite®,  and  the  Apache  Ignite  logo  are  either  registered  trademarks  or  trademarks  of  the  Apache  So8ware  Founda<on  in  the  United  States  and/or  other  countries.  

•  In-­‐Memory  Key-­‐Value  Store  –  Good  for  Caching  Tuples  

•  Founda<on  for  Shared  Memory  State    –  IgniteRDD  is  based  on  Data  Grid  –  Ignite  File  System  is  based  on  Data  Grid  

•  On-­‐Heap  &  Off-­‐Heap  Memory  •  In-­‐Memory  Indexes  

–  Fast  SQL  •  Built  for  High  Throughput  and  Low  Latencies  

Why  Ignite  Data  Grid?  

Page 8: Be#er%Together–)Apache%Ignite%&%Apache%Spark% · Apache®,)Apache)Ignite,)Ignite®,)and)the)Apache)Ignite)logo)are)either)registered)trademarks)or)trademarks)of)the)Apache)So8ware)Foundaon)in)the)United)States)and

Apache®,  Apache  Ignite,  Ignite®,  and  the  Apache  Ignite  logo  are  either  registered  trademarks  or  trademarks  of  the  Apache  So8ware  Founda<on  in  the  United  States  and/or  other  countries.  

•  Key-­‐Value  Store  (JCache,  JSR  107)  –  In-­‐Memory  Key-­‐Value  Store  –  Basic  Cache  Opera<ons  –  ConcurrentMap  APIs  –  Collocated  Processing  (EntryProcessor)  –  Events  and  Metrics  –  Pluggable  Persistence  

•  Data  Grid  –  ACID  Transac<ons  –  SQL  Queries  (ANSI  99)  –  In-­‐Memory  Indexes  –  On-­‐Heap  &  Off-­‐Heap  Memory  –  Automa<c  RDBMS  Integra<on  

Data  Grid:  JCache  (JSR  107)  

Page 9: Be#er%Together–)Apache%Ignite%&%Apache%Spark% · Apache®,)Apache)Ignite,)Ignite®,)and)the)Apache)Ignite)logo)are)either)registered)trademarks)or)trademarks)of)the)Apache)So8ware)Foundaon)in)the)United)States)and

Apache®,  Apache  Ignite,  Ignite®,  and  the  Apache  Ignite  logo  are  either  registered  trademarks  or  trademarks  of  the  Apache  So8ware  Founda<on  in  the  United  States  and/or  other  countries.  

Data  Grid:  Distributed  Caching  

Par<<oned  Cache   Replicated  Cache  

Page 10: Be#er%Together–)Apache%Ignite%&%Apache%Spark% · Apache®,)Apache)Ignite,)Ignite®,)and)the)Apache)Ignite)logo)are)either)registered)trademarks)or)trademarks)of)the)Apache)So8ware)Foundaon)in)the)United)States)and

Apache®,  Apache  Ignite,  Ignite®,  and  the  Apache  Ignite  logo  are  either  registered  trademarks  or  trademarks  of  the  Apache  So8ware  Founda<on  in  the  United  States  and/or  other  countries.  

•  ANSI-­‐99  SQL  •  Always  Consistent  •  Fault  Tolerant  •  In-­‐Memory  Indexes  (On-­‐Heap  and  Off-­‐Heap)  •  Automa<c  Group  By,  Aggrega<ons,  Sor<ng  •  Cross-­‐Cache  Joins,  Unions,  etc.  •  Ad-­‐Hoc  SQL  Support  

Data  Grid:  Ad-­‐Hoc  SQL  (ANSI  99)  

Page 11: Be#er%Together–)Apache%Ignite%&%Apache%Spark% · Apache®,)Apache)Ignite,)Ignite®,)and)the)Apache)Ignite)logo)are)either)registered)trademarks)or)trademarks)of)the)Apache)So8ware)Foundaon)in)the)United)States)and

Apache®,  Apache  Ignite,  Ignite®,  and  the  Apache  Ignite  logo  are  either  registered  trademarks  or  trademarks  of  the  Apache  So8ware  Founda<on  in  the  United  States  and/or  other  countries.  

SQL  Cross-­‐Cache  GROUP  BY  Example  

Page 12: Be#er%Together–)Apache%Ignite%&%Apache%Spark% · Apache®,)Apache)Ignite,)Ignite®,)and)the)Apache)Ignite)logo)are)either)registered)trademarks)or)trademarks)of)the)Apache)So8ware)Foundaon)in)the)United)States)and

Apache®,  Apache  Ignite,  Ignite®,  and  the  Apache  Ignite  logo  are  either  registered  trademarks  or  trademarks  of  the  Apache  So8ware  Founda<on  in  the  United  States  and/or  other  countries.  

Apache  Ignite  for  Spark  and  Hadoop  

Page 13: Be#er%Together–)Apache%Ignite%&%Apache%Spark% · Apache®,)Apache)Ignite,)Ignite®,)and)the)Apache)Ignite)logo)are)either)registered)trademarks)or)trademarks)of)the)Apache)So8ware)Foundaon)in)the)United)States)and

Apache®,  Apache  Ignite,  Ignite®,  and  the  Apache  Ignite  logo  are  either  registered  trademarks  or  trademarks  of  the  Apache  So8ware  Founda<on  in  the  United  States  and/or  other  countries.  

•  Automa<c  Resource  Management  •  Easy  Data  Center  Installa<on  •  Easy  Data  Center  Configura<on  •  On-­‐Demand  Elas<city  

DevOps:  IntegraZon  with  Yarn  and  Mesos  

Page 14: Be#er%Together–)Apache%Ignite%&%Apache%Spark% · Apache®,)Apache)Ignite,)Ignite®,)and)the)Apache)Ignite)logo)are)either)registered)trademarks)or)trademarks)of)the)Apache)So8ware)Foundaon)in)the)United)States)and

Apache®,  Apache  Ignite,  Ignite®,  and  the  Apache  Ignite  logo  are  either  registered  trademarks  or  trademarks  of  the  Apache  So8ware  Founda<on  in  the  United  States  and/or  other  countries.  

•  IgniteRDD  Deployment  Modes  –  Share  RDD  across  tasks  on  the  host  –  Share  RDD  across  tasks  in  the  applica<on  –  Share  RDD  globally  –  Embedded  vs  External  Deployments    

•  Faster  SQL  –  In-­‐Memory  Indexes  –  SQL  on  top  of  Shared  RDD  

Share  RDDs  Across  Spark  Jobs    

Page 15: Be#er%Together–)Apache%Ignite%&%Apache%Spark% · Apache®,)Apache)Ignite,)Ignite®,)and)the)Apache)Ignite)logo)are)either)registered)trademarks)or)trademarks)of)the)Apache)So8ware)Foundaon)in)the)United)States)and

Apache®,  Apache  Ignite,  Ignite®,  and  the  Apache  Ignite  logo  are  either  registered  trademarks  or  trademarks  of  the  Apache  So8ware  Founda<on  in  the  United  States  and/or  other  countries.  

•  Main  Entry  Point  from  Spark  to  Ignite  •  Specify  Different  Ignite  Configura<ons  •  Embedded  vs  External  Deployments  

–  Client  vs  Server  Modes  

IgniteContext  

Page 16: Be#er%Together–)Apache%Ignite%&%Apache%Spark% · Apache®,)Apache)Ignite,)Ignite®,)and)the)Apache)Ignite)logo)are)either)registered)trademarks)or)trademarks)of)the)Apache)So8ware)Foundaon)in)the)United)States)and

Apache®,  Apache  Ignite,  Ignite®,  and  the  Apache  Ignite  logo  are  either  registered  trademarks  or  trademarks  of  the  Apache  So8ware  Founda<on  in  the  United  States  and/or  other  countries.  

•  Implementa<on  of  SparkRDD  •  Mutable  (unlike  na<ve  RDDs)  •  Par<<oned  over  Ignite  Par<<oned  Caches  •  Indexed  SQL    

–  Spark  only  does  Full  Scans  –  Indexes  are  1000x  faster  

IgniteRDD  

Page 17: Be#er%Together–)Apache%Ignite%&%Apache%Spark% · Apache®,)Apache)Ignite,)Ignite®,)and)the)Apache)Ignite)logo)are)either)registered)trademarks)or)trademarks)of)the)Apache)So8ware)Foundaon)in)the)United)States)and

Apache®,  Apache  Ignite,  Ignite®,  and  the  Apache  Ignite  logo  are  either  registered  trademarks  or  trademarks  of  the  Apache  So8ware  Founda<on  in  the  United  States  and/or  other  countries.  

•  Ignite  In-­‐Memory  File  System  (IGFS)  –  Hadoop-­‐compliant  –  Easy  to  Install  –  On-­‐Heap  and  Off-­‐Heap  –  Caching  Layer  for  HDFS  –  Write-­‐through  and  Read-­‐through  HDFS  –  Performance  Boost  

Ignite  In-­‐Memory  File  System  

Page 18: Be#er%Together–)Apache%Ignite%&%Apache%Spark% · Apache®,)Apache)Ignite,)Ignite®,)and)the)Apache)Ignite)logo)are)either)registered)trademarks)or)trademarks)of)the)Apache)So8ware)Foundaon)in)the)United)States)and

Apache®,  Apache  Ignite,  Ignite®,  and  the  Apache  Ignite  logo  are  either  registered  trademarks  or  trademarks  of  the  Apache  So8ware  Founda<on  in  the  United  States  and/or  other  countries.  

•  Non-­‐Collocated  Joins  (released  in  1.7)  •  Data  Modifica<on  Language  (DML  in  2.0)  

–  INSERT,  UPDATE,  DELETE  •  Data  Defini<on  Language  (DDL  in  2.1)  

–  CREATE,  ALTER,  DROP  •  More  IGFS  Performance  •  Na<ve  Data  Frame  Integra<on  

Apache  Ignite  Roadmap  

Page 19: Be#er%Together–)Apache%Ignite%&%Apache%Spark% · Apache®,)Apache)Ignite,)Ignite®,)and)the)Apache)Ignite)logo)are)either)registered)trademarks)or)trademarks)of)the)Apache)So8ware)Foundaon)in)the)United)States)and

Apache®,  Apache  Ignite,  Ignite®,  and  the  Apache  Ignite  logo  are  either  registered  trademarks  or  trademarks  of  the  Apache  So8ware  Founda<on  in  the  United  States  and/or  other  countries.  

InteracZve  SQL  with  Apache  Zeppelin  

Page 20: Be#er%Together–)Apache%Ignite%&%Apache%Spark% · Apache®,)Apache)Ignite,)Ignite®,)and)the)Apache)Ignite)logo)are)either)registered)trademarks)or)trademarks)of)the)Apache)So8ware)Foundaon)in)the)United)States)and

Apache®,  Apache  Ignite,  Ignite®,  and  the  Apache  Ignite  logo  are  either  registered  trademarks  or  trademarks  of  the  Apache  So8ware  Founda<on  in  the  United  States  and/or  other  countries.  

ANY  QUESTIONS?  Thank  you  for  joining  us.  Follow  the  conversa<on.  

hRp://www.ignite.apache.org  

@apacheignite  

@dsetrakyan