1 Confidential Streaming Data Integration with Apache Kafka
1Confidential
Streaming Data Integrationwith Apache Kafka
2Confidential
About Gwen
Gwen Shapira – System Architect @Confluent
PMC @ Apache Kafka
Moving data round since 2000
Previously:
• Software Engineer @ Cloudera
• Oracle Database Consultant
Find me:
• @gwenshap
3Confidential
The Plan
1. What is Data Integration About?2. How things changed?3. What is difficult and important?4. How we solve things in Kafka?
4Confidential
Data Integration
Making sure the right dataGets to the right places
5Confidential
10 years ago…
InformaticaDataStageManual Optimizations
6Confidential
5 years ago…
7Confidential
8Confidential
9Confidential
Today…
• Everything streaming• Everything real-time• Everything in-memory• Everything containers• Everything clouds
10Confidential
These Things Matter
• Reliability – Losing data is (usually) not OK. • Exactly Once vs At Least Once
• Timeliness • Push vs Pull• High throughput, Varying throughput
• Compression, Parallelism, Back Pressure
• Data Formats• Flexibility, Structure
• Security• Error Handling
11Confidential
12Confidential
After: Stream Data Platform with Kafka Distribute
d Fault Tolerant Stores Messages
Search Security
Fraud Detection Application
User Tracking Operational Logs Operational MetricsEspresso Cassandra Oracle
Hadoop Log Search Monitoring Data Warehouse
Kafka
Processes Streams
13Confidential
14Confidential
14
15Confidential
15
16Confidential
16
17Confidential
17
18Confidential
IntroducingKafka Connect
Large-scale streaming data import/export for Kafka
19Confidential
20Confidential
Overview of Connect
1. Install a cluster of Workers2. Download / Build and install Connector Plugins3. Use REST API to Start and Configure Connectors4. Connectors start Tasks. Tasks run inside Workers and copy data.
21Confidential
22Confidential
23Confidential
24Confidential
25Confidential
26Confidential
27Confidential
28Confidential
30Confidential
31Confidential
32Confidential
Questions?