A Microservice Architecture for Big Data Pipelines BigData.be Meetup June 2016
A Microservice Architecture for Big Data PipelinesBigData.be Meetup June 2016
Let’s face it: Big Data is no longer a Big Deal
2Image © User:Kleiner / Wikimedia Commons / CC BY-SA 3.0
www.realimpactanalytics.com
Yardsticks of Software Development:
1. Create Modularity
2. Ensure Quality
3. Scale Development
4. Painless Deployment
1. Challenge: Big Data in Production
2. Zen of Micro Services
3. Data Modules
4. Conclusion
Image © User:Guma89 / Wikimedia Commons / CC BY-SA 3.0
www.realimpactanalytics.com
1. Challenge: Big Data in Production
2. Zen of Micro Services
3. Data Modules
4. Conclusion
Modularity is Imperative for On Premise Deployments:
RealImpact
Product Product Product
Client Client Client
The Promised Land
5Image http://hanciong.deviantart.com/art/old-world-map-253195357
www.realimpactanalytics.com
Micro Services: Maximal Modularity
1. No shared state
2. Minimal coupling
3. Separation of concerns
4. Mix & match
1. Challenge: Big Data in Production
2. Zen of Micro Services
3. Data Modules
4. Conclusion
www.realimpactanalytics.com
Micro Services: Scalable Development
1. Team responsibility
2. Less code = faster ramp up
3. Technology independence
1. Challenge: Big Data in Production
2. Zen of Micro Services
3. Data Modules
4. Conclusion
www.realimpactanalytics.com
Micro Services: Painless Deployment
1. Reproducible environments
2. Versioned APIs
3. Installation = docker-compose up
1. Challenge: Big Data in Production
2. Zen of Micro Services
3. Data Modules
4. Conclusion
Prod
Dev
www.realimpactanalytics.com
Micro Services: QA Friendly
1. Three levels of testing
• Class / function level
• Service level
• Integration level
2. Staging is no big deal
1. Challenge: Big Data in Production
2. Zen of Micro Services
3. Data Modules
4. Conclusion
www.realimpactanalytics.com
Translation to Big Data Pipelines…
1. Challenge: Big Data in Production
2. Zen of Micro Services
3. Data Modules
4. Conclusion
TrendingAnalysis
Twitter Data
TopTweeters
Recommend
www.realimpactanalytics.comcontainer
Translation to Big Data Pipelines…
1. Challenge: Big Data in Production
2. Zen of Micro Services
3. Data Modules
4. Conclusion
TrendingAnalysis
manifest.yaml
run.sh
jar
runtime
www.realimpactanalytics.comcontainer
Translation to Big Data Pipelines…
1. Challenge: Big Data in Production
2. Zen of Micro Services
3. Data Modules
4. Conclusion
TrendingAnalysis
datasources: - twitter
outputs: - id: daily-trends fields: - name: keyword type: string - name: relevance type: integer
parameters: …
manifest.yaml
run.sh
jar
runtime
www.realimpactanalytics.com
Translation to Big Data Pipelines…
1. Challenge: Big Data in Production
2. Zen of Micro Services
3. Data Modules
4. Conclusion
TrendingAnalysis
HDFS
Input Data
Result
Parameters
Demo
14
www.realimpactanalytics.com
Data Modules: QA Friendly?
1. Three levels of testing ✔
• Class / function level
• Module level
• Integration level
2. Staging is no big deal ✔
1. Challenge: Big Data in Production
2. Zen of Micro Services
3. Data Modules
4. Conclusion
www.realimpactanalytics.com
Data Modules: Painless Deployment?
1. Reproducible environments (✔)
2. Versioned APIs ✔
3. Installation = docker-compose up (✔)
1. Challenge: Big Data in Production
2. Zen of Micro Services
3. Data Modules
4. Conclusion
Prod
Dev
www.realimpactanalytics.com
Data Modules: Scalable Development?
1. Team responsibility ✔
2. Less code = faster ramp up ✔
3. Technology independence ✔
1. Challenge: Big Data in Production
2. Zen of Micro Services
3. Data Modules
4. Conclusion
www.realimpactanalytics.com
Data Modules: Modularity?
1. No shared state (well…)
2. Minimal coupling ✔
3. Separation of concerns ✔
4. Mix & match ✔
1. Challenge: Big Data in Production
2. Zen of Micro Services
3. Data Modules
4. Conclusion
Conclusion
19
Brussels Office 5, Place du Champ de Mars 1050 Brussels Belgium
Cape Town Office Sovereign Quay, 34 Somerset Road8005, Green Point, Cape Town South Africa
São Paulo Office 93, Rua Doutor Andrade Pertence Vila Olímpia, São Paulo Brazil
Luxembourg Office 691, rue de Neudorf 2220 Luxembourg Grand Duché du Luxembourg
www.realimpactanalytics.com
Kuala Lumpur Office 28-01, Integra Tower 348 JalanTun Razak, 50400 Kuala Lumpur Malaysia