Peeling the Onion How Data Abstractions Help Build BigData Apps Andreas Neumann @caskoid November 2016 Cask, CDAP, Cask Hydrator and Cask Tracker are trademarks or registered trademarks of Cask Data. Apache Spark, Spark, the Spark logo, Apache Hadoop, Hadoop and the Hadoop logo are trademarks or registered trademarks of the Apache Software Foundation. All other trademarks and registered trademarks are the property of their respective owners.
23
Embed
BDAM Data Abstractionscustomers.cask.co/rs/882-OYR-915/images/BDAM 11-9_Data... · 2020. 8. 23. · How Data Abstractions Help Build BigData Apps Andreas Neumann @caskoid ... - Apache
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Peeling the OnionHow Data Abstractions Help Build BigData Apps
Andreas Neumann @caskoid
November 2016
Cask, CDAP, Cask Hydrator and Cask Tracker are trademarks or registered trademarks of Cask Data. Apache Spark, Spark, the Spark logo, Apache Hadoop, Hadoop and the Hadoop logo are trademarks or registered trademarks of the Apache Software Foundation. All other trademarks and registered trademarks are the property of their respective owners.
- Physical Storage Medium - Lowest level of the abstraction stack
- Benefits - Application code not “polluted” with low-level storage APIs - Portability across storage engines - Portability across different version of the storage engine - Testability in environments with different storage engine - Reusability of code
Storage Format Abstraction- Representation of data in the storage engine
- Serialization of data types to native storage format - Mapping complex types to storage format (ORM) - Schema representation - Provided partially by some storage engines SQL)
- Benefits - Application is not concerned with serialization/deserialization - Schema evolution - Enforces correct schema and representation
- Ingest from Twitter into a Dataset - Run MapReduce over the Dataset to compute frequent #hashtags - Service to retrieve the top #hashtags - See the lineage for this Dataset