Data Pipeline: Big Data meets Salesforce Carolina Ruiz Medina Principal Developer on Product Innovation [email protected] @carolenlanube Agustina García Peralta Principal Developer on Platform Strategy [email protected] @agarciaodeian
Salesforce
Data Pipeline:Big Data meets SalesforceCarolina Ruiz MedinaPrincipal Developer on Product [email protected]@carolenlanubeAgustina Garca PeraltaPrincipal Developer on Platform [email protected]@agarciaodeian
Carolina Ruiz MedinaPrincipal Developer on Product InnovationFinancialForce.com , MVP@CarolEnLaNube @CodeCoffeeCloud
Agustina Garca PeraltaPrincipal Developer, Platform StrategyFinancialForce.com@agarciaodeian
About
GREAT ALONE. BETTER TOGETHER.Native to Salesforce App Cloud since 2009Investors include Salesforce VenturesCustomers in 27 countries650+ employees, San Francisco basedDreamforce.FinancialForce.com
First, a few quick words about FinancialForce.com.
FinancialForce.com builds ERP apps that are native to the Salesforce App cloud including Accounting, professional services automation, Human resources and Inventory applications. Our apps can be subscribed to separately or part of a whole ERP family.
Our company investors include Salesforce Ventures, which made their original investment in us in 2009.
We have customers all around the world in 27 countries and over 650 employees including those at our headquarters on 595 Market St. here in San Francisco.
We have quite few sessions and parties planned here this week, you can learn more about those at Dreamforce.Financialforce.com. Feel free to join us.
AgendaData Pipeline - OverviewPipeline Use CasesHow Pipeline works DemosBig DataTake awayQ&A
Asynchronous apex@futureQueueableBatch ApexFlex Queue (since Summer 15)Common scenario Large amount of data
Any other option? Data Pipeline: New feature to integrate Apache Pig into Salesforce
Common scenario Large amount of data
What does it do? Process massive amounts of data in parallel.Key elementsMapReduce software to write programs to run amounts of data in parallelHadopp cluster cluster for storing and analyzing amounts of data
Apache Pig Background
Enables Developers to create executions for analyzing LARGE AMOUNT of data in PARALLEL
How does it work? It uses Pig Latin Data-flow languageBetween SQL and JavaWe can create our own UDF (user define functions)
Apache Pig Background
Why is it relevant? Technology associated with Hadoop but can be used by other frameworks Salesforce
Is there anything unique to Apache Pig running in Salesforce?Running in multitenant environmentApache Pig Background
Under Pilot program GA by Summer 16 (Safe Harbor)How does Data Pipeline work?Run Pig Scripts written in Pig Latin language
What is Data Pipeline?
Data PipelinePig ScriptApex?
Execution featureRun asynchronouslyIn ParallelFrom where?Developer ConsoleDuring deployTooling API 33.0 onwards
What is Data Pipeline?
Anything else?It is an ETL (Extract Transform Load)Pig Scripts can be included into a package
What is Data Pipeline?
What is Data Pipeline?
1 . PerformanceData Pipeline Advantages vs other processes
2 . Ability to Execute Scripts in Parallel
3 . No hitting governor Limits
4 . De-couple On-line Transaction Processing and On-line Analytical Processing
5 . Allows you to think in terms of data flow
How Pipeline can help us?
. and we need to process them Now! We have a large volume of Financial Transactions. for our Users to be able to use them: Report, print, or for another quick process to finish revaluatePrepare data for Currency Revaluation SObject SObject
to
How Pipeline can help us?
. and we need to process them Now! We have a large volume of Financial Transactions. for our manager to look the progress, to export data quickly... Extracting information from large amount of Data SObject File
to
To build the Solution lets See Pig Script firstWhat is Pig Script ?
OperatorsJOINGROUPDISTINCTORDER
SolutionSObject SObject
to
SolutionSObject File
to
File created
Demo
Use Case
LBX7/7/2015$150.00I-00000Other7/7/2015$250.00I-00001LBX7/7/2015$150.00I-00002LBX12/7/2015$350.00I-00003Other15/7/2015$550.00I-00004
LBX7/7/2015$150.00I-00000Other7/7/2015$250.00I-00001LBX7/7/2015$150.00I-00002LBX12/7/2015$350.00I-00003Other15/7/2015$550.00I-00004
LBX7/7/2015$150.00I-00000Other7/7/2015$250.00I-00001LBX7/7/2015$150.00I-00002LBX12/7/2015$350.00I-00003Other15/7/2015$550.00I-00004
LBX7/7/2015$150.00I-00000Other7/7/2015$250.00I-00001LBX7/7/2015$150.00I-00002LBX12/7/2015$350.00I-00003Other15/7/2015$550.00I-00004
7/7/2015LBX$300.007/7/2015Other$250.0012/7/2015Other$250.0015/7/2015Other$550.00
LBX7/7/2015$150.00I-00000Other7/7/2015$250.00I-00001LBX7/7/2015$150.00I-00002LBX12/7/2015$350.00I-00003Other15/7/2015$550.00I-00004
SObject
toFile
Use Case -
SObject
toFile
Use Case
No header!!SObject
toFile
Demo
Use Case
SObject
toFile
Use Case
SObject
toFile
Data Pipeline 2 more options
Join 2 objects
Data Pipeline 2 more optionsRead and Process a JSON file
Thousand of invoicesKeep them somewhere for audit processesNo need all information, just some field valuesBut that is not all!!
Big Data
#Big Data#Big Objects
Big Data Big ObjectsCustom ObjectBig ObjectCreationManual & MetadataMetadata
Under Pilot program GA by Summer 16 (Safe Harbor)
Big Data Big Objects
Big Data Big Objects
Big Data Big ObjectsCustom ObjectBig ObjectCreationManual & MetadataMetadataAPI namemyObject__cmyObject__bEnable Reports, Track Activities, Track Field History, etc.Options AvailableOptions No AvailableField TypesAllText ; Date/Time ; Lookup
Numbers!!!
Big Data Big ObjectsCustom ObjectBig ObjectAble to edit / delete fields?YesNoTriggers; Field Sets; etcOptions AvailableOptions no Available
//Run as presentation to see al information
Big Data Big ObjectsCustom ObjectBig ObjectHow to Populate recordsAll optionsBulk API; SOAP API; Data PipelineCan I amend a record?YesNo Only clone is availableCan I see data creating a TabYesNo Only via SOQLFor free?YesNo Talk with Salesfoce about itStorage?It count against storage limitationIt DOES NOT count against the storage limitation
Yes!!
//Run as presentation to see al information
Big Data Big Objects & Pipeline
Size complexity 20 operators, 20 loads and 10 stores / scriptRun up to 30 scripts a dayBulk APIStore calls it and its limits are in placeDoes not support some operators like CountCant break the rules on Salesforce Platform triggers, validations, required fields, etcOnce you run the process there is no way backData Pipeline - Limitations
Data Pipeline Take away1. New Feature is in Pilot
2. Run Scripts via: Developer Console Deploy Tooling API ( since API 33.0) 3. Run Scripts Asynchronously and in Parallel4. Better performance 5. Easy to use!!
Q&AISV Scale: Big Data for ISV 4pmPark Central Hotel, Franciscan Ballroom
https://pig.apache.org/http://goo.gl/h5N7Sahttps://goo.gl/KXQSKC
Links and moreCarolina Ruz [email protected]@[email protected]://www.meetup.com/es/South-Spain-Salesforce-Developer-Group/
Agustina Garca [email protected]@agarciaodeianwww.agarciaodeian.comhttp://www.meetup.com/es/Spain-Salesforce-Developer-User-Group/
Thank you
null3239.1877