Top Banner
Spring Batch Christopher Jeffers August 2012
25

Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring Batch Technical Explanation –Architecture –The Batch Job.

Jan 11, 2016

Download

Documents

Randall Eaton
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring Batch Technical Explanation –Architecture –The Batch Job.

Spring BatchChristopher Jeffers

August 2012

Page 2: Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring Batch Technical Explanation –Architecture –The Batch Job.

2

Agenda

• Intro to Spring Batch and Use-Cases

• Spring Batch Technical Explanation– Architecture

– The Batch Job

– Skipping and Retrying Steps

– Scaling Features

• Spring Batch Evaluation– Solving Use-Cases

– Benefits

– Issues

– Integration Options

– Future Steps

Page 3: Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring Batch Technical Explanation –Architecture –The Batch Job.

3

Spring Batch Overview

• Lightweight framework designed to enable the development of robust batch applications used in enterprise systems

• As a part of Spring, it builds on the ease of use of the POJO-based development approach, while making it easy for developers to use more advanced enterprise services when necessary

• Provides reusable functions that are essential in processing large volumes of data

• Provides scaling features, including multi-threading and massive parallelism for Spring Batch Jobs

Page 4: Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring Batch Technical Explanation –Architecture –The Batch Job.

4

Batch Use-Cases

• DataRoomBatch– Physically delete all rows marked for deletion from a given

bucket (DeepSix)

– Rerun user documents through publishing workflow

– Proactive auditing of the environment

• Public Records Batch Processing– User inputs file with search criteria for many individuals

and program searches database for changes in information, returning a report of hits to user

– Read, Process, and Write sequence

– Satisfies Government and Corporate requirements

Page 5: Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring Batch Technical Explanation –Architecture –The Batch Job.

5

Reason for Spring Batch POC

• Current batch system for public records is not powerful enough to handle very large requests

• Have had to turn away customers because of this

• A more powerful and flexible batch solution could solve this problem

Page 6: Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring Batch Technical Explanation –Architecture –The Batch Job.

6

Agenda

• Intro to Spring Batch and Use-Cases

• Spring Batch Technical Explanation– Architecture

– The Batch Job

– Skipping and Retrying Steps

– Scaling Features

• Spring Batch Evaluation– Solving Use-Cases

– Benefits

– Issues

– Integration Options

– Future Steps

Page 7: Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring Batch Technical Explanation –Architecture –The Batch Job.

7

Architecture

• Layered architecture

• The application layer contains all batch jobs and custom code

• Batch Core contains runtime classes necessary to launch and control a batch job

• Batch Infrastructure contains common readers and writers, and services used by both the application and the core framework

http://static.springsource.org/spring-batch/reference/html/spring-batch-intro.html

Page 8: Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring Batch Technical Explanation –Architecture –The Batch Job.

8

The Batch Job

• A Job entity encapsulates an entire batch process

• A Job is comprised of Steps, which encapsulate a phase of a batch job– Step can be as complex or simple as developer wants

http://static.springsource.org/spring-batch/reference/html/domain.html

Page 9: Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring Batch Technical Explanation –Architecture –The Batch Job.

9

Chunk Processing

• Typical Spring Batch Step– Read, Process, Write sequence

• Multiple items are read and processed before being written as a “chunk”– Size of chunk declared in configuration (commit-interval)

http://static.springsource.org/spring-batch/reference/html/configureStep.html

Page 10: Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring Batch Technical Explanation –Architecture –The Batch Job.

10

Step Flow

• Steps can be configured to flow sequentially or conditionally– Allows for some complex jobs

http://static.springsource.org/spring-batch/reference/html/configureStep.html

Page 11: Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring Batch Technical Explanation –Architecture –The Batch Job.

11

Job Repository

• The JobRepository is used to do CRUD operations with Meta-Data relating to Job and Step execution– Example: Job Parameters, Job/Step status, etc.

http://static.springsource.org/spring-batch/reference/html/domain.html

Page 12: Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring Batch Technical Explanation –Architecture –The Batch Job.

12

Step Skipping

• Step is skipped if an exception listed in the configuration is thrown, rather than stopping the batch execution

• Used for exceptions that will be thrown on every attempt of the Step– FileNotFoundException, Parse Exceptions, etc.

• SkipListener can be used to log skipped items

Page 13: Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring Batch Technical Explanation –Architecture –The Batch Job.

13

Retrying Steps

• If an exception listed in the configuration is thrown, the operation is attempted again

• Used for exceptions that may not be thrown on every attempt of the Step– ConcurrencyFailureException,

DeadlockLoserDataAccessException, etc.

• Can set a limit on number of retries

• RetryListener can be used to log retried items

• RetryTemplate can be used to further customize retry logic

Page 14: Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring Batch Technical Explanation –Architecture –The Batch Job.

14

Scaling Features (Single Process)

• Multi-Threaded Jobs or Steps– Using Spring’s TaskExecutor object

• Parallel Steps– Using split flows and a TaskExecutor in Job configuration.

http://static.springsource.org/spring-batch/reference/html/scalability.html

Page 15: Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring Batch Technical Explanation –Architecture –The Batch Job.

15

Scaling Features (Multi-Process)

• Remote Chunking– Splits Step processing across multiple processes, using

some middleware to communicate

http://static.springsource.org/spring-batch/reference/html/scalability.html

Page 16: Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring Batch Technical Explanation –Architecture –The Batch Job.

16

Scaling Features (Multi-Process)

• Step Partitioning– Splits input and executes remote steps in parallel

– PartitionHandler sends StepExecution requests to remote steps

– Partitioner generates the input for new step executions

http://static.springsource.org/spring-batch/reference/html/scalability.html

Page 17: Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring Batch Technical Explanation –Architecture –The Batch Job.

17

Job Flow with Client/Server and Partitioning

Page 18: Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring Batch Technical Explanation –Architecture –The Batch Job.

18

Agenda

• Intro to Spring Batch and Use-Cases

• Spring Batch Technical Explanation– Architecture

– The Batch Job

– Skipping and Retrying Steps

– Scaling Features

• Spring Batch Evaluation– Solving Use-Cases

– Benefits

– Issues

– Integration Options

– Future Steps

Page 19: Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring Batch Technical Explanation –Architecture –The Batch Job.

19

Solving the Use-Cases

• DataRoomBatch (DeepSix Example)– Bucket is input to JdbcCursorItemReader

– Create an Item Processor to check if the row is marked for deletion and delete it if so

– Item Writer could be empty or used to output statistics

– Partitioning easily done by dividing up number of rows per partition

Page 20: Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring Batch Technical Explanation –Architecture –The Batch Job.

20

Solving the Use-Cases

• Public Records Batch Processing– Input file is input to FlatFileItemReader

– Custom Item Processor to search the database for hits

– Custom Item Writer to compile report of search results

– Following step to send report to user

– Easy to implement a Partitioner for the input file

Page 21: Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring Batch Technical Explanation –Architecture –The Batch Job.

21

Benefits of Spring Batch

• Part of Spring Framework– Allows easy integration with other Spring features

– General simplicity offered by Spring

• Step flow customizable

• Basic Item Readers and Writers already available

• Features available for monitoring Jobs and Steps

• Many scaling options available

Page 22: Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring Batch Technical Explanation –Architecture –The Batch Job.

22

Issues with Spring Batch

• No built-in scheduler– Not a big issue, scheduler libraries easily integrated

• Potentially a lot of XML configuration– Business logic across Java and XML files can complicate

debugging and maintenance

– Annotations can help

• Anything but very basic components will need to be created as new classes

Page 23: Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring Batch Technical Explanation –Architecture –The Batch Job.

23

Helpful Integration Options

• Spring Batch Admin– Web-Based administration console

– Contains Spring Batch Integration, allowing use of Spring Integration messages to launch and monitor jobs

• Scheduler (cron, Spring Scheduling, Quartz)

• Clustering Framework (Hadoop, GridGain, Terracotta)– Ideal for improving horizontal scaling

– Spring Data Hadoop is a fairly new Spring feature that helps integrate Spring with Hadoop

Page 24: Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring Batch Technical Explanation –Architecture –The Batch Job.

24

Future Steps

• Get Spring Batch set up with a clustered environment– Evaluate performance

– Figure out dynamic load balancing

• Play around with more features and integration options– Spring Batch Admin, manual job restarting, etc.

• Implement Spring Batch Admin into Cobalt GUI?

• Look more into the information stored in Meta-data database and figure out how to use for monitoring/managing jobs

• Look into Partitioning and how much must be done to implement sending partitions off to remote machines

• Look into job/step timeout

Page 25: Spring Batch Christopher Jeffers August 2012. Agenda Intro to Spring Batch and Use-Cases Spring Batch Technical Explanation –Architecture –The Batch Job.

Questions?