Top Banner
Please Note IBMs statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBMs sole discretion. Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion. Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the users job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.
30

1140 Modeling and parsing business data (IBM IMPACT 2014)

Jun 08, 2015

Download

Technology

Matt Lucas

Presentation from IBM Impact 2014. Data Format Description Language is a modeling language for describing general text and binary data. It solves a long-standing problem - how to describe data formats of all kinds in a standardized way. This session introduces the language and demonstrates how it is used within various IBM products for parsing and writing various common data formats.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1140 Modeling and parsing business data (IBM IMPACT 2014)

Please Note IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM’s sole discretion.

Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision.

The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.

Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.

Page 2: 1140 Modeling and parsing business data (IBM IMPACT 2014)

Agenda

!   Introduction

!   OGF DFDL – a standard for modeling text and binary data

!   The IBM DFDL Component

!   DFDL support in IBM Integration Bus

!   Questions

Page 3: 1140 Modeling and parsing business data (IBM IMPACT 2014)

Data Modeling – Why DFDL? !   Much of the data in the world resides in files, is not XML, is a mixture of textual and

binary with custom syntax and encodings, and does not have a shareable machine-readable description

!   But there has been no universal standard for modeling this data! •  XML -> use XML Schema •  RDBMS -> use database schema •  Text/binary -> ??

!   Existing standards are too prescriptive: “Put your data in this format!” !   Organizations including IBM evolved their own way of modeling text and binary data

based on customer need. !   IBM® examples…

•  WebSphere® Message Broker: MRM message set •  WebSphere ESB: Data Handlers •  WebSphere Transformation Extender: Type Trees •  WebSphere DataPower: FFD •  WebSphere Cast Iron: Flat File Schema •  Sterling B2B Integrator: DDF and IDF files

ü  DFDL: a universal, shareable, non-prescriptive description for general text & binary data formats

Page 4: 1140 Modeling and parsing business data (IBM IMPACT 2014)

Agenda

!   Introduction

!   OGF DFDL – a standard for modeling text and binary data

!   The IBM DFDL Component

!   DFDL support in IBM Integration Bus

!   Questions

Page 5: 1140 Modeling and parsing business data (IBM IMPACT 2014)

Data Format Description Language (DFDL) § A new open standard

– From the Open Grid Forum (OGF) –  http://www.ogf.org/

– Version 1.0 –  ‘Proposed Recommendation’

status § A way of describing data…

–  It is NOT a data format itself! § A powerful modeling language …

– Text, binary and bit – Commercial record-oriented – Scientific and numeric – Modern and legacy –  Industry standards

§ While allowing high performance … – You choose the right data format

for the job

§ Leverage XML Schema technology – Uses W3C XML Schema 1.0 subset

& type system to describe the logical structure of the data

– Uses XSDL annotations to describe the physical representation of the data

– The result is a DFDL schema § Keep simple cases simple § Annotations are human readable § Both read and write

– A DFDL Processor can parse and serialize data using a DFDL schema

§  Intelligent parsing – Automatically resolve choices and

optionality § Validation of data when parsing and

serializing

Page 6: 1140 Modeling and parsing business data (IBM IMPACT 2014)

Example – Delimited Text Data

intval=5;fltval=-7.1E8

ASCII text integer

ASCII text floating point

Separator

Separators, initiators (aka tags), & terminators are all examples in DFDL of delimiters

Initiator Initiator

Page 7: 1140 Modeling and parsing business data (IBM IMPACT 2014)

Example – DFDL Schema

<xs:complexType name=“myNumbers"> <xs:sequence> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/v1.0"> <dfdl:sequence separator=“;” encoding=“ascii” …/> </xs:appinfo>

</xs:annotation> <xs:element name=“myInt" type=“xs:int”> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/v1.0"> <dfdl:element representation="text"

textNumberPattern="###0" encoding="ascii" lengthKind="delimited" initiator="intval=" …/> </xs:appinfo>

</xs:annotation> </xs:element>

<xs:element name=“myFloat" type=“xs:float”> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/v1.0"> <dfdl:element representation="text" textNumberPattern="##0.0#E0" encoding="ascii"

lengthKind="delimited" initiator="fltval=" …/> </xs:appinfo>

</xs:annotation> </xs:element> </xs:sequence> </xs:complexType>

DFDL properties

DFDL annotation

intval=5;fltval=-7.1E8

Page 8: 1140 Modeling and parsing business data (IBM IMPACT 2014)

Example – DFDL Schema (Short Form)

<xs:complexType name=“myNumbers"> <xs:sequence dfdl:separator=“;” dfdl:encoding=“ascii” …> <xs:element name=“myInt" type=“xs:int” dfdl:representation="text"

dfdl:textNumberPattern="###0" dfdl:encoding="ascii" dfdl:lengthKind="delimited" dfdl:initiator="intval=" … /> <xs:element name=“myFloat" type=“xs:float” dfdl:representation="text"

dfdl:textNumberPattern="##0.0#E0" dfdl:encoding="ascii" dfdl:lengthKind="delimited" dfdl:initiator="fltval=" … /> </xs:sequence> </xs:complexType>

DFDL properties

intval=5;fltval=-7.1E8

Page 9: 1140 Modeling and parsing business data (IBM IMPACT 2014)

§  A DFDL processor uses a DFDL schema to understand a data stream

§  It consists of a DFDL parser and (optionally) a DFDL serializer

§  The DFDL parser reads a data stream and creates a DFDL ‘infoset’

§  The DFDL serializer takes a DFDL ‘infoset’ and writes a data stream

<Document> <Element name=“myNumbers”/> <Element name=“myInt” dataType=“xs:int” dataValue=“5”/> <Element name=“myFloat” dataType=“xs:float” dataValue=“-7.1E08”/> </Element> </Document>

intval=5;fltval=-7.1E8

<xs:complexType name=“myNumbers"> <xs:sequence dfdl:separator=“;” dfdl:encoding=“ascii” ... > <xs:element name=“myInt" type=“xs:int” dfdl:representation="text"

dfdl:encoding="ascii“ dfdl:textNumberPattern=“###0” dfdl:lengthKind="delimited" dfdl:initiator="intval=“ ... /> <xs:element name=“myFloat" type=“xs:float” dfdl:representation="text"

dfdl:encoding="ascii“ dfdl:textNumberPattern=“##0.0#E0” dfdl:lengthKind="delimited" dfdl:initiator="fltval=“ ... /> </xs:sequence> </xs:complexType>

DFDL Processor

DFDL Processor

Page 10: 1140 Modeling and parsing business data (IBM IMPACT 2014)

DFDL 1.0 Features §  Text data types such as strings, numbers, zoned decimals, calendars, booleans §  Binary data types such as integers, floats, BCD, packed decimals, calendars, booleans §  Fixed length data and data delimited by text or binary markup §  Bi-directional text §  Bit data of arbitrary length §  Pattern languages for text numbers and calendars §  Ordered, unordered and floating content §  Default values on parsing and serializing §  Nil values for handling out-of-band data §  Fixed and variable arrays §  XPath 2.0 expression language including variables to model dynamic data §  Speculative parsing to resolve choices and optional content §  Validation to XML Schema 1.0 rules §  Scoping mechanism to allow common property values to be applied at multiple points §  Hide elements in the data §  Calculate element values

Page 11: 1140 Modeling and parsing business data (IBM IMPACT 2014)

When should I use DFDL? §  DFDL’s sweet spot is when you need to model and parse a text or binary data format and where either:

§  You have a specification of the data format ‘on the wire’ §  You have actual wire examples of the data format

§  DFDL is recommended to model: §  Binary data from COBOL, C, PL/1, ASM programs §  Text data with delimiters such as CSV §  Text industry standards such as SWIFT, HL7, EDIFACT, X12, … §  Binary industry standards such as ISO8583, TLog, ...

§  DFDL is not recommended to model: §  XML

§  Already have XML parsers and XML Schema / DTDs §  JSON

§  Already have JSON parsers, and JSON schema under design §  GPB, HDF5, …

§  With serialization formats like GPB, the wire format is never exposed to the consumer and access to the data is using APIs

§  DFDL expressions are not recommended for implementing complex validation rules.

Page 12: 1140 Modeling and parsing business data (IBM IMPACT 2014)

DFDL Adoption !  IBM DFDL reusable component ships with:

•  WebSphere Message Broker v8.0 •  Integration Bus v9.0 •  Rational® Performance Test Server v8.0.1 •  Rational Test Virtualization Server v8.0.1 •  Rational Test Workbench v8.0.1 •  Rational Developer for System z v8.5 •  InfoSphere® Master Data Management v11

!  Further IBM products and appliances looking to adopt in 2014 and beyond

!  Open-source DFDL implementation in progress ‘Daffodil’ •  Available as an alpha release (parser only) •  More features added every release

!  DFDL web community on GitHub for collaborative authoring of DFDL schemas for commercial and scientific data formats

Page 13: 1140 Modeling and parsing business data (IBM IMPACT 2014)

DFDL Schemas Web Community

!  Free public repository for DFDL models

!  Hosted on the popular GitHub community website

!  Unlimited read-only access !  Collaboration encouraged !  Evolving content

Page 14: 1140 Modeling and parsing business data (IBM IMPACT 2014)

Getting Started with DFDL

Page 15: 1140 Modeling and parsing business data (IBM IMPACT 2014)

DFDL Videos on the Web

Page 16: 1140 Modeling and parsing business data (IBM IMPACT 2014)

Useful DFDL Links

!  OGF DFDL home page: http://www.ogf.org/dfdl/ !  DFDL 1.0 specification: http://www.ogf.org/documents/GFD.174.pdf !  DFDL tutorials: http://redmine.ogf.org/dmsf/dfdl-wg?folder_id=5485 !  DFDL-WG Redmine project: http://redmine.ogf.org/projects/dfdl-wg !  DFDL developerWorks:

http://www.ibm.com/developerworks/library/se-dfdl/index.html !  DFDL Wikipedia page: http://en.wikipedia.org/wiki/DFDL !  DFDL Schemas on GitHub: https://github.com/DFDLSchemas !  Daffodil open source project:

https://opensource.ncsa.illinois.edu/confluence/display/DFDL/Daffodil%3A+Open+Source+DFDL

Page 17: 1140 Modeling and parsing business data (IBM IMPACT 2014)

Agenda

!   Introduction

!   OGF DFDL – a standard for modeling text and binary data

!   The IBM DFDL Component

!   DFDL support in IBM Integration Bus

!   Questions

Page 18: 1140 Modeling and parsing business data (IBM IMPACT 2014)

IBM DFDL !  Designed as an embeddable component

•  First shipped in 2011 !  DFDL processor

•  High performance parser and serializer •  Java and C •  Streaming, on-demand, speculative •  Pre-compiles DFDL schema •  Parser emits SAX-like events

!  Tooling for creating DFDL models •  DFDL Schema editor eclipse plugins •  Wizards for CSV, COBOL & C •  Debug model using real data from within tooling

!  IBM DFDL implements majority of the OGF DFDL 1.0 specification •  Some more advanced features of DFDL are not yet available •  Will be added in future DFDL deliverables until 100% achieved

<Document> <Element name=“myNumbers”/> <Element name=“myInt” …/> <Element name=“myFloat” …/> </Element> </Document>

intval=5;fltval=-7.1E8

<xs:schema …> <xs:annotation> <xs:appinfo …> </xs:appinfo> </xs:annotation> ... </xs:schema>

IBM DFDL Processor

Page 19: 1140 Modeling and parsing business data (IBM IMPACT 2014)

What’s New in IBM DFDL !   Latest release of IBM DFDL is v1.1. !   Since IBM DFDL v1.0 several spec features have been added:

•  Extracting data using a prefixed length •  Extracting data using a regular expression •  Extracting binary data with delimiters •  User-defined variables •  Default values when serializing •  More XPath functions in expressions

!   Since IBM DFDL v1.0 several tooling features have been added: •  Keyboard shortcuts in DFDL editor •  MBCS enablement in DFDL debugger •  Copy/paste in DFDL editor •  Generation of sample values

!   Continual increase in performance !   High on the list:

•  Unordered sequences •  Direct dispatch for choices

Page 20: 1140 Modeling and parsing business data (IBM IMPACT 2014)

Agenda

!   Introduction

!   OGF DFDL – a standard for modeling text and binary data

!   The IBM DFDL Component

!   DFDL support in IBM Integration Bus

!   Questions

Page 21: 1140 Modeling and parsing business data (IBM IMPACT 2014)

IBM DFDL in WMB and IIB !   DFDL domain and parser

• Available in usual way – input nodes, ESQL, Java, … • More capable and higher performing than MRM CWF/TDS

!   DFDL models • DFDL schema files reside in libraries, not in message sets

!   DFDL wizards and editor for creating DFDL models !   DFDL model debugger

• Debug parsing & writing of data within toolkit • No message flow or runtime deploy necessary!

!   DFDL schema deployed in BAR file • No dictionary file

!   No automatic migration from MRM

Page 22: 1140 Modeling and parsing business data (IBM IMPACT 2014)

New Message Model Wizard

New launcher for creating Message Models

Selecting one of these creates

a DFDL schema

Page 23: 1140 Modeling and parsing business data (IBM IMPACT 2014)

Using the DFDL domain and parser

On Demand or Complete

parsing

Optional validation

DFDL domain

Specify message

name

Page 24: 1140 Modeling and parsing business data (IBM IMPACT 2014)

DFDL message tree

( ['MQROOT' : 0xd6d218] (0x01000000:Name):Properties = ( ['MQPROPERTYPARSER' : 0x141d34e8] (0x03000000:NameValue):MessageSet = '' (CHARACTER) (0x03000000:NameValue):MessageType = '{}:Company' (CHARACTER) (0x03000000:NameValue):MessageFormat = '' (CHARACTER) (0x03000000:NameValue):Encoding = 273 (INTEGER) (0x03000000:NameValue):CodedCharSetId = 850 (INTEGER) .... ) (0x01000000:Name):DFDL = ( ['dfdl' : 0xd812c8] (0x01000000:Name):Company = ( (0x03000000:NameValue):CompanyName = 'IBM' (CHARACTER) (0x01000000:Name):Employee = ( (0x03000000:NameValue):EmpNo = 12345 (INTEGER)

(0x03000000:NameValue):Dept = 21 (INTEGER) (0x03000000:NameValue):EmpName = 'Steve Hanson' (CHARACTER) ... )

) ) )

DFDL message

name

Message name in tree (like XMLNSC)

DFDL domain

Compact ‘Name/Value’

syntax elements

Data types from DFDL

schema

Page 25: 1140 Modeling and parsing business data (IBM IMPACT 2014)

!   The IBM DFDL Java classes may be used to create stand-alone Java applications for parsing/serializing text & binary data formats

!   IIB license permits IBM DFDL classes to be used by Java applications: •  On a computer where IIB is installed •  On a remote computer where IIB is not installed

!   When used remotely, charged as if IIB was installed (via ITLM) !   IBM DFDL for Java is fully supported when used in this manner !   Javadoc for APIs and a sample program are provided

<Document> <Element name=“myNumbers”/> <Element name=“myInt” dataType=“xs:int” dataValue=“5”/> <Element name=“myFloat” dataType=“xs:float” dataValue=“-7.1E08”/> </Element> </Document>

intval=5;fltval=-7.1E8 IBM DFDL for Java

Stand-alone Java Application

Using the IBM DFDL classes

Page 26: 1140 Modeling and parsing business data (IBM IMPACT 2014)

DFDL Schemas for Industry Formats !   Fully supported, full function DFDL schemas will appear as

part of IBM products such as the industry connectivity packs for IIB !   Unsupported, part function DFDL schemas will appear on

DFDLSchemas GitHub site as and when available • EDIFACT • HL7 v2.5.1, v2.6 and v2.7

–  Full version included with IIB Connectivity Pack for Healthcare •  IBM/Toshiba 4690 SurePos ACE v7r3 TLog

–  Full version included with IIB Retail Pack •  ISO 8583 1987

–  Full version included with WMB v8 and IIB v9 sample & on GitHub • NACHA 2013, PCAP

!   Please make IBM aware of formats that you are interested in.

Page 27: 1140 Modeling and parsing business data (IBM IMPACT 2014)

Agenda

!   Introduction

!   OGF DFDL – a standard for modeling text and binary data

!   The IBM DFDL Component

!   DFDL support in IBM Integration Bus

!   Questions

Page 28: 1140 Modeling and parsing business data (IBM IMPACT 2014)

We Value Your Feedback

!   Don’t forget to submit your Impact session and speaker feedback! Your feedback is very important to us – we use it to continually improve the conference.

!   Use the Conference Mobile App or the online Agenda Builder to quickly submit your survey

•  Navigate to “Surveys” to see a view of surveys for sessions you’ve attended

28

Page 29: 1140 Modeling and parsing business data (IBM IMPACT 2014)

Thank You

Page 30: 1140 Modeling and parsing business data (IBM IMPACT 2014)

Legal Disclaimer

•  © IBM Corporation 2014. All Rights Reserved. •  The information contained in this publication is provided for informational purposes only. While efforts were made to verify the completeness and accuracy of the information contained in

this publication, it is provided AS IS without warranty of any kind, express or implied. In addition, this information is based on IBM’s current product plans and strategy, which are subject to change by IBM without notice. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this publication or any other materials. Nothing contained in this publication is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software.

•  References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. Product release dates and/or capabilities referenced in this presentation may change at any time at IBM’s sole discretion based on market opportunities or other factors, and are not intended to be a commitment to future product or feature availability in any way. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth or other results.

•  If the text contains performance statistics or references to benchmarks, insert the following language; otherwise delete: Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.

•  If the text includes any customer examples, please confirm we have prior written approval from such customer and insert the following language; otherwise delete: All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer.

•  Please review text for proper trademark attribution of IBM products. At first use, each product name must be the full name and include appropriate trademark symbols (e.g., IBM Lotus® Sametime® Unyte™). Subsequent references can drop “IBM” but should include the proper branding (e.g., Lotus Sametime Gateway, or WebSphere Application Server). Please refer to http://www.ibm.com/legal/copytrade.shtml for guidance on which trademarks require the ® or ™ symbol. Do not use abbreviations for IBM product names in your presentation. All product names must be used as adjectives rather than nouns. Please list all of the trademarks that you use in your presentation as follows; delete any not included in your presentation. IBM, the IBM logo, Lotus, Lotus Notes, Notes, Domino, Quickr, Sametime, WebSphere, UC2, PartnerWorld and Lotusphere are trademarks of International Business Machines Corporation in the United States, other countries, or both. Unyte is a trademark of WebDialogs, Inc., in the United States, other countries, or both.

•  If you reference Adobe® in the text, please mark the first use and include the following; otherwise delete: Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries.

•  If you reference Java™ in the text, please mark the first use and include the following; otherwise delete: Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.

•  If you reference Microsoft® and/or Windows® in the text, please mark the first use and include the following, as applicable; otherwise delete: Microsoft and Windows are trademarks of Microsoft Corporation in the United States, other countries, or both.

•  If you reference Intel® and/or any of the following Intel products in the text, please mark the first use and include those that you use as follows; otherwise delete: Intel, Intel Centrino, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.

•  If you reference UNIX® in the text, please mark the first use and include the following; otherwise delete: UNIX is a registered trademark of The Open Group in the United States and other countries.

•  If you reference Linux® in your presentation, please mark the first use and include the following; otherwise delete: Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others.

•  If the text/graphics include screenshots, no actual IBM employee names may be used (even your own), if your screenshots include fictitious company names (e.g., Renovations, Zeta Bank, Acme) please update and insert the following; otherwise delete: All references to [insert fictitious company name] refer to a fictitious company and are used for illustration purposes only.