Top Banner
VTL (Validation and Transformation Language) A new standard for data validation and processing Marco Pellegrino Eurostat Acknowledgements: Bank of Italy, SDMX Technical Working Group, DDI Alliance, Bryan Fitzpatrick, Arofan Gregory, and others… Eurosta t
26

VTL (Validation and Transformation Language) A new standard for data validation and processing Marco Pellegrino Eurostat Acknowledgements: Bank of Italy,

Dec 24, 2015

Download

Documents

Ashlee Shepherd
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: VTL (Validation and Transformation Language) A new standard for data validation and processing Marco Pellegrino Eurostat Acknowledgements: Bank of Italy,

VTL (Validation and Transformation Language)

A new standard for data validation and

processing

Marco PellegrinoEurostat

Acknowledgements: Bank of Italy, SDMX Technical Working Group, DDI Alliance, Bryan Fitzpatrick, Arofan Gregory, and others…

Eurostat

Page 2: VTL (Validation and Transformation Language) A new standard for data validation and processing Marco Pellegrino Eurostat Acknowledgements: Bank of Italy,

Background

Data validation, a critical issue for the E.S.S.

Eurostat and Member States: double work or "no work"?

Inefficiencies:• Lack of coordination• Lack of documentation• Lack of formalisation of validation procedures and rules• Low harmonisation of software solutions.

Need of a comprehensive solution: portfolio of actions in the framework of the ESS Vision 2020

2

Page 3: VTL (Validation and Transformation Language) A new standard for data validation and processing Marco Pellegrino Eurostat Acknowledgements: Bank of Italy,

Eurostat

SDMX originally focused on data collection and dissemination

Current line of tendency: Support more stages of the statistical production process

Approach

GSBPM (Generic Statistical Business Process Model)

3

Page 4: VTL (Validation and Transformation Language) A new standard for data validation and processing Marco Pellegrino Eurostat Acknowledgements: Bank of Italy,

Data Validation Process Before/During Transmission

(“First Level”) - Covered by SDMX today

- Format Check (SDMX-ML) - Code Check (SDMX DSD)

After Transmission( “Second Level”) - Not yet covered by SDMX

SDMX-VTL

- Detailed value check - Mirror check - …

Page 5: VTL (Validation and Transformation Language) A new standard for data validation and processing Marco Pellegrino Eurostat Acknowledgements: Bank of Italy,

Eurostat

Main goals:

Define and preserve validation rules (document and preserve the validation know-how)

Exchange and share validation rules (with reporting institutions & other correspondents)

Apply validation rules in the collection and production processes (aiming at an industrialized processing of statistical data)

At a later stage:

Improve the VTL to support more complex algorithms for data compilation and estimation

The VTL initiative

5

Page 6: VTL (Validation and Transformation Language) A new standard for data validation and processing Marco Pellegrino Eurostat Acknowledgements: Bank of Italy,

What is VTL 1.0?• A reference framework for the creation of rules for data

validation and transformation

• It maps to a clear and generic information model

• It aligns with relevant statistical information standards such as SDMX and GSIM

SDMX

VTL: part 1 - part 2

BNF (Extended Backus-Naur Form) Technical notation

6

Page 7: VTL (Validation and Transformation Language) A new standard for data validation and processing Marco Pellegrino Eurostat Acknowledgements: Bank of Italy,
Page 8: VTL (Validation and Transformation Language) A new standard for data validation and processing Marco Pellegrino Eurostat Acknowledgements: Bank of Italy,

Eurostat

Main VTL features

• User orientation

• Integrated approach

• IT implementation independence

• Active role for processing

• Extensibility and customizability

• Language effectiveness

Proper governance is needed

8

Page 9: VTL (Validation and Transformation Language) A new standard for data validation and processing Marco Pellegrino Eurostat Acknowledgements: Bank of Italy,

The VTL Information Model

• VTL is a “stand-alone” specification• It can be used with SDMX, DDI, or potentially anything

else• It can be used on its own

• Because different standards have different information models, VTL must establish its own information model• Other information models can be mapped against it• VTL uses GSIM as a basis

Page 10: VTL (Validation and Transformation Language) A new standard for data validation and processing Marco Pellegrino Eurostat Acknowledgements: Bank of Italy,

VTL Data Model

• Organizes Data Points into Data Sets

• Describes Data Structures using Structure Components• Measures• Attributes• Identifiers

• very similar to GSIM

Page 11: VTL (Validation and Transformation Language) A new standard for data validation and processing Marco Pellegrino Eurostat Acknowledgements: Bank of Italy,
Page 12: VTL (Validation and Transformation Language) A new standard for data validation and processing Marco Pellegrino Eurostat Acknowledgements: Bank of Italy,

Logical Data Set

DataPoints

Identifier Component

Identifier Component

Measure Component

Page 13: VTL (Validation and Transformation Language) A new standard for data validation and processing Marco Pellegrino Eurostat Acknowledgements: Bank of Italy,

Transformation Model

• Takes a set of Transformation Expressions and organizes them into a Transformation Scheme

• Each Expression has an Operand, and Operator, and a Result– Operands can have Parameters– Operators and Results are identified by the Expression

when it is executed– VTL specifies the Operators and the types of Parameters

• VTL uses the SDMX Transformation model

Page 14: VTL (Validation and Transformation Language) A new standard for data validation and processing Marco Pellegrino Eurostat Acknowledgements: Bank of Italy,

Transformations and Process models

Transformation modelIt exists in SDMX, but not in GSIM and DDI

It allows defining calculations through mathematical expressions

It does not allow cycles (same structure than a spreadsheet)

Process modelIt exists in SDMX, GSIM, DDI and other standards (e.g. BPM)

It allows defining calculations through a process

It allow cycles (like a procedural programming language)

Page 15: VTL (Validation and Transformation Language) A new standard for data validation and processing Marco Pellegrino Eurostat Acknowledgements: Bank of Italy,
Page 16: VTL (Validation and Transformation Language) A new standard for data validation and processing Marco Pellegrino Eurostat Acknowledgements: Bank of Italy,

GSIM Process Model

Page 17: VTL (Validation and Transformation Language) A new standard for data validation and processing Marco Pellegrino Eurostat Acknowledgements: Bank of Italy,

Process Method and Rules

Page 18: VTL (Validation and Transformation Language) A new standard for data validation and processing Marco Pellegrino Eurostat Acknowledgements: Bank of Italy,

Governance and Standards Alignment

• VTL will be maintained by the SDMX TWG• Extensions will be considered for inclusion in future

versions

• Has already produced some feedback to GSIM for next version• VTL can be mapped against SDMX• VTL can be directly utilized by DDI in those places where

computations are included• VTL could be used in CSPA services where processing is

performed • As GSIM processing Rules

Page 19: VTL (Validation and Transformation Language) A new standard for data validation and processing Marco Pellegrino Eurostat Acknowledgements: Bank of Italy,

What's next?

• More operators and features + bug-fixing + fine-tuning = VTL 1.1

• Reuse of rules, structural validation?

• SDMX specifications (e.g. for exchanging VTL rules in SDMX messages, for storing rules and for requesting validation rules from web services) in progress

• Implementation tests with some pilot domains

• Integration within the ESS Validation Architecture (Validation project with national statistical institutes).

19

Page 20: VTL (Validation and Transformation Language) A new standard for data validation and processing Marco Pellegrino Eurostat Acknowledgements: Bank of Italy,

Eurostat

Conclusions

• A formal unambiguous and standard language was needed for encoding validation rules so that these can be translated into specific data editing systems

• Use of generic software services provided within the ESS community is foreseen

• Great achievement, led by a task-force with experts from statistical institutes, central banks, international organisations and (a few) private experts

20

Thanks for your [email protected]

Page 21: VTL (Validation and Transformation Language) A new standard for data validation and processing Marco Pellegrino Eurostat Acknowledgements: Bank of Italy,

Eurostat

Examples

21

Page 22: VTL (Validation and Transformation Language) A new standard for data validation and processing Marco Pellegrino Eurostat Acknowledgements: Bank of Italy,

22

Is the total = 100?

check (ds1[keep (Country,Year, Percentage)][aggregate sum(Percentage)]=100, imbalance(Percentage), all)

VTL Grammar: A Simple Example

Page 23: VTL (Validation and Transformation Language) A new standard for data validation and processing Marco Pellegrino Eurostat Acknowledgements: Bank of Italy,

23

ds1[keep (Country,Year, Percentage)][aggregate sum(Percentage)]

check (ds1[keep (Country,Year, Percentage)][aggregate sum(Percentage)]=100, imbalance(Percentage), all)

Steps

Page 24: VTL (Validation and Transformation Language) A new standard for data validation and processing Marco Pellegrino Eurostat Acknowledgements: Bank of Italy,

VTL Grammar: Another Example

• We have two Data Sets (D1 and D2) with the same structure:

Page 25: VTL (Validation and Transformation Language) A new standard for data validation and processing Marco Pellegrino Eurostat Acknowledgements: Bank of Italy,

VTL Grammar: A Simple Example (cont.)

• We want to create a table (Dresult) which provides totals, combining the values for the US and the European Union:

Dresult := D1 + D2

Page 26: VTL (Validation and Transformation Language) A new standard for data validation and processing Marco Pellegrino Eurostat Acknowledgements: Bank of Italy,

Results

Dresult is a Data Set containing the United States plus the European Union: