Top Banner
Metadata driven application for data processing – from local toward global solution Rudi Seljak Statistical Office of the Republic of Slovenia
15

Metadata driven application for data processing – from local toward global solution Rudi Seljak Statistical Office of the Republic of Slovenia.

Jan 03, 2016

Download

Documents

Godfrey Bell
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Metadata driven application for data processing – from local toward global solution Rudi Seljak Statistical Office of the Republic of Slovenia.

Metadata driven application for data processing – from local

toward global solution

Rudi Seljak Statistical Office of the Republic of Slovenia

Page 2: Metadata driven application for data processing – from local toward global solution Rudi Seljak Statistical Office of the Republic of Slovenia.

Summary of presentation

• Introduction • Current generic application – main

characteristics• Development of global solution • Changes in the statistical process• Conclusions

Page 3: Metadata driven application for data processing – from local toward global solution Rudi Seljak Statistical Office of the Republic of Slovenia.

Introduction

• Statistical data processing:– Demanding, time consuming and very expensive task– Constant pressure for budget cuts

• Rationalisation of the statistical process:– Take advantage of the rapid IT development– Movement from domain oriented to process oriented production– Stove-pipe IT solutions replaced by general applications

• Statistical Office of the Republic of Slovenia (SURS)– SURS began systematic development of generic solutions 6 years ago– Prototype solutions for several parts of the process were developed – These solutions were already used for several large surveys (e.g. 2010

Agriculture Census and the 2011 Population Census)– The prototype generic solutions are now upgraded to a more global

solutions

Page 4: Metadata driven application for data processing – from local toward global solution Rudi Seljak Statistical Office of the Republic of Slovenia.

Generalised solutions – main characteristics

• Small, generic solutions for small parts of the statistical process, called the building blocks: – Enable easy and flexible linking of inputs and outputs of the individual

components to the whole statistical process

– Can be plugged to different databases in different environments (e.g. ORACLE, SAS) if the input database follows few basic conditions

– They are designed as fully metadata driven (MDD) systems: one program code → the parameters for the execution of the processing for the concrete survey are provided through the special metadata tables

– The process metadata can be provided in different environments (SAS, MS Access, ORACLE) → the metadata organisation must follow the strict rules of its structure (tables and variables)

Page 5: Metadata driven application for data processing – from local toward global solution Rudi Seljak Statistical Office of the Republic of Slovenia.

Building blocks - functioning

Different microdata databases

General SAS program

 

  Ad-hoc program

Ad-hoc program

Building block

 

Different databases of process metadata

Page 6: Metadata driven application for data processing – from local toward global solution Rudi Seljak Statistical Office of the Republic of Slovenia.

Linking bulding blocks into the process

Building block 1

MicrodataBuilding block 2

Ad-hoc program

Building block n

Transformed data

Ad-hoc program

Transformed data

Ad-hoc program

Transformed data

Page 7: Metadata driven application for data processing – from local toward global solution Rudi Seljak Statistical Office of the Republic of Slovenia.

Process metadata

• The system is to a very large extent based on the process metadata:– Processing rules which enable adjustment of the general

program for different surveys.

• The process metadata are at the moment inserted directly into MS Access database– High probability of syntax errors – Users must be thoroughly instructed in order to correctly fill the

metadata

Table Variable Condition Corr_rule Step

TABLE1 X X/Y >1000 Round(X/100) 1

TABLE1 Z Z NE X X 2

Page 8: Metadata driven application for data processing – from local toward global solution Rudi Seljak Statistical Office of the Republic of Slovenia.

Building blocks

• The basic tool of the whole system are the building blocks, which cover the particular processing phase.

• SAS macros which is able to operate on the basis of the process metadata.

• So far the building blocks for following phases are created:– Data validation (logical controls) – Deterministic corrections – Data imputations– Standard error estimation – Aggregation – Tabulation– Calculation of quality indicators– Disclosure control (testing phase)

Page 9: Metadata driven application for data processing – from local toward global solution Rudi Seljak Statistical Office of the Republic of Slovenia.

Building a global solution

• The developed system is very open and flexible tool. • However certain re-integration would be needed to

increase its functionality: – To move the process metadata in ORACLE environment

– To create single, unique database of process metadata where process metadata for all the surveys are stored and maintained

– To develop the graphical interfaces for user friendly management of process metadata

– To link the system with the metadata repository

Page 10: Metadata driven application for data processing – from local toward global solution Rudi Seljak Statistical Office of the Republic of Slovenia.

The new system

Different microdata databases

General SAS program

 

  Ad-hoc program

Database of processing metadata

Metadata repository

Ad-hoc program

Application for metadata management

 

Data on tables and variables

Page 11: Metadata driven application for data processing – from local toward global solution Rudi Seljak Statistical Office of the Republic of Slovenia.

Application for metadata managementDeterministic corrections

Page 12: Metadata driven application for data processing – from local toward global solution Rudi Seljak Statistical Office of the Republic of Slovenia.

Application for metadata managementExecution of the particular process step

Page 13: Metadata driven application for data processing – from local toward global solution Rudi Seljak Statistical Office of the Republic of Slovenia.

New application and statistical process

• Generic MDD application introduces changes in the implementation of data processing on general level: – Essentially different distribution of work between IT specialists, general

methodologists and IT experts

– Change in the role of subject-matter statisticians → changed expectations of their skills and capabilities

– The work organisation of the IT Department and the General Methodology Department will have to be changed from domain oriented to process oriented.

– Different approach of IT and methodology experts will be needed. • Experts capable of thinking and operating at a much more general level • Survey is just one of the realisations of the general statistical process.

Page 14: Metadata driven application for data processing – from local toward global solution Rudi Seljak Statistical Office of the Republic of Slovenia.

Conclusions

• SURS developments in recent years: flexible, metadata driven generic solutions for different phases of data processing.

• Very open system will be replaced with more integrated and centralised system

• Main goal: Transition from the stove-pipe oriented production to the more integrated processing systems

• Two main challenges:– To build the generic IT solutions, which would „cover“ the wide

diversity of statistical surveys – To change the very „domain oriented state of mind “ among the

employees

Page 15: Metadata driven application for data processing – from local toward global solution Rudi Seljak Statistical Office of the Republic of Slovenia.

Thank you for your attention