Top Banner
Alessandra Capobianchi and Luisa Franconi Istat -Division for Information Technology and Methodology - Italy Ntts 2009 Brussel 18-20 Febbruary 2009 Cell suppression in linked tables from structural business statistics using Tau Argus 3.3.0: a conceptual framework
23

Alessandra Capobianchi and Luisa Franconi Istat - Division for Information Technology and Methodology - Italy Ntts 2009 Brussel 18-20 Febbruary 2009 Cell.

Mar 27, 2015

Download

Documents

Maria Regan
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Alessandra Capobianchi and Luisa Franconi Istat - Division for Information Technology and Methodology - Italy Ntts 2009 Brussel 18-20 Febbruary 2009 Cell.

Alessandra Capobianchi and Luisa Franconi Istat -Division for Information Technology and Methodology - Italy

Ntts 2009Brussel 18-20 Febbruary 2009

Cell suppression in linked tables from structural business statistics using Tau Argus 3.3.0: a conceptual framework

Page 2: Alessandra Capobianchi and Luisa Franconi Istat - Division for Information Technology and Methodology - Italy Ntts 2009 Brussel 18-20 Febbruary 2009 Cell.

What are linked tables?

NTTS 2009

Brussel 18-20 Febbruary

2009

Tables presenting data on the same response variable sharing some categories of at least one explanatory variable are said “Linked tables”.

Such esplanatory variable is called “linked variable”.

Page 3: Alessandra Capobianchi and Luisa Franconi Istat - Division for Information Technology and Methodology - Italy Ntts 2009 Brussel 18-20 Febbruary 2009 Cell.

Motivation

NTTS 2009

Brussel 18-20 Febbruary

2009

- EUROSTATSince now Eurostat was in charge of protecting tables requested by SBS Regulations and performed a global confidentiality treatment. From 2008 Eurostat will not treat the confidentiality of the Sbs tables and each NIS has to take care on his own of such protection process

- SBS tablesCommunity Structural Business Statistics (SBS) are a set of hierarchical linked tables with spanning variable that present different levels of the hierarchy in different tables

- SoftwareTau-argus version 3.3.0 available at the website of the Essnet project http://neon.vb.cbs.nl/casc/..%5Ccasc%5Ctau.htm .Currently it doesn’t deal with hierarchical linked tables that present different levels of the hierarchy.

-NEEDTo develop a scheme to cope with such problem

Page 4: Alessandra Capobianchi and Luisa Franconi Istat - Division for Information Technology and Methodology - Italy Ntts 2009 Brussel 18-20 Febbruary 2009 Cell.

• Community Structural Business Statistics (SBS) are collected within the framework of Council Regulation (EC, EURATOM) No. 58/97 of December 1996.

• Definitions and table breakdowns are specified in a series of Commission and Council Regulations.

• We focus our attention on the first four annexes covering the 'business economy‘ (Annex1), industry (Annex2), distributive trades (Annex3) and construction (Annex4).

Aim- Achive the protection of the Structural Business Statistics deriving from such annexes

Community Structural Business Statistics NTTS 2009

Brussel 18-20 Febbruary

2009

Page 5: Alessandra Capobianchi and Luisa Franconi Istat - Division for Information Technology and Methodology - Italy Ntts 2009 Brussel 18-20 Febbruary 2009 Cell.

Conceptual scheme

NTTS 2009

Brussel 18-20 Febbruary

2009

The protection process of the SBS linked tables can be divided into three

steps:

1. Translate the legal framework into a set of tables create a set of tables for Argus

2. Analyse the links between tables establish an order in the protection of the tables 3. Apply Tau-Argus to each table according to the order previously

established maintain coherences in the suppression pattern

Page 6: Alessandra Capobianchi and Luisa Franconi Istat - Division for Information Technology and Methodology - Italy Ntts 2009 Brussel 18-20 Febbruary 2009 Cell.

The tables we focus on are those related to:

annual enterprise statistics (at 4 digit Nace code)

annual enterprise statistics (at 3 digit Nace code) by size classes

annual enterprise statistics (at 2 digit Nace code) by region (NUTS2)

The main statistical unit is the enterprise even if some statistics are produced also for KAU and for local unit

Enterprise=the smallest combination of legal units that is an organisational unit producing goods or services

Kau= kind-of-activity unit that groups all the parts of an enterprise contributing to the performance of an activity at the class level (four digits) of NACE

Local Unit=an enterprise or part thereof (e.g. a workshop, factory, warehouse, office, mine or depot) situated in a geographically identified place

1.Traslation

NTTS 2009

Brussel 18-20 Febbruary

2009

Page 7: Alessandra Capobianchi and Luisa Franconi Istat - Division for Information Technology and Methodology - Italy Ntts 2009 Brussel 18-20 Febbruary 2009 Cell.

However, such general scheme comprising three types of tables presents some relevant differences:

• the first table is replicated with KAU as response unit

• in the second table the variable size class presents two different classifications for different sectors (C-F and G-K);

• for sector G:• the regional table is released at NACE 3 level instead of NACE 2; • only for this sector there is the additional table relating to NACE 3 by turnover in classes.

1.Translation : some peculiarities

NTTS 2009

Brussel 18-20 Febbruary

2009

Page 8: Alessandra Capobianchi and Luisa Franconi Istat - Division for Information Technology and Methodology - Italy Ntts 2009 Brussel 18-20 Febbruary 2009 Cell.

The tables considered in the general scheme need to be split into several tables that are homogeneous in the level of the classifying variables and response unit.

1.Translation: the set of tables

NTTS 2009

Brussel 18-20 Febbruary

2009

Definition of spanning variables for each table to be processed by Argus in order to fulfil SBS regulations

Tables processed by Argus

Classifying variable

Response unit

Annex

Tab1.1 NACE 4 Enterprise Annex 1A, Annex 2A, Annex 4A and Annex 3B

Tab1.2 NACE 4 (KAU) Annex 2E and Annex 4E

Tab2.1 NACE 3 by size class1

Enterprise Annex 2D and Annex 4D

Tab2.2 NACE 3 by size class2

Enterprise Annex 3C and Annex 1B

Tab2.3 NACE 3 by NUTS-2 local unit Annex 3E

Tab2.4 NACE 3 by Turnover in classes

Enterprise Annex 3C

Tab3.1 NACE 2 by NUTS-2 local unit Annex 1C, Annex 2F and Annex 4F

Page 9: Alessandra Capobianchi and Luisa Franconi Istat - Division for Information Technology and Methodology - Italy Ntts 2009 Brussel 18-20 Febbruary 2009 Cell.

The analysis of the levels of the hierarchy of the “linked variable” implies the definition of a scheme of relationships that provides the order of the processing of the tables from the most detailed level of the hierarchy to the most aggregated.

That’s because more detailed cells of the table will contribute to the construction of marginal cells in other tables that present a lower level of the hierarchy of the linked variable.

Common cells need to present a coherent suppression pattern.

2. Analysis of the Set of Linked Tables

NTTS 2009

Brussel 18-20 Febbruary

2009

Page 10: Alessandra Capobianchi and Luisa Franconi Istat - Division for Information Technology and Methodology - Italy Ntts 2009 Brussel 18-20 Febbruary 2009 Cell.

The most detailed tables is Tab1.1. That table is relative to all enterprises classified according to classes of NACE classification in 4 digit codes.

That table will be called the “starting table”.

2. Analysis of the Set of Linked Tables

NTTS 2009

Brussel 18-20 Febbruary

2009

Page 11: Alessandra Capobianchi and Luisa Franconi Istat - Division for Information Technology and Methodology - Italy Ntts 2009 Brussel 18-20 Febbruary 2009 Cell.

The next table to be processed should present the hierarchical level of the “linked variable” immediately higher than the starting table.

In SBS Community statistics there are two tables: Tab2.1 and Tab2.2:

- present 3 digit NACE code as classifying variable.

- are related to different sectors of the economy and no link exist between the two tables

- present two different classifications of the variable size class relative to different sectors of NACE code

2. Analysis of the Set of Linked Tables

NTTS 2009

Brussel 18-20 Febbruary

2009

Page 12: Alessandra Capobianchi and Luisa Franconi Istat - Division for Information Technology and Methodology - Italy Ntts 2009 Brussel 18-20 Febbruary 2009 Cell.

2. Analysis of the Set of Linked Tables

NTTS 2009

Brussel 18-20 Febbruary

2009

Page 13: Alessandra Capobianchi and Luisa Franconi Istat - Division for Information Technology and Methodology - Italy Ntts 2009 Brussel 18-20 Febbruary 2009 Cell.

The next level of the hierarchy of the “linked variable” is 2 digit NACE code

In SBS Community statistics there is theTab3.1 that present this hierarchical level and is related to sectors C to K excluding G of the NACE classification.

-Tab3.1 presents marginal derived from tab2.1 and tab2.2 for sectors C to K excluding G

-The response unit are different but enterprises either coincide with local units or comprise of more than one local unit

2. Analysis of the Set of Linked Tables

NTTS 2009

Brussel 18-20 Febbruary

2009

Page 14: Alessandra Capobianchi and Luisa Franconi Istat - Division for Information Technology and Methodology - Italy Ntts 2009 Brussel 18-20 Febbruary 2009 Cell.

2. Analysis of the Set of Linked Tables

NTTS 2009

Brussel 18-20 Febbruary

2009

Page 15: Alessandra Capobianchi and Luisa Franconi Istat - Division for Information Technology and Methodology - Italy Ntts 2009 Brussel 18-20 Febbruary 2009 Cell.

Some relevant differences are presented for the sector G.

-The regional table is released at NACE 3 level instead of NACE 2; Tab2.3 and is linked only to tab2.2.

-There is an additional table released at NACE 3 level by turnover in classes; Tab2.4 linked to tab3.2

Also for sectors C-F the table at NACE 4 level is presented not only for enterprises but also for KAU; Tab1.2.

The tables tab1.1 and tab1.2 coincide almost perfectly so it has been decided to apply Tab1.1 pattern of suppression to Tab1.2.

2. Analysis of the Set of Linked Tables

NTTS 2009

Brussel 18-20 Febbruary

2009

Page 16: Alessandra Capobianchi and Luisa Franconi Istat - Division for Information Technology and Methodology - Italy Ntts 2009 Brussel 18-20 Febbruary 2009 Cell.

2. Analysis of the Set of Linked

Tables

NTTS 2009

Brussel 18-20 Febbruary

2009

Page 17: Alessandra Capobianchi and Luisa Franconi Istat - Division for Information Technology and Methodology - Italy Ntts 2009 Brussel 18-20 Febbruary 2009 Cell.

The order generated by the analysis of the links between the tables as described in the previous scheme aims to identify common cells in subsequent tables.

Common cells need to present a coherent suppression pattern.

Tau-Argus software allows to fix a setting of a priory information for cells selected by the user. Such flexibility of the software can be used to impose coherent suppression patterns to a set of tables

3. Protection phase: a priory information

NTTS 2009

Brussel 18-20 Febbruary

2009

Page 18: Alessandra Capobianchi and Luisa Franconi Istat - Division for Information Technology and Methodology - Italy Ntts 2009 Brussel 18-20 Febbruary 2009 Cell.

This “A priori” information is organised in an history file.

In the history file the user, before the protection phase, can assign to predetermined cells of the table, one of the following protection “status” .

3. Protection phase: a priory information

NTTS 2009

Brussel 18-20 Febbruary

2009

Alphanumeric code

Meaning Action to be taken by Argus

U Unsafe The cell has to be protected.

S Safe The cell is not at risk; it can be used as secondary suppression.

P Protected The cell cannot to be used as secondary suppression.

Page 19: Alessandra Capobianchi and Luisa Franconi Istat - Division for Information Technology and Methodology - Italy Ntts 2009 Brussel 18-20 Febbruary 2009 Cell.

The protection process using secondary cell suppression starts from:

Tab1.1; the most detailed table.

This table is protected by Argus according to the rules established by the Member State.

In order to communicate to Argus the information related to the protection of this starting table we ask the software to save the “status” information relative to each single cell of the protected tab1.1.

Five different output status are allowed by Tau-Argus;

3.Protection of a Set of Linked Tables

NTTS 2009

Brussel 18-20 Febbruary

2009

Page 20: Alessandra Capobianchi and Luisa Franconi Istat - Division for Information Technology and Methodology - Italy Ntts 2009 Brussel 18-20 Febbruary 2009 Cell.

The second step is to protect the second table of the scheme tab2.1

All the suppression applied to the previous table tab1.1 to the common marginal cells, have to be replicated in the current table tab2.1

This will be done by creating an history file for the current table (tab2.1) containing a priori information that will impose to Argus the constraints stemming from the protection of the previous table (tab1.1).

Different types of constraints may arise for each of the common cell.

3.Protection of a Set of Linked Tables

NTTS 2009

Brussel 18-20 Febbruary

2009

Page 21: Alessandra Capobianchi and Luisa Franconi Istat - Division for Information Technology and Methodology - Italy Ntts 2009 Brussel 18-20 Febbruary 2009 Cell.

3.Protection of a Set of Linked Tables

NTTS 2009

Brussel 18-20 Febbruary

2009

Output status of the common cell in the previous table

Meaning A priori information of the common cell of the current table

Action taken by Argus

1 Safe P This cell will not be selected as secondary suppression

5 Unsafe ─ Argus will recognise a primary suppression

11 Secondary suppression

U Set to manually unsafe in the current table i.e. to be protected

14 Missing value ─

9 Manually unsafe

U Set to manually unsafe in the current table i.e. cell to be protected

10 Manually safe P This cell will not be selected as secondary suppression

Output status in the previous table, meaning and corresponding status to be applied in the a priori information for the current table.

Page 22: Alessandra Capobianchi and Luisa Franconi Istat - Division for Information Technology and Methodology - Italy Ntts 2009 Brussel 18-20 Febbruary 2009 Cell.

3.Process is applied following the relationship scheme

NTTS 2009

Brussel 18-20 Febbruary

2009

Tab1.1•Apply Tau Argus•Create .saf file•Convert information contained in the .saf file into a priori information for tab2.1 and tab2.2•Select common cells (tab1.1 with tab2.1; tab1.1 and tab2.2) and create the “History files” (H2.1 , H2.2)

Tab2.1

• Apply History file H2.1•Apply Tau Argus•Create .saf file•Convert information contained in the .saf file into a priori information for tab3.1 •Select common cells (tab2.1 with tab3.1) and create the “History file” (H3.1_1)

Tab2.2

•Apply History file H2.2•Apply Tau Argus•Create .saf file•Convert information contained in the .saf file into a priori information for tab3.1 •Select common cells (tab2.2 with tab3.1) and create the “History file” (H3.1_2)

Tab3.1

• Apply History files H3.1_1 and H3.1_2•Apply Tau Argus•Create .saf file

Page 23: Alessandra Capobianchi and Luisa Franconi Istat - Division for Information Technology and Methodology - Italy Ntts 2009 Brussel 18-20 Febbruary 2009 Cell.

Conclusions and Further Work

Conclusion

•We describe a process to protect linked hierarchical tables from SBS using Tau-Argus

•We have successfully aaplied the process to the Italian sample of SBS

Further work

•With the entry into force of the new SBS regulation pertaining changes due to the adoption of the new classification of economic activities, NACE rev.2, more work need to be done.

•Study protection pattern harmonised between subsequent years need to be carefully tuned so that coherence is maintained not only within a year but also for successive years.