Editing And Imputation For Editing And Imputation For Manufacturing Statistics Manufacturing Statistics At At Statistics Canada Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March 15 to 17, 2011
Jan 14, 2016
Editing And Imputation For Editing And Imputation For Manufacturing Statistics Manufacturing Statistics
AtAt
Statistics CanadaStatistics Canada
Marie BrodeurDirector General, Industry Statistics Branch
Santiago, ChileMarch 15 to 17, 2011
Outline Of The Presentation
Overview of the Manufacturing Program Centralized Process Surveys Overview of the UES Survey Process Post Collection Processing Inputs & Tools Use of Tax Data The many phases of UES Post Collection
Process Managing the UES Post Collection Process
2
Statistics Canada
Chief Statistician Chief Statistician of Canadaof Canada
CorporateCorporateServicesServices
National AccountsNational AccountsAnd AnalyticalAnd Analytical
StudiesStudiesBusiness and TradeBusiness and Trade
StatisticsStatisticsInformatics andInformatics and
MethodologyMethodology
Census andCensus andOperationsOperations
Social, Health andSocial, Health andLabour StatisticsLabour Statistics
3
Statistics CanadaBusiness andBusiness andTrade StatisticsTrade Statistics
Industry Industry StatisticsStatistics
Economy-wideEconomy-wideStatisticsStatistics
Agriculture,Technology andTransportation
Statistics
ManufacturingManufacturingand Energyand Energy
DistributiveDistributiveTradesTrades
Service IndustriesService Industries
Enterprise StatisticsEnterprise Statistics
Consumer PricesConsumer Prices
International TradeInternational Trade
Producer PricesProducer Prices
Investment andInvestment andCapital StockCapital Stock
Enterprise StatisticsEnterprise Statistics
AgricultureAgriculture
Small BusinessSmall BusinessAnd Special SurveysAnd Special Surveys
Science, InnovationScience, InnovationAnd ElectronicAnd Electronic
InformationInformation
TransportationTransportation
4
Share of manufacturing sales by industry, 2010
0.0% 2.0% 4.0% 6.0% 8.0% 10.0% 12.0% 14.0% 16.0% 18.0%
Leather and allied product manufacturing Textile mills
Textile product mills Clothing manufacturing
Printing and related support activitiesElectrical equipment, appliance and component
Furniture and related product manufacturing Beverage and tobacco product manufacturing
Miscellaneous manufacturing Non-metallic mineral product manufacturing
Computer and electronic product manufacturing Wood product manufacturing
Plastics and rubber products manufacturingPaper manufacturing
Machinery manufacturing Fabricated metal product manufacturing
Primary metal manufacturing Chemical manufacturing
Petroleum and coal product manufacturingFood manufacturing
Transportation equipment manufacturing
Manufacturing Distribution Of Sales
5
Establishments primarily engaged in the physical or chemical transformation of materials and substances into new products
Includes assembly of the component parts of manufactured goods, blending of materials, finishing of manufactured products by dyeing, heat treating, plating and similar operations
Transformation of own materials or those owned by others
Service outputs: custom work, repair and maintenance Product outputs: finished goods, intermediate goods
Who Are Manufacturers?
6
Monthly Survey of Manufacturing (MSM)
Annual Survey of Manufactures and Logging (ASML)
Series of sub-annual commodity surveys
Manufacturing Program At Statistics Canada (STC)
7
Monthly indicator of manufacturing activity Last Redesign in 1999 Designed to be a reliable indicator for both
trends and levels Establishment Survey (n= 10,500) Stratified by Province, NAICS and Size
General Characteristics Of The MSM
8
Sales• Goods of own manufacture
Inventories• Raw materials• Goods-in-process• Finished products
Orders• New orders• Unfilled orders
Goods purchased for resale (revenue and inventory)• These data are collected but not released
Sales is the main concept, exceptionally production for some industries (aerospace and shipbuilding)
MSM Concepts
9
Simple Complex
Total number of establishments on the business register
2,278,730 110,557
Value of sales of all establishments on the Business Register
$2,214.9 billion
$1,859.1 billion
Total number of manufacturing establishments on the business register
84,215 6,648
Value of sales of manufacturing establishments on the Business Register
$340.8 billion
$234.5 billion
Frame And Coverage
10
MSM Sampling Plan
Take-Some
Take-All
Take-None
11
Tax replaced
Survey
Background• The Goods and Services Tax (GST) is the federal
Value Added Tax• GST is collected by the Canada Revenue Agency
(CRA)• The CRA provides tax data to Statistics Canada
Information received includes the Business Number, revenue, tax remitted and input tax credit
MSM Sampling Plan: Use Of Tax
12
Who is replaced?• Single establishment enterprises
Replace 50% of sampled data with GST data
• Chronic refusals
Who are not replaced?• Very large single enterprise establishments• Complex units (i.e. multiple establishments) – as it is
found in the GST database
Use Of Tax Data
13
Measures the contribution of manufacturing industries to economic activity in Canada
In 2010, manufacturing accounted for 15% of GDP and 12% of total employment (SEPH)
Key input to SNA Input-Output tables Survey collects data on
• what commodities are produced (Make matrix)
• where commodities are destined (provincial I/O tables)
• what commodities and primary inputs are used in production (Use matrix)
What Is The Annual Survey Of Manufactures And Logging (ASML)?
14
ASML is conducted under the umbrella of Statistics Canada’s Unified Enterprise Survey Program (UES)
Same as MSM
Establishments primarily engaged in manufacturing and logging activities and classified to NAICS 31, 32 and 33 as well as NAICS 113
Estimates produced for 261 NAICS6 level industries
Estimates produced for the 10 provinces and 3 territories.
Survey Coverage
15
Revenue variables (16), expense variables (43), detailed opening and closing inventories (12), other financial (5)
Sales or outputs variables are valued at producer or FOB factory gate prices required by SNA
Commodities consumed (inputs) and produced (outputs) both goods and services
Collect commodity values and quantities (for selected goods)
Services produced and consumed collected as expense items and classified based on COA
Content: Commodity Variables
16
Types Of Administrative (Tax) Data
From the Canadian Revenue Agency (CRA)
• Agreement between two agencies
• T1 (unincorporated businesses)
• T2 (incorporated businesses)
• T4 (pay slips)
• GST (goods and service tax)
• PD7 (payroll deduction accounts)17
Editing And Imputation For Manufacturing Surveys
Why A Centralized Process?
Best Practices Standardization of Processes
• Cross Survey Comparisons• Enterprise Centric Processing/Coherence
Analysis Efficient use of Resources Transportable Knowledge Across Survey
Programs
19
Challenges Of A Centralized Process Remain Centralized
Distribute processing
Priority Setting
Communication and Coordination
20
Pre-Grooming
Allocation / Estimation
Edit & Imputation
“Clean” Records
Central Data Store
Subject Matter Review & Correction
Tool
Tax Data
USTART
UES Post-Collection Processing
21
Collection
Collection Period: February to early October
Collection Processing System: Blaise• Blaise can be seen as being a Collection Control
Center
• Blaise has many functions: Call Scheduler Transaction history files Audit Trail Files And more
22
Blaise: Variables
Questionnaire number Mail-out date Number of calls Length of the call Number of contact attempts Response code And more
23
Blaise: Bonuses Over The Years
Blaise Transaction History (BTH) Files• Collection data analysis:
Produced a paper on best time to call Produced a paper on maximum # of attempts
Audit Trail Files• Find outliers• Difficult to answer questions
24
Collection Precontact (Dec-Jan)
– Mostly for Business Register (BR) births; verification of contact information (name, address, …)
– By phone (in a few cases, a letter or a fact sheet is sent)
Mail-out of questionnaires (Jan-March)– 2 or 3 mail-out dates
Follow-up in case of non-response for some units (begins about a month atfer mail-out)
– Phone call, remail or fax
Mail-back of questionnaires
Verifications of received questionnaires / Edits– Is the questionnaire complete or are some key variables
missing? (Edit follow-up by phone in some cases) 25
Collection
Coding of questionnaires (about 20 response codes)
• Response, non-response, out-of-scope, …
Imaging / Data capture (CADI - Computer Assisted Data Input)
26
Centralized Collection
Mailout(38K CEs)
Pre-Contact(17K Businesses)
Edit / Verification(BLAISE)
Receipt(75% target)
Delinquent Follow-Up
Capture / Imaging
“Clean” Records
Score Function
27
UES: Data Collection / Score Function Introduced in 2002, the UES score function is the main
tool used at the collection stage to determine which priority to give for the follow-up of about 23,000 Collection Entities (CE) each year.
Reduces collection costs yet retains data quality
Similar to the collection goal of obtaining a high weighted coverage response rate.
PRIORITY 1: Extensive follow-up for the larger revenue CEs in cases of non-response.
PRIORITY 0: Minimum follow-up for the smaller CEs in cases of non-response.
28
DISSEMINATION
COLLECTION
Chart Of Accounts
SalesOperatingrevenue Cost of
sales
Grossprofit
Expenses
EBIT
OutputsInputs
Valueadded
ShipmentsOperatingSurplus
GDP
LINK, BRIDGE, CONCORDANCE
29
Expected Benefits Of A Chart Of Accounts
Standardization in business data collection Higher survey response Increase in quality of data Comparison of data from various sources Increase efficiency in using administrative
data
30
Links To Chart Of Accounts
CHART OF
ACCOUNTEstablishment
Legal entity
Enterprise
31
UES: Use Of Tax Data Validation (comparison)
Verify dubious collected data against the equivalent tax data record
Imputation One of the methods used for non-response
Estimation Below take-none Direct Data Replacement
Update Business Register Allocation of survey data (use tax revenues, salaries
and expenses)
Develop centralized systems• Move away from stand-alone• Single point of access for security
Integrated Questionnaire Metadata System Edit and imputation Allocation and Estimation Data Warehouse
Centralized Processing Systems And Databases
Enterprise Portfolio Managers
Top 350 enterprises in Canada Status
• Platinum, Gold, Silver, Bronze Personal visits Enterprise Profiling Coordination of mail-out and collection Enterprise/ Establishment coherence Holistic Response Management
• Strategic Response Unit• Escalation Process / Statistics Act
34
Review and Correction (Post-Capture)
Done via an application which is a micro-editing tool
Opportunity to perform edits and to manually correct data before the automated edit and imputation process
Opportunity to gain an understanding of the quality of data coming in from the field
35
What Is Generally Done By SMOs During This Process?
Ensure that industry codes are valid and Ensure that industry codes are valid and response code are correctresponse code are correct
Ensure that equivalent survey cells have Ensure that equivalent survey cells have consistent dataconsistent data
Enter data for records that came in after the Enter data for records that came in after the collection cut-off datecollection cut-off date
Review high impact outliers in terms of profit, Review high impact outliers in terms of profit, average salary, etc.average salary, etc.
Check comments made by respondents and Check comments made by respondents and collection staff collection staff
36
Why Is This Process Necessary? Reviewing and correcting records will increase
the number and quality of donors for the automated edit and imputation (E&I) stage. This will improve the quality of data coming out of E&I.
Need to assess the quality of collected data Determine if problems with questionnaire Inability of respondent to provide a given
data point Determine if enough data for E&I
37
What Should Not Be Done During This Process?
Do not plug data for non-response records. They will be imputed during the automated E&I.
38
What Is E & I? Editing
• Verify that parts add-up to total • Ensure that there are no missing values where parts
add up to total• There must be consistency between related
variables Imputation
• Changing values in fields which fail edit rules with a view to ensuring that the resulting data satisfy all edit rules. In practice, reported data will rarely be changed
• Impute for missing data or partially responded data• Impute entire records in the case of total non-
response39
Why Is E&I Necessary?
To produce a complete and consistent data file that accounts for all sampled units
Both units that did not respond to the survey must be imputed and units that did not provide a complete response must be imputed
Correct erroneous responses
40
E&I Terminology
Data Group• Groupings (defined by SM) of records that will be kept together Groupings (defined by SM) of records that will be kept together
for imputation purposesfor imputation purposes• These groupings are based on multi dimensions:These groupings are based on multi dimensions:
industry (NAICS)industry (NAICS) geography (province)geography (province)
Data groups that will be used for a specific survey will depend on:• initial sample design (number of units sampled and the level of initial sample design (number of units sampled and the level of
stratification used)stratification used)• number of records that respond to the survey (a minimum of 5 number of records that respond to the survey (a minimum of 5
or 10 records are required in a data group)or 10 records are required in a data group) May be changed during production if not enough donors
41
E&I Terminology (continued) Edit Group
• Grouping of variables within a record that will be processed together in an imputation method
• Generally edit groups may be defined as follows for most surveys: revenue and expense sections employment section and provincial
distribution of goods/services sold• Allows for a record to be a donor if it has clean
data in one section even when other sections are blank; this increases the donor pool
42
E&I Terminology (continued)
Key variables• Total operating revenue
• Total operating expenses• Salaries• Cost of goods sold
43
The Stages Of The E&I System
Pre-processing
BANFF E & I System Post-Processing
Allocation
44
Preprocessing
Deterministic Edits Conditional edits - If A then B Sum of Parts (SOP) Assign 100% to percentage totals Impute reporting period Donor Outlier Detection
45
BANFF E & I System
Impute for missing key variables as specified by subject matter (i.e. total revenue, total expenses)
Impute for other missing variables:• Apply Historical Trend• Apply Current Year Trend• Use donor (for partial imputation),
Select a donor for massive imputation for total non-response
46
BANFF Algorithms
DIFTREND - Historical trend imputation
CURRATIO - Current ratio imputation
PREVALUE – Value from the previous period for the same unit is imputed
PREAUX – Historical value of a proxy variable for the same unit
CURAUX – Current value of a proxy variable for the same unit
47
Post-Processing
Prorate components to ensure that they sum exactly to totals
Perform a number of consistency checks to ensure that micro-data are valid
Assign customer location (percentage cells)
Massive Imputation (donor selected during processor but applied in the post-processor)
48
Allocation - Definition & Purpose
Definition: Allocation is the distribution of survey and administrative
data from their acquisition level (Collection Entity) to the targeted statistical units (Establishments or Locations) as defined on the survey frame.
Purpose: To provide fully-processed micro data on a fiscal year
basis, for establishments or locations in-sample for the UES
Determine the distribution of value added by province
49
Establishment 1
Establishment 4
Establishment 3
Establishment 2
SAMPLE
Questionnaire 2
Collection/Processing
Allocation
Establishment 1
Establishment 4
Establishment 3
Establishment 2
Establishment U
Questionnaire 1
Sample Survey Allocation
50
Post Collection Operations Committee• Discuss production issues of common interest• Provide status reports on production and production readiness
Divisional Production meetings• Working group level dealing with production issues relating to a
specific subject matter division, including planning and adhoc requests
Post Collection Processing Teams• Structured by Subject Matter Division to provide the best support
and to maximise subject matter expertise Change Management Requests
• Improvements Service Request Management Portal (SRM)
• Corrections
Managing The UES Post Collection Process
51
Future Directions
IBSP (Integrated Business Statistics Project)• New and Improved UES, to consolidate and
standardise processing for more annual and sub-annual business surveys
• Start RY2013. To be completed for RY2015• Number of surveys to increase from 60 annual surveys
to 120 annual and sub-annual surveys.
52