The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning, Research and Evaluation Division U.S. Census Bureau
Mar 28, 2015
The Statistical Administrative Records System and Administrative Records
Experiment 2000: System Design, Successes, and
Challenges
Dean H. Judson
Planning, Research and Evaluation DivisionU.S. Census Bureau
Outline of Presentation
• General principles for using administrative records properly
• Overview of StARS/AREX history, goals and design
• Applications and evaluations: StARS 1999 and StARS 2000 versus Census 2000
General Principles for Using Administrative Records Properly
How Administrative Records Are Created and Used
P resen ta tion(q u ery resu lts an d d isp lays )
D atab ase
R ecord ed E ven ts an d O b jec ts(ad m in is tra tive record )
O b served E ven ts an d O b jec ts("sam p lin g fram e")
E ven ts an d O b jec ts(p op u la tion )
Policy changes which change the definition of events and objects
“Ontologies” and thresholds for observation
Data entry errors and coding schemes
Data management issues
Query structure and spurious structure
Data collection
Some Important Principles
• Database Population !
• Database Truth !
• The “true” Data exist in the “real world”, as does the “true” Population.
• But, the database gives us information that points to the Truth, and points to the Population.
Populationin StARSDatabase
Resident U.S. Population on April 1, 2000
Deceased
Non-U.S.Residents
Acc
iden
tal D
upli
cati
on
Oops!Accidentally
includedcontractors!
Populationin Employee
Database
“Current” employees of Company X,October 1, 2001
Terminated,not yet entered
in database
Acc
iden
tal D
upli
cati
on
State 1
State 2
State 3
State 1
State 2
State 3
State 4
State 1
State 2
State 3
State 1
State 2
State 1
State 2
State 3
State 1
State 2
State 1
State 2
State 1
State 2
State 3
State 4
Proper Representation Incomplete Representation
Ambiguous Representation Meaningless States
Data Quality The function that maps from “real world” to database allows one to reconstructthe “real world” from the database values. Source: Wand and Wang, 1996:90
Ontologies and Data Quality
“Real world” Database
“Real world” Database“Real world” Database
“Real world” Database
Coverage of Target Population
Inte
nsit
y/C
onte
nt o
f D
ata
Col
lect
ion
Low High
Low
High
Administrative Records/Data Warehouse
Careful, well-donesample survey
Coverage versus Intensity/Content:How can we get the best of both?
Original DW Database (X)
Augmented DW Database, withX and estimated Y’s
CarefullyCollected Data (Y)
RepresentativeSample of XX
“GroundTruth”
Estimated Model: Y=f(X)
A Model for “Borrowing Strength”
Statistical Administrative Records System and Administrative Records
Experiment
Background and History
• Statistical Administrative Records System– Six large Federal input files: IRS 1040, IRS
1099, Selective Service, Medicare, Indian Health Service, HUD-TRACS/MTCS
– One lookup file: SSA/Census NUMIDENT
• AREX 2000– Attempt to use StARS data to simulate
administrative records census
What Was the Purpose of StARS 1999 and AREX 2000?
• Test the feasibility of an administrative records census– StARS: Nationwide– AREX: two counties in Maryland, three in Colorado
• MD 1.4M persons in 558K households• CO: 1.2M persons in 459K households
• Test two methods for conducting an administrative records census – top-down method– bottom-up method (match to address list, add’tl
operations)
Can We Do This?
• Title 13, U.S. Code (§6, (a)-(c) abridged:– “The Secretary…may call upon any other department…of the Federal
Government…for information pertinent to the work provided for in this title…To the maximum extent possible, the Secretary…shall use [such] information instead of conducting direct inquiries”
• Privacy Act, 1974 (Title 5 §6, abridged):– “No agency shall disclose any record…unless…to the Bureau of the
Census for purposes of planning or carrying out a census or survey or related [title 13] activity”
– “Each agency that maintains a system of records shall…publish in the Federal Register upon establishment…the existence and character of the system of records” (Published StARS in FR , January 1999)
The Statistical Administrative Records System-1999
TY98 IRS 1040119,946,193
TY98 IRS 1099598,075,971
Medicare56,837,022
Selective Service
13,176,234
HUD TRACS3,342,234
Indian Health Service
3,106,821
EditedIRS 1040
243,260,776
EditedIRS 1099
EditedMedicare
EditedSelective Service
EditedHUD TRACS
EditedIndian Health
Service
NUMIDENT676,589,439
CensusNUMIDENT
396,185,872
Address Processing795,742,702
Person Characteristics
File (PCF)396,185,872
Hygiene & Unduplication136,154,293
Geocoding102,965,122 (75.6% Coded)33,189,171 (24.4% Uncoded)
Person Processing875,750,973
SSN Validation (PVS)844,945,296 Valid
(96.5%)
Unduplication279,601,038
Remove Deceased/Create
Composite Record
257,764,909
Extraction of AREX Test Site Records1,459,760 in Baltimore Site1,229,274 in Colorado Site
InvalidSSNs
30,805,677(3.5%)
RaceModel
GenderModel
MortalityModel
TIGER
Code 1
ABI
? Research
EditedMTCS
6,208,615
EditedIRS IMF
253,825,653
EditedHUD TRACS
1,991,655
EditedSSS
14,538,895
EditedMedicare
59,197,759
EditedIRS IRMF
568,109,788
Statistical Administrative Records System-2000 (DRAFT)
TY99 IRS IMF124,729,862
TY99 IRS IRMF583,642,950
Medicare59,198,432
Selective Service
13,370,053
HUD TRACS1,991,672
Indian Health Service
2,730,407
EditedIHS
2,728,548
NUMIDENT721,228,119
CensusNUMIDENT
408,447,131
Address Processing725,230,009
Hygiene & Unduplication158,593,956
Geocoding125,647,359
Person Processing905,432,071
SSN Validation895,196,891
Unduplication289,968,449
Remove Deceased/Create
Composite Record
265,950,850
InvalidSSNs
10,235,180
RaceModel
GenderModel
MortalityModel
TIGER/MAF
Code 1
ABI
?
HUD MTCS6,232,562
Person Characteristics
File (PCF)408,447,131
Administrative Records Experiment in 2000 (AREX 2000)
• Five selected sites in Maryland and Colorado– MD: Baltimore city, Baltimore county;
– CO: El Paso county, Douglas county, Jefferson county
• Attempt to simulate an Administrative Records Census
• Not all aspects of an Administrative Records Census are simulated– Group Quarters survey
– Coverage measurement survey
• Special operations not included in StARS– Request for physical address (PO boxes/Rural Route’s)
– Clerical hand geocoding
– Field verification of addresses not matched to DMAF
AREX 2000 Evaluations
• Process: Analyzing selected components of the AREX implementation processing
• Outcomes: Block level analysis: Age/Race/Sex/Hispanicity comparisons to Census 2000
• Household level analysis:– Comparing household distributions for matched addresses
– Assessing the feasibility of using administrative records in lieu of a field interview to obtain data on nonresponding households
• Available at www.census.gov/pred/www/rpts.html#AREX
• (Synthesis of results from the Administrative Records Experiment in 2000)
Characteristics of Files Included in the StARS System
• IRS Individual Master 1040 File:– Tax year data; April, 2000 refers to “tax year” 1999– TY ‘99 file arrives October, 2000– Business entities, estates, other institutions included– ~120 million return records/year; maximum of six person records per
return – Households below the filing threshold do not need to file– Late filers systematically different than early filers– Tax Filing Unit Housing Unit: 10-20% of addresses are PO Boxes,
business addresses, tax preparers (Czajka, 2000)– TY95+: SSN’s of dependents requested, recorded– .5% of primary filer, 1.6% of secondary filer, 3.4% of dependents’ SSN’s
in error (Czajka, 1987)– Age, race, sex, Hispanic origin microdata not available
Characteristics of Files Included in the StARS System, cont.
• IRS Information Returns Master File:– Tax year data; April, 2000 refers to “tax year” 1999
– TY ‘99 file arrives October, 2000
– Business entities, estates, other institutions included
– ~700 million records/year
– Recipient address Housing Unit
– 10-20% of addresses are PO Boxes, business addresses, tax preparers
– Extremely limited microdata content: Age, race, sex, Hispanic origin microdata not available; name information often truncated
– Possible source of information on undocumented persons
Characteristics of Files Included in the StARS System, cont.
• Selective Service File:– Requested 4/1/99(00) file “cut date”
– ~13 million records
– Registration required in 1940, suspended in 1975, resumed in 1980
– Presumably, males 18-25 are required to inform SSS when they move
– Females, non-immigrant aliens, hospitalized, incarcerated, and institutionalized males, and members of the armed forces are exempt
– Limited microdata content: Race, Hispanic origin microdata not available
– Address information may not be current
Characteristics of Files Included in the StARS System, cont.
• Medicare Enrollment Database (EDB):– Requested 4/1/99(00) file “cut date” -- current and historical Medicare
enrollment (“Active” and “Inactive” cases)– ~ 40 million records at any one point in time– Recipient Address Housing Unit
• Proxy recipients listed on the file (e.g., John Doe’s benefits c/o Jane Doe; John Doe’s benefits c/o nursing home)
– Used in population estimates system for 65+ household population estimates
– A small portion of records at any point in time are almost certainly deceased (Kim and Sater, 2000)
– Coverage is high (93-102%) but not perfect and unevenly distributed geographically
• “Snowbird” states appear to have lower ratios of Medicare to 65+ population than “non-snowbird” states (Kim and Sater, 2000)
Characteristics of Files Included in the StARS System, cont.
• Indian Health Service patient file:– Requested 4/1/99(00) file “cut date”
– ~10 million patient/transaction records
– Transaction record person record
– Unduplication• about 10 million patient records, 2 million unduplicated SSN’s
– Many missing SSN’s (about 20%)
– Integral part of our race model
Characteristics of Files Included in the StARS System, cont.
• Housing and Urban Development Tenant Rental Assistance Certification System (HUD-TRACS/MTCS):– Requested 4/1/99(00) file “cut date”
– HUD subsidy payments
– TRACS 1999: ~ 3.3 million records
– TRACS 2000: ~ 2 million records
– Short form data for all members of household (Race/Hispanic only for head of household)
– Address information may represent project or landlord address
Characteristics of Files Included in the StARS System, cont.
• Census NUMIDENT File:– ~700 million transaction records 400 million individual SSN records– Post 1985: Enumeration at birth– For each SSN: Date of birth, gender, race, place of birth
• About 50-60 million persons on the file are deceased but not identified as such
• No current residence information on the file• Taxpayer ID Numbers (TINs) not on the file• Demographic properties:
– About 35% of SSN’s on file have alternate names (marriage, divorce, etc.)– About 6% missing gender– Race coding has changed (prior to 1980, 3 races: White, Black, Other);
20% either “unknown” or “other”– About 25% of SSN’s have transactions with different race codes
Creating Final StARS Database
• Select best address and demographics based on– geocodability
– currency
– quality
• Impute missing demographics (from NUMIDENT/PERSON CHARACTERISTICS FILE)
• Flag records for deceased people• Final database is like the census
Address Processing Results (StARS 1999)
• Almost 800 million addresses at start
• About 6 percent identified as potential businesses
• 136 million address records after unduplication
• About 75 percent geocoded– 85 percent geocoding rate for city-style
addresses
Person Processing Results (StARS 1999)
• 875 million records at start• 845 million have valid SSN record (96.5%)• 280 million after unduplication by SSN• 261 million after removal of known deceased• 257 million after removal of known deceased and
persons residing in outlying territories• StARS 2000: 266 million after removal of known
deceased before April 1, 2000 and persons residing in outlying territories
Additional Operations of AREX 2000
• Clerical geocoding
• Request for physical address (for P.O. Boxes, Etc.)
• Match to Decennial Master Address File
• Field address verification
Major Analytic Issues with StARS Processing
• Ontologies– The way in which an administrative agency “defines” the world may not
match the way the Census Bureau “defines” the world, e.g.,
– A delivery address suitable for receiving a payment check may not suffice for putting individuals at a street address
– Difficult to distinguish individual units within the Basic Street Address
– Race coding: Hispanic Origin is a separate race on NUMIDENT
– Transaction data person data
– How many names does a person have (and in what order)?
• Proxies – IRS & Medicare records– JOHN WILSON The address is (presumably) for Mary Smith. John Wilson may or
– C/O MARY SMITH may not live there.
– 1004 LAUREL LANE
– ROCKMONT, MD 22345
Major Analytic Issues with StARS Processing, cont.
• Addresses that are difficult to place on the ground– About 10 % of addresses are rural style
– PO Boxes: 45% for IHS, 9.5% for Medicare, 7.5% for IRS 1040, 6.8% for SSS, 3.8% for IRS 1099, .4% for HUD-TRACS (Huang and Kim, 2000)
– 1995 IRS/CPS match: 86.5% of tax return cases had the same address as residence address, 94% coded to same county (Sater, 1995)
• John Smith
• H&R BLOCK
• P.O. BOX 12
• GREENWAY, MD 29752
– Addresses with both business and residential components• Dean H. Judson
• JUDSON OLD GROWTH LOGGING SERVICES
• 45850 BACKWOODS HIGHWAY
• BOONDOCKS, OR 96432
Major Analytic Issues with StARS Processing, cont.• Unduplication and matching
– Addresses and personal characteristics are measured with substantial variation• Often not obvious whether a particular pair of records represent a duplicate or not.• Yet, with multiple files, unduplication decisions must be made.
– Address matching:
101 Elm Rd, # 1 97132
101 Elm St, apt 1 97701
Versus
101 Elm Rd, #1 97132
101 Elm St, apt 1 97132
Major Analytic Issues with StARS Processing, cont.
• Variations in data from different sources– Of the 50% of SSN’s found on multiple files,
• about 1% have more than one gender recorded • about 32% have multiple addresses• about 2% have multiple races (Huang and Kim, 2000)
• “Imputation” from the NUMIDENT– Many files have limited microdata. For those that are found on the
NUMIDENT, we can “impute” microdata from the approximately equivalent NUMIDENT fields.
• Race Model (Bye, 1998,1999)• Gender Model (Thompson, 1999)• Mortality Model (Falkenstein, Resnick, and Judson, 2000)
– StARS 2002 “NUMIDENT Race Enhancement”• Match NUMIDENT to Census 2000• Use Census 2000 race response to improve imputation model
Major Analytic Issues with StARS Processing, cont.
• Changing information states– Distinct problem from “point in time” data collection
– Information states change over time/over databases• Address information ages over time and varies over databases
SAM SMITH SAM SMITH
BOX 2 RURAL ROUTE 37 486 MAIN STREET
WESTPORT, VA 32784 FAIRFIELD, VA 33412
(Dated 10/14/98 from Medicare) (From TY97 IRS file, filed sometime in 1998)
• Mortality information ages over time and varies over databases
• One database provides information about the other, provided that matching can be performed
• Data processing requires complex, and substantively important, decision logic at each step
Applications and Evaluations
Applications• SSN search and validation with GEOkey
– Earlier: 90% found in validation step, 5% in search step– 2001 Evaluation: 92% found in search (with GEOkey) alone– Apparently, our computer search outperforms SSA manual system
• CPS/NHIS/ACS to Census matching evaluations– Compare different race responses– Compare survey and Census coverage– Compare variations in Poverty estimates
• Evaluation of synthetic estimation methods (Popoff, Judson and Fadali, 2001)
• Multiple-system Estimation for coverage evaluation– Additional information to aid dual-system estimation (Asher and
Feinberg, 2001)– Erroneous enumerations (Biemer, Brown, Wiesen, and Judson, 2001)
Applications• Nonresponse follow up (NRFU) substitution (’04 simulation test)
• Imputation methods improvement (’04 simulation test)
• Master Address File (MAF) targeting
• Census unduplication confirmation
• Population estimation (postcensal estimates)
• Survey improvement (noninterview adjustments)
Evaluations• Numident/PCF 1998 versus 1998 National estimates (Miller, Judson and Sater, 2000)
• State level comparisons of StARS 2000 versus Census 2000
• County StARS-synthetic methods versus county ratio estimates and Census 2000
• Detailed comparison by (fully crossed) age, race, sex, and Hispanic origin counts versus Census 2000, at the county level
• AREX tract, block, household evaluations on February 19th
Population Distribution by Age
0%
5%
10%
15%
20%
Under 10 10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-89 90+
National estimates PCF population BEFORE applying the mortality model PCF population AFTER applying the mortality model
Numident/PCF 1998 versus 1998 National Estimates
Percent Distribution
72.4%
12.1%
3.6%0.7%
10.1%
0.6% 0.2% 0.1%
69.2%
12.5%
4.3%0.8%
12.7%
0.3% 0.1% 0.1%0%
20%
40%
60%
80%
100%
White Non-Hispanic
Black Non-Hispanic
API Non-Hispanic
AI Non-Hispanic
WhiteHispanic
BlackHispanic
APIHispanic
AI Hispanic
National Estimates
PCF File
Numident/PCF 1998 versus 1998 National Estimates
AL
AK
AZ
AR
CA
CO
CTDE
DC
FLGA
HIID IL
INIAKS
KYLA
MEMDMA
MI
MN
MS
MOMTNE
NV
NH
NJ
NM
NY
NC
NDOH
OKOR
PA
RI
SC
SDTN
TXUT
VTVAWA
WV
WIWY
US
0.95
0.97
0.99
1.01
1.03
1.05
1.07
1.09
1.11
1.13
States
Rat
io C
ensu
s 20
00/S
tAR
S 2
000
Over the entire U.S., Census 2000 is about 6% higher than StARS 2000. Alaska is the only state where StARS 2000 exceeds Census 2000.
State Level Comparisons of Census 2000 to StARS 2000
County StARS-synthetic Methods versus 1999 Estimates
Comparison of 99 Estimates and StARS 99 Race/Sex Distribution (Three Counties in Colorado)
0%
10%
20%
30%
40%
50%
Whitemale
Whitefemale
Blackmale
Blackfemale
AIANmale
AIANfemale
API male APIfemale
99 Estimates
StARS 99
County StARS-synthetic methods versus 1999 Estimatesversus Census 2000
% Hispanic (StARS 99 vs. 99 Estimates vs. Census 2000, selected counties where StARS and Estimates deviate by more than 4
percentage points, counties in Colorado)
0
10
20
30
40
50
60
70
80
90
Alamos
a
Archu
leta
Bent
Chaffe
e
Conejo
s
Costill
a
Crowley
Frem
ont
Garfie
ld
Huerfa
no
Kiowa
La P
lata
Las A
nimas
Linco
ln
Mine
ral
Mor
gan
Otero
Phillip
s
Pueblo
Sagua
che
San Ju
an
StARS 99
Census 2000
99 Estimates
Counties in which StARS 99 is closer to Census 2000 are marked with a star.
Fully crossed age, race, sex, and Hispanic Origin array(ARSH array)
• For every county in the U.S., count the number of nondeceased persons by:– Single year of age (0,101+)
– Race (four groups)
– Sex (two groups)
– Hispanic origin (Hispanic/non)
– Potentially 102 x 4 x 2 x 2 = 1632 cells per county, 3141x1632 = 5,126,112 in the U.S.
• Error Measures:– Simple difference (C-S)
– Algebraic percent error (S-C)/C
Note: Each data point is a single
county’s ARSH cell.
Note: Each data point is a singlecounty’s ARSH cell.
Age/Sex distributions, selected counties in Texas
Graphs by SEX (MF)AGE
StARS % of total population Census % of total population
F
0 10 20 30 40 50 60 70 80 90
0
.5
1
1.5
M
0 10 20 30 40 50 60 70 80 90
Graphs by SEX (MF)AGE
StARS % of total population Census % of total population
F
0 10 20 30 40 50 60 70 80 90
0
.5
1
1.5
M
0 10 20 30 40 50 60 70 80 90
Graphs by SEX (MF)AGE
StARS % of total population Census % of total population
F
0 10 20 30 40 50 60 70 80 90
0
.5
1
M
0 10 20 30 40 50 60 70 80 90
Graphs by SEX (MF)AGE
StARS % of total population Census % of total population
F
0 10 20 30 40 50 60 70 80 90
0
1
2
3
4
M
0 10 20 30 40 50 60 70 80 90
Anderson County (N of Houston) Andrews County (Far west, NM border)
Atascosa County (Southern part of state) Brazos County (W of Houston)
Concluding Thoughts
• Historians of science will say that there was an “explosion” of research into Administrative Records and Data Warehousing in the late 20th/early 21st century
• Using these databases in a statistically-principled way requires a new statistical paradigm:– Not survey sampling per se– Not econometric modeling per se– Not coverage measurement per se– Something new
• These databases have some similar, but many different data quality issues than usual survey or census data
• We are attacking these issues with real Census applications
For Further Reading• Alvey, W., and Scheuren, F. (1982). Background for an Administrative Records Census. Proceedings of the Social
Statistics Section. Alexandria, VA: American Statistical Association.• Asher, J., and Feinberg, S. (2001). Statistical Variations on an Administrative Records Census. Proceedings of the Social
Statistics Section. Alexandria, VA: American Statistical Association.• Biemer, P., Brown, G., Weisen, C., and Judson, D.H. (2001). Triple system estimation in the presence of erroneous
enumerations. Proceedings of the Social Statistics Section. Alexandria, VA: American Statistical Association. Under review at the Journal of Official Statistics.
• Bye, B. (1997). Administrative Record Census for 2010 Design Proposal, Final Report. Rockville, MD: Westat, Inc.• Bye, B. (1998). Race and ethnicity modeling with SSA Numident Data: Interim report: File development and tabulations.
Unpublished document available from the U.S. Bureau of the Census.• Bryant, C. (1995). Comparing the LUCA address list to “local records.” Paper presented at the 1995 State Data Center
Meeting, San Francisco, CA, April 4, 1995.• Czajka, J., Moreno, L., and Schirm, A.L. (1997). On the Feasibility of Using Internal Revenue Service Records to Count
the U.S. Population. Washington, DC: Mathematica Policy Research, Inc.• Czajka, J. (1999). Can we count on administrative records in future U.S. Censuses? Presentation at the Bureau of the
Census, December 15, 1999.• Falkenstein, Matthew, Resnick, Dean R., and Judson, Dean. H. (2000). The Mortality Module of the Statistical
Administrative Records System. Administrative Records Memorandum Series, U.S. Census Bureau.• Farber, Jim, and Shaw, Kevin M. (2002). Dual System Estimates of Housing Units Based on Administrative Records. To
appear in the 2002 Proceedings of the American Statistical Association, Government Statistics Section [CD-ROM], Alexandria, VA: American Statistical Association.
• Heimovitz, Harley K (2002). Administrative Records Experiment 2000: Outcomes. To appear in the 2002 Proceedings of the American Statistical Association, Government Statistics Section [CD-ROM], Alexandria, VA: American Statistical Association.
• Huang, E., and Kim, J. (2000). One Percent Sample Study Report (SRD-DRAFT). Unpublished document available from the U.S. Bureau of the Census, February 10, 2000.
For Further Reading• Judson, D.H., and Popoff, C.L. (2000). Research Use of Administrative Records. University of Nevada: Nevada State
Demographer’s Office.• Judson, D. H. (2000). The Statistical Administrative Records System: System Design, Successes, and Challenges. Paper
presented at the 2000 Data Quality Workshop, Morristown, NJ, Nov 30-Dec 1.• Judson, D.H., Popoff, Carole L., and Batutis, Michael (2001). An Evaluation of the Accuracy of U.S. Census Bureau
County Population Estimation Methods. Statistics in Transition, 5:185-215.• Judson, D.H. (2001). A Partial Order Approach to Record Linkage. Paper presented at the Federal Committee on
Statistical Methodology, Washington, DC, November 14, 2001.• Judson, D.H. (2002). Adventures in Bayesian Record Linkage. Paper presented at the Classification Society of North
America, June 11, 2002.• Judson, Dean H. (2002). Merging Administrative Records Databases in the Absence of a Register: Data Quality Concerns
and Outcomes of an Experiment in Administrative Records Use. Paper presented at the UNECE-EUROSTAT work session on registers and administrative records in social and demographic statistics, Geneva, Switzerland, 9-11 December 2002).
• Kim, M. O., and Sater, D. (2000). Defining the Medicare Data Universe for the U.S. Census Bureau's Population Estimates Program. Paper presented at the Southern Demographic Association meetings, New Orleans, LA, August 29, 2000.
• Leggieri, Charlene, and Prevost, Ron (1999). Expansion Of Administrative Records Uses At The Census Bureau: A Long-Range Research Plan. Paper presented at the November 1999 Meeting of the Federal Committee on Statistical Methodology, Washington D.C.
• Miller, E., Judson, D.H., and Sater, D. (2000). The 100% Census NUMIDENT: Demographic Analysis of Modeled Race and Hispanic Origin Estimates Based Exclusively on Administrative Records Data, Paper presented at the Southern Demographic Association meetings, New Orleans, LA, August 29, 2000.
• Popoff, C.L., Judson, D.H., and Fadali, Betsy (2001). Measuring the Number of People Without Health Insurance: A Test of a Synthetic Estimates Approach for Small Area Estimates using SIPP Microdata. Paper presented at the Federal Committee on Statistical Methodology, Washington, DC, November 14, 2001.
For Further Reading• Sailer, P., Weber, M., and Yau, E. (1993). How Well Can IRS Count the Population? 1993
Proceedings of the Survey Research Methods Section. Alexandria, VA: American Statistical Association.
• Sater, D. (1995). Differences in Location of Households and Tax Filing Units. Paper presented at the 1995 meeting of the Population Association of America, San Francisco, CA, April 6, 1995.
• Stuart, E. and Zaslavsky, A.M. (2002). Using administrative records to predict census day residency. In Constantine Gatsonis, Robert E. Kass, Alicia Carriquiry, Andrew Gelman, David Higdon, Donna K. Pauler, Isabella Verdinelli (Eds.), Case Studies in Bayesian Statistics Volume VI. New York, NY: Springer.
• Thompson, Herbert (1999). The Development of a Gender Model with SSA Numident Data. Administrative Records Research Memorandum Series #32, U.S. Census Bureau.
• Wand, Y., and Wang, R. Y. (1996). Anchoring data quality dimensions in ontological foundations. Communications of the ACM, 39: 86-95.
• Zanutto, Elaine, and Zaslavsky, Alan M. (2001). Using Administrative Records to Impute for Nonresponse. In R. Groves, R.J.A. Little, and J.Eltinge (Eds), Survey Nonresponse. New York: John Wiley.
Glossary of Terms• Administrative records: Data collected wherein the primary purpose is to administer a regulation or record a
transaction rather than data collection per se.
• Administrative Records Census: A Census of Population and Housing in which a predominant component of the census-taking is performed by using administrative records databases. In practice, field operations (for example, for coverage measurement or for Group Quarters enumeration) often coincide.
• AREX2000: Administrative Records Experiment in 2000, an experimental attempt to simulate an “Administrative Records Census” in two sites in the U.S.
• Basic Street Address: The primary street number and street name, omitting apartment numbers or other within-structure identifiers.
• CPS: Current Population Survey, an ongoing survey administered by the U.S. Census Bureau.
• Data Quality: The ability to construct a mapping from the ontological representation of a data item in a database to its appropriate ontological representation in the “real world.”
• Master Address File (MAF): A file of addresses maintained by the U.S. Census Bureau for the purpose of taking its decennial census, and acting as a frame for ongoing sample surveys. The Decennial Master Address File is referred to as the “DMAF.”
• Master Housing File: A file of addresses developed by the Statistical Administrative Records System.
• Microdata: Data on individual person or housing characteristics, i.e., race, sex, age, street address, zip code.
• Ontology: The study of “what is”, that is, the categories by which we understand the world.
• StARS: Statistical Administrative Records System, an experimental database that combines information from several major Federal databases into one database that can be used for census-taking purposes.