Top Banner
2008 Summer North American Stata Users Group meeting Chicago, 24-25 July 2008 Using SPSS files in Stata Sergiy Radyakin The World Bank
14
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Radyakin usespss

2008 Summer North American Stata Users Group meetingChicago, 24-25 July 2008

Using SPSS files in Stata

Sergiy RadyakinThe World Bank

Page 2: Radyakin usespss

2

How to get the data in?• Use Stata to manipulate the data and read it in

• Use the data producing application to export the data in the proper format that Stata can later import

• Use specialized conversion software to convert to a proper format

• Use another statistical package that supports both formats to make it convert the dataset

• Write own conversion program in/as:

– Stata (slow, portable)

– Mata (faster, portable)

– Plugin (very fast, not portable, dependent on Stata’s bit-width)

– Standalone (very fast, not portable, independent of Stata’s bitwidth)

Page 3: Radyakin usespss

3

Which data formats does Stata support (as of v10)?

• Stata native formats (use)• ASCII data with dictionaries (insheet/infix)• SAS XPORT format (fdause)• Data import via ODBC, provided that a

required driver is installed and configured

• But, no SPSS support

Page 4: Radyakin usespss

4

When SPSS is available:• SPSS v14 and later supports exporting data to Stata

format

• SPSS_to_Stata_00.sbs script by Alasdair Crockett is available for earlier releases, requires both SPSS and Stata for conversion

Data Services Guides: SPSS_to_Stata Conversion Utility Guide

http://www.data-archive.ac.uk/support/conversionguide.pdf

• This can be automated with an .ado wrapper similar to USESAS by Dan Blanchette, which requires SAS to be installed to import data to Stata

• These are not “true readers”, since they require SPSS or SAS to be installed (with license costs, etc.)

Page 5: Radyakin usespss

5

Specialized Conversion Software

• Stat/Transfer– http://www.stattransfer.com/– $295 (New unit, Windows)

• DBMS/Copy– http://www.dataflux.com/Product-Services/Products/

dbms.asp– $495 (New individual, Windows)

• Both support command line parameters to convert in a batch-mode and thus can be “wrapped” for use with Stata, see e.g. STCMD by Roger Newson

(as of July 13, 2008)

Page 6: Radyakin usespss

6

USESPSS• USESPSS is a new command for Stata to

read in SPSS data (*.sav files)

• It is a “true reader” – does not require any other software (other than OS Windows)

• Free

• Implemented as a plugin, with portions of code (e.g. file decompression) written in assembler for performance optimization

• Note: SPSS format documentation is not released, and only fragmented information is available in the Internet

Page 7: Radyakin usespss

7

USESPSS Features• Reads *.sav files originating from both Windows and

UNIX versions of SPSS (LoHi and HiLo byte orders)

• Supports compressed and non-compressed SPSS files

• Preserves variable and value labels

• Optimizes data storage types (2-pass)

• Supports long variable names

• Automatically renames not allowed variable names and resolves naming collisions

• Preserves number of decimals in numeric formats.

• Transfers, but does not format date/time variables

Page 8: Radyakin usespss

8

USESPSS Syntax

usespss can be used as any other command in the command line, user’s .do files and .ado programs:

usespss [using] “filename.sav” [,clear saving(“filename.dta”) iff(condition) inn(condition) memory(memsize) lowmemory(memsize)]

Page 9: Radyakin usespss

9

Memory Tradeoff

• Stata and plugins share the same address space

• As a consequence, plugins can read Stata’s data directly (if they know where it is located) and call Stata’s subroutines (if exposed).

• However, the more memory is allocated for Stata data, the less memory is available to the plugins, because the size of the address space is limited (typically 2GB on a 32-bit Windows system). In other words, plugins compete for memory between themselves and with Stata.

Page 10: Radyakin usespss

10

Memory Tradeoff

• Similarly to Stata, usespss attempts to load the whole data file into memory; this speeds up the 2-pass processing (1st pass – optimization of the storage types, 2nd pass – actual conversion)

• But, when user loads the SPSS data Stata data (if any) is discarded. So Stata’s memory use can be temporarily decreased within usespss.ado

• It is important to do this when working with large files, otherwise the plugin will not be able to allocate enough memory to load the SPSS data file.

Page 11: Radyakin usespss

11

Stata data

Stata code

Memory Use

time

Limit, e.g. 2GB

usespss.adostarts

Stata memory is temporarily set to a

low value

Plugin data

Mem

ory

usespss.ado ends

Stata memory is set to a higher value

Consider the following code:

set mem 800musespss using “mydata.sav”, lowmemory(10) memory(800)

Free memory

Free memory

Plugin code

10m

800m

Any dataset in Stata’s

memory is cleared

Page 12: Radyakin usespss

12

DESSPSS• desspss is a new Stata command to

describe the contents of an SPSS system *.sav file

• does not destroy data in the memory• works much faster than usespss using filename.sav, saving(filename.dta)

describe

because no optimization/conversion is actually performed, but does not list the variable types (these are determined after optimization)

• saves all descriptive information in r()

Page 13: Radyakin usespss

13

DESSPSS Example Report

. desspss using artificial.sav

DESSPSS Report==============SPSS System file: artificial.savCreated (date): 17-Jul- 8Created (time): 22: 4: 0SPSS product: SPSS-X SYSTEM FILE. SPSS 5.0 MS/Windows made by DBMS/COPY File label (if present): File size (as stored on disk): 382692 bytesData size: 381432 bytesData stored in compressed formatThis file is likely to originate from a Windows platform (LoHi byte order)

Number of cases (observations): 10000Number of variables: 10Case size: 88 bytes----------------------------------------------------------------------

Variables:

GENDER MARRIED B_YEAR W_HOURS CITY_CODAGE EMP_STAT WAGE FULLTIME CITY_NAM

Page 14: Radyakin usespss

14

• Embedded artificially created dataset in SPSS format: Click on the icon opens the SPSS file in Stata if: 1. usespss is installed in Stata, and 2. file assosiation was set:

Substitute with the full name of the Stata’s executable

• Questions?

--------------------- beginning of sav_file.reg --------------------Windows Registry Editor Version 5.00[HKEY_CLASSES_ROOT\.sav]@="sav_auto_file"[HKEY_CLASSES_ROOT\sav_auto_file]@="SPSS Dataset"[HKEY_CLASSES_ROOT\sav_auto_file\shell][HKEY_CLASSES_ROOT\sav_auto_file\shell\open][HKEY_CLASSES_ROOT\sav_auto_file\shell\open\command]@="\"C:\\Stata10se\\wsestata.exe\" usespss \"%1\"“---------------------- end of sav_file.reg -------------------------

Demonstration:

artificial.sav