Top Banner
2008 Summer North American Stata Users Group meeting Chicago, 24-25 July 2008 Using SPSS files in Stata Sergiy Radyakin The World Bank
14

2008 Summer North American Stata Users Group meeting Chicago, 24-25 July 2008 Using SPSS files in Stata Sergiy Radyakin The World Bank.

Apr 02, 2015

Download

Documents

Julio Singler
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 2008 Summer North American Stata Users Group meeting Chicago, 24-25 July 2008 Using SPSS files in Stata Sergiy Radyakin The World Bank.

2008 Summer North American Stata Users Group meetingChicago, 24-25 July 2008

Using SPSS files in Stata

Sergiy RadyakinThe World Bank

Page 2: 2008 Summer North American Stata Users Group meeting Chicago, 24-25 July 2008 Using SPSS files in Stata Sergiy Radyakin The World Bank.

2

How to get the data in?• Use Stata to manipulate the data and read it in

• Use the data producing application to export the data in the proper format that Stata can later import

• Use specialized conversion software to convert to a proper format

• Use another statistical package that supports both formats to make it convert the dataset

• Write own conversion program in/as:

– Stata (slow, portable)

– Mata (faster, portable)

– Plugin (very fast, not portable, dependent on Stata’s bit-width)

– Standalone (very fast, not portable, independent of Stata’s bitwidth)

Page 3: 2008 Summer North American Stata Users Group meeting Chicago, 24-25 July 2008 Using SPSS files in Stata Sergiy Radyakin The World Bank.

3

Which data formats does Stata support (as of v10)?

• Stata native formats (use)• ASCII data with dictionaries (insheet/infix)• SAS XPORT format (fdause)• Data import via ODBC, provided that a

required driver is installed and configured

• But, no SPSS support

Page 4: 2008 Summer North American Stata Users Group meeting Chicago, 24-25 July 2008 Using SPSS files in Stata Sergiy Radyakin The World Bank.

4

When SPSS is available:• SPSS v14 and later supports exporting data to Stata

format

• SPSS_to_Stata_00.sbs script by Alasdair Crockett is available for earlier releases, requires both SPSS and Stata for conversion

Data Services Guides: SPSS_to_Stata Conversion Utility Guide

http://www.data-archive.ac.uk/support/conversionguide.pdf

• This can be automated with an .ado wrapper similar to USESAS by Dan Blanchette, which requires SAS to be installed to import data to Stata

• These are not “true readers”, since they require SPSS or SAS to be installed (with license costs, etc.)

Page 5: 2008 Summer North American Stata Users Group meeting Chicago, 24-25 July 2008 Using SPSS files in Stata Sergiy Radyakin The World Bank.

5

Specialized Conversion Software

• Stat/Transfer– http://www.stattransfer.com/– $295 (New unit, Windows)

• DBMS/Copy– http://www.dataflux.com/Product-Services/Products/

dbms.asp– $495 (New individual, Windows)

• Both support command line parameters to convert in a batch-mode and thus can be “wrapped” for use with Stata, see e.g. STCMD by Roger Newson

(as of July 13, 2008)

Page 6: 2008 Summer North American Stata Users Group meeting Chicago, 24-25 July 2008 Using SPSS files in Stata Sergiy Radyakin The World Bank.

6

USESPSS• USESPSS is a new command for Stata to

read in SPSS data (*.sav files)

• It is a “true reader” – does not require any other software (other than OS Windows)

• Free

• Implemented as a plugin, with portions of code (e.g. file decompression) written in assembler for performance optimization

• Note: SPSS format documentation is not released, and only fragmented information is available in the Internet

Page 7: 2008 Summer North American Stata Users Group meeting Chicago, 24-25 July 2008 Using SPSS files in Stata Sergiy Radyakin The World Bank.

7

USESPSS Features• Reads *.sav files originating from both Windows and

UNIX versions of SPSS (LoHi and HiLo byte orders)

• Supports compressed and non-compressed SPSS files

• Preserves variable and value labels

• Optimizes data storage types (2-pass)

• Supports long variable names

• Automatically renames not allowed variable names and resolves naming collisions

• Preserves number of decimals in numeric formats.

• Transfers, but does not format date/time variables

Page 8: 2008 Summer North American Stata Users Group meeting Chicago, 24-25 July 2008 Using SPSS files in Stata Sergiy Radyakin The World Bank.

8

USESPSS Syntax

usespss can be used as any other command in the command line, user’s .do files and .ado programs:

usespss [using] “filename.sav” [,clear saving(“filename.dta”) iff(condition) inn(condition) memory(memsize) lowmemory(memsize)]

Page 9: 2008 Summer North American Stata Users Group meeting Chicago, 24-25 July 2008 Using SPSS files in Stata Sergiy Radyakin The World Bank.

9

Memory Tradeoff

• Stata and plugins share the same address space

• As a consequence, plugins can read Stata’s data directly (if they know where it is located) and call Stata’s subroutines (if exposed).

• However, the more memory is allocated for Stata data, the less memory is available to the plugins, because the size of the address space is limited (typically 2GB on a 32-bit Windows system). In other words, plugins compete for memory between themselves and with Stata.

Page 10: 2008 Summer North American Stata Users Group meeting Chicago, 24-25 July 2008 Using SPSS files in Stata Sergiy Radyakin The World Bank.

10

Memory Tradeoff

• Similarly to Stata, usespss attempts to load the whole data file into memory; this speeds up the 2-pass processing (1st pass – optimization of the storage types, 2nd pass – actual conversion)

• But, when user loads the SPSS data Stata data (if any) is discarded. So Stata’s memory use can be temporarily decreased within usespss.ado

• It is important to do this when working with large files, otherwise the plugin will not be able to allocate enough memory to load the SPSS data file.

Page 11: 2008 Summer North American Stata Users Group meeting Chicago, 24-25 July 2008 Using SPSS files in Stata Sergiy Radyakin The World Bank.

11

Stata data

Stata code

Memory Use

time

Limit, e.g. 2GB

usespss.adostarts

Stata memory is temporarily set to a

low value

Plugin data

Mem

ory

usespss.ado ends

Stata memory is set to a higher value

Consider the following code:

set mem 800musespss using “mydata.sav”, lowmemory(10) memory(800)

Free memory

Free memory

Plugin code

10m

800m

Any dataset in Stata’s

memory is cleared

Page 12: 2008 Summer North American Stata Users Group meeting Chicago, 24-25 July 2008 Using SPSS files in Stata Sergiy Radyakin The World Bank.

12

DESSPSS• desspss is a new Stata command to

describe the contents of an SPSS system *.sav file

• does not destroy data in the memory• works much faster than usespss using filename.sav, saving(filename.dta)

describe

because no optimization/conversion is actually performed, but does not list the variable types (these are determined after optimization)

• saves all descriptive information in r()

Page 13: 2008 Summer North American Stata Users Group meeting Chicago, 24-25 July 2008 Using SPSS files in Stata Sergiy Radyakin The World Bank.

13

DESSPSS Example Report

. desspss using artificial.sav

DESSPSS Report==============SPSS System file: artificial.savCreated (date): 17-Jul- 8Created (time): 22: 4: 0SPSS product: SPSS-X SYSTEM FILE. SPSS 5.0 MS/Windows made by DBMS/COPY File label (if present): File size (as stored on disk): 382692 bytesData size: 381432 bytesData stored in compressed formatThis file is likely to originate from a Windows platform (LoHi byte order)

Number of cases (observations): 10000Number of variables: 10Case size: 88 bytes----------------------------------------------------------------------

Variables:

GENDER MARRIED B_YEAR W_HOURS CITY_CODAGE EMP_STAT WAGE FULLTIME CITY_NAM

Page 14: 2008 Summer North American Stata Users Group meeting Chicago, 24-25 July 2008 Using SPSS files in Stata Sergiy Radyakin The World Bank.

14

• Embedded artificially created dataset in SPSS format: Click on the icon opens the SPSS file in Stata if: 1. usespss is installed in Stata, and 2. file assosiation was set:

Substitute with the full name of the Stata’s executable

• Questions?

--------------------- beginning of sav_file.reg --------------------Windows Registry Editor Version 5.00[HKEY_CLASSES_ROOT\.sav]@="sav_auto_file"[HKEY_CLASSES_ROOT\sav_auto_file]@="SPSS Dataset"[HKEY_CLASSES_ROOT\sav_auto_file\shell][HKEY_CLASSES_ROOT\sav_auto_file\shell\open][HKEY_CLASSES_ROOT\sav_auto_file\shell\open\command]@="\"C:\\Stata10se\\wsestata.exe\" usespss \"%1\"“---------------------- end of sav_file.reg -------------------------

Demonstration:

artificial.sav