This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
• This course will provide practical solutions to problems that arise before doing analyses as well as the final push toward getting the results.
• I will talk about issues like finding unruly data, massaging data into a useful format, building datasets of valid data and choosing statistics.
Administrivia
Getting Help
• Mike Hurley [email protected] is the TA for the course.• His office hours will be announced weekly. I will be available
for online Q&A at [email protected] or preferably, on the class newsgroup. I will answer questions every morning around dawn. If you post to the newsgroup and do not hear back quickly please email me.
• Things labeled “Assignment”, but not “Homework”, can be done with the help of classmates.
• You are strongly encouraged to discuss your problems up until you start writing your answers to the homework problems.
• I assume you know how to use Windows or Mac OS.• For this class you need access to a machine with: – Windows XP Pro or Vista Business/Ultimate– Windows 7 Professional/Business/Ultimate.
• XP Home Edition or Vista Home Edition will not work and Windows 7 Home Premium may work with the software in this class.
• I use: XP Pro, 7 Pro, and XP Pro running in Parallels on the Mac.
Administrivia
Getting a Computer
• If you want to get a new computer, you can get one at a very good price through Stanford. You can get ideas on what is an acceptable computer here:
afs.stanford.edu is the easy way to move files to your UNIX space.
Use Your Website
Use AFS and your Website
Mount your drive then you can put stuff in the WWW
folder!
Install OpenAFS
My UNIX SpaceStanford Software
After AFS is InstalledStanford Software
SecureFXStanford Software
Secure AFS• You can make a space that can hold PHI and be shared by anybody with a SUNet ID.1. Setup the workgroup that will serve as your access control list:
http://workgroup.stanford.edu2. Request the Secure AFS space:
• The biggest weaknesses in computer security are the legal users of the system. – Walking away from a terminal – Using passwords that are easy to crack – Taking data off of restricted machines– Viruses and Trojan horses will kill you if you let
them!
Security
Email
• Email provides all the confidentiality of a postcard.
• If you are sending HIPAA sensitive information you can secure your email:
• You may get unsolicited commercial solicitations, advertisements, chain letters, or pornography through your Stanford email account.– NEVER respond to these messages, never use the REMOVE
provided in the email.– NEVER put your email address on a web page.
Security
• At webmail.stanford.edu you can choose the Preferences tab and Filters from the left to automatically sack repeat offenders.
• Each year, on average, one student in five loses all their work. Plan on your computer being destroyed at the worst possible time this year.– Coffee, computer worm or virus, small child with
refrigerator magnet, physical hard drive failure, theft, bicycle crash, etc.
• Every day back up your work to more than one location.
Security
Where to Backup
• PLEASE use removable media if you have no network access – – Floppy disk, CD, DVD, flash media
• NEVER backup or share confidential data (HIPPA sensitive protected health information) on mobile media without talking to security experts first.
• At home I use www.crashplan.com. Ask your Tech support person for recommendations.
Rcmdr is a friendly, but incomplete, graphical user interface (GUI) for R.
Other Software
Getting SAS
• If you have a machine with XP, Vista or Windows 7 Pro, Business or Ultimate and more than 30 Gig of extra hard drive space you can get SAS for $65 per year. Place the order here:https://itservices.stanford.edu/service/softwarelic/sas – There is a digital download that is HUGE (15+ Gig not Meg). If
you have a wired connection on campus use it.
• The instructions for installing it can be found here:http://www.stanford.edu/class/hrp223/2012/InstallingSAS93_20120702.pptx
• Stuff that … – will make you famous or cry– you want to pull from the electronic medical
record– the information you will need to store if it is not in
the medical record
Data
Structured vs. Unstructured
• Unstructured data– Text like dictations, operation notes, data entry
comments– Difficult to process
• Structured data– Afford the ability to build ontologies– Dates– Pick lists (multiple choice)– Relatively easy to process
Data
Structuring Biomedical Data
• RxNORM for drug ingredients / brand names• ICD-9 for billing diagnostic and procedure codes– fairly coarse but nicely hierarchical
• ICD-O for detailed cancer pathology• CPT for procedures – No hierarchical structure, difficult to search
• SNOMED-CT – for general purpose clinical terms– Hierarchical, detailed and vast but with some gaps
Data
What is structured data?
• All pieces of information that you collect and calculate as part of a study are data. Every person’s response to a questionnaire is called a data point.
• There are two fundamentally different types of data: numeric and character. – Numeric data is always … numeric. Information that you could
want to do math on is numeric data.– Character data is alphanumeric. It includes the obvious things
like names and addresses, but it also includes numbers that you should not do math on.
• Some systems, like R, make finer distinctions and let you set data so they are forced to be factors.
Data
What is data coding?
• A question such as, “What is your current age in years?” is going to generate numeric data.
• A question such as, “At what age did you first contract a sexually transmitted disease?” is going to generate numeric data ….
But you are going to need to allow for the possibility that somebody has never contracted a sexually transmitted disease.
… and you always need to allow for people who never knew or do not remember information or who may be dishonest in their answers.
Data
What is data coding? (2)
• When you have a question that generates numeric data and your subject’s response is not a “real number” you can code a bogus value.– “Not applicable” can be coded as age –1000000.– “Do not know” can be coded as –2000000.
• The better way to deal with this problem is to use the value “NULL.”– SAS allows you to code 27 different types of NULL.– Null values make your job easier when you try to do math
on the values.
Data
Missing Data
• SAS represents missing character data as a pair of quotes with nothing between them and missing numbers are stored as a decimal place.
• You can also use .A, .B, etc. to code for missing numbers but you can’t enter them directly.
Data
What is data coding? (3)
• Questions that generate alphanumeric data are always complex compared to numeric data.
• “Where were you born?” can be coded as a string of letters from a fill-in-the-blank question or coded as letters or numbers from a multiple choice format question.– Do not use null in fill-in-the-blanks.
Data
Typical Tasks
• Importing data• Cleaning• Making a subset• Numeric and graphical summaries• Analyses with graphics• Summary reports
or• Doing simple math
Data
Basics
• While most people use SAS for processing complex collections of data, it can be used for simple math. The techniques that you use for simple math are also used to make complex changes to any size data sets.
• I hope this stuff will make your lives easier in statistics classes…
SAS
Using EG for MathSAS
SAS
A data set is shown in the flowchart.It’s contents are displayed in the programming windowpane.You can see it stored in the temporary “work library” by browsing the Server List.
SASMake a temporary
dataset to hold the answer.
The Log tab gives you feedback on what SAS did.
SAS
No Need for a Data SetFor a simple calculation you do not need to make a
dataset to hold a single number. You have the number show up in the log window.
1. Give SAS a formula. 1+1
2. Tell it what to call the results.theAnswer
3. Print the results out.putlog theAnswer =
4. Tell it you are done giving it instructions.
Use short meaningful names that do not include spaces, punctuation characters, or leading numbers.
SAS
Basic Math• You put the instructions together by typing a
program into the code window, like this:data _null_;theAnswer = 1 + 1;putlog theAnswer =;run;• Run it.
Don’t bother to store the results in a dataset.
SAS
The count of how many lines have been submitted
The Answer
SAS
Don’t panic….
• The help that ships with SAS is good.
• It is its own program hidden in the documentation subfolder inside the SAS folder off the Windows start button.
Search for functions and call routines by category
Click the Favorites tab.
Final Administrivia
• Please save a table for the people who are officially enrolled (or are taking the class for deferred credit).
• Bring a laptop with SAS if possible.• Grades (pass/fail only)– Pass 4 of 4 homework assignments for 3 units– Pass 3 of 4 homework assignments for 2 units