Top Banner
SQL Queries and CRS Plus Part 1 Jennifer Seiffert, Northrop Grumman under contract to CDC’s NPCR For Registry Plus Users Group August 21, 2008
39

SQL Queries and CRS Plus Part 1 Jennifer Seiffert, Northrop Grumman under contract to CDCs NPCR For Registry Plus Users Group August 21, 2008.

Mar 31, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SQL Queries and CRS Plus Part 1 Jennifer Seiffert, Northrop Grumman under contract to CDCs NPCR For Registry Plus Users Group August 21, 2008.

SQL Queries and CRS PlusPart 1

Jennifer Seiffert, Northrop Grumman under contract to CDC’s NPCR

For Registry Plus Users Group

August 21, 2008

Page 2: SQL Queries and CRS Plus Part 1 Jennifer Seiffert, Northrop Grumman under contract to CDCs NPCR For Registry Plus Users Group August 21, 2008.

What is SQL?

• Structured Query Language for working with databases to manage and retrieve data

• Pronounced “sequel” or “S” “Q” “L”

• ANSI/ISO standard, but proprietary extensions for different DBMSs

• “SQL Server” is MicroSoft’s database management system using SQL language

Page 3: SQL Queries and CRS Plus Part 1 Jennifer Seiffert, Northrop Grumman under contract to CDCs NPCR For Registry Plus Users Group August 21, 2008.

SQL Queries

• Queries are used to retrieve data or information about the data.

• Queries are a type of SQL Statement, the SELECT statement.

• Statements begin with a keyword, in this case the keyword is SELECT.

• Keywords are in red in these slides.

Page 4: SQL Queries and CRS Plus Part 1 Jennifer Seiffert, Northrop Grumman under contract to CDCs NPCR For Registry Plus Users Group August 21, 2008.

SQL Query Interface in CRS Plus (1)

Page 5: SQL Queries and CRS Plus Part 1 Jennifer Seiffert, Northrop Grumman under contract to CDCs NPCR For Registry Plus Users Group August 21, 2008.

Type query here.

Results display here.

Click here to execute query.

SQL Query Interface in CRS Plus (2)

Page 6: SQL Queries and CRS Plus Part 1 Jennifer Seiffert, Northrop Grumman under contract to CDCs NPCR For Registry Plus Users Group August 21, 2008.

Query

Results

Sample SQL Query in CRS Plus

Page 7: SQL Queries and CRS Plus Part 1 Jennifer Seiffert, Northrop Grumman under contract to CDCs NPCR For Registry Plus Users Group August 21, 2008.

Click here to save your query or your results.

SQL Query Interface in CRS Plus (3)

Page 8: SQL Queries and CRS Plus Part 1 Jennifer Seiffert, Northrop Grumman under contract to CDCs NPCR For Registry Plus Users Group August 21, 2008.

When focus is on the query entry area, you can save your query so you can re-run it in the future.

SQL Query Interface in CRS Plus (4)

Page 9: SQL Queries and CRS Plus Part 1 Jennifer Seiffert, Northrop Grumman under contract to CDCs NPCR For Registry Plus Users Group August 21, 2008.

When focus is on the results area, you can save your results as a file in one of several forms and use the results in another program. Default is comma-delimited file.

SQL Query Interface in CRS Plus (5)

Page 10: SQL Queries and CRS Plus Part 1 Jennifer Seiffert, Northrop Grumman under contract to CDCs NPCR For Registry Plus Users Group August 21, 2008.

CRS Plus Data Types

• SQL data types are listed in Control Table

• Almost all fields in CRS Plus tables are of type nVarchar—character strings of varying length. There are exceptions for some system fields.

• Strings in SQL statements must be enclosed in single quotes.

Page 11: SQL Queries and CRS Plus Part 1 Jennifer Seiffert, Northrop Grumman under contract to CDCs NPCR For Registry Plus Users Group August 21, 2008.

Model CRS Plus Tables

Patient ID LastName SSN Sex

0001 Smith 234567839 2

0002 Jones 123456789 1

MedRefID PatientID AgeDx PSite DxCounty

0001 0001 62 C509 013

0408 0002 68 C619 999

0025 0002 71 C349 041

Tables are linked by including keys from other tables. Patient 0002 has two tumors.

Column name is FieldName from Control table.

Page 12: SQL Queries and CRS Plus Part 1 Jennifer Seiffert, Northrop Grumman under contract to CDCs NPCR For Registry Plus Users Group August 21, 2008.

SELECT (1)

• The SELECT statement is a description of the set of data to be returned by the query.

• The SELECT keyword is followed by a list of the columns from a specified table to be returned in the set, followed by FROM clause specifying source table(s).

• Symbol * selects all columns in a table.

Page 13: SQL Queries and CRS Plus Part 1 Jennifer Seiffert, Northrop Grumman under contract to CDCs NPCR For Registry Plus Users Group August 21, 2008.

SELECT (2)

• Two ways to specify columns:– Use FieldName from Control Table in

CRS Plus. Example: TypeRepSrc– Use format: TableName.FieldName

Example: MedicalSum_2.TypeRepSrc

– Second method is required for joins to distinguish fields with the same name in different tables, so good practice

Page 14: SQL Queries and CRS Plus Part 1 Jennifer Seiffert, Northrop Grumman under contract to CDCs NPCR For Registry Plus Users Group August 21, 2008.

SELECT (3)

• In standard SQL, all statements, including SELECT statements, must end with a semicolon (;). However, SQL Server will accept statements without a semicolon.

• SELECT statements are safe to run. They cannot modify or damage your database.

• But they can give a misleading answer if done incorrectly.

Page 15: SQL Queries and CRS Plus Part 1 Jennifer Seiffert, Northrop Grumman under contract to CDCs NPCR For Registry Plus Users Group August 21, 2008.

Not Case-Sensitive

• SQL is not case-sensitive. “SELECT”, “select”, and “Select” mean the same thing. It is a tradition to write SQL's own words in uppercase to distinguish SQL instructions from other words used in queries.

• Using all caps for keywords is good practice.

Page 16: SQL Queries and CRS Plus Part 1 Jennifer Seiffert, Northrop Grumman under contract to CDCs NPCR For Registry Plus Users Group August 21, 2008.

Registry Data Tables in CRS Plus

• PatientsPatients_2

• MedicalSumMedicalSum_2

• AbstractsAbstracts_2Abstracts_3

• Tables split for performance and because of field number limits

• See Saba Yemane’s handouts (or Control Table) to map fields to tables

Page 17: SQL Queries and CRS Plus Part 1 Jennifer Seiffert, Northrop Grumman under contract to CDCs NPCR For Registry Plus Users Group August 21, 2008.

Joining Tables

• To retrieve data in columns from more than one table requires a join of tables– Example: LastName from Patients table and

PSite from MedicalSum table

• Joins require a field and value to be the same in the two tables being joined

Page 18: SQL Queries and CRS Plus Part 1 Jennifer Seiffert, Northrop Grumman under contract to CDCs NPCR For Registry Plus Users Group August 21, 2008.

Joins

• WHERE Table1.FieldA = Table2.FieldA AND Table1.FieldA = Table3.FieldA AND Table 3.FieldB = Table4.FieldB

• WHERE (Patients.PatientId = Patients_2.PatientID AND Patients.PatientID = MedicalSum.PatientID AND MedicalSum_2.MedRefID = MedicalSum.MedRefID)

Page 19: SQL Queries and CRS Plus Part 1 Jennifer Seiffert, Northrop Grumman under contract to CDCs NPCR For Registry Plus Users Group August 21, 2008.

SELECT Patients.BirthDate, Patients.LastName, MedicalSum_2.RegNodExam

FROM Patients, MedicalSum_2, MedicalSum

WHERE

Patients.PatientID = MedicalSum.PatientID

AND MedicalSum.MedRefID = MedicalSum_2.MedRefID

AND Patients.BirthDate > '19400100’

ORDER BY

Patients.BirthDate

SELECT clause: column names separated by commas

FROM clause: table names separated by commas

WHERE clause: filter criteria with joins and logical operators

ORDER BY clause: sort order

Model Query

Page 20: SQL Queries and CRS Plus Part 1 Jennifer Seiffert, Northrop Grumman under contract to CDCs NPCR For Registry Plus Users Group August 21, 2008.

Exclude Abstractsby Status (1)

• Status is an attribute of abstracts, not of consolidated data.

• In querying from Abstracts tables, you may want to exclude Voided and Pending

• See Sanjeev’s Status Codes handout.

Page 21: SQL Queries and CRS Plus Part 1 Jennifer Seiffert, Northrop Grumman under contract to CDCs NPCR For Registry Plus Users Group August 21, 2008.

Exclude Abstractsby Status (2)

• Two versions of syntax to negate a condition– WHERE . . . AND NOT (Status1 = 97 OR Status1

= 99)

– WHERE . . . AND (Status1 <> 97 AND Status1 <> 99)

• Status1 and Status2 are system variables with data type TinyInt, so no quotes are used around the values.

• Be careful with use of AND and/or OR!

Page 22: SQL Queries and CRS Plus Part 1 Jennifer Seiffert, Northrop Grumman under contract to CDCs NPCR For Registry Plus Users Group August 21, 2008.

Nulls

• NULL means data are not defined, BUT

• Not the same as blank or unknown

• Use of nulls complicates data retrieval with SQL, requiring 3-value logic (True, False, or Unknown)

• Nulls are not used in CRS Plus database, so you can safely ignore this complication.

Page 23: SQL Queries and CRS Plus Part 1 Jennifer Seiffert, Northrop Grumman under contract to CDCs NPCR For Registry Plus Users Group August 21, 2008.

Parentheses

• Use them in complex conditional expressions.

• They aren’t always necessary for correct results, but you need to really know what you’re doing to know when it’s safe not to use them.

• They add clarity for reading.

Page 24: SQL Queries and CRS Plus Part 1 Jennifer Seiffert, Northrop Grumman under contract to CDCs NPCR For Registry Plus Users Group August 21, 2008.

Using Site Group• Standard site analysis categories used by SEER, NPCR, and NAACCR• Codes are 5 characters. For codes, see:

– NAACCR vol. III, section 4.2.1.1, Table 3, p. 50– SEER Web site: http://seer.cancer.gov/siterecode/icdo3_d01272003/

• 3 levels of codes: Detail, Subaggregate, and Aggregate

– GrpSite3, detailed codeExample: 35011 = acute lymphocytic leukemia

– GrpSite2, subaggregateExample: 35010 = acute leukemias

– GrpSite1, aggregateExample: 35000 = all leukemias

Page 25: SQL Queries and CRS Plus Part 1 Jennifer Seiffert, Northrop Grumman under contract to CDCs NPCR For Registry Plus Users Group August 21, 2008.

GRPSite Codes in CRS Plus (1)

• Three fields in MedicalSum table– GrpSite1—Aggregate

Example: 35000, Leukemias– GrpSite2—Subaggregate

Example: 35010, Acute leukemias– GrpSite3—Detail

Example: 35011, Acute lymphocytic leukemia

Page 26: SQL Queries and CRS Plus Part 1 Jennifer Seiffert, Northrop Grumman under contract to CDCs NPCR For Registry Plus Users Group August 21, 2008.

GRPSite Codes in CRS Plus (2)

• Stored when tumor records are created or updated

• Can also be computed on demand on all tumor records by selecting “Batch Update SEER Codes” from the Administration menu

Page 27: SQL Queries and CRS Plus Part 1 Jennifer Seiffert, Northrop Grumman under contract to CDCs NPCR For Registry Plus Users Group August 21, 2008.

How to Formulate a Query (1)

• See handout:SQL Queries and CRS Plus Sample Query.doc

• First, formulate your question in English.Example: I want a list of childhood leukemia cases in Kosciusko county residents, diagnosed in 2006, listed in order by age.

Page 28: SQL Queries and CRS Plus Part 1 Jennifer Seiffert, Northrop Grumman under contract to CDCs NPCR For Registry Plus Users Group August 21, 2008.

How to Formulate a Query (2)

Identify the data items needed to identify the cases, and their tables.Example: For selection, I can use GrpSite1 (the aggregate code), DxCounty, DxDate, and AgeDx, all from MedicalSum table.

Page 29: SQL Queries and CRS Plus Part 1 Jennifer Seiffert, Northrop Grumman under contract to CDCs NPCR For Registry Plus Users Group August 21, 2008.

How to Formulate a Query (3)

Identify the data items you want on the results list, and their tables.Example: For display, I want Patients.PatientID, Patients.LastName, MedicalSum.AgeDx, MedicalSum.PSite , MedicalSum.HistTypeICDO3, MedicalSum.DxCounty, MedicalSum.DxDate, MedicalSum.GrpSite1

TIP: Always display the items you are filtering on as a check.

Page 30: SQL Queries and CRS Plus Part 1 Jennifer Seiffert, Northrop Grumman under contract to CDCs NPCR For Registry Plus Users Group August 21, 2008.

Caution!

County codes are not unique. They are duplicated across states. To select a specific county, select on both state and county codes. For Kosciusko County, Indiana, I will need to select

State = IN

County = 085

Page 31: SQL Queries and CRS Plus Part 1 Jennifer Seiffert, Northrop Grumman under contract to CDCs NPCR For Registry Plus Users Group August 21, 2008.

How to Formulate a Query (4a)

Construct first part of query:SELECT Patients.PatientID, Patients.LastName, MedicalSum.AgeDX, MedicalSum.PSite, MedicalSum.HistTypeICDO3, MedicalSum.DxState, MedicalSum.DxCounty, MedicalSum.DxDate, MedicalSum.GrpSite1--Continued--(The 2 hyphens signal a comment.)

Page 32: SQL Queries and CRS Plus Part 1 Jennifer Seiffert, Northrop Grumman under contract to CDCs NPCR For Registry Plus Users Group August 21, 2008.

How to Formulate a Query (4b)

FROM Patients, MedicalSum--Continued

Page 33: SQL Queries and CRS Plus Part 1 Jennifer Seiffert, Northrop Grumman under contract to CDCs NPCR For Registry Plus Users Group August 21, 2008.

How to Formulate a Query (5)

Determine codes to use in WHERE clauseExample: AgeDx less than 16 (15 years and younger)GrpSite1 = 35000 (leukemia group)State = INCountyDx = 085 (Kosciusko)Year of DX = 2006

Page 34: SQL Queries and CRS Plus Part 1 Jennifer Seiffert, Northrop Grumman under contract to CDCs NPCR For Registry Plus Users Group August 21, 2008.

How to Formulate a Query (6a)

Construct WHERE clause with joins and filters

Example: WHERE Patients.PatientID = MedicalSum.PatientID AND MedicalSum_2.MedRefID = MedicalSum.MedRefID) AND

--Continued

Page 35: SQL Queries and CRS Plus Part 1 Jennifer Seiffert, Northrop Grumman under contract to CDCs NPCR For Registry Plus Users Group August 21, 2008.

How to Formulate a Query (6b)

MedicalSum.AgeDx < ’016’ AND

SUBSTRING (MedicalSum.DxDate, 1, 4) = ‘2006’ AND

--This is SQL Server’s substring function syntax, beginning in column 1 of the field for a length of 4 columns.

MedicalSum.GrpSite1 = ‘35000’ AND

--Continued

Page 36: SQL Queries and CRS Plus Part 1 Jennifer Seiffert, Northrop Grumman under contract to CDCs NPCR For Registry Plus Users Group August 21, 2008.

How to Formulate a Query (6c)

DxState = ‘IN’ AND

DxCounty = ‘085’

--End of WHERE clause

Page 37: SQL Queries and CRS Plus Part 1 Jennifer Seiffert, Northrop Grumman under contract to CDCs NPCR For Registry Plus Users Group August 21, 2008.

How to Formulate a Query (7)

Construct ORDER BY clause

Example:ORDER BY AgeDx;

--Note terminal semicolon indicating end of entire SQL Select statement

Page 38: SQL Queries and CRS Plus Part 1 Jennifer Seiffert, Northrop Grumman under contract to CDCs NPCR For Registry Plus Users Group August 21, 2008.

Preview of Part 2

• Using COUNT

• Using AS for alias names

• Converting data types for calculations

• Subqueries (nested queries)

• More complex conditional selections and joins

• Request time! Send your query requests and we’ll build a library on the Web site.

Page 39: SQL Queries and CRS Plus Part 1 Jennifer Seiffert, Northrop Grumman under contract to CDCs NPCR For Registry Plus Users Group August 21, 2008.

• The findings and conclusions in this report are those of the author(s) and do not necessarily represent the official position of the Centers for Disease Control and Prevention.