SQL Queries and CRS Plus Part 1 Jennifer Seiffert, Northrop Grumman under contract to CDC’s NPCR For Registry Plus Users Group August 21, 2008
Mar 31, 2015
SQL Queries and CRS PlusPart 1
Jennifer Seiffert, Northrop Grumman under contract to CDC’s NPCR
For Registry Plus Users Group
August 21, 2008
What is SQL?
• Structured Query Language for working with databases to manage and retrieve data
• Pronounced “sequel” or “S” “Q” “L”
• ANSI/ISO standard, but proprietary extensions for different DBMSs
• “SQL Server” is MicroSoft’s database management system using SQL language
SQL Queries
• Queries are used to retrieve data or information about the data.
• Queries are a type of SQL Statement, the SELECT statement.
• Statements begin with a keyword, in this case the keyword is SELECT.
• Keywords are in red in these slides.
SQL Query Interface in CRS Plus (1)
Type query here.
Results display here.
Click here to execute query.
SQL Query Interface in CRS Plus (2)
Query
Results
Sample SQL Query in CRS Plus
Click here to save your query or your results.
SQL Query Interface in CRS Plus (3)
When focus is on the query entry area, you can save your query so you can re-run it in the future.
SQL Query Interface in CRS Plus (4)
When focus is on the results area, you can save your results as a file in one of several forms and use the results in another program. Default is comma-delimited file.
SQL Query Interface in CRS Plus (5)
CRS Plus Data Types
• SQL data types are listed in Control Table
• Almost all fields in CRS Plus tables are of type nVarchar—character strings of varying length. There are exceptions for some system fields.
• Strings in SQL statements must be enclosed in single quotes.
Model CRS Plus Tables
Patient ID LastName SSN Sex
0001 Smith 234567839 2
0002 Jones 123456789 1
MedRefID PatientID AgeDx PSite DxCounty
0001 0001 62 C509 013
0408 0002 68 C619 999
0025 0002 71 C349 041
Tables are linked by including keys from other tables. Patient 0002 has two tumors.
Column name is FieldName from Control table.
SELECT (1)
• The SELECT statement is a description of the set of data to be returned by the query.
• The SELECT keyword is followed by a list of the columns from a specified table to be returned in the set, followed by FROM clause specifying source table(s).
• Symbol * selects all columns in a table.
SELECT (2)
• Two ways to specify columns:– Use FieldName from Control Table in
CRS Plus. Example: TypeRepSrc– Use format: TableName.FieldName
Example: MedicalSum_2.TypeRepSrc
– Second method is required for joins to distinguish fields with the same name in different tables, so good practice
SELECT (3)
• In standard SQL, all statements, including SELECT statements, must end with a semicolon (;). However, SQL Server will accept statements without a semicolon.
• SELECT statements are safe to run. They cannot modify or damage your database.
• But they can give a misleading answer if done incorrectly.
Not Case-Sensitive
• SQL is not case-sensitive. “SELECT”, “select”, and “Select” mean the same thing. It is a tradition to write SQL's own words in uppercase to distinguish SQL instructions from other words used in queries.
• Using all caps for keywords is good practice.
Registry Data Tables in CRS Plus
• PatientsPatients_2
• MedicalSumMedicalSum_2
• AbstractsAbstracts_2Abstracts_3
• Tables split for performance and because of field number limits
• See Saba Yemane’s handouts (or Control Table) to map fields to tables
Joining Tables
• To retrieve data in columns from more than one table requires a join of tables– Example: LastName from Patients table and
PSite from MedicalSum table
• Joins require a field and value to be the same in the two tables being joined
Joins
• WHERE Table1.FieldA = Table2.FieldA AND Table1.FieldA = Table3.FieldA AND Table 3.FieldB = Table4.FieldB
• WHERE (Patients.PatientId = Patients_2.PatientID AND Patients.PatientID = MedicalSum.PatientID AND MedicalSum_2.MedRefID = MedicalSum.MedRefID)
SELECT Patients.BirthDate, Patients.LastName, MedicalSum_2.RegNodExam
FROM Patients, MedicalSum_2, MedicalSum
WHERE
Patients.PatientID = MedicalSum.PatientID
AND MedicalSum.MedRefID = MedicalSum_2.MedRefID
AND Patients.BirthDate > '19400100’
ORDER BY
Patients.BirthDate
SELECT clause: column names separated by commas
FROM clause: table names separated by commas
WHERE clause: filter criteria with joins and logical operators
ORDER BY clause: sort order
Model Query
Exclude Abstractsby Status (1)
• Status is an attribute of abstracts, not of consolidated data.
• In querying from Abstracts tables, you may want to exclude Voided and Pending
• See Sanjeev’s Status Codes handout.
Exclude Abstractsby Status (2)
• Two versions of syntax to negate a condition– WHERE . . . AND NOT (Status1 = 97 OR Status1
= 99)
– WHERE . . . AND (Status1 <> 97 AND Status1 <> 99)
• Status1 and Status2 are system variables with data type TinyInt, so no quotes are used around the values.
• Be careful with use of AND and/or OR!
Nulls
• NULL means data are not defined, BUT
• Not the same as blank or unknown
• Use of nulls complicates data retrieval with SQL, requiring 3-value logic (True, False, or Unknown)
• Nulls are not used in CRS Plus database, so you can safely ignore this complication.
Parentheses
• Use them in complex conditional expressions.
• They aren’t always necessary for correct results, but you need to really know what you’re doing to know when it’s safe not to use them.
• They add clarity for reading.
Using Site Group• Standard site analysis categories used by SEER, NPCR, and NAACCR• Codes are 5 characters. For codes, see:
– NAACCR vol. III, section 4.2.1.1, Table 3, p. 50– SEER Web site: http://seer.cancer.gov/siterecode/icdo3_d01272003/
• 3 levels of codes: Detail, Subaggregate, and Aggregate
– GrpSite3, detailed codeExample: 35011 = acute lymphocytic leukemia
– GrpSite2, subaggregateExample: 35010 = acute leukemias
– GrpSite1, aggregateExample: 35000 = all leukemias
GRPSite Codes in CRS Plus (1)
• Three fields in MedicalSum table– GrpSite1—Aggregate
Example: 35000, Leukemias– GrpSite2—Subaggregate
Example: 35010, Acute leukemias– GrpSite3—Detail
Example: 35011, Acute lymphocytic leukemia
GRPSite Codes in CRS Plus (2)
• Stored when tumor records are created or updated
• Can also be computed on demand on all tumor records by selecting “Batch Update SEER Codes” from the Administration menu
How to Formulate a Query (1)
• See handout:SQL Queries and CRS Plus Sample Query.doc
• First, formulate your question in English.Example: I want a list of childhood leukemia cases in Kosciusko county residents, diagnosed in 2006, listed in order by age.
How to Formulate a Query (2)
Identify the data items needed to identify the cases, and their tables.Example: For selection, I can use GrpSite1 (the aggregate code), DxCounty, DxDate, and AgeDx, all from MedicalSum table.
How to Formulate a Query (3)
Identify the data items you want on the results list, and their tables.Example: For display, I want Patients.PatientID, Patients.LastName, MedicalSum.AgeDx, MedicalSum.PSite , MedicalSum.HistTypeICDO3, MedicalSum.DxCounty, MedicalSum.DxDate, MedicalSum.GrpSite1
TIP: Always display the items you are filtering on as a check.
Caution!
County codes are not unique. They are duplicated across states. To select a specific county, select on both state and county codes. For Kosciusko County, Indiana, I will need to select
State = IN
County = 085
How to Formulate a Query (4a)
Construct first part of query:SELECT Patients.PatientID, Patients.LastName, MedicalSum.AgeDX, MedicalSum.PSite, MedicalSum.HistTypeICDO3, MedicalSum.DxState, MedicalSum.DxCounty, MedicalSum.DxDate, MedicalSum.GrpSite1--Continued--(The 2 hyphens signal a comment.)
How to Formulate a Query (4b)
FROM Patients, MedicalSum--Continued
How to Formulate a Query (5)
Determine codes to use in WHERE clauseExample: AgeDx less than 16 (15 years and younger)GrpSite1 = 35000 (leukemia group)State = INCountyDx = 085 (Kosciusko)Year of DX = 2006
How to Formulate a Query (6a)
Construct WHERE clause with joins and filters
Example: WHERE Patients.PatientID = MedicalSum.PatientID AND MedicalSum_2.MedRefID = MedicalSum.MedRefID) AND
--Continued
How to Formulate a Query (6b)
MedicalSum.AgeDx < ’016’ AND
SUBSTRING (MedicalSum.DxDate, 1, 4) = ‘2006’ AND
--This is SQL Server’s substring function syntax, beginning in column 1 of the field for a length of 4 columns.
MedicalSum.GrpSite1 = ‘35000’ AND
--Continued
How to Formulate a Query (6c)
DxState = ‘IN’ AND
DxCounty = ‘085’
--End of WHERE clause
How to Formulate a Query (7)
Construct ORDER BY clause
Example:ORDER BY AgeDx;
--Note terminal semicolon indicating end of entire SQL Select statement
Preview of Part 2
• Using COUNT
• Using AS for alias names
• Converting data types for calculations
• Subqueries (nested queries)
• More complex conditional selections and joins
• Request time! Send your query requests and we’ll build a library on the Web site.
• The findings and conclusions in this report are those of the author(s) and do not necessarily represent the official position of the Centers for Disease Control and Prevention.