Top Banner
MODERN SQL SQL Beyond 1992
56

Modern sql

Apr 16, 2017

Download

Technology

Elizabeth Smith
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript

Modern sql

Modern sqlSQL Beyond 1992

So why did I start this talkI like sql and was apprenticed to someone who was determined to teach me how to sql rightI had done a lot of mysql in my day and had learned some very bad habitsI was rather amazed at all the stuff I could do with proper sql!1

Why use an RDBMS?

We could talk a bunch on why youd choose an rdbms over sqlBut thats not what this talk is aboutThis is about you already have sql and want itIf you really want to see this in action, go argue with Derick

2

Things to look at when choosing a databaseblah blah blah Consistency, availability, and partition tolerance (CAP) Robustness and reliability Scalability Performance and speed Operational and querying capabilities Database integrity and constraints Database security Database vendor/system funding, stability, community, and level of establishment Talent pool and availability of relevant skills The type and structure of data being stored, and the ideal method of modeling the data

In other words IT DEPENDS And dont let any database employee tell you otherwise

Look, lots of things go into choosing a databaseMost of us in web look for fast and cheap that might be nice, but you might be missing out on features you needAlso Im rather disappointed in a lot of open source databases

3

The SQL you useIs from 1992At least thats a step up from SQL-89Until MSVC 2015 C was stuck on C89The first SQL standard was created in 1986COBOL, FORTRAN, Pascal and PL/II now feel oldWas driven by large companiesSee if you can name some

Interestingly enough Microsoft didnt get into the game until1998 they partenered with Sybase, then eventually did a rewrite to make it NT happy (Thats why if you used the old php mssql extension that also gave you Sybase calls)

IBM, Oracle and later Microsoft have been the big players Sybase, SAP and others are also involved4

We are ANSI and we like standards!Second, because there was no outside buyer to shape the content of the Core level of SQL99, it was enlarged to such an extent that to implement it all is close to impossible for all vendors except for two or three. In short, the size of core is a natural barrier to practical product development. Michael GormanSecretary of the ANSI Database Languages Committeehttp://tdan.com/is-sql-a-real-standard-anymore/4923

SQL-92SQL-99SQL-2003SQL-2011

The SQL standard is huge. More than 4000 pages in its SQL:2011 incarnation. No single implementation can ever implement all features.0 Even in the early releases, such as SQL-92, the SQL standard defined different conformance levels so that vendors can claim conformance to a subset of the standard.

Starting with SQL:1999 all features are enumerated and either flagged mandatory or optional. As a bare minimum, conforming systems must comply with all mandatory features, which are collectively called Core SQL. Besides entry-level SQL-92 features, Core SQL:1999 also requires some features previously only required for intermediate or full level as well as a few new features3.Beyond Core SQL, vendors can claim conformance on a feature-by-feature basis.

5

Reporting Reporting ReportingAlmost all the features well discuss are most useful for reportingSome are syntactic sugar (or run faster in cases) than traditional SQLMysql doesnt have any of this (sorry folks) maybe WITH in 8.0https://dveeden.github.io/modern-sql-in-mysql/Go here and +1 all the feature requests!PostgreSQL has it all, plus some other goodies But use a new version, I recommend 9.5

But its 2016!!

The sad part is I dont even care about the REALLY new shiny (ahem, 5 year old) stuff, I just want 1999 support!!In fact nothing here Im showing you is beyond the spec for 19997

OLAPGrouping sets, CUBE, and ROLLUP

OLAPOnline analytical processingProcesses multi-dimensional analytical (MDA) queries swiftlyConsolidation (roll-up) aggregation of data that can be accumulated and computed in one or more dimensionsDrill-downnavigate through the details Slicing and dicingtake out (slicing) a specific set of data and view (dicing) the slices from different dimensions

This is describing in fancy terms every reporting interface ever

9

GROUPING SETS, ROLLUP, CUBEGROUPING SETS ( ( e1, e2, e3, ... ), ( e1, e2 ), ( ))allow you to choose how you do sets of columns that you group byROLLUP ( e1, e2, e3, ... )is shorthand the given list of expressions and all prefixes of the list including the empty list, useful for hierarchical data CUBE ( e1, e2, ... )is shorthand for the given list and all of its possible subsets

Getting some information grouped multiple ways

Ive done this many many many many times in one form or another (or multiple queries even to get it or other evil)11

GROUPING SET examplehttps://gist.github.com/auroraeosrose/b6b71780ba4c91cd02e6d175a6eeb49a

Grouping sets let you do all kinds of cool complex stuff!Guess what happens when you add a () to the end?12

ROLLUP example

That, that is what I want to writeSimple, succinct DOES THE SAME THING!!13

Why do we care?Simplify queriesPerform fewer queries, get more data out of the same queryGroup information on multiple (and complex) dimensions

14

SupportMySQL (and MariaDB) have GROUP BY WITH ROLLUPThats it, just rollupAnd its kind of broken syntax too compared to other DBsPostgreSQL was late to the party (9.5) but implemented ALL THE THINGSSQL Server, Oracle, DB2 have had this stuff for ages (plus a bunch of proprietary olap features in addition!)SQLite IS missing this stuff (bad SQLite, bad!)

With (Recursive)CTE for you and for me

Subqueries suck to read

Does anyone have any idea what this is doing? I do not!17

WITH organize complex queriesWITH query_name (column_name1, ...) AS (SELECT ...) SELECT ...

A way to organize queriesAlso called common table expression (CTE) and sub-query factoringMakes highly complex queries sane to read and understand

Seriously, so postgresql and sqlite support this but mysql doesnt yet it MIGHT in 8.0, maybe18

WITH examplehttps://gist.github.com/auroraeosrose/82cc6c420d7749336ef474c87df50841

Oooh now it makes some sense

Caveat lector postgresql treats with statements as an optimization fence! This is because you can update and delete using a with, a postgresql extensionSo beware of CTES depending on what you want for performance!19

Why do we care?If its easier to read its easier to maintainAssign column names to tablesHide tables for testing (with names HIDE TABLES of the same name)

Perl joke 20

WITH RECURSIVEThe optional RECURSIVE modifier changes WITH from a mere syntactic convenience into a feature that accomplishes things not otherwise possible in standard SQL. Using RECURSIVE, a WITH query can refer to its own outputWAT?

With recursive example

22

TO THE VM!Ubuntu 16.04sudo apt-get install postgresqlsudo i u postgrespsql

How to play along if youre so included the default postgresql in Ubuntu 16.04 IS 9.5 which is what you should be onNot only for features and speed but also there were a couple of nasty jsonb and security bugs fixed23

Why do we care?Row generatorsFake dataSeriesDatesProcessing GraphsFinding distinct valuesLoose index scan

The term "loose indexscan" is used in some other databases for the operation of using a btree index to retrieve the distinct values of a column efficiently; rather than scanning all equal values of a key, as soon as a new value is found, restart the search by looking for a larger value. This is much faster when the index has many equal keys. Postgres does not support loose indexscans natively, but they can be emulated using a recursive CTE as follows:

24

SupportBasically everyone but MySQLYes, even SQLite Only PostgreSQL REQUIRES the recursive keywordThey do some weird things with WITH

25

LateralThe foreach loop of SQL

LATERAL joining by foreacha LATERAL join is like a SQL foreach loop, in which the db will iterate over each row in a result set and evaluate a subquery using that row as a parameter.A lateral join can reference other tables in the query!Generally lateral joins are faster (the optimizer gets to have fun)

Lateral join examplehttps://gist.github.com/auroraeosrose/73fb8d0779ef4c0251754f38eea228de

Yes, you could probably rewrite this as a subquery but it is generally going to be faster this way, especially with large amounts of data

28

TO THE VM!

SupportEveryone but MySQL and SQL ServerWell, except in SQL Server you can use cross/outer APPLY to do the same thing

Once again mysql is out in the cold30

2003 has arrived

Who remembers what was happening in 2003? You do realize this stuff was standardized 13 years ago? No more complaining about browser stuff huh?31

FilterAnd the CASE fakeout

This is something I absolutely love, filter is amaaaazing

Also I lied while case is a 1999 feature, filter is a 2003 feature (bite me)The next cool thing is 2003 too, the future is coming!

32

FILTER selective aggregatesSUM() FILTER(WHERE )Works on any aggregate functionIncluding array_agg

Faking it SUM(CASE WHEN THEN END)COUNT(CASE WHEN THEN 1 END)

With the exception of subqueries and window functions, the may contain any expression that is allowed in regular where clauses0.

The biggest win here is its simply FASTER in postgresql to use filter the query planner is more clever than when used with a traditional case statement, which can be slow

Use it for pivot tables, for grabbing eav data easily

33

Filter examplehttps://gist.github.com/auroraeosrose/f649e997ffd94df57c827ab3b0ee7d1b

CASE instead

35

TO THE VM!Which is faster case or filter?Why is this not the #1 request everywhereLook, well give you a keyword that optimizes the crap out of your query and you can do more with just 1 query

SupportOnly PostgreSQL has it You can use CASE on almost any other DB to fake it

But remember, as we discussed earlier there CAN be a performance penalty for queries with CASE filter has better query optimizations37

Window functionsRank, Over, Partition By

The whole idea behind window functions is to allow you to process several values of the result set at a time: you see through the window some peer rows and are able to compute a single output value from them, much like when using an aggregate function. 38

Window FunctionsDefine which rows are visible at each rowOVER() makes all rows visible at each rowOVER(PARTITION BY) segregates like a GROUP BYOVER(ORDER BY BETWEEN) segregates using < >

So basically were going to chop up so we can do multiple things at a time 39

As a query writer I want toMerge rows that have the same thingsGROUP BYDISTINCTAggregate data from related rowsRequires a GROUP BYUses aggregate functions

BUT AT THE SAME TIME

40

OVER (PARTION BY)

Using the same ddl as the lateral stuff, you can see we can get our average salary AND our individual salary at the same time!41

TO THE VM!You can do a LOT more with WindowingYou can pageYou can do ranges and betweenYou can window more than onceI could do a whole talk on windowing!

42

OVER (ORDER BY )

A window function performs a calculation across a set of table rows that are somehow related to the current row. This is comparable to the type of calculation that can be done with an aggregate function. But unlike regular aggregate functions, use of a window function does not cause rows to become grouped into a single output row the rows retain their separate identities. Behind the scenes, the window function is able to access more than just the current row of the query result.

A window function call always contains an OVER clause directly following the window function's name and argument(s). This is what syntactically distinguishes it from a regular function or aggregate function. The OVER clause determines exactly how the rows of the query are split up for processing by the window function. The PARTITION BY list within OVER specifies dividing the rows into groups, or partitions, that share the same values of the PARTITION BY expression(s). For each row, the window function is computed across the rows that fall into the same partition as the current row.

43

Windowing Functionsrow_number()number of the current row within its partition, counting from 1rank()rank of the current row with gapsdense_rank()rank of the current row without gapspercent_rank()relative rank of the current row: (rank - 1) / (total rows - 1)cume_dist()relative rank of the current row: (number of rows preceding or peer with current row) / (total rows)ntile(num_buckets integer)integer ranging from 1 to the argument value, dividing the partition as equally as possiblelag(value anyelement)returns value evaluated at the row that is offset rows before the current row within the partitionlead(value anyelement)returns value evaluated at the row that is offset rows after the current row within the partitionfirst_value(value any)returns value evaluated at the row that is the first rowlast_value(value any)returns value evaluated at the row that is the last rownth_value(value any, nth integer)returns value evaluated at the row that is the nth row of the window frame (counting from 1); null if no such row

SupportPostgreSQLSQL ServerOracleDB2

Both SQLite and MySQL are missing support

To the future!

Currently the biggest trend seems to be jsonThis amuses me a lot because I remember the xml pushYou could argue json is a better format but there are a BUNCH of xml standards for dbs, did you know that?Sql server and oracle and db2 have all sorts of fancy features for xml usageBut now theyre all moving json bound46

Nosql in your sqljsonb and PostgreSQL

But I like nosql! But I need SQL

Json types in PostgresqljsonString internal representationhttp://rfc7159.net/rfc7159 previously supported http://www.ietf.org/rfc/rfc4627.txt Stores exact text, reparsed on each execution

jsonbBinary internal representationCan have indexes on stuff insideHas shadow typeshttp://rfc7159.net/rfc7159 De-duplicates and decomposes to binary format

shadow types unknown to the core SQL parser. 49

Creation

Simple Selection

TO THE VM!But wait, theres more

OperatorsOperatorExampleExample Result->'[{"a":"foo"},{"b":"bar"},{"c":"baz"}]'::jsonb->2{"c":"baz"}->'{"a": {"b":"foo"}}'::jsonb->'a'{"b":"foo"}->>'[1,2,3]'::jsonb->>23->>'{"a":1,"b":2}'::jsonb->>'b'2#>'{"a": {"b":{"c": "foo"}}}'::jsonb#>'{a,b}'{"c": "foo"}#>>'{"a":[1,2,3],"b":[4,5,6]}'::jsonb#>>'{a,2}'3

DescriptionGet JSON array element (indexed from zero, negative integers count from the end)Get JSON object field by keyGet JSON array element as textGet JSON object field as textGet JSON object at specified pathGet JSON object at specified path as text

53

Moar OperatorsOperatorExample@>'{"a":1, "b":2}'::jsonb @> '{"b":2}'::jsonb