Advanced DBI Perl Conference 3.0 Tutorial August 1999 Tim Bunce.

Advanced DBI

Perl Conference 3.0 Tutorial

August 1999

Tim Bunce

Advanced DBI tutorialby Tim BunceAugust 1999

Topical Topics

Wheels within Wheels Error Checking and Handling Transactions Speed Speed Speed! Portability The Power of the Proxy DBI for the Web Handling LONG/BLOB data Tainting Flexing the Multiplex

Wheels within Wheels

The DBI architecture

and how to watch it at work

Under the hood

DBI defines and implements an interface Driver modules do much of the real work DBI provides default methods, functions, tools etc for

drivers Not limited to the lowest common denominator -

mechanism provided for driver specific extensions Designed and built for speed Valuable detailed call tracing/debugging built-in

DBI Module

Perl Application

DBD::OtherDBD::InformixDBD::Oracle

Oracle Server Informix Server Other Server

A picture is worth?

Setting the scene

Inner and outer worlds Application and Drivers

Inner and outer handles DBI handles are references to tied hashes

The DBI Method Dispatcher gateway between the worlds and the heart of the DBI

… Now we'll go all deep and visual for a while...

Architecture of the DBI classes #1

‘’outer’’

DBD::A::dr DBD::A::db DBD::A::st

DBI::dr DBI::db DBI::st

DBI handle classes visible to applications.These classes are effectively ‘empty’.

DBD::B::dr DBD::B::db DBD::B::st

Parallel handle classes implemented by drivers.

DBD::_::common

DBD::_::dr DBD::_::db DBD::_::st

‘’inner’’

Base classesprovidingfallbackbehavior.

Architecture of the DBI classes #2

methodmethod

method

methodmethod

DBD::A::db

DBD::B::db

DBD::A::stmethodmethod

DBI::st

Applicationmakes callsto methodsusing DBI handleobjects

methodmethodmethodmethodmethodmethod

DBI::db

‘’outer’’

dispatchDBI

methodmethodmethodmethodmethod

DBI::_::db

method

DBI::_::common

‘’inner’’

Anatomy of a DBI handle

HandleRef.

DBIMagic

‘’outer’’ ‘’inner’’

struct imp_dbh_t {

struct dbih_dbc_t com;

… implementers … … own data ...

struct dbih_dbc_t {

… DBI data ...

TiedHash

TieMagic

DBI::db DBI::db

Method call walk-through

Consider a simple prepare call: $dbh->prepare(...)

$dbh is reference to a DBI::db object (regardless of driver) DBI::db::prepare is an alias for DBI dispatch method Dispatch calls driver’s prepare method something like this: my $inner_hash_ref = … # from tie magic

my $implementor_class = … # from DBI magic data

$inner_hash_ref->$implementor_class::prepare(...) Driver code gets the inner hash so it has fast access to the hash

contents

Watching the DBI in action

DBI has detailed call tracing built-in The trace can be very helpful in understanding application

behavior and for debugging Shows parameters and results Trace information can be written to a file Not used often enough!

Our program for today...

1: #!/usr/bin/perl -w

3: use DBI;

5: $dbh = DBI->connect('', '', '', { RaiseError => 1 });

7: $upd = $dbh->prepare("UPDATE prices SET price=? WHERE prod=?");

8: $ins = $dbh->prepare("INSERT INTO prices (prod,price) VALUES(?,?)");

9: $rows = $upd->execute(42, "Widgets");

10: $ins->execute("Widgets", 42) if $rows == 0;

12: $dbh->disconnect;

Enabling Tracing

Per handle$h->trace($level);

$h->trace($level, $filename); Only effects that handle and any new child handles created from it Child handles get trace level of parent in effect at time of creation

Global (internal to application)DBI->trace(...);

Sets effective global default minimum trace level

Global (external to application) Enabled using DBI_TRACE environment variable

DBI_TRACE=digits DBI->trace(digits);

DBI_TRACE=filename DBI->trace(2, filename);

DBI_TRACE=digits=filename DBI->trace(digits, filename);

(extra)

Trace level 1

Trace level 1 shows method results and line numbers:

<- connect= DBI::db=HASH(0xe0abc) at DBI.pm line 356.

<- STORE('PrintError', 1)= 1 at DBI.pm line 382.

<- STORE('AutoCommit', 1)= 1 at DBI.pm line 382.

<- STORE('RaiseError', 1)= 1 at DBI.pm line 382.

<- prepare('UPDATE …')= DBI::st=HASH(0xe1238) at test.pl line 7.

<- prepare('INSERT …')= DBI::st=HASH(0xe1504) at test.pl line 8.

<- execute= '0E0' at test.pl line 9.

<- execute= 1 at test.pl line 10.

<- disconnect= 1 at test.pl line 11.

<- DESTROY= undef

Trace level 2 and above

Trace level 2 shows calls with parameters and more:

-> connect for DBD::ODBC::dr (DBI::dr=HASH(0x13dfec)~0xe14a4

'' '' **** HASH(0xe0a10))

<- connect= DBI::db=HASH(0xe0ab0) at DBI.pm line 356.

-> STORE for DBD::ODBC::db (DBI::db=HASH(0xe0abc)~INNER 'PrintError' 1)

<- STORE= 1 at DBI.pm line 382.

-> prepare for DBD::ODBC::db (DBI::db=HASH(0xe0ab0)~0xe0abc

'UPDATE prices SET price=? WHERE prod=?')

<- prepare= DBI::st=HASH(0xe1274) at test.pl line 7.

Trace level 3 and above shows more processing details

Error Checking and Handling

To err is human, to detect, divine.

The importance of error checking

Errors happen! Failure happens when you don't expect errors!

– database crash / network disconnection– lack of disk space for insert or select (sort space for order by)– server math error on select (divide by zero)– and maybe, just maybe, errors in your own code (Gasp!)

Beat failure by expecting errors!

Detect errors early to limit effects– Defensive Programming, e.g., check assumptions– Through Programming, e.g., check for errors after fetch loops

Error checking - ways and means

Error checking the hard way...

$h->method || die "DBI method failed: $DBI::errstr";

Error checking the easy way...

$h->{RaiseError} = 1;

$h->method;

Handling errors the smart way

For simple applications immediate death on error is fine

For more advanced applications greater control is needed

Life after death:$h->{RaiseError} = 1;

eval { … $h->method; … };

if ($@) { … }

Bonus prize Other, non-DBI, code within the eval block may also raise an

exception that will be caught and handled cleanly

Transactions

To do or to undo,

that is the question

Transactions - What's it all about?

Far more than just locking The A.C.I.D. test True transactions give true safety

– even from power failures and system crashes!– Incomplete transactions are automatically rolled-back by the

database server when it's restarted.

Also removes burden of undoing incomplete changes Hard to implement (for the vendor) and has significant

performance cost A very large topic worthy of an entire tutorial

Transactions - Life Preserving

Classic: system crash between one bank account being debited and another being credited.

Dramatic: power failure during update statement on 3 million rows after 2 minutes when only part way through.

Real-world: complex series of inter-related updates, deletes and inserts on many separate tables fails at the last step due to a duplicate primary key on an insert.

Transaction recovery would handle all these situations automatically. Makes a system far more robust and trustworthy over the long term. Use transactions if your database supports them. If it doesn't and you need them, switch to a different database.

Transactions - How the DBI helps

Tools of the trade: Set AutoCommit off, and RaiseError on Wrap eval { … } around the code Use $dbh->commit; and $dbh->rollback;

Disable AutoCommit via $dbh->{AutoCommit} = 0;– to enable transactions and thus rollback-on-error

Enable RaiseError via $dbh->{RaiseError} = 1;– to automatically 'throw an exception' on error

The surrounding eval { … }– catches the exception and puts the error message into $@

Test $@ and rollback() if set, else commit()

Transactions - Example code

$dbh->{RaiseError} = 1;

eval {

$dbh->method(…); # assorted DBI calls

foo(...); # application code

$dbh->commit; # commit the changes

if ($@) {

warn "Transaction aborted because $@";

$dbh->rollback;

Transactions - Further comments

The eval { … } catches all exceptions– not just from DBI calls. Also catches fatal runtime errors from Perl.

Put commit() inside the eval– ensures commit failure is caught cleanly.– remember that commit() itself may fail for many reasons.

Don't forget that rollback() may also fail– due to database crash or network failure etc.

Other points:– Always explicitly commit or rollback before disconnect().– Destroying a connected $dbh should always rollback.– END blocks can catch exit-without-disconnect to rollback and

disconnect cleanly.

Speed Speed Speed!

What helps,what doesn't

Give me speed!

DBI was designed for speed from day one DBI dispatch written in hand-crafted XS/C Dispatch to XS driver method calls optimized Cached attributes returned directly by DBI dispatch DBI overhead is generally insignificant

– So we'll talk about other speed issues instead ...

Partition for speed

Application partitioning– stop and think - work smarter not harder

Pick the right database for the job, if you have the choice Work close to the data Use stored procedures where appropriate Moving data to/from the client is always expensive

– think of latency as well as bandwidth Use proprietary bulk-load where appropriate Consider database maths vs. Perl maths Multiple simple queries with 'joins' in Perl may be faster Consider local caching, in memory or DBM file etc. Mix 'n Match techniques as needed

Prepare for speed

prepare() - what happens in the server– Receive and parse the SQL statement into internal form– Get details for all the selected tables– Check access rights for each– Get details for all the selected fields– Check data types in expressions– Get details for all the indices of the tables– Develop an optimised query 'access plan' for best execution– Return a handle for all this cached information

This can be an expensive process - especially the 'access plan' Some databases, like MySQL, don't cache the information but

have simpler, and thus faster, plan creation

The Best Laid Plans

Query optimisation is hard Intelligent high quality cost based query optimisation is really hard!

Know your optimiser Oracle, Informix, Sybase, DB2, SQL Server etc. all slightly different.

Check what it's doing Use tools to see the plans used for your queries - very helpful

Help it along Most 'big name' databases have a mechanism to analyse and store the

key distributions of indices to help the optimiser make good plans– beware: keep it fresh, old key distributions might be worse than none

Some also allow you to embed 'hints' into the SQL as comments– beware: take it easy, over hinting hinders dynamic optimisation

(extra)

Respect your server's SQL cache

Optimised Access Plan etc. is cached within the server keyed by the exact original SQL string used

Without placeholders, SQL string varies each time so cached one is not reused and time is wasted creating a new access plan the new statement and access plan are added to cache so the cache fills and other statements get pushed out

Compare do("insert … $id");

with do("insert … ?", $id);

Hot handles

Use prepare() to get a handle on the statement in the SQL cache and keep it there.

The handle for the pre-prepared statement avoids a round-trip to server for cache-check / parsing etc.

Avoid using $dbh->do(…) in a loop. Use $sth = $dbh->prepare(…)

and $sth->execute() instead. For example, convert looped do("insert … ?", $id)

to prepare("insert … ?")

and looped execute($id).

Let the DBI cache your handles

Sometimes it's not easy to hold all your handles– e.g., library code to lookup values from the database given a $dbh

The prepare_cached() method gives you a client side statement handle cache: while ( ($field, $value) = each %search_fields ) { push @sql, "$field = ?";

push @values, $value;

$where = "";

$where = "where ".join(" and ", @sql) if @sql;

$sth = $dbh->prepare_cached("select * from table $where");

$sth->execute(@values);

Can avoid the need for global statement handle variables

Sling less for speed

while(@row = $sth->fetchrow_array) { } one field: 3,100 fetches per cpu second ten fields: 1,000 fetches per cpu second

while($row = $sth->fetchrow_arrayref) { } one field: 5,300 fetches per cpu second ten fields: 4,000 fetches per cpu second

Notes: Timings made on an old SPARC 10 using DBD::Oracle Timings assume instant record fetch within driver Fields all one char. @row would be even slower for more/bigger fields Use bind_columns() for direct access to fetched fields

Keep a handle on your databases

Connecting to a database can be slow Try to connect once and stay connected where practical

We'll discuss web server issues later

New method: The new connect_cached() method acts like prepare_cached() but for database handles

It also checks the connection and reconnects if it's broken Similar, but not quite the same as Apache::DBI New and subject to change with experience Potentially most useful with DBD::Proxy & DBI::ProxyServer

Portability

A Holy Grail

To be taken with a pinch of salt

Portability in practice

Portability requires care and testing - it can be tricky Platform Portability - the easier bit

Availability of database client software (and server if required) Availability of DBD driver DBD::Proxy can address both these issues - see later

Database Portability - more tricky Differences in SQL dialects cause most problems Differences in data types can also be a problem Driver capabilities (placeholders etc)

DBIx::Compat module (in DBIx::RecordSet) may be useful. A standard DBI test suite is needed (and planned).

The Power of the Proxy

Thin clients ...

and other buzz words

DBD::Proxy & DBI::ProxyServer

Networking for Non-networked databases DBD::Proxy driver forwards calls over network to remote

DBI::ProxyServer No changes in application behavior

Only the DBI->connect statement needs to be changed

Proxy can be made completely transparent by setting the DBI_AUTOPROXY environment variable

So not even the DBI->connect statement needs to be changed!

DBI::ProxyServer works on Win32 Access to Access and other Win32 ODBC and ADO data sources

Developed by Jochen Wiedmann

A proxy picture

Application

RPC::pClientStorable IO::Socket

DBD::Foo

DBI::ProxyServer

RPC::pServerIO:Socket Storable

Network

DBD::Proxy

Thin clients and other buzz words

Proxying for remote access: "thin-client"– No need for database client code on the DBI client

Proxying for network security: "encryption"– Can use Crypt::IDEA, Crypt::DES etc.

Proxying for "access control" and "firewalls"– handy for web servers

Proxying for action control– e.g., only allow specific select or insert statements

Proxying for local access: "n-tier middleware"– connect_cached(), memory use, licensing limits

Proxying for performance: "compression"– Can compress data transfers using Compress::Zlib

DBI for the Web

Hand waving from 30,000 feet

Web DBI - Connect speed

Databases can be slow to connect– Traditional CGI forces a new connect per request

Move Perl and DBI into the web server– Apache with mod_perl and Apache::DBI module– Microsoft IIS with ActiveState's PerlEx

Connections can then persist between requests– Apache::DBI automatically used by DBI if loaded– No CGI script changes required to get persistence

Take care not to change the shared session behaviour– Leave the $dbh in the same state you found it!

Other alternatives include FastCGI, CGI::SpeedyCGI and CGI::MiniSvr

Web DBI - Too many connections

Limits on Database Connections Database platform resources or licensing Memory consumption of web servers

Partition web servers into General and Database

Direct database work to the Database web servers Use Reverse Proxy / Redirect / Rewrite to achieve this Allows each subset of servers to be tuned to best fit workload And be run on appropriate platforms

Web DBI - State-less-ness

No fixed client-server pair– Each request can be handled by a different process.– So can't simply stop fetching when one page is complete and continue

fetching from the same $sth when the next page is requested.– And transactions can't span requests. Even if they could you'd have

problems with database locks being held etc.

Need access to 'accumulated state' somehow: – via the client (e.g., hidden form fields)– via the database (records in a session_state table)– via the web server file system (DBM files etc)

Need to purge old state info if stored on server, so timestamp it

– See Apache::Session module– DBI::ProxyServer + connect_cached with session ID may suit

Web DBI - Browsing pages of results

Re-execute query each time then count/discard (simple but expensive) works well for small results sets or where users rarely view many pages fast initial response, degrades gradually for later pages count/discard in server is better but still inefficient for large result sets affected by inserts and deletes from other processes

Re-execute query with where clause using min/max keys from last results works well where original query can be qualified in that way, not common

Select and cache full result rows somewhere for fast access can be expensive for large result sets with big fields

Select and cache only the row keys, fetch full rows as needed optimisation of above, use ROWID if supported, "select … where … in (…)"

If data is static and queries predictable then custom pre-built indexes may be useful The cache can be on web server, using DBM file for example, or on database

server using a table keyed by session id

Web DBI - Concurrent editing

How to prevent updates overwriting each other? You can use Optimistic Locking via 'qualified update':

update table set ...=...

where key = $old_key

and field1 = $old_field1

and field2 = $old_field2 and … for all fields Potential problems with floating point data values not matching Some databases support a high-resolution 'update timestamp'

field that can be checked instead

Check the update row count If it's zero then you know the record has been changed or deleted

by another process

Web DBI - Tips for the novice

Test one step at a time– Test perl + DBI + DBD driver outside the web server first– Test web server + non-DBI CGI next

Remember that CGI scripts run as a different user with a different environment - expect to be tripped up by that

DBI trace is your friend - use it Use the Perl "-w" and "-T" options. Always "use strict;" Read and inwardly digest the WWW Security FAQ:

http://www-genome.wi.mit.edu/WWW/faqs/www-security-faq.html Read the Perl CGI FAQ:

http://www.perl.com/cgi-bin/pace/pub/doc/FAQs/cgi/perl-cgi-faq.html And, if appropriate, read the mod_perl information available from:

http://perl.apache.org

Handling LONG/BLOB data

What makes LONG / BLOB / MEMO data special? Not practical to pre-allocate fixed size buffers for worst case

Fetching LONGs - treat as normal fields after setting: $dbh->{LongReadLen} - buffer size to allocate for expected data $dbh->{LongTruncOk} - should truncating-to-fit be allowed

Inserting LONGs The limitations of string literals The benefits of placeholders

Chunking / Piecewise processing not supported So you're limited to available memory Some drivers support an unofficial blob_read() method

DBI security tainting

By default the DBI ignores Perl tainting doesn't taint returned data doesn't check that parameters are not tainted

The Taint attribute enables that behaviour– If Perl itself is in taint mode

Each handle has it's own inherited Taint attribute So can be enabled for a connection and disabled for particular

statements, for example:

$dbh = DBI->connect(…, { Taint => 1 });

$sth = $dbh->prepare("select * from safe_table");

$sth->{Taint} = 0; # no tainting on this handle

Flexing the Multiplex

A new kid on the block

DBD::Multiplex

DBD::Multiplex - new and experimental Connects to multiple databases at once (via DBI) Executes statements on all of them by default Could be configured to:

insert into all databases but select from one fallback to alternate database if primary is unavailable select from all and check results (pick most common) select round-robin / or at random to distribute load

Can be used with DBD::Proxy, either above or below May also acquire fancy caching in later versions Watch this space!

Reference Materials

http://www.symbolstone.org/technology/perl/DBI– The DBI Home Page

http://www.isc.org/dbi-lists.html– To subscribe to the DBI Mailing Lists

http://www.perl.com/CPAN/authors/id/TIMB/DBI_Talk1_1997.tar.gz– My DBI session at The Perl Conference 1.0 - general introduction

http://www.perl.com/CPAN/authors/id/TIMB/DBI_Talk2_1998.tar.gz– My DBI session at The Perl Conference 2.0 - more depth

http://www.perl.com/CPAN/authors/id/TIMB/DBI_Talk3_1999.tar.gz– This DBI tutorial at The Perl Conference 3.0!

http://www.oreilly.com/catalog/perldbi/– or http://www.amazon.com/exec/obidos/ASIN/1565926994/dbi– The forthcoming DBI book! Due late 1999 or early 2000.– (Order in advance at the conference to get special discounted rate :-)

The end.

Till next year...

Advanced DBI Perl Conference 3.0 Tutorial August 1999 Tim Bunce.

dbi dbi

dbi classes

dbi data

driver dbi

dbi architecture

advanced dbi tutorial

dbi handle objects method

db method dbi

Documents

Postgres at CBC...Code Igniter ! Perl ! DBI Contests !...

Using Perl and DBI/DBD With Informix Databases Darryl Priest...

Using Perl DBI to Interface to MySQL

The Web Application Developer’s Vie · 2001. 12. 6. ·.....

Bunce Manor

Programming Perl DBI - The University of...

HARRY BUNCE - cdn.dmfa.co.uk

Perl Dbi Dpriest

Topic 8: Databases CSE2395/CSE3395 Perl Programming Llama3.....

Advanced Perl DBI - BackPan - Perl.org

Perl RDBMS Programming(DBI/DBIx::Sunnyのはなし)

Cory P. Bunce - Resume

Advanced Perl DBI

Advanced Perl DBI Making simple things easy and difficult...

Firewall SQL - n Firewall SQL.pdf · PDF file–C++, C, C#,...

Steve bunce mario & sonic