Database Access in Python

Authors: Tom Dunham
Date: 2009-04-05

Relational Databases

A Database Server

Database Users

MySQL

You and MySQL

A simple query

To return the user you connected as:

mysql> select user();
+----------------+
| user()         |
+----------------+
| root@localhost |
+----------------+
1 row in set (0.00 sec)

Python and MySQL

MySQLdb is an implementation of the Python Database API (see PEP 249 http://www.python.org/dev/peps/pep-0249/). The Python Database API is designed to give a set of guidelines to the authors of database connection libraries that aim to make them similar. The hope is that this makes code more portable across databases, and easier to understand.

Cursors

The safest, most database neutral way of processing rows is this:

cursor.execute('SELECT ...')
while 1:
      row = cursor.fetchone()
      if row == None:
         break
      # process row...

Since iterators were added in Python 2.2, most database libraries (including MySQLdb) have supported iterable cursors:

cursor.execute('SELECT ...')
for row in cursor:
      # process row...

Parameterization

Parameterization (2)

Always prefer parameterization to creating queries using string formatting. Parameters handle quoting values transparently, avoiding the need to examine a value before passing it to the DB, which saves the programmer work (and avoiding bugs for people named O'Malley). Many databases cache parsed queries to make subsequent calls more efficient - this is ineffective if every query differs by the values of its parameters.

Different databases use different characters as placeholders, you can find out which character by examining the paramstyle attribute of the module.

qmark Question mark style, e.g. ...WHERE name=?

numeric Numeric, positional style, e.g. ...WHERE name=:1

named Named style, e.g. ...WHERE name=:name

format ANSI C printf format codes, e.g. ...WHERE name=%s

pyformat Python extended format codes, e.g. ...WHERE name=%(name)s

Changing data

Transactions

This scarcely scratches the surface of transactions.

The ACID properties on Wikipedia http://en.wikipedia.org/wiki/ACID

InnoDB http://www.innodb.com/ http://dev.mysql.com/doc/refman/5.0/en/innodb.html

Bernstein & Newcomer Principles of Transaction Processing

Error Handling

Error Handling

The database can fail at any time - including between you reading the last result and closing the connections

MySQL does not throw an exception here, but it's not forbidden by the spec.

See PEP 249 for the list of exceptions.

Trapping DB Exceptions

Exceptions are not immediately obvious from a backtrace:

>>> curs.execute("select * from foo")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
...
_mysql_exceptions.ProgrammingError: (1146, "Table 'meps.foo' doesn't exist")

DB Exceptions

This is better:

from MySQLdb import DatabaseError, ProgrammingError

So is this (avoids importing them all):

try:
   ...
except db.ProgrammingError, err:
   ...

Resource Leaks

db = connect(...)
try:
    curs = db.cursor()
    try:
        try:
            curs.execute(...)
            db.commit()
        except:
            db.rollback()
            raise
    finally:
        curs.close()
finally:
   db.close()
Don't believe me? In "The Exception That Grounded an Airline", the first chapter in Release It! (Nygard 2007) Michal Nygard explains how a resource leak stopped checkins at an airline for three hours; long enough that it took the rest of the day to clear the backlog.

Context Managers

Exercises

See handout

Details for connecting to the database will be given in class.

You can use this function to parse rebase_allenz.txt into 8-value tuples:

def rebase_rec(itr):
    rec = []
    for line in itr:
        m = re.match(r"<([1-8])>(.*)", line)
        if m:
            field, val = int(m.group(1)), m.group(2)
            rec.append(val)
            if field == 8:
                yield tuple(rec)
                rec = []
  1. Write a program that loads all the records in rebase_allenz.txt into the database.

  2. Display the number of records for every source using:

    SELECT source, COUNT(*) FROM enzyme GROUP BY source ORDER BY 2 DESC;
    
  3. Display the names of the enzymes with a particular prototype:

    SELECT enzyme_name FROM enzyme WHERE prototype = 'Bsp1407I';