updated testimonials page

Certification Validation
certificate id: 74651
password: sheeri
               
                4.0 Core Certification Validation

Co-workers have said:

“It is with mixed feelings that I am writing today to tell you that Sheeri has accepted a job in the private sector and will be leaving Tufts on 22 February.

It’s hard to let go of a bright, cheerful and highly energetic co-worker like Sheeri. She always brings a lot of enthusiasm to bear on new projects and works hard at them even when they have lost their novelty and started to become tedious. Sheeri always puts her best effort into everything she takes on, so for purely selfish reasons I’m sorry to see her go.” – Bob Webber, Manager, University Systems Group, Tufts University

“Sheeri’s collaborative approach to systems’ design and administration ensured that relevant details came forth in the planning rather than testing or production stage of a project. Her care to get a job done well and prompt handling of issues enabled the team I led to be more productive than we had been under any other system administrator at Tufts.” http://tinyurl.com/q7fzv — must login to linkedin.com to view the original.

Clients have said:
“Your [technical] review has been really thorough, and you’ve gone above and beyond and done some things to make my job a lot easier!” — Brian Jepson, Editor, O’Reilly Media, Inc.

Other sites have said:
“Sheeri Kritzer is a certified MySQL administrator with an active interest and involvement in MySQL and the MySQL community. In her blog she discusses database security and performance in a practical and competent manner, while also covering events she is participating in as well as more general issues of interest to users and developers of MySQL. The blog seems to be updated regularly, and it contains many entries – all of which are well written and more extensive than one perhaps have come to expect from blogs today.” — Bill, the founder of PHPCentral.org at http://www.phpcentral.com/49-notable-mysql-blogs-reviews.html

Boston MySQL Meetup Group members have said the following about Sheeri’s presentations:

June 2006 Meetup
“Very KISS (keep it simple stupid). Presentation seemed accessible to people of different levels of technical knowledge; Jargon was kept to a minimum.”

March 2006 Meetup
“very interesting and informative”

January 2006 Meetup
“I found the meeting informative with many tips on backup techniques. Thank you for sharing your experience and for the pizza!”

“Very focused and informative presentation.”

“I thought it went rather well, I like the structure.”

Certification Validation
certificate id: 74651
password: sheeri
               
                4.0 Core Certification Validation

Co-workers have said:

“It is with mixed feelings that I am writing today to tell you that Sheeri has accepted a job in the private sector and will be leaving Tufts on 22 February.

It’s hard to let go of a bright, cheerful and highly energetic co-worker like Sheeri. She always brings a lot of enthusiasm to bear on new projects and works hard at them even when they have lost their novelty and started to become tedious. Sheeri always puts her best effort into everything she takes on, so for purely selfish reasons I’m sorry to see her go.” – Bob Webber, Manager, University Systems Group, Tufts University

“Sheeri’s collaborative approach to systems’ design and administration ensured that relevant details came forth in the planning rather than testing or production stage of a project. Her care to get a job done well and prompt handling of issues enabled the team I led to be more productive than we had been under any other system administrator at Tufts.” http://tinyurl.com/q7fzv — must login to linkedin.com to view the original.

Clients have said:
“Your [technical] review has been really thorough, and you’ve gone above and beyond and done some things to make my job a lot easier!” — Brian Jepson, Editor, O’Reilly Media, Inc.

Other sites have said:
“Sheeri Kritzer is a certified MySQL administrator with an active interest and involvement in MySQL and the MySQL community. In her blog she discusses database security and performance in a practical and competent manner, while also covering events she is participating in as well as more general issues of interest to users and developers of MySQL. The blog seems to be updated regularly, and it contains many entries – all of which are well written and more extensive than one perhaps have come to expect from blogs today.” — Bill, the founder of PHPCentral.org at http://www.phpcentral.com/49-notable-mysql-blogs-reviews.html

Boston MySQL Meetup Group members have said the following about Sheeri’s presentations:

June 2006 Meetup
“Very KISS (keep it simple stupid). Presentation seemed accessible to people of different levels of technical knowledge; Jargon was kept to a minimum.”

March 2006 Meetup
“very interesting and informative”

January 2006 Meetup
“I found the meeting informative with many tips on backup techniques. Thank you for sharing your experience and for the pizza!”

“Very focused and informative presentation.”

“I thought it went rather well, I like the structure.”

Example of Query Clarity with BETWEEN

A bit ago ago I posted some optimizer hints and recommended using them to write more readable queries. Today I stumbled across an example of my own work that I changed. This is not quite a case of using BETWEEN instead of AND statements, but it sort of is. Basically, I needed to reuse this query:

SELECT count(*),city,
CASE WHEN lastLogin < now ()-interval 90 day THEN 90 WHEN lastLogin < NOW()-interval 60 day THEN 60 WHEN lastLogin < NOW()-interval 30 day THEN 30 WHEN lastLogin > NOW()-interval 30 day THEN 0
ELSE -1
END
FROM . . .

I wrote this query less than a month ago.

I looked at this query today and wondered, “why did I not put lastLogin < now ()-interval 60 AND lastLogin>NOW()-interval 90 in there?” I then realized what I did.

Because the CASE statement evaluates in order, to make the query more concise I used a standard procedural programming trick — I put the older cases first. In this way, something older than 90 days gets caught in the first case, something between 60-90 days gets caught in the 2nd case, etc.

This is great, and probably even standard for a procedural language. And in fact, MySQL’s CASE statement works the same.

However, it is much more clear (in my mind) to write:

SELECT count(*),city,
CASE WHEN lastLogin < (NOW()-interval 90 day) THEN 90 WHEN lastLogin BETWEEN (NOW()-INTERVAL 60 DAY) AND (NOW()-INTERVAL 90 DAY) THEN 60 WHEN lastLogin BETWEEN (NOW()-INTERVAL 30 DAY) AND (NOW()-INTERVAL 60 DAY) THEN 30 WHEN lastLogin < (NOW()-INTERVAL 30 DAY) THEN 0 ELSE -1 END FROM . . .

It's more characters, which is less efficient. But I feel it is more readable, because it really does spell out declaratively what I am looking for.

A bit ago ago I posted some optimizer hints and recommended using them to write more readable queries. Today I stumbled across an example of my own work that I changed. This is not quite a case of using BETWEEN instead of AND statements, but it sort of is. Basically, I needed to reuse this query:

SELECT count(*),city,
CASE WHEN lastLogin < now ()-interval 90 day THEN 90 WHEN lastLogin < NOW()-interval 60 day THEN 60 WHEN lastLogin < NOW()-interval 30 day THEN 30 WHEN lastLogin > NOW()-interval 30 day THEN 0
ELSE -1
END
FROM . . .

I wrote this query less than a month ago.

I looked at this query today and wondered, “why did I not put lastLogin < now ()-interval 60 AND lastLogin>NOW()-interval 90 in there?” I then realized what I did.

Because the CASE statement evaluates in order, to make the query more concise I used a standard procedural programming trick — I put the older cases first. In this way, something older than 90 days gets caught in the first case, something between 60-90 days gets caught in the 2nd case, etc.

This is great, and probably even standard for a procedural language. And in fact, MySQL’s CASE statement works the same.

However, it is much more clear (in my mind) to write:

SELECT count(*),city,
CASE WHEN lastLogin < (NOW()-interval 90 day) THEN 90 WHEN lastLogin BETWEEN (NOW()-INTERVAL 60 DAY) AND (NOW()-INTERVAL 90 DAY) THEN 60 WHEN lastLogin BETWEEN (NOW()-INTERVAL 30 DAY) AND (NOW()-INTERVAL 60 DAY) THEN 30 WHEN lastLogin < (NOW()-INTERVAL 30 DAY) THEN 0 ELSE -1 END FROM . . .

It's more characters, which is less efficient. But I feel it is more readable, because it really does spell out declaratively what I am looking for.

things to blog about.

Cardinality.

Also a blog post on “using the sakila sample database”

Speeding-up Queries: New Features of the MySQL 5.0 Query Engine — Katchaounov — blog about it, find the slides, explain well.

new fulltext parser like sergei’s talk. dayum!

Things I learned
Tune server params is last resort AFTER application optimization (from Jay)

http://dev.mysql.com/doc/refman/5.0/en/communication-errors.html

http://lists.mysql.com/mysql/196677 and my response.

nested set/adjacency hybrid code from roland

FOUND_ROWS
ROW_COUNT
SQL_CALC_FOUND_ROWS — calculate how many there are when you’re doing a limit. (or SELECT CALC_FOUND ROWS)

http://firestuff.org/wordpress/2006/07/20/sqlsearch-100/

crosstab: http://rpbouman.blogspot.com/2005/10/creating-crosstabs-in-mysql.html
http://www.futhark.ch/mysql/106.html

(sun has a db tech group — neat, ponder a job?)

What about composite searches — and, or, not? Pivot tables? Crosstabs? SELECTs and defaults, as well as multiple selections? newsome’s “building sane query interfaces” get the code from newsome@acm.org (emailed 8/16)

‘mysqld –verbose –help’
how to find out how it was compiled?

Cardinality.

Also a blog post on “using the sakila sample database”

Speeding-up Queries: New Features of the MySQL 5.0 Query Engine — Katchaounov — blog about it, find the slides, explain well.

new fulltext parser like sergei’s talk. dayum!

Things I learned
Tune server params is last resort AFTER application optimization (from Jay)

http://dev.mysql.com/doc/refman/5.0/en/communication-errors.html

http://lists.mysql.com/mysql/196677 and my response.

nested set/adjacency hybrid code from roland

FOUND_ROWS
ROW_COUNT
SQL_CALC_FOUND_ROWS — calculate how many there are when you’re doing a limit. (or SELECT CALC_FOUND ROWS)

http://firestuff.org/wordpress/2006/07/20/sqlsearch-100/

crosstab: http://rpbouman.blogspot.com/2005/10/creating-crosstabs-in-mysql.html
http://www.futhark.ch/mysql/106.html

(sun has a db tech group — neat, ponder a job?)

What about composite searches — and, or, not? Pivot tables? Crosstabs? SELECTs and defaults, as well as multiple selections? newsome’s “building sane query interfaces” get the code from newsome@acm.org (emailed 8/16)

‘mysqld –verbose –help’
how to find out how it was compiled?

New Book Available: SQL Hacks

Like many other in the “Hacks” series, this book offers 100 tips and tricks for making the most of your database through querying.

See more information at:
http://www.oreilly.com/catalog/sqlhks/

It is about to be published (Nov. 2006). I believe you can order the final cut (as opposed to the “rough cut”), although last week they were still advertising it as a rough cut…now you can order it online, in a print book or to read on Safari.

As a technical editor for this book, I have read through all of it (or at least, all of it that was there in the draft I received) and I can say it is definitely worth the $30 USD. Most of the examples at the time I read it were in MySQL format (though they might have changed that to be more fair to other databases); while it has some material for DB2 and Access, it’s mostly a book for MySQL, Oracle and SQL Server (in that order).

There are hacks for all levels, and all abilities. If you’re a beginner who cannot seem to read the manual, or an expert who refers to it all the time, there’s plenty in this book for you. I learned a lot just by having to edit it, and in fact rewrote a couple of important queries using what I had learned in the book. I would put it on a ‘top 5 books to read if you are a DBA or write SQL’.

Like many other in the “Hacks” series, this book offers 100 tips and tricks for making the most of your database through querying.

See more information at:
http://www.oreilly.com/catalog/sqlhks/

It is about to be published (Nov. 2006). I believe you can order the final cut (as opposed to the “rough cut”), although last week they were still advertising it as a rough cut…now you can order it online, in a print book or to read on Safari.

As a technical editor for this book, I have read through all of it (or at least, all of it that was there in the draft I received) and I can say it is definitely worth the $30 USD. Most of the examples at the time I read it were in MySQL format (though they might have changed that to be more fair to other databases); while it has some material for DB2 and Access, it’s mostly a book for MySQL, Oracle and SQL Server (in that order).

There are hacks for all levels, and all abilities. If you’re a beginner who cannot seem to read the manual, or an expert who refers to it all the time, there’s plenty in this book for you. I learned a lot just by having to edit it, and in fact rewrote a couple of important queries using what I had learned in the book. I would put it on a ‘top 5 books to read if you are a DBA or write SQL’.

Isolation & Concurrency

A few meetups ago, we had Jim Starkey come and speak. During that evening, he asked “Why is MVCC not serializable?” It was a “go and do your homework” kind of question.

So, in order to answer that question, let’s explore the concepts of isolation and concurrency, the “i” and “c” in ACID compliance. For this post, I will refer to “transactions”, by which I mean actual transactions, as well as statements where autocommit=0. In other words, every statement is its own transaction, unless it is within an explicit transaction, and for the purpose of this article, those statements are “transactions”. Of course, for many of the explanations below, if there’s only one statement in the transaction, there’s no problem. (As usual, if I make a mistake, please correct me. I will make sure all comments from all sources get to the article at http://www.sheeri.net/123 .)

Isolation levels define how separate the transaction data is from the actual data in the table. So if I begin a transaction, read a row from the table, change the row, read the row from the table, and commit, what do I see at each step? What does someone else who is also reading the same row see?

Isolation Level: Read Uncommitted
# Step Thread 1: changing data Thread 2: reading data
0 Thread 1: Read row
Thread 2: Read row
sees data from original row sees data from original row
1 Thread 1: Change Row
Thread 2: N/A
changes data N/A
2 Thread 1: Read row
Thread 2: Read row
sees new data sees new data
3 Thread 1: Commit
Thread 2: N/A
Commits data N/A
4 Thread 1: Read row
Thread 2: Read row
sees new data sees new data

So a read uncommitted is the lowest level of isolation in that there is none. This is how statements that are implicit transactions act — you see the data right away, even if what you are doing is logically a transaction. This is also called a “dirty read” because information is seen right away. If the transaction rolled back, and ‘undid’ the change, then the original data would be back for all to see.

More isolated than read uncommitted read is “read committed”, which, as you can guess, means that thread 2 can only read committed changes.

The description in bold is the difference between read uncommitted and read commited:

Isolation Level: Read Committed
# Step Thread 1: changing data Thread 2: reading data
0 Thread 1: Read row
Thread 2: Read row
sees data from original row sees data from original row
1 Thread 1: Change Row
Thread 2: N/A
changes data N/A
2 Thread 1: Read row
Thread 2: Read row
sees new data sees data from the original row
3 Thread 1: Commit
Thread 2: N/A
Commits data N/A
4 Thread 1: Read row
Thread 2: Read row
sees new data sees new data

However, this means that for the duration of thread 2‘s transaction, the data may change. This is not necessarily desirable, and is seen as a ‘dirty’ read because for the duration of thread 2’s transaction, the state of the data changes.

So if thread 1 commits a change, thread 2 does not see it until it reads the changed row in a transaction that started after the commit.

The part in bold is the difference between read uncommitted and repeatable read:

Isolation Level: Repeatable Read
# Step Thread 1: changing data Thread 2: reading data
0 Thread 1: Read row
Thread 2: Read row
sees data from original row sees data from original row
1 Thread 1: Change Row
Thread 2: N/A
changes data N/A
2 Thread 1: Read row
Thread 2: Read row
sees new data sees data from the original row
3 Thread 1: Commit
Thread 2: N/A
Commits data N/A
4 Thread 1: Read row
Thread 2: Read row
sees new data sees data from original table
5 Thread 1: N/A
Thread 2: New transaction
Thread 2 starts new transaction N/A
6 Thread 1: Read row
Thread 2: Read row
sees new data sees new data

This sounds like a good level of isolation. However, there are many cases where a transaction will want the most recent available data. For instance, if two people who have a joint bank account take out money at the same time, having two systems start with the same balance, update the new balance and then write the new balance to the database will cause the bank to lose a lot of money, as only one of the two withdrawals will be reflected.

The serializable isolation level helps prevent this — it does not allow a read of data that is being written to. In other words, there is a serial order to transactions, even reads.

The part in bold is the difference between repeatable read and serializable:

Isolation Level: Serializable
# Step Thread 1: changing data Thread 2: reading data
0 Thread 1: Read row
Thread 2: Read row
sees data from original row sees data from original row
1 Thread 1: Change Row
Thread 2: N/A
changes data N/A
2 Thread 1: Read row
Thread 2: Read row
sees new data is not allowed to start the transaction of reading
3 Thread 1: Commit
Thread 2: N/A
Commits data N/A
4 Thread 1: Read row
Thread 2: Read row
sees new data is allowed to start the read transaction,
sees new data
5 Thread 1: N/A
Thread 2: New transaction
Thread 2 starts new transaction N/A
6 Thread 1: Read row
Thread 2: Read row
sees new data sees new data

I have not yet addressed how these restrictions are enforced, that’s what concurrency is.

Concurrency is how the different working environments occur at the same time. Folks who know about version control understand concurrency issues. One way to control concurrency is to use READ and WRITE locks. To achieve a read uncommitted isolation level, there are no locks. For read committed, a read lock on the data allows multiple threads to read the data, but no thread may write to the data; a write lock does not allow reading until the lock is released. For repeatable read, a read lock means that the thread always sees the same data even if it is changed on disk. For serializable, only one lock can be on the data at a time.

Those who deal with version control know that there is a better system than those locks. Imagine not being able to look at a code file while someone was changing it! In many version control systems, each user has a working copy where changes can be made, which is hidden from everyone else until commit. That is multi-version concurrency control (MVCC) — there are multiple versions, all out there at once concurrently. MySQL employes MVCC.

The serializable isolation level makes more sense using the MVCC model — before a software developer commits, they ususally update their files to see if other changes have been made. This does not quite carry over to databases.

Within a multi-statement transaction (for storage engines with support), the database gives a snapshot of the data to the transaction to work with, much like checking out a version. The physical database may be changing with other commits, but much like version control in software, you only see the data you ‘checked out’ for the duration of the transaction.

Therefore, all isolation levels up to repeatable read are easy to enforce, but serializable is more difficult. There is no way to “update” the snapshot of data within a transaction, to use what’s currently in the database; you can only look at the data as it was at the beginning of the transaction (repeatable read isolation level).

So that’s why MVCC in a database is not serializable.

To be clear: You can set the isolation level to be SERIALIZABLE in MySQL.

See:
http://dev.mysql.com/doc/refman/5.0/en/innodb-transaction-isolation.html

and

http://dev.mysql.com/doc/refman/5.0/en/innodb-transaction-isolation.html

A few meetups ago, we had Jim Starkey come and speak. During that evening, he asked “Why is MVCC not serializable?” It was a “go and do your homework” kind of question.

So, in order to answer that question, let’s explore the concepts of isolation and concurrency, the “i” and “c” in ACID compliance. For this post, I will refer to “transactions”, by which I mean actual transactions, as well as statements where autocommit=0. In other words, every statement is its own transaction, unless it is within an explicit transaction, and for the purpose of this article, those statements are “transactions”. Of course, for many of the explanations below, if there’s only one statement in the transaction, there’s no problem. (As usual, if I make a mistake, please correct me. I will make sure all comments from all sources get to the article at http://www.sheeri.net/123 .)
Continue reading “Isolation & Concurrency”

One TRIGGER Fact, Optimizer Rewriting Stuff

  • One thing about TRIGGERs is that you can have 6 triggers per table:
    BEFORE
    AFTER
        INSERT
    UPDATE
    DELETE

    That was brought up at the User Group meeting.

    In other news, I have been reading the MySQL Internals Manual. The most interesting chapter yet is Chapter 3, the optimizer. This can be read out of order, and actually contains lots of information that folks can use. For instance:

    WHERE col1=col2 AND col2=’x’
    transforms to
    WHERE col1=’x’ AND col2=’x’

    because constant matching is faster.

    Tautologies and always-false statements are usually optimized out of the query so that something like WHERE 0=0 does not waste time by actually being tested.

    A table with 0 or 1 rows is a constant value. Another constant table scenario is WHERE col1=[constant] for all the columns of any UNIQUE or PRIMARY KEYS, assuming they are defined as NOT NULL. This is a good reason to use NOT NULL for column definitions unless you really need NULL.

    A constant table is what is used by the optimizer when you see “const” as the “join type” in the result of an EXPLAIN.

    WHERE col1 in (1,2,3) is the exact same expression as WHERE col1=1 OR col1=2 OR col1=3

    WHERE col1 BETWEEN 1 AND 3 is the exact same expression as WHERE col1>=1 AND col1< =3

    The latter 2 examples are good to know, because the first of the two statements in each case are more clear.

  • One thing about TRIGGERs is that you can have 6 triggers per table:
    BEFORE
    AFTER
        INSERT
    UPDATE
    DELETE

    That was brought up at the User Group meeting.

    In other news, I have been reading the MySQL Internals Manual. The most interesting chapter yet is Chapter 3, the optimizer. This can be read out of order, and actually contains lots of information that folks can use. For instance:

    WHERE col1=col2 AND col2=’x’
    transforms to
    WHERE col1=’x’ AND col2=’x’

    because constant matching is faster.

    Tautologies and always-false statements are usually optimized out of the query so that something like WHERE 0=0 does not waste time by actually being tested.

    A table with 0 or 1 rows is a constant value. Another constant table scenario is WHERE col1=[constant] for all the columns of any UNIQUE or PRIMARY KEYS, assuming they are defined as NOT NULL. This is a good reason to use NOT NULL for column definitions unless you really need NULL.

    A constant table is what is used by the optimizer when you see “const” as the “join type” in the result of an EXPLAIN.

    WHERE col1 in (1,2,3) is the exact same expression as WHERE col1=1 OR col1=2 OR col1=3

    WHERE col1 BETWEEN 1 AND 3 is the exact same expression as WHERE col1>=1 AND col1< =3

    The latter 2 examples are good to know, because the first of the two statements in each case are more clear.

  • Pros and Cons of Descriptive Foreign Keys?

    Given the following scenario:

    Addresses
    name group city
    Johnny Friends Cambridge
    Kevin Friends Allston
    Justin Coworkers Easton
    Aya Coworkers North Andover
    Josh Family Groton
    Connie Family Easton

    In a database, that could be represented as such:

    CREATE TABLE GroupNum (
    groupId TINYINT UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT,
    groupName VARCHAR(20)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

    CREATE TABLE AddressNum (
    firstName VARCHAR(20),
    groupId TINYINT UNSIGNED NOT NULL,
    city VARCHAR(20),
    KEY (groupId),
    FOREIGN KEY (groupId) REFERENCES GroupNum (groupId)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

    INSERT INTO GroupNum (groupName) VALUES ('Friends'),('Coworkers'),('Family');
    INSERT INTO AddressNum (firstName,groupId,city) VALUES
    ('Johnny',1,'Cambridge'),
    ('Kevin',1,'Allston'),
    ('Justin',2,'Easton'),
    ('Aya',2,'North Andover'),
    ('Josh',3,'Groton'),
    ('Connie',3,'Easton');

    SELECT firstName,groupName,city
    FROM AddressNum INNER JOIN GroupNum USING (groupId);

    firstName groupName city
    Johnny Friends Cambridge
    Kevin Friends Allston
    Justin Coworkers Easton
    Aya Coworkers North Andover
    Josh Family Groton
    Connie Family Easton

    or, using descriptive foreign keys:

    CREATE TABLE GroupText (
    groupName VARCHAR(20) NOT NULL PRIMARY KEY
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

    CREATE TABLE AddressText (
    firstName VARCHAR(20),
    groupName VARCHAR(20) NOT NULL,
    city VARCHAR(20),
    KEY (groupName),
    FOREIGN KEY (groupName) REFERENCES GroupText (groupName)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

    INSERT INTO GroupText (groupName) VALUES ('Friends'),('Coworkers'),('Family');
    INSERT INTO AddressText (firstName,groupName,city) VALUES
    ('Johnny','Friends','Cambridge'),
    ('Kevin','Friends','Allston'),
    ('Justin','Coworkers','Easton'),
    ('Aya','Coworkers','North Andover'),
    ('Josh','Family','Groton'),
    ('Connie','Family','Easton');

    SELECT firstName,groupName,city
    FROM AddressText;

    firstName groupName city
    Johnny Friends Cambridge
    Kevin Friends Allston
    Justin Coworkers Easton
    Aya Coworkers North Andover
    Josh Family Groton
    Connie Family Easton

    So what are the pros and cons of using descriptive foreign keys, as opposed to surrogate keys?

    Con:
    1) An auto-increment surrogate key (a number) is referenced faster than strings, so INSERTS will be slightly slower.
    2) Data redundancy. Foreign keys need to be updated (and/or deleted) whenever the data is updated.

    Pro:
    1) Using a surrogate key means that to get user-readable data (ie, the text) a JOIN must be made. Using a descriptive foreign key, no JOIN needs to be made to get user-readable data. This includes reading with SELECTs and when INSERTing or modifying data. For example, to insert into AddressNum the groupId is needed.
    2) Updates/deletes on a foreign key can be set to cascade.
    3) Human-readable data in the database. This will make it easier for the DBA to debug data issues, as they can use a SHOW CREATE TABLE to find the foreign key references, and then get a sample of data from one table and understand the scenario, without having to JOIN.

    My thinking right now is that I would use a descriptive foreign key where the forign key is “static” data, or the foreign key references are used in a small number of tables.

    For instance, in a user-based application, I would not use “username” as a descriptive foreign key, I would stick with a surrogate key of “uid”, because the foreign key is ubiquitous, and if a user changed their username, every single table would need to cascade — or else the update would fail.

    However, I would use descriptive foreign keys for “static” data like locale (language) data, country and state names, because they likely will not change, even though they might affect many changes. Athough note that in these places, I’m also using standards — “locale” for language, ISO country and state/province codes, etc.

    I would also use it in the scenario above, where the data is not static, but also does not touch many tables — the ‘group name’ only affects the ‘listings’ table if it gets changed.

    So, what are the pros and cons of this method? When would you and would you not use it?

    Given the following scenario:

    Addresses
    name group city
    Johnny Friends Cambridge
    Kevin Friends Allston
    Justin Coworkers Easton
    Aya Coworkers North Andover
    Josh Family Groton
    Connie Family Easton

    In a database, that could be represented as such:

    CREATE TABLE GroupNum (
    groupId TINYINT UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT,
    groupName VARCHAR(20)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

    CREATE TABLE AddressNum (
    firstName VARCHAR(20),
    groupId TINYINT UNSIGNED NOT NULL,
    city VARCHAR(20),
    KEY (groupId),
    FOREIGN KEY (groupId) REFERENCES GroupNum (groupId)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

    INSERT INTO GroupNum (groupName) VALUES ('Friends'),('Coworkers'),('Family');
    INSERT INTO AddressNum (firstName,groupId,city) VALUES
    ('Johnny',1,'Cambridge'),
    ('Kevin',1,'Allston'),
    ('Justin',2,'Easton'),
    ('Aya',2,'North Andover'),
    ('Josh',3,'Groton'),
    ('Connie',3,'Easton');

    SELECT firstName,groupName,city
    FROM AddressNum INNER JOIN GroupNum USING (groupId);

    firstName groupName city
    Johnny Friends Cambridge
    Kevin Friends Allston
    Justin Coworkers Easton
    Aya Coworkers North Andover
    Josh Family Groton
    Connie Family Easton

    or, using descriptive foreign keys:

    CREATE TABLE GroupText (
    groupName VARCHAR(20) NOT NULL PRIMARY KEY
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

    CREATE TABLE AddressText (
    firstName VARCHAR(20),
    groupName VARCHAR(20) NOT NULL,
    city VARCHAR(20),
    KEY (groupName),
    FOREIGN KEY (groupName) REFERENCES GroupText (groupName)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

    INSERT INTO GroupText (groupName) VALUES ('Friends'),('Coworkers'),('Family');
    INSERT INTO AddressText (firstName,groupName,city) VALUES
    ('Johnny','Friends','Cambridge'),
    ('Kevin','Friends','Allston'),
    ('Justin','Coworkers','Easton'),
    ('Aya','Coworkers','North Andover'),
    ('Josh','Family','Groton'),
    ('Connie','Family','Easton');

    SELECT firstName,groupName,city
    FROM AddressText;

    firstName groupName city
    Johnny Friends Cambridge
    Kevin Friends Allston
    Justin Coworkers Easton
    Aya Coworkers North Andover
    Josh Family Groton
    Connie Family Easton

    So what are the pros and cons of using descriptive foreign keys, as opposed to surrogate keys?

    Con:
    1) An auto-increment surrogate key (a number) is referenced faster than strings, so INSERTS will be slightly slower.
    2) Data redundancy. Foreign keys need to be updated (and/or deleted) whenever the data is updated.

    Pro:
    1) Using a surrogate key means that to get user-readable data (ie, the text) a JOIN must be made. Using a descriptive foreign key, no JOIN needs to be made to get user-readable data. This includes reading with SELECTs and when INSERTing or modifying data. For example, to insert into AddressNum the groupId is needed.
    2) Updates/deletes on a foreign key can be set to cascade.
    3) Human-readable data in the database. This will make it easier for the DBA to debug data issues, as they can use a SHOW CREATE TABLE to find the foreign key references, and then get a sample of data from one table and understand the scenario, without having to JOIN.

    My thinking right now is that I would use a descriptive foreign key where the forign key is “static” data, or the foreign key references are used in a small number of tables.

    For instance, in a user-based application, I would not use “username” as a descriptive foreign key, I would stick with a surrogate key of “uid”, because the foreign key is ubiquitous, and if a user changed their username, every single table would need to cascade — or else the update would fail.

    However, I would use descriptive foreign keys for “static” data like locale (language) data, country and state names, because they likely will not change, even though they might affect many changes. Athough note that in these places, I’m also using standards — “locale” for language, ISO country and state/province codes, etc.

    I would also use it in the scenario above, where the data is not static, but also does not touch many tables — the ‘group name’ only affects the ‘listings’ table if it gets changed.

    So, what are the pros and cons of this method? When would you and would you not use it?

    Declarative Language: Transactions

    Developers* seem to think of a transaction as a block of code. A try/catch block seems analogous at first glance — try to do a series of actions, and if one fails, roll them all back, else commit. So many people trip up and try to write code within a transaction to avoid deadlock or race conditions.

    This is the source of the problems with the commands that perform an implicit COMMIT (that links to the 4.1 and under documentation, be sure to check manual for the specific version you are using). Implicit COMMITs can throw a lot of unexpected bugs in, so try to use those commands wisely. Transactions are session-specific. So a COMMIT, implicit or otherwise, affects any existing transaction within that session — as START TRANSACTION performs an implicit commit, you can only work on one transaction at a time in a session. The SQL Standard provides the START TRANSACTION, ROLLBACK and COMMIT syntax.

    However, a transaction really it should be treated as a command. The point of a transaction is to be atomic. It is designed put a few tasks in a natural group. The beauty of a declarative language like SQL is that I tell it what I want, not how to do it. The optimizer figures out how to do it. In procedural and object-oriented programming, I specify how to do something.

    Trying to fix a race condition or deadlock by using locking within a transaction defeats the point of a transaction. Locking a table, changing the schema, or starting another transaction, is not a natural grouping. I keep using “natural” but that is my made-up term for it.

    Think of a joint bank account — if one person uses their bank card to pay at a restaurant, the waiter will go to the terminal and authorize an amount — larger than the bill, probably 30% larger, to ensure that when the gratuity is added, there is enough money in the account to cover it. After the tip is added and the bill signed, the receipt gets put into a pile, and once a day a batch is run, which is when the money actually gets taken out of your account.

    Now imagine the other person fills their tank with gasoline. When the bank card is swiped at the gas pump, an amount is authorized**, and the driver is allowed to start filling the tank. At the end of the fill, the exact amount is determined and sent off to the bank for processing (or done in a batch, as above).

    What banks and credit cards do is authorize the amount and “hold” it — this is why the “available” bank balance may be different than the “actual” bank balance. That is the “natural” way to avoid a race condition. What developers try to do by adding “unnatural” statements such as locking tables, or transactions within transactions, is analogous to a bank skipping authorization and not allowing any transactions to occur until a payment clears.

    This is obviously primitive to folks who want to purchase more than one item per day with their card. And yet, developers foam at the mouth, cursing the database engine (all have implicit commits, MySQL is not alone) for not doing what they tell it to. Of course the database engine does not do what you tell it to — it is not designed to. It is designed to give you the answer you want, not follow the path you specify.

    So the moral of the story is that care should be taken that transactions actually encompass the very smallest atomic operation, not “things to group in a block”. Stored procedures actually help in the matter of getting around the issue folks have with implicit commits; in fact, while researching this article I found that the MySQL Reference Manual for Version 3.23.10 alpha states:

    For the moment, we are much more for implementing the SQL server language (something like stored procedures). With this you would very seldom really need COMMIT-ROLLBACK. This would also give much better performance.

    * When I say developers, I mean “people who think like developers” meaning they do not grok the declarative nature of SQL.

    ** I found this out the hard way driving from Boston to Chicago and back in a 24′ moving truck. I was filling the tank and the pump stopped filling at $50, and I had to re-swipe my debit card.

    Developers* seem to think of a transaction as a block of code. A try/catch block seems analogous at first glance — try to do a series of actions, and if one fails, roll them all back, else commit. So many people trip up and try to write code within a transaction to avoid deadlock or race conditions.

    This is the source of the problems with the commands that perform an implicit COMMIT (that links to the 4.1 and under documentation, be sure to check manual for the specific version you are using). Implicit COMMITs can throw a lot of unexpected bugs in, so try to use those commands wisely. Transactions are session-specific. So a COMMIT, implicit or otherwise, affects any existing transaction within that session — as START TRANSACTION performs an implicit commit, you can only work on one transaction at a time in a session. The SQL Standard provides the START TRANSACTION, ROLLBACK and COMMIT syntax.

    However, a transaction really it should be treated as a command. The point of a transaction is to be atomic. It is designed put a few tasks in a natural group. The beauty of a declarative language like SQL is that I tell it what I want, not how to do it. The optimizer figures out how to do it. In procedural and object-oriented programming, I specify how to do something.

    Trying to fix a race condition or deadlock by using locking within a transaction defeats the point of a transaction. Locking a table, changing the schema, or starting another transaction, is not a natural grouping. I keep using “natural” but that is my made-up term for it.

    Think of a joint bank account — if one person uses their bank card to pay at a restaurant, the waiter will go to the terminal and authorize an amount — larger than the bill, probably 30% larger, to ensure that when the gratuity is added, there is enough money in the account to cover it. After the tip is added and the bill signed, the receipt gets put into a pile, and once a day a batch is run, which is when the money actually gets taken out of your account.

    Now imagine the other person fills their tank with gasoline. When the bank card is swiped at the gas pump, an amount is authorized**, and the driver is allowed to start filling the tank. At the end of the fill, the exact amount is determined and sent off to the bank for processing (or done in a batch, as above).

    What banks and credit cards do is authorize the amount and “hold” it — this is why the “available” bank balance may be different than the “actual” bank balance. That is the “natural” way to avoid a race condition. What developers try to do by adding “unnatural” statements such as locking tables, or transactions within transactions, is analogous to a bank skipping authorization and not allowing any transactions to occur until a payment clears.

    This is obviously primitive to folks who want to purchase more than one item per day with their card. And yet, developers foam at the mouth, cursing the database engine (all have implicit commits, MySQL is not alone) for not doing what they tell it to. Of course the database engine does not do what you tell it to — it is not designed to. It is designed to give you the answer you want, not follow the path you specify.

    So the moral of the story is that care should be taken that transactions actually encompass the very smallest atomic operation, not “things to group in a block”. Stored procedures actually help in the matter of getting around the issue folks have with implicit commits; in fact, while researching this article I found that the MySQL Reference Manual for Version 3.23.10 alpha states:

    For the moment, we are much more for implementing the SQL server language (something like stored procedures). With this you would very seldom really need COMMIT-ROLLBACK. This would also give much better performance.

    * When I say developers, I mean “people who think like developers” meaning they do not grok the declarative nature of SQL.

    ** I found this out the hard way driving from Boston to Chicago and back in a 24′ moving truck. I was filling the tank and the pump stopped filling at $50, and I had to re-swipe my debit card.

    COMMIT and Non-Transactional Tables

    At the July MySQL User Group Meeting, Jim Starkey wondered aloud, “What happens when I COMMIT on a memory table?” I wrote the question down, to research it later.

    The obvious answer is “COMMIT on a non-transactional table does nothing.”

    Tonight I was thinking about this, and I realized I do not actually COMMIT “on a table.”

    The manual page at: http://dev.mysql.com/doc/refman/4.1/en/commit.html (and the 5.0 and 5.1 equivalents) state:

    By default, MySQL runs with autocommit mode enabled. This means that as soon as you execute a statement that updates (modifies) a table, MySQL stores the update on disk.

    If you are using a transaction-safe storage engine (such as InnoDB, BDB, or NDB Cluster), you can disable autocommit mode with the following statement:

    SET AUTOCOMMIT=0;

    But what does If you are using a transaction-safe storage engine really mean? I ask because I do not specify a table type with that command. So I tried it on a fresh MySQL install (5.0.19 GA, on a RedHat Linux Enterprise machine):

    [skritzer]$ mysql -u root
    Welcome to the MySQL monitor. Commands end with ; or \g.
    Your MySQL connection id is 1 to server version: 5.0.19-standard
    Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

    mysql> select distinct table_schema,engine from information_schema.tables\G
    *************************** 1. row ***************************
    table_schema: information_schema
    engine: MEMORY
    *************************** 2. row ***************************
    table_schema: information_schema
    engine: MyISAM
    *************************** 3. row ***************************
    table_schema: mysql
    engine: MyISAM
    3 rows in set (0.00 sec)

    mysql> START TRANSACTION;
    Query OK, 0 rows affected (0.00 sec)

    mysql> COMMIT;
    Query OK, 0 rows affected (0.00 sec)

    mysql> SET autocommit=0;
    Query OK, 0 rows affected (0.00 sec)

    mysql> START TRANSACTION;
    Query OK, 0 rows affected (0.00 sec)

    mysql> COMMIT;
    Query OK, 0 rows affected (0.00 sec)

    mysql> SET autocommit=1;
    Query OK, 0 rows affected (0.00 sec)

    As you can see, this machine has no user-created tables (seriously, it was just installed today, that user connection id of 1 is because it was the first time I logged into it). From my playing around, I can extrapolate that If you are using a transaction-safe storage engine really means If you have a transaction-safe storage engine enabled.

    Let’s try it out:


    mysql> show engines\G
    *************************** 1. row ***************************
    Engine: MyISAM
    Support: DEFAULT
    Comment: Default engine as of MySQL 3.23 with great performance
    *************************** 2. row ***************************
    Engine: MEMORY
    Support: YES
    Comment: Hash based, stored in memory, useful for temporary tables
    *************************** 3. row ***************************
    Engine: InnoDB
    Support: YES
    Comment: Supports transactions, row-level locking, and foreign keys
    *************************** 4. row ***************************
    Engine: BerkeleyDB
    Support: NO
    Comment: Supports transactions and page-level locking
    *************************** 5. row ***************************
    Engine: BLACKHOLE
    Support: NO
    Comment: /dev/null storage engine (anything you write to it disappears)
    *************************** 6. row ***************************
    Engine: EXAMPLE
    Support: NO
    Comment: Example storage engine
    *************************** 7. row ***************************
    Engine: ARCHIVE
    Support: YES
    Comment: Archive storage engine
    *************************** 8. row ***************************
    Engine: CSV
    Support: NO
    Comment: CSV storage engine
    *************************** 9. row ***************************
    Engine: ndbcluster
    Support: NO
    Comment: Clustered, fault-tolerant, memory-based tables
    *************************** 10. row ***************************
    Engine: FEDERATED
    Support: NO
    Comment: Federated MySQL storage engine
    *************************** 11. row ***************************
    Engine: MRG_MYISAM
    Support: YES
    Comment: Collection of identical MyISAM tables
    *************************** 12. row ***************************
    Engine: ISAM
    Support: NO
    Comment: Obsolete storage engine
    12 rows in set (0.00 sec)

    mysql> exit
    Bye
    [skritzer]$ ls -l /etc/my.cnf
    ls: /etc/my.cnf: No such file or directory
    [skritzer]$ sudo vi /etc/my.cnf

    We trust you have received the usual lecture from the local System
    Administrator. It usually boils down to these two things:

    #1) Respect the privacy of others.
    #2) Think before you type.

    Password:
    [mysqld]
    skip-innodb
    :wq
    "/etc/my.cnf" 2L, 21C written
    [skritzer]$ sudo /etc/init.d/mysql stop
    Shutting down MySQL.. [ OK ]
    [skritzer]$ ps -ef | grep mysql
    skritzer 7253 7179 0 21:07 pts/0 00:00:00 grep mysql
    [skritzer]$ sudo /etc/init.d/mysql start
    Starting MySQL[ OK ]
    [skritzer]$ ps -ef | grep mysql
    root 7261 1 0 21:07 pts/0 00:00:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql --pid-file=/var/lib/mysql/mysql.pid
    mysql 7282 7261 0 21:07 pts/0 00:00:00 /usr/sbin/mysqld --basedir=/ --datadir=/var/lib/mysql --user=mysql --pid-file=/var/lib/mysql/mysql.pid --skip-locking
    skritzer 7290 7179 0 21:07 pts/0 00:00:00 grep mysql
    [skritzer]$ mysql -u root
    Welcome to the MySQL monitor. Commands end with ; or \g.
    Your MySQL connection id is 1 to server version: 5.0.19-standard

    Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

    mysql> show engines\G
    *************************** 1. row ***************************
    Engine: MyISAM
    Support: DEFAULT
    Comment: Default engine as of MySQL 3.23 with great performance
    *************************** 2. row ***************************
    Engine: MEMORY
    Support: YES
    Comment: Hash based, stored in memory, useful for temporary tables
    *************************** 3. row ***************************
    Engine: InnoDB
    Support: YES
    Comment: Supports transactions, row-level locking, and foreign keys
    *************************** 4. row ***************************
    Engine: BerkeleyDB
    Support: NO
    Comment: Supports transactions and page-level locking
    *************************** 5. row ***************************
    Engine: BLACKHOLE
    Support: NO
    Comment: /dev/null storage engine (anything you write to it disappears)
    *************************** 6. row ***************************
    Engine: EXAMPLE
    Support: NO
    Comment: Example storage engine
    *************************** 7. row ***************************
    Engine: ARCHIVE
    Support: YES
    Comment: Archive storage engine
    *************************** 8. row ***************************
    Engine: CSV
    Support: NO
    Comment: CSV storage engine
    *************************** 9. row ***************************
    Engine: ndbcluster
    Support: NO
    Comment: Clustered, fault-tolerant, memory-based tables
    *************************** 10. row ***************************
    Engine: FEDERATED
    Support: NO
    Comment: Federated MySQL storage engine
    *************************** 11. row ***************************
    Engine: MRG_MYISAM
    Support: YES
    Comment: Collection of identical MyISAM tables
    *************************** 12. row ***************************
    Engine: ISAM
    Support: NO
    Comment: Obsolete storage engine
    12 rows in set (0.00 sec)

    mysql> SET autocommit=0;
    Query OK, 0 rows affected (0.00 sec)

    mysql> START TRANSACTION;
    Query OK, 0 rows affected (0.00 sec)

    mysql> COMMIT;
    Query OK, 0 rows affected (0.00 sec)

    mysql> SET autocommit=1;
    Query OK, 0 rows affected (0.00 sec)

    mysql> START TRANSACTION;
    Query OK, 0 rows affected (0.00 sec)

    mysql> COMMIT;
    Query OK, 0 rows affected (0.00 sec)

    No errors or warnings are generated. So technically speaking, I can SET autocommit=0 no matter what storage engines are in use, for any of the varying definitions of in use.

    At the July MySQL User Group Meeting, Jim Starkey wondered aloud, “What happens when I COMMIT on a memory table?” I wrote the question down, to research it later.

    The obvious answer is “COMMIT on a non-transactional table does nothing.”

    Tonight I was thinking about this, and I realized I do not actually COMMIT “on a table.”

    The manual page at: http://dev.mysql.com/doc/refman/4.1/en/commit.html (and the 5.0 and 5.1 equivalents) state:

    By default, MySQL runs with autocommit mode enabled. This means that as soon as you execute a statement that updates (modifies) a table, MySQL stores the update on disk.

    If you are using a transaction-safe storage engine (such as InnoDB, BDB, or NDB Cluster), you can disable autocommit mode with the following statement:

    SET AUTOCOMMIT=0;

    But what does If you are using a transaction-safe storage engine really mean? I ask because I do not specify a table type with that command. So I tried it on a fresh MySQL install (5.0.19 GA, on a RedHat Linux Enterprise machine):
    Continue reading “COMMIT and Non-Transactional Tables”

    Twins and PlanetMySQL

    What is it about the folks on Planet MySQL having twin brothers?

    Roland Bouman
    Sheeri Kritzer
    Jay Pipes
    Zach Urlocker

    That’s 4 of the top 25 posters to Planet MySQL. Anyone else want to reveal having a twin? Anyone on here have a twin sister? If you’re a twin and aren’t on the Planet, note that here too……

    What is it about the folks on Planet MySQL having twin brothers?

    Roland Bouman
    Sheeri Kritzer
    Jay Pipes
    Zach Urlocker

    That’s 4 of the top 25 posters to Planet MySQL. Anyone else want to reveal having a twin? Anyone on here have a twin sister? If you’re a twin and aren’t on the Planet, note that here too……