Backup – Page 4 – Sheeri Dot Org

updated testimonials page

	Certification Validation certificate id: 74651 password: sheeri
			4.0 Core Certification Validation

Co-workers have said:

“It is with mixed feelings that I am writing today to tell you that Sheeri has accepted a job in the private sector and will be leaving Tufts on 22 February.

It’s hard to let go of a bright, cheerful and highly energetic co-worker like Sheeri. She always brings a lot of enthusiasm to bear on new projects and works hard at them even when they have lost their novelty and started to become tedious. Sheeri always puts her best effort into everything she takes on, so for purely selfish reasons I’m sorry to see her go.” – Bob Webber, Manager, University Systems Group, Tufts University

“Sheeri’s collaborative approach to systems’ design and administration ensured that relevant details came forth in the planning rather than testing or production stage of a project. Her care to get a job done well and prompt handling of issues enabled the team I led to be more productive than we had been under any other system administrator at Tufts.” http://tinyurl.com/q7fzv — must login to linkedin.com to view the original.

Clients have said:
“Your [technical] review has been really thorough, and you’ve gone above and beyond and done some things to make my job a lot easier!” — Brian Jepson, Editor, O’Reilly Media, Inc.

Other sites have said:
“Sheeri Kritzer is a certified MySQL administrator with an active interest and involvement in MySQL and the MySQL community. In her blog she discusses database security and performance in a practical and competent manner, while also covering events she is participating in as well as more general issues of interest to users and developers of MySQL. The blog seems to be updated regularly, and it contains many entries – all of which are well written and more extensive than one perhaps have come to expect from blogs today.” — Bill, the founder of PHPCentral.org at http://www.phpcentral.com/49-notable-mysql-blogs-reviews.html

Boston MySQL Meetup Group members have said the following about Sheeri’s presentations:

June 2006 Meetup
“Very KISS (keep it simple stupid). Presentation seemed accessible to people of different levels of technical knowledge; Jargon was kept to a minimum.”

March 2006 Meetup
“very interesting and informative”

January 2006 Meetup
“I found the meeting informative with many tips on backup techniques. Thank you for sharing your experience and for the pizza!”

“Very focused and informative presentation.”

“I thought it went rather well, I like the structure.”

	Certification Validation certificate id: 74651 password: sheeri
			4.0 Core Certification Validation

Co-workers have said:

“It is with mixed feelings that I am writing today to tell you that Sheeri has accepted a job in the private sector and will be leaving Tufts on 22 February.

March 2006 Meetup
“very interesting and informative”

January 2006 Meetup
“I found the meeting informative with many tips on backup techniques. Thank you for sharing your experience and for the pizza!”

“Very focused and informative presentation.”

“I thought it went rather well, I like the structure.”

Example of Query Clarity with BETWEEN

A bit ago ago I posted some optimizer hints and recommended using them to write more readable queries. Today I stumbled across an example of my own work that I changed. This is not quite a case of using BETWEEN instead of AND statements, but it sort of is. Basically, I needed to reuse this query:

SELECT count(*),city, CASE WHEN lastLogin < now ()-interval 90 day THEN 90 WHEN lastLogin < NOW()-interval 60 day THEN 60 WHEN lastLogin < NOW()-interval 30 day THEN 30 WHEN lastLogin > NOW()-interval 30 day THEN 0 ELSE -1 END FROM . . .

I wrote this query less than a month ago.

I looked at this query today and wondered, “why did I not put lastLogin < now ()-interval 60 AND lastLogin>NOW()-interval 90 in there?” I then realized what I did.

Because the CASE statement evaluates in order, to make the query more concise I used a standard procedural programming trick — I put the older cases first. In this way, something older than 90 days gets caught in the first case, something between 60-90 days gets caught in the 2nd case, etc.

This is great, and probably even standard for a procedural language. And in fact, MySQL’s CASE statement works the same.

However, it is much more clear (in my mind) to write:

SELECT count(*),city, CASE WHEN lastLogin < (NOW()-interval 90 day) THEN 90 WHEN lastLogin BETWEEN (NOW()-INTERVAL 60 DAY) AND (NOW()-INTERVAL 90 DAY) THEN 60 WHEN lastLogin BETWEEN (NOW()-INTERVAL 30 DAY) AND (NOW()-INTERVAL 60 DAY) THEN 30 WHEN lastLogin < (NOW()-INTERVAL 30 DAY) THEN 0 ELSE -1 END FROM . . .

It's more characters, which is less efficient. But I feel it is more readable, because it really does spell out declaratively what I am looking for.

I wrote this query less than a month ago.

I looked at this query today and wondered, “why did I not put lastLogin < now ()-interval 60 AND lastLogin>NOW()-interval 90 in there?” I then realized what I did.

This is great, and probably even standard for a procedural language. And in fact, MySQL’s CASE statement works the same.

However, it is much more clear (in my mind) to write:

It's more characters, which is less efficient. But I feel it is more readable, because it really does spell out declaratively what I am looking for.

things to blog about.

Cardinality.

Also a blog post on “using the sakila sample database”

Speeding-up Queries: New Features of the MySQL 5.0 Query Engine — Katchaounov — blog about it, find the slides, explain well.

new fulltext parser like sergei’s talk. dayum!

Things I learned
Tune server params is last resort AFTER application optimization (from Jay)

http://dev.mysql.com/doc/refman/5.0/en/communication-errors.html

http://lists.mysql.com/mysql/196677 and my response.

nested set/adjacency hybrid code from roland

FOUND_ROWS
ROW_COUNT
SQL_CALC_FOUND_ROWS — calculate how many there are when you’re doing a limit. (or SELECT CALC_FOUND ROWS)

http://firestuff.org/wordpress/2006/07/20/sqlsearch-100/

crosstab: http://rpbouman.blogspot.com/2005/10/creating-crosstabs-in-mysql.html
http://www.futhark.ch/mysql/106.html

(sun has a db tech group — neat, ponder a job?)

What about composite searches — and, or, not? Pivot tables? Crosstabs? SELECTs and defaults, as well as multiple selections? newsome’s “building sane query interfaces” get the code from newsome@acm.org (emailed 8/16)

‘mysqld –verbose –help’
how to find out how it was compiled?

Cardinality.

Also a blog post on “using the sakila sample database”

Speeding-up Queries: New Features of the MySQL 5.0 Query Engine — Katchaounov — blog about it, find the slides, explain well.

new fulltext parser like sergei’s talk. dayum!

Things I learned
Tune server params is last resort AFTER application optimization (from Jay)

http://dev.mysql.com/doc/refman/5.0/en/communication-errors.html

http://lists.mysql.com/mysql/196677 and my response.

nested set/adjacency hybrid code from roland

FOUND_ROWS
ROW_COUNT
SQL_CALC_FOUND_ROWS — calculate how many there are when you’re doing a limit. (or SELECT CALC_FOUND ROWS)

http://firestuff.org/wordpress/2006/07/20/sqlsearch-100/

crosstab: http://rpbouman.blogspot.com/2005/10/creating-crosstabs-in-mysql.html
http://www.futhark.ch/mysql/106.html

(sun has a db tech group — neat, ponder a job?)

‘mysqld –verbose –help’
how to find out how it was compiled?

New Book Available: SQL Hacks

Like many other in the “Hacks” series, this book offers 100 tips and tricks for making the most of your database through querying.

See more information at:
http://www.oreilly.com/catalog/sqlhks/

It is about to be published (Nov. 2006). I believe you can order the final cut (as opposed to the “rough cut”), although last week they were still advertising it as a rough cut…now you can order it online, in a print book or to read on Safari.

As a technical editor for this book, I have read through all of it (or at least, all of it that was there in the draft I received) and I can say it is definitely worth the $30 USD. Most of the examples at the time I read it were in MySQL format (though they might have changed that to be more fair to other databases); while it has some material for DB2 and Access, it’s mostly a book for MySQL, Oracle and SQL Server (in that order).

There are hacks for all levels, and all abilities. If you’re a beginner who cannot seem to read the manual, or an expert who refers to it all the time, there’s plenty in this book for you. I learned a lot just by having to edit it, and in fact rewrote a couple of important queries using what I had learned in the book. I would put it on a ‘top 5 books to read if you are a DBA or write SQL’.

Like many other in the “Hacks” series, this book offers 100 tips and tricks for making the most of your database through querying.

See more information at:
http://www.oreilly.com/catalog/sqlhks/

Isolation & Concurrency

A few meetups ago, we had Jim Starkey come and speak. During that evening, he asked “Why is MVCC not serializable?” It was a “go and do your homework” kind of question.

So, in order to answer that question, let’s explore the concepts of isolation and concurrency, the “i” and “c” in ACID compliance. For this post, I will refer to “transactions”, by which I mean actual transactions, as well as statements where autocommit=0. In other words, every statement is its own transaction, unless it is within an explicit transaction, and for the purpose of this article, those statements are “transactions”. Of course, for many of the explanations below, if there’s only one statement in the transaction, there’s no problem. (As usual, if I make a mistake, please correct me. I will make sure all comments from all sources get to the article at http://www.sheeri.net/123 .)

Isolation levels define how separate the transaction data is from the actual data in the table. So if I begin a transaction, read a row from the table, change the row, read the row from the table, and commit, what do I see at each step? What does someone else who is also reading the same row see?

Isolation Level: Read Uncommitted
#	Step	Thread 1: changing data	Thread 2: reading data
0	Thread 1: Read row Thread 2: Read row	sees data from original row	sees data from original row
1	Thread 1: Change Row Thread 2: N/A	changes data	N/A
2	Thread 1: Read row Thread 2: Read row	sees new data	sees new data
3	Thread 1: Commit Thread 2: N/A	Commits data	N/A
4	Thread 1: Read row Thread 2: Read row	sees new data	sees new data

So a read uncommitted is the lowest level of isolation in that there is none. This is how statements that are implicit transactions act — you see the data right away, even if what you are doing is logically a transaction. This is also called a “dirty read” because information is seen right away. If the transaction rolled back, and ‘undid’ the change, then the original data would be back for all to see.

More isolated than read uncommitted read is “read committed”, which, as you can guess, means that thread 2 can only read committed changes.

The description in bold is the difference between read uncommitted and read commited:

Isolation Level: Read Committed
#	Step	Thread 1: changing data	Thread 2: reading data
0	Thread 1: Read row Thread 2: Read row	sees data from original row	sees data from original row
1	Thread 1: Change Row Thread 2: N/A	changes data	N/A
2	Thread 1: Read row Thread 2: Read row	sees new data	sees data from the original row
3	Thread 1: Commit Thread 2: N/A	Commits data	N/A
4	Thread 1: Read row Thread 2: Read row	sees new data	sees new data

However, this means that for the duration of thread 2‘s transaction, the data may change. This is not necessarily desirable, and is seen as a ‘dirty’ read because for the duration of thread 2’s transaction, the state of the data changes.

So if thread 1 commits a change, thread 2 does not see it until it reads the changed row in a transaction that started after the commit.

The part in bold is the difference between read uncommitted and repeatable read:

Isolation Level: Repeatable Read
#	Step	Thread 1: changing data	Thread 2: reading data
0	Thread 1: Read row Thread 2: Read row	sees data from original row	sees data from original row
1	Thread 1: Change Row Thread 2: N/A	changes data	N/A
2	Thread 1: Read row Thread 2: Read row	sees new data	sees data from the original row
3	Thread 1: Commit Thread 2: N/A	Commits data	N/A
4	Thread 1: Read row Thread 2: Read row	sees new data	sees data from original table
5	Thread 1: N/A Thread 2: New transaction	Thread 2 starts new transaction	N/A
6	Thread 1: Read row Thread 2: Read row	sees new data	sees new data

This sounds like a good level of isolation. However, there are many cases where a transaction will want the most recent available data. For instance, if two people who have a joint bank account take out money at the same time, having two systems start with the same balance, update the new balance and then write the new balance to the database will cause the bank to lose a lot of money, as only one of the two withdrawals will be reflected.

The serializable isolation level helps prevent this — it does not allow a read of data that is being written to. In other words, there is a serial order to transactions, even reads.

The part in bold is the difference between repeatable read and serializable:

Isolation Level: Serializable
#	Step	Thread 1: changing data	Thread 2: reading data
0	Thread 1: Read row Thread 2: Read row	sees data from original row	sees data from original row
1	Thread 1: Change Row Thread 2: N/A	changes data	N/A
2	Thread 1: Read row Thread 2: Read row	sees new data	is not allowed to start the transaction of reading
3	Thread 1: Commit Thread 2: N/A	Commits data	N/A
4	Thread 1: Read row Thread 2: Read row	sees new data	is allowed to start the read transaction, sees new data
5	Thread 1: N/A Thread 2: New transaction	Thread 2 starts new transaction	N/A
6	Thread 1: Read row Thread 2: Read row	sees new data	sees new data

I have not yet addressed how these restrictions are enforced, that’s what concurrency is.

Concurrency is how the different working environments occur at the same time. Folks who know about version control understand concurrency issues. One way to control concurrency is to use READ and WRITE locks. To achieve a read uncommitted isolation level, there are no locks. For read committed, a read lock on the data allows multiple threads to read the data, but no thread may write to the data; a write lock does not allow reading until the lock is released. For repeatable read, a read lock means that the thread always sees the same data even if it is changed on disk. For serializable, only one lock can be on the data at a time.

Those who deal with version control know that there is a better system than those locks. Imagine not being able to look at a code file while someone was changing it! In many version control systems, each user has a working copy where changes can be made, which is hidden from everyone else until commit. That is multi-version concurrency control (MVCC) — there are multiple versions, all out there at once concurrently. MySQL employes MVCC.

The serializable isolation level makes more sense using the MVCC model — before a software developer commits, they ususally update their files to see if other changes have been made. This does not quite carry over to databases.

Within a multi-statement transaction (for storage engines with support), the database gives a snapshot of the data to the transaction to work with, much like checking out a version. The physical database may be changing with other commits, but much like version control in software, you only see the data you ‘checked out’ for the duration of the transaction.

Therefore, all isolation levels up to repeatable read are easy to enforce, but serializable is more difficult. There is no way to “update” the snapshot of data within a transaction, to use what’s currently in the database; you can only look at the data as it was at the beginning of the transaction (repeatable read isolation level).

So that’s why MVCC in a database is not serializable.

To be clear: You can set the isolation level to be SERIALIZABLE in MySQL.

See:
http://dev.mysql.com/doc/refman/5.0/en/innodb-transaction-isolation.html

and

http://dev.mysql.com/doc/refman/5.0/en/innodb-transaction-isolation.html

A few meetups ago, we had Jim Starkey come and speak. During that evening, he asked “Why is MVCC not serializable?” It was a “go and do your homework” kind of question.

One TRIGGER Fact, Optimizer Rewriting Stuff

One thing about TRIGGERs is that you can have 6 triggers per table:

BEFORE
AFTER

INSERT
UPDATE
DELETE

That was brought up at the User Group meeting.

In other news, I have been reading the MySQL Internals Manual. The most interesting chapter yet is Chapter 3, the optimizer. This can be read out of order, and actually contains lots of information that folks can use. For instance:

WHERE col1=col2 AND col2=’x’
transforms to
WHERE col1=’x’ AND col2=’x’

because constant matching is faster.

Tautologies and always-false statements are usually optimized out of the query so that something like WHERE 0=0 does not waste time by actually being tested.

A table with 0 or 1 rows is a constant value. Another constant table scenario is WHERE col1=[constant] for all the columns of any UNIQUE or PRIMARY KEYS, assuming they are defined as NOT NULL. This is a good reason to use NOT NULL for column definitions unless you really need NULL.

A constant table is what is used by the optimizer when you see “const” as the “join type” in the result of an EXPLAIN.

WHERE col1 in (1,2,3) is the exact same expression as WHERE col1=1 OR col1=2 OR col1=3

WHERE col1 BETWEEN 1 AND 3 is the exact same expression as WHERE col1>=1 AND col1< =3

The latter 2 examples are good to know, because the first of the two statements in each case are more clear.

One thing about TRIGGERs is that you can have 6 triggers per table:

BEFORE
AFTER

INSERT
UPDATE
DELETE

That was brought up at the User Group meeting.

WHERE col1=col2 AND col2=’x’
transforms to
WHERE col1=’x’ AND col2=’x’

because constant matching is faster.

Tautologies and always-false statements are usually optimized out of the query so that something like WHERE 0=0 does not waste time by actually being tested.

A constant table is what is used by the optimizer when you see “const” as the “join type” in the result of an EXPLAIN.

WHERE col1 in (1,2,3) is the exact same expression as WHERE col1=1 OR col1=2 OR col1=3

WHERE col1 BETWEEN 1 AND 3 is the exact same expression as WHERE col1>=1 AND col1< =3

The latter 2 examples are good to know, because the first of the two statements in each case are more clear.

Pros and Cons of Descriptive Foreign Keys?

Given the following scenario:

Addresses
name	group	city
Johnny	Friends	Cambridge
Kevin	Friends	Allston
Justin	Coworkers	Easton
Aya	Coworkers	North Andover
Josh	Family	Groton
Connie	Family	Easton

In a database, that could be represented as such:

CREATE TABLE GroupNum ( groupId TINYINT UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT, groupName VARCHAR(20) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE AddressNum ( firstName VARCHAR(20), groupId TINYINT UNSIGNED NOT NULL, city VARCHAR(20), KEY (groupId), FOREIGN KEY (groupId) REFERENCES GroupNum (groupId) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

INSERT INTO GroupNum (groupName) VALUES ('Friends'),('Coworkers'),('Family'); INSERT INTO AddressNum (firstName,groupId,city) VALUES ('Johnny',1,'Cambridge'), ('Kevin',1,'Allston'), ('Justin',2,'Easton'), ('Aya',2,'North Andover'), ('Josh',3,'Groton'), ('Connie',3,'Easton');

SELECT firstName,groupName,city FROM AddressNum INNER JOIN GroupNum USING (groupId);

firstName groupName city

Johnny Friends Cambridge

Kevin Friends Allston

Justin Coworkers Easton

Aya Coworkers North Andover

Josh Family Groton

Connie Family Easton

firstName	groupName	city
Johnny	Friends	Cambridge
Kevin	Friends	Allston
Justin	Coworkers	Easton
Aya	Coworkers	North Andover
Josh	Family	Groton
Connie	Family	Easton

or, using descriptive foreign keys:

CREATE TABLE GroupText ( groupName VARCHAR(20) NOT NULL PRIMARY KEY ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE AddressText ( firstName VARCHAR(20), groupName VARCHAR(20) NOT NULL, city VARCHAR(20), KEY (groupName), FOREIGN KEY (groupName) REFERENCES GroupText (groupName) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

INSERT INTO GroupText (groupName) VALUES ('Friends'),('Coworkers'),('Family'); INSERT INTO AddressText (firstName,groupName,city) VALUES ('Johnny','Friends','Cambridge'), ('Kevin','Friends','Allston'), ('Justin','Coworkers','Easton'), ('Aya','Coworkers','North Andover'), ('Josh','Family','Groton'), ('Connie','Family','Easton');

SELECT firstName,groupName,city FROM AddressText;

firstName groupName city

Johnny Friends Cambridge

Kevin Friends Allston

Justin Coworkers Easton

Aya Coworkers North Andover

Josh Family Groton

Connie Family Easton

firstName	groupName	city
Johnny	Friends	Cambridge
Kevin	Friends	Allston
Justin	Coworkers	Easton
Aya	Coworkers	North Andover
Josh	Family	Groton
Connie	Family	Easton

So what are the pros and cons of using descriptive foreign keys, as opposed to surrogate keys?

Con:
1) An auto-increment surrogate key (a number) is referenced faster than strings, so INSERTS will be slightly slower.
2) Data redundancy. Foreign keys need to be updated (and/or deleted) whenever the data is updated.

Pro:
1) Using a surrogate key means that to get user-readable data (ie, the text) a JOIN must be made. Using a descriptive foreign key, no JOIN needs to be made to get user-readable data. This includes reading with SELECTs and when INSERTing or modifying data. For example, to insert into AddressNum the groupId is needed.
2) Updates/deletes on a foreign key can be set to cascade.
3) Human-readable data in the database. This will make it easier for the DBA to debug data issues, as they can use a SHOW CREATE TABLE to find the foreign key references, and then get a sample of data from one table and understand the scenario, without having to JOIN.

My thinking right now is that I would use a descriptive foreign key where the forign key is “static” data, or the foreign key references are used in a small number of tables.

For instance, in a user-based application, I would not use “username” as a descriptive foreign key, I would stick with a surrogate key of “uid”, because the foreign key is ubiquitous, and if a user changed their username, every single table would need to cascade — or else the update would fail.

However, I would use descriptive foreign keys for “static” data like locale (language) data, country and state names, because they likely will not change, even though they might affect many changes. Athough note that in these places, I’m also using standards — “locale” for language, ISO country and state/province codes, etc.

I would also use it in the scenario above, where the data is not static, but also does not touch many tables — the ‘group name’ only affects the ‘listings’ table if it gets changed.

So, what are the pros and cons of this method? When would you and would you not use it?

Given the following scenario:

Addresses
name	group	city
Johnny	Friends	Cambridge
Kevin	Friends	Allston
Justin	Coworkers	Easton
Aya	Coworkers	North Andover
Josh	Family	Groton
Connie	Family	Easton

In a database, that could be represented as such:

CREATE TABLE GroupNum ( groupId TINYINT UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT, groupName VARCHAR(20) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE AddressNum ( firstName VARCHAR(20), groupId TINYINT UNSIGNED NOT NULL, city VARCHAR(20), KEY (groupId), FOREIGN KEY (groupId) REFERENCES GroupNum (groupId) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

INSERT INTO GroupNum (groupName) VALUES ('Friends'),('Coworkers'),('Family'); INSERT INTO AddressNum (firstName,groupId,city) VALUES ('Johnny',1,'Cambridge'), ('Kevin',1,'Allston'), ('Justin',2,'Easton'), ('Aya',2,'North Andover'), ('Josh',3,'Groton'), ('Connie',3,'Easton');

SELECT firstName,groupName,city FROM AddressNum INNER JOIN GroupNum USING (groupId);

firstName groupName city

Johnny Friends Cambridge

Kevin Friends Allston

Justin Coworkers Easton

Aya Coworkers North Andover

Josh Family Groton

Connie Family Easton

firstName	groupName	city
Johnny	Friends	Cambridge
Kevin	Friends	Allston
Justin	Coworkers	Easton
Aya	Coworkers	North Andover
Josh	Family	Groton
Connie	Family	Easton

or, using descriptive foreign keys:

CREATE TABLE GroupText ( groupName VARCHAR(20) NOT NULL PRIMARY KEY ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE AddressText ( firstName VARCHAR(20), groupName VARCHAR(20) NOT NULL, city VARCHAR(20), KEY (groupName), FOREIGN KEY (groupName) REFERENCES GroupText (groupName) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

INSERT INTO GroupText (groupName) VALUES ('Friends'),('Coworkers'),('Family'); INSERT INTO AddressText (firstName,groupName,city) VALUES ('Johnny','Friends','Cambridge'), ('Kevin','Friends','Allston'), ('Justin','Coworkers','Easton'), ('Aya','Coworkers','North Andover'), ('Josh','Family','Groton'), ('Connie','Family','Easton');

SELECT firstName,groupName,city FROM AddressText;

firstName groupName city

Johnny Friends Cambridge

Kevin Friends Allston

Justin Coworkers Easton

Aya Coworkers North Andover

Josh Family Groton

Connie Family Easton

firstName	groupName	city
Johnny	Friends	Cambridge
Kevin	Friends	Allston
Justin	Coworkers	Easton
Aya	Coworkers	North Andover
Josh	Family	Groton
Connie	Family	Easton

So what are the pros and cons of using descriptive foreign keys, as opposed to surrogate keys?

My thinking right now is that I would use a descriptive foreign key where the forign key is “static” data, or the foreign key references are used in a small number of tables.

I would also use it in the scenario above, where the data is not static, but also does not touch many tables — the ‘group name’ only affects the ‘listings’ table if it gets changed.

So, what are the pros and cons of this method? When would you and would you not use it?

Declarative Language: Transactions

Developers* seem to think of a transaction as a block of code. A try/catch block seems analogous at first glance — try to do a series of actions, and if one fails, roll them all back, else commit. So many people trip up and try to write code within a transaction to avoid deadlock or race conditions.

This is the source of the problems with the commands that perform an implicit COMMIT (that links to the 4.1 and under documentation, be sure to check manual for the specific version you are using). Implicit COMMITs can throw a lot of unexpected bugs in, so try to use those commands wisely. Transactions are session-specific. So a COMMIT, implicit or otherwise, affects any existing transaction within that session — as START TRANSACTION performs an implicit commit, you can only work on one transaction at a time in a session. The SQL Standard provides the START TRANSACTION, ROLLBACK and COMMIT syntax.

However, a transaction really it should be treated as a command. The point of a transaction is to be atomic. It is designed put a few tasks in a natural group. The beauty of a declarative language like SQL is that I tell it what I want, not how to do it. The optimizer figures out how to do it. In procedural and object-oriented programming, I specify how to do something.

Trying to fix a race condition or deadlock by using locking within a transaction defeats the point of a transaction. Locking a table, changing the schema, or starting another transaction, is not a natural grouping. I keep using “natural” but that is my made-up term for it.

Think of a joint bank account — if one person uses their bank card to pay at a restaurant, the waiter will go to the terminal and authorize an amount — larger than the bill, probably 30% larger, to ensure that when the gratuity is added, there is enough money in the account to cover it. After the tip is added and the bill signed, the receipt gets put into a pile, and once a day a batch is run, which is when the money actually gets taken out of your account.

Now imagine the other person fills their tank with gasoline. When the bank card is swiped at the gas pump, an amount is authorized**, and the driver is allowed to start filling the tank. At the end of the fill, the exact amount is determined and sent off to the bank for processing (or done in a batch, as above).

What banks and credit cards do is authorize the amount and “hold” it — this is why the “available” bank balance may be different than the “actual” bank balance. That is the “natural” way to avoid a race condition. What developers try to do by adding “unnatural” statements such as locking tables, or transactions within transactions, is analogous to a bank skipping authorization and not allowing any transactions to occur until a payment clears.

This is obviously primitive to folks who want to purchase more than one item per day with their card. And yet, developers foam at the mouth, cursing the database engine (all have implicit commits, MySQL is not alone) for not doing what they tell it to. Of course the database engine does not do what you tell it to — it is not designed to. It is designed to give you the answer you want, not follow the path you specify.

So the moral of the story is that care should be taken that transactions actually encompass the very smallest atomic operation, not “things to group in a block”. Stored procedures actually help in the matter of getting around the issue folks have with implicit commits; in fact, while researching this article I found that the MySQL Reference Manual for Version 3.23.10 alpha states:

For the moment, we are much more for implementing the SQL server language (something like stored procedures). With this you would very seldom really need COMMIT-ROLLBACK. This would also give much better performance.

* When I say developers, I mean “people who think like developers” meaning they do not grok the declarative nature of SQL.

** I found this out the hard way driving from Boston to Chicago and back in a 24′ moving truck. I was filling the tank and the pump stopped filling at $50, and I had to re-swipe my debit card.

For the moment, we are much more for implementing the SQL server language (something like stored procedures). With this you would very seldom really need COMMIT-ROLLBACK. This would also give much better performance.

* When I say developers, I mean “people who think like developers” meaning they do not grok the declarative nature of SQL.

** I found this out the hard way driving from Boston to Chicago and back in a 24′ moving truck. I was filling the tank and the pump stopped filling at $50, and I had to re-swipe my debit card.

COMMIT and Non-Transactional Tables

At the July MySQL User Group Meeting, Jim Starkey wondered aloud, “What happens when I COMMIT on a memory table?” I wrote the question down, to research it later.

The obvious answer is “COMMIT on a non-transactional table does nothing.”

Tonight I was thinking about this, and I realized I do not actually COMMIT “on a table.”

The manual page at: http://dev.mysql.com/doc/refman/4.1/en/commit.html (and the 5.0 and 5.1 equivalents) state:

By default, MySQL runs with autocommit mode enabled. This means that as soon as you execute a statement that updates (modifies) a table, MySQL stores the update on disk.

If you are using a transaction-safe storage engine (such as InnoDB, BDB, or NDB Cluster), you can disable autocommit mode with the following statement:

SET AUTOCOMMIT=0;

But what does If you are using a transaction-safe storage engine really mean? I ask because I do not specify a table type with that command. So I tried it on a fresh MySQL install (5.0.19 GA, on a RedHat Linux Enterprise machine):

[skritzer]$ mysql -u root Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 1 to server version: 5.0.19-standard Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> select distinct table_schema,engine from information_schema.tables\G *************************** 1. row *************************** table_schema: information_schema engine: MEMORY *************************** 2. row *************************** table_schema: information_schema engine: MyISAM *************************** 3. row *************************** table_schema: mysql engine: MyISAM 3 rows in set (0.00 sec)

mysql> START TRANSACTION; Query OK, 0 rows affected (0.00 sec)

mysql> COMMIT; Query OK, 0 rows affected (0.00 sec)

mysql> SET autocommit=0; Query OK, 0 rows affected (0.00 sec)

mysql> START TRANSACTION; Query OK, 0 rows affected (0.00 sec)

mysql> COMMIT; Query OK, 0 rows affected (0.00 sec)

mysql> SET autocommit=1; Query OK, 0 rows affected (0.00 sec)

As you can see, this machine has no user-created tables (seriously, it was just installed today, that user connection id of 1 is because it was the first time I logged into it). From my playing around, I can extrapolate that If you are using a transaction-safe storage engine really means If you have a transaction-safe storage engine enabled.

Let’s try it out:

mysql> show engines\G *************************** 1. row *************************** Engine: MyISAM Support: DEFAULT Comment: Default engine as of MySQL 3.23 with great performance *************************** 2. row *************************** Engine: MEMORY Support: YES Comment: Hash based, stored in memory, useful for temporary tables *************************** 3. row *************************** Engine: InnoDB Support: YES Comment: Supports transactions, row-level locking, and foreign keys *************************** 4. row *************************** Engine: BerkeleyDB Support: NO Comment: Supports transactions and page-level locking *************************** 5. row *************************** Engine: BLACKHOLE Support: NO Comment: /dev/null storage engine (anything you write to it disappears) *************************** 6. row *************************** Engine: EXAMPLE Support: NO Comment: Example storage engine *************************** 7. row *************************** Engine: ARCHIVE Support: YES Comment: Archive storage engine *************************** 8. row *************************** Engine: CSV Support: NO Comment: CSV storage engine *************************** 9. row *************************** Engine: ndbcluster Support: NO Comment: Clustered, fault-tolerant, memory-based tables *************************** 10. row *************************** Engine: FEDERATED Support: NO Comment: Federated MySQL storage engine *************************** 11. row *************************** Engine: MRG_MYISAM Support: YES Comment: Collection of identical MyISAM tables *************************** 12. row *************************** Engine: ISAM Support: NO Comment: Obsolete storage engine 12 rows in set (0.00 sec)

mysql> exit Bye [skritzer]$ ls -l /etc/my.cnf ls: /etc/my.cnf: No such file or directory [skritzer]$ sudo vi /etc/my.cnf

We trust you have received the usual lecture from the local System Administrator. It usually boils down to these two things:

#1) Respect the privacy of others. #2) Think before you type.

Password: [mysqld] skip-innodb :wq "/etc/my.cnf" 2L, 21C written [skritzer]$ sudo /etc/init.d/mysql stop Shutting down MySQL.. [ OK ] [skritzer]$ ps -ef | grep mysql skritzer 7253 7179 0 21:07 pts/0 00:00:00 grep mysql [skritzer]$ sudo /etc/init.d/mysql start Starting MySQL[ OK ] [skritzer]$ ps -ef | grep mysql root 7261 1 0 21:07 pts/0 00:00:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql --pid-file=/var/lib/mysql/mysql.pid mysql 7282 7261 0 21:07 pts/0 00:00:00 /usr/sbin/mysqld --basedir=/ --datadir=/var/lib/mysql --user=mysql --pid-file=/var/lib/mysql/mysql.pid --skip-locking skritzer 7290 7179 0 21:07 pts/0 00:00:00 grep mysql [skritzer]$ mysql -u root Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 1 to server version: 5.0.19-standard

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> SET autocommit=0; Query OK, 0 rows affected (0.00 sec)

mysql> START TRANSACTION; Query OK, 0 rows affected (0.00 sec)

mysql> COMMIT; Query OK, 0 rows affected (0.00 sec)

mysql> SET autocommit=1; Query OK, 0 rows affected (0.00 sec)

mysql> START TRANSACTION; Query OK, 0 rows affected (0.00 sec)

mysql> COMMIT; Query OK, 0 rows affected (0.00 sec)

No errors or warnings are generated. So technically speaking, I can SET autocommit=0 no matter what storage engines are in use, for any of the varying definitions of in use.

At the July MySQL User Group Meeting, Jim Starkey wondered aloud, “What happens when I COMMIT on a memory table?” I wrote the question down, to research it later.

The obvious answer is “COMMIT on a non-transactional table does nothing.”

Tonight I was thinking about this, and I realized I do not actually COMMIT “on a table.”

The manual page at: http://dev.mysql.com/doc/refman/4.1/en/commit.html (and the 5.0 and 5.1 equivalents) state:

By default, MySQL runs with autocommit mode enabled. This means that as soon as you execute a statement that updates (modifies) a table, MySQL stores the update on disk.

If you are using a transaction-safe storage engine (such as InnoDB, BDB, or NDB Cluster), you can disable autocommit mode with the following statement:

SET AUTOCOMMIT=0;

Twins and PlanetMySQL

What is it about the folks on Planet MySQL having twin brothers?

Roland Bouman
Sheeri Kritzer
Jay Pipes
Zach Urlocker

That’s 4 of the top 25 posters to Planet MySQL. Anyone else want to reveal having a twin? Anyone on here have a twin sister? If you’re a twin and aren’t on the Planet, note that here too……

What is it about the folks on Planet MySQL having twin brothers?

Roland Bouman
Sheeri Kritzer
Jay Pipes
Zach Urlocker

That’s 4 of the top 25 posters to Planet MySQL. Anyone else want to reveal having a twin? Anyone on here have a twin sister? If you’re a twin and aren’t on the Planet, note that here too……